Skip to content

With the tweasel project, we want to build a web app that detects privacy violations in mobile apps on Android and iOS. Users can select an app from the app stores and we will analyze its network traffic and consent dialogs. We will show a report to the user and offer to generate a complaint under the GDPR and ePrivacy Directive with the collected evidence. Lorenz and I are working on this thanks to NLnet funding.

To keep you up to date on everything we’re doing, we’ll start doing biweekly update posts, where we go into the progress we’ve made and features we’ve added to our tools and libraries, but also any interesting technical challenges we’ve solved. This first one is going to be a bit longer, since we have some catching up to do. Strap in.

Appstraction

#

Appstraction is an abstraction layer for common instrumentation functions on Android and iOS. It allows you to install, uninstall, start, stop apps and configure their permissions, as well as manage device settings like emulator snapshots, clipboard, proxy, and certificates. Appstraction can also be used for purposes other than mobile privacy.

Cyanoacrylate

#

Cyanoacrylate is a toolkit for large-scale automated traffic analysis of mobile apps on Android and iOS. It uses mitmproxy to capture the HTTP(S) traffic of apps in HAR format and appstraction to instrument physical devices, or emulators for Android. Cyanoacrylate handles the management of certificate authorities and WireGuard mitmproxy setup automatically. It is designed to analyze the tracking behavior of mobile apps.

  • The first version of cyanoacrylate was released at the end of March. It featured a fully automatic mitmproxy setup, Android emulator control and Python environment installation. We are using the har_dump.py script to export the traffic from mitmproxy as a .har file and Lorenz wrote a mitmproxy script to communicate its events to JavaScript. This version only supported Android.

  • In version 0.2.0, we make use of WireGuard’s feature to only tunnel traffic of specific apps and allow you to configure the WireGuard app filtering in the options. By default, if you do a traffic collection on an app analysis, we only collect that app’s traffic. That way, you don’t have to worry about filtering out background traffic anymore.

    I implemented this by manipulating the internal config files of the WireGuard app on Android.

  • In version 0.3.0, Lorenz implemented support for traffic collection on iOS devices. This currently uses an HTTP(S) proxy (unlike on Android, where we use WireGuard) and cannot filter the traffic of individual apps. Instead, we currently always record the entire system’s traffic.

  • Finally, with version 0.4.0, we added Windows support for cyanoacrylate and simplified the setup a little.

TrackHAR and trackers.tweasel.org

#

TrackHAR is a library for detecting tracking data transmissions from traffic in HAR format. It uses custom adapters to handle different tracking endpoints and extract the transmitted data. TrackHAR also aims to produce outputs that can be used to generate human-readable documentation of the tracking data. This documentation is hosted at trackers.tweasel.org, a wiki that explains how TrackHAR recognizes and decodes the requests, and provides some sample information from research data.

  • TrackHAR had its first release in April. With that, we have laid down the design and schema for the adapters and implemented the basic functionality. Most of the adapters from my master’s thesis are ported over but have received only limited additional testing and checking so far. Also, the documentation for the containedDataPaths is still lacking behind what we are aiming for.

  • The adapter-based matching approach TrackHAR primarily uses necessarily means that a significant portion of requests will be unprocessed (as we can’t write an adapter for every possible endpoint, especially developer-/app-specific ones). To alleviate that somewhat, I implemented indicator matching as an (optional) fallback. With indicator matching, the user can provide an object that maps data types to honey data like this:

    {
        localIp: ['10.0.0.2', 'fd31:4159::a2a1'],
        idfa: '6a1c1487-a0af-4223-b142-a0f4621d0311'
    }
    

    TrackHAR then searches for these values in the requests. In addition to string matching in plain text, we also support searching in base64- and URL-encoded text. Support for additional encodings and hashes is planned.

  • In April, we also launched Lorenz' initial implementation of trackers.tweasel.org. This documentation is generated completely automatically from the adapters in TrackHAR. We are even creating a human-readable description of the decoding steps. I also included static example values of the actual data transmitted to the tracking endpoints based on the data from my master’s thesis. Ultimately, we want to have a constantly-updated public database of tracking requests and dynamically list examples of observed values for each data path.
    We hope that this will become a valuable resource for people who want to dig deeper into tracking.

CLI

#

Tweasel CLI is a command-line tool that allows you to instrument and analyze mobile apps and their traffic using the tweasel project libraries. You can record the traffic of an Android or iOS app in HAR format (based on cyanoacrylate), and detect tracking data transmissions from the traffic (based on TrackHAR). Tweasel CLI provides a convenient wrapper around these libraries for common use cases, so you don’t have to write any code.

  • In April, we also released the first version of our CLI (the implementation of which was more painful than it should have been…). This initial release supports two commands:

    With record-traffic, you can record the traffic of an Android or iOS app in HAR format. Through command line arguments, you can configure various aspects of the traffic collection like a timeout and whether to record only the traffic of one app or the entire system.

    With detect-tracking, you can then detect tracking data transmissions from traffic in HAR format (whether recorded with a tweasel tool or otherwise). The traffic in the specified HAR file will be analyzed using TrackHAR. The detected tracking data can be output as JSON or as a human-readable table:

    Screenshot of running the detect-traffic command on a HAR file recorded from de.check24.check24.har. Two POST requests are shown, with a table of the detected data transmissions underneath, each with a property, context, path, and value. The first request is to app.adjust.net.in and transmitted appId, appVersion, idfa, otherIdentifiers, language, model, osName, osVersion, country, manufacturer, screenWidth, and screenHeight. The second request is to app-measurement.com and transmitted appId, appVersion, idfa, osName, and osVersion.
  • Since then, I made two more changes that are not released yet (both requested by Malte):

    • I’ve implemented an “interactive timeout”. If the user doesn’t provide an explicit --timeout flag, we wait until they manually stop the traffic recording. I think the CLI is more likely to be used for manual analysis, so this makes more sense as a default.

      I also added support for multiple traffic collections. With a new --multiple-collections flag, after each time the user stops an interactive timeout, we ask them to enter a name to start a new traffic collection or leave it empty to stop. This is really useful for analyzing apps with consent dialogs. This way, you can easily do a manual analysis and record the traffic from before and after an interaction with the consent dialog separately.

    • I also displayed the “setting up” steps more granularly:

      Screenshot of the output of the record-traffic command. showing granular substeps for the first 'Setting up…' step that is displayed as in progress: 'Starting analysis…' (done), 'Checking tracking domain resolution…' (skipped), 'Waiting for device…' (in progress), 'Checking device connecting and setting up…', 'Starting app analysis…' (last two not started yet). The following first-level steps are: 'Installing app…', 'Starting app…', 'Collecting traffic…', 'Stopping app…', 'Cleaning up…'

Everything else

#
  • Just at the end of last year, we gave a talk at the FireShonks year-end event. We talked about how mobile apps track us and what data they send to third parties. We showed how we analyzed thousands of apps automatically and what we found out. We also explained the legal framework of tracking in the EU and why most apps and consent dialogs don’t comply with it. The talk was recorded (it was in German but there is an English live dub available).
  • We also have a parse-tunes library for fetching select data on iOS apps from the Apple App Store via undocumented internal iTunes APIs. I wrote a Mastodon thread on that already back in January.
  • Our explicit goal is to make our libraries and tools not just for us, both also for other NGOs, data protection authorities, researchers, etc. In April, we gave a presentation before the tech advisory board of the European Data Protection Board (EDPB) about our results and the tools we developed for mobile app tracking research. The meeting was not recorded but our slides are of course available.
    We’ll also be giving a training course on how to use our tools for a German authority next month. And we’re already in contact with two other organizations fighting against tracking to work together on this issue. If you’re also interested in collaborating, please reach out! We are more than happy to help you use our tools, implement feature requests, etc.
written by Benjamin Altpeter
on
licensed under: Creative Commons Attribution 4.0 International License