The Google Play Store recently introduced a data safety section in order to give users accessible insights into apps’ data collection practices. We analyzed the labels of 43,927 popular apps. Almost one third of the apps with a label claims not to collect any data. But we also saw often downloaded apps, including apps meant for children, admitting to collecting and sharing highly sensitive data like the user’s sexual orientation or health information for tracking and advertising purposes. To verify the declarations, we recorded the network traffic of 500 apps, finding more than one quarter of them transmitting tracking data not declared in their data safety label.
At the end of April 2022, Google launched their new data safety section for Android apps, a feature meant to give users reliable information about how apps distributed through the Play Store handle their users’ data. App developers are required to list the types of data their apps process and the purposes each data type is used for. They also need to distinguish for each, whether they collect this data for themselves or whether they share it with third parties. In addition, developers have to declare whether users can ask for their data to be deleted.
This information is then displayed in the Play Store as the data safety label, with the stated goal of allowing users to decide themselves whether they are okay with an app’s privacy practices before installing it.
Google’s launch of the data safety labels follows a similar effort by Apple, which introduced the very similar privacy labels for iOS back in late 2020. In both cases, all information in the labels is self-declared by the app developers and it is unclear whether and to what extent Google and Apple verify the details. There is a risk of intentionally or accidentally false declarations by developers misleading users into believing that an app is more privacy-friendly than it actually is. We have already contributed to a study into the honesty of privacy labels on iOS and showed that some labels contain obvious inconsistencies like claiming to collect user IDs not linked to the user and that 16 % of the checked apps transmitted data not declared in their label.
Since it has now been a few months since the introduction of the data safety labels and many apps have provided one, it’s time for us to look into the situation on Android.
What do the labels say?#
We’ll start by getting a general overview of what the apps say in their data safety labels. For that, we want to look at the most popular apps. The Play Store compiles top charts for each category. Through the website, one can only view the top 45 apps per category, but it is possible to access the full top charts using an internal API endpoint. For the following statistics, we looked at the data safety labels of the top apps across all categories, with 43,927 apps in total (after deduplicating those appearing in multiple charts).
According to Google’s documentation, all apps were supposed to provide a data safety label by July 20, 2022. Now, one and a half months after that deadline, more than one fifth of apps (9,255) has still not provided one yet. These apps can no longer publish updates and “may face additional enforcement actions in the future, such as the removal of [the] app’s store listing from Google Play”.
29.8 % (10,347) of the apps that do provide one, say they neither share nor collect any data, and 57.2 % (19,848) claim to at least not share any data with third parties. Those numbers sound encouraging as many apps can indeed function entirely locally on the phone without transmitting data but remember that those are self-declarations by the developers and we can’t tell yet whether these claims are actually truthful.
But what about that apps that do say they process data? The situation is looking less privacy-friendly here: The four most commonly declared data types are all for tracking purposes: device IDs, crash logs, app interaction, and diagnostic data. Only after those do we see data types that some apps might actually need, like user IDs and the user’s name.
65.5 % (22,728) of apps with a data safety label self-declare to collect or share at least one data type that is only useful for tracking1. That’s almost all of the apps that don’t claim not to collect or share any data! Meanwhile, only 53.8 % (18,661) self-declare to collect or share at least one data type that can be used for purposes other than tracking2. And 10 % (3,348) only share data with third parties but don’t collect any themselves—how generous of them.
The picture stays the same when looking at the purposes the labels give for the collected data types: Analytics is also the most commonly declared purpose, followed by App functionality and Advertising or marketing.
In addition to listing the data types and purposes, apps also need to declare whether users can request deletion of their data. We should expect this to be the case for all apps considering that it’s required by the GDPR. Nonetheless, 27.2 % (9,428) of apps with a label say that users cannot request deletion, but most of them at least declare that they neither collect nor share any data. Excluding those, 5.5 % (1,911) say that they collect and/or share data but users cannot request data deletion.
While looking at the data safety labels, we noticed a worrying number of apps declaring that they collect or even share highly sensitive data including information about their user’s sexual orientation, political or religious beliefs, and health for tracking or advertising purposes. Remember that these are self-declarations by the app developers, not allegations by us or third parties. The app developers themselves seem to have no problem with admitting to this incredibly problematic data use.
Here are just a few examples of well-known apps with many downloads doing this3:
- Facebook collects political or religious beliefs, the sexual orientation, and health info for analytics purposes
- Amazon Shopping collects health info for analytics purposes
- Roblox collects the sexual orientation for analytics purposes and shares it for analytics, and advertising or marketing purposes
- SoundCloud: Play Music & Songs shares the sexual orientation for advertising or marketing purposes
- My Little Pony: Magic Princess collects the sexual orientation for analytics, and advertising or marketing purposes and shares it for advertising or marketing purposes
- FarmVille 2: Country Escape collects the sexual orientation for advertising or marketing purposes
- 9GAG: Funny GIF, Meme & Video shares the sexual orientation for analytics purposes
- Zalando Lounge - Shopping Club collects and shares the sexual orientation for analytics, and advertising or marketing purposes
- momox: Bücher & mehr verkaufen collects and shares the sexual orientation for advertising or marketing purposes
- nebenan.de - your social network for neighbours collects the sexual orientation for advertising or marketing purposes
It’s unclear whether all the apps actually use the data in this way, but even if these were overzealous “just-in-case” declarations because developers don’t know what the trackers they include in their apps do, it shows a concerning disregard for their users’ privacy.
It is unclear why any of them would need to process this data in the first place, let alone for tracking or advertising purposes. This is especially true considering that all these data types fall under the “special categories of personal data” for which the GDPR affords additional protections (Art. 9 GDPR). Some companies like to claim a legitimate interest (Art. 6(1)(f) GDPR) for tracking to avoid having to ask the user for consent. That practice is questionable even for non-sensitive data, but definitely not applicable for special categories of personal data.
Especially shocking: Some of the apps listed above are explicitly and exclusively targeted at children. The GDPR rightfully recognizes that children need even stricter protection with regard to their personal data (Recital 38 GDPR) and thus sets even higher requirements for processing their data. Collecting and even sharing special categories of personal data about children for analytics or advertising purposes is absolutely unacceptable.
Checking labels against actual traffic#
Finally, we ran a traffic analysis on the top 500 apps overall4 to check the truthfulness of the declarations in the labels. We installed and started each app in an Android emulator and let it running for a minute without any user input. In the background, we recorded the entire network traffic.
Here’s an overview of the data types we observed being transmitted:
We can see apps commonly transmitting device parameters like Android version, phone model, screen size, carrier, battery status, and volume. As we didn’t interact with the apps at all, it’s not surprising that there isn’t really any traffic related to actual app functionality but rather tracking and advertising traffic for the most part. But it is worth noting that even benign data types like app ID and version or screen size are usually transmitted in conjunction with a unique ID for the user or device (i.e. pseudonymously)5, making them personal data under the GDPR (Recital 26(2) GDPR).
We can now compare the recorded network traffic with the declarations in the data safety labels. Of course, we can only check a small subset of the possible data types since we don’t interact with the apps at all. Similarly, we can only definitively say when data is transmitted but if we don’t observe data being transmitted, it doesn’t necessarily mean that it never is. Also note that Google is less strict in their requirements than the GDPR’s definition of “processing”. For example, according to Google’s policies, apps don’t need to list data sent to a server but deleted immediately after handling the request under “collected data”. We don’t (and can’t) consider these exceptions in our automated analysis.
Keeping that in mind, at least from what we saw, most of the declarations were correct but we did also observe missing declarations. Most notably, more than one quarter of apps transmitted tracking data6 that they didn’t declare. A handful of apps transmitted the user’s location without declaring that. Additionally, a little more than 5.7 % and 6.3 % of apps contacted known tracking and advertising servers respectively without declaring the corresponding purpose anywhere in their label.
These results are in line with what we previously saw for iOS privacy labels. These labels can be a helpful tool in making important information about data collection practices that was previously buried in privacy policies approachable and easier to grasp for users. But if the labels are solely based on self-declarations by app developers, they can also dangerously misrepresent the actual data collection, misleading users into wrongly believing that apps are privacy-friendly even when they aren’t actually.
But the declarations in the labels also highlight the vast collection of tracking and advertising data that is worringly ubiquitous across the web and mobile and sometimes concerns data that is completely inappropriate to collect. Disclosing these practices is not enough. Tracking practices need to be significantly dialed back, and—at the very least—users need to be given a genuine and informed choice in the matter, as the GDPR already requires.
Analysis data set and source code#
The data safety labels that the analysis in this post is based on, were downloaded on September 07, 2022. We are publishing our full data set, including the recorded network traffic. We also have a separate CSV with just the worrying declarations described above.
The source code for the analysis is available on GitHub.
We consider the following categories of data types potentially useful for purposes other than tracking: Location, Personal info, Financial info, Health and fitness, Messages, Photos and videos, Audio files, Files and docs, Calendar, Contacts ↩︎
While we ran the analysis on all the apps, it was only successful for 442 apps. Of the remaining ones, seven could not be downloaded for our emulator due to specific device requirements, and 51 crashed during the traffic recording. ↩︎
We consider the data in a request pseudonymous if the request contains at least one unique identifier for the device or user, namely the device’s Google Advertising ID (including hashed forms thereof), the user’s public IP address, or a tracker-specific unique ID. ↩︎
By “tracking data”, we mean the data types Diagnostics, Other app performance data, and Device or other IDs. Google doesn’t clearly define what falls under those. For the purposes of this analysis, we consider the following information as falling under the respective type:
- Diagnostics: roaming status, is device rooted?, is device an emulator?, network connection type, WiFi and cellular signal strength, charging status, battery percentage, sensor data (accelerometer, rotation), RAM usage, disk usage, uptime, volume
- Other app performance data: device name, carrier, local IPs, BSSID
- Device or other IDs: Google advertising ID, hashed Google advertising ID, IMEI, MAC address, public IP address (included in the request path or body), other unique user, session, or device IDs