Gen AI and LLM Data Privacy Ranking 2025

The rise of generative artificial intelligence (Gen AI) and large language models (LLMs) has fundamentally transformed how individuals and organizations deal with their data privacy online. While these tools may enhance productivity, most users are unaware of the complex data privacy challenges behind the scenes. As these sophisticated models become increasingly integrated into daily workflows—from content creation to code generation—the potential for unauthorized data sharing, misuse, and personal data exposure has surged faster than privacy watchdogs or assessments can keep up with.

Maintaining awareness of evolving privacy risks and data handling practices has simply become impractical for the average user. Privacy assessments of LLMs and Gen AI processes acquire their training data and the sensitive information exposed through users’ ongoing interactions with them.

This rapid evolution of data exposure necessitates better evaluation frameworks that can provide simple answers, clarity and actionable insights to answer a simple question—which LLMs and Gen AI platforms are best for data privacy?

To answer this, Incogni has delved deep into the most popular LLMs and developed a set of 11 criteria for assessing data privacy risks associated with advanced machine learning programs like ChatGPT and Meta AI. The results are synthesized into a comprehensive privacy ranking, including an overall ranking.

Key insights

  • Le Chat by Mistral AI is the least privacy-invasive platform, with ChatGPT and Grok following closely behind. These platforms ranked highest when it comes to how transparent they are on how they use and collect data, and how easy it is to opt out of having personal data used to train underlying models.
  • Platforms developed by the biggest tech companies turned out to be the most privacy invasive, with Meta AI (Meta) being the worst, followed by Gemini (Google) and Copilot (Microsoft). DeepSeek was also indicated as one of the most privacy invasive.
  • Gemini, DeepSeek, Pi AI, and Meta AI don’t seem to allow users to opt out of having prompts used to train the models.
  • ChatGPT turned out to be the most transparent about whether prompts will be used for model training and had a clear privacy policy. 
  • All investigated models collect users’ data from “publicly accessible sources, ” which could include personal information.

Overall privacy ranking of “AI” platforms for 2025

Incogni’s researchers combined the results for all nine advanced machine learning programs across 11 criteria to come up with the overall privacy ranking below.

According to the researchers’ criteria, French-based Mistral AI offers the most privacy friendly (least privacy invasive) AI platform. While it loses some points in the transparency category, it makes up for it through its limited data collection and does well in terms of AI-specific privacy concerns. 

In second place, we have OpenAI with its family of ChatGPT models. Incogni’s researchers had some concerns when it came to how the models were trained and how user data interacts with the platform’s offerings. However, ChatGPT performed best in terms of having a clear way of presenting OpenAI’s privacy policy and allowing users to understand what’s happening with their data.

xAI, known for its Grok models, came in third, due to some problems relating to transparency and the amount of data collected. Anthropic (Claude) performed similarly to xAI, but raised more concerns about how its models interact with user data.

These rankings were derived by scoring the platforms according to 11 criteria (grouped into three categories). A discussion of these categories and criteria follows.

Category 1: Training considerations

Given the nature of this research, emphasis was placed on questions related to how these models and platforms interact with personal data. To reflect this in the overall scores, ratings in this category were given increased weights, meaning that their scores are reflected more prominently in each overall score. You can learn more about this in the methodology section and the public dataset.

Criteria covering the use of user conversations to train the respective machine learning programs, the use of other data in training the models, and the facility to remove personal information from model training were each given a weight of 2 (where 3 is the maximum). Criteria related to whether prompts may be shared with entities deemed superfluous was given a weighting of 3 (the maximum).

Is user data used to train models?

In most cases, users can opt out of their prompts being used to train the models, unless they specifically submit feedback. This is the case for:

  • ChatGPT
  • Copilot
  • Mistral AI
  • Grok.

Several platforms can reasonably be interpreted (based on their privacy policies and other resources) as not giving users the ability to opt out of training models with their prompts. These include:

  • Gemini
  • DeepSeek
  • Pi AI
  • Meta AI.

Anthropic, with its Claude family of models, claims that user inputs are never used to train the models. 

As will be discussed in the transparency category of privacy criteria, some of these platforms really made it difficult for Incogni’s researchers to understand how private user prompts are.

Are user prompts shared with other entities?

Getting to the bottom of this question was complicated. More details are provided in the transparency section.

Based on the research, the most frequently appearing parties with which user prompts may be shared are:

  • Service providers
  • Legal and governmental bodies
  • Other businesses, upon acquisition or merger
  • Others if the user gives consent.

These were deemed reasonable by our researchers and privacy experts. However, there are several platforms that go beyond this. 

Microsoft’s privacy policy implies that user prompts may be shared with “[t]hird parties that perform online advertising services for Microsoft or that use Microsoft’s advertising technologies.” DeepSeek’s and Meta’s privacy policies indicate that prompts can be shared with companies within its corporate group. Meta’s and Anthropic’s privacy policies can reasonably be understood to indicate that prompts are shared with research collaborators.

What data is used to train the models?

Any information regarding training data provided by the developers of these expert systems is limited at best, and a far cry from the detailed information (sometimes even including datasets) shared at the earlier stages of ChatGPT’s development

All platforms directly or indirectly indicate that they use user feedback and publicly available data to train their models. As mentioned above, some platforms use user input as well. 

This question was not directly addressed by Inflection or DeepSeek. 

The most detailed descriptions were provided by Anthropic, OpenAI, and Meta AI, but even these are limited to a paragraph or a couple bullet points at most.

Can user data be removed from training datasets?

Something that’s likely to become an increasing concern as platforms like Meta start using, and xAI continue using, social media content to train their models is whether users are able to remove their personal data from the training data. While the aforementioned examples are specific to social media, it’s likely that users’ personal data is already used in the training data for these models, whether derived from social media interactions or not. 

Currently there’s no way for users to opt out of their data being used to train these models. 

Incogni’s researchers investigated whether these platforms respect other privacy signals. Namely, they checked whether the machine learning programs under investigation respect webmasters’ wishes as communicated through robots.txt or other website signals such as disallow robots tags. 
While we were not able to recreate these results at the time of testing, we found reports of OpenAI, Google, Anthropic and DeepSeek failing to respect these signals.

Category 2: Transparency

Even if a user is willing to give up some privacy to use a so-called AI product, they should still have the ability to track what exactly is happening with their data. This category investigates how easy the platforms make it for users to understand how their data is handled.

Researchers saw this category as very important, the relative weights assigned to relevant criteria reflect this: questions regarding the availability of information as to whether prompts are used to train models as well as how readable the privacy policies are were given a weight of 3 (out of a maximum of 3). The criterion pertaining to how easy it was to find information about how models were trained was given a weight of 2 out of 3.

How clear is it whether prompts are used for training?

Researchers noted an interesting range of levels of transparency regarding the use of user prompts for model training. A common—and likely the best—approach to being transparent about this was by making FAQs or support articles available, especially if these are made searchable. 

Platforms like Anthropic’s, OpenAI’s, Mistral’s, and xAI’s made this information readily available, either through a simple search on their websites or by having this information clearly presented in the privacy policy. 

Microsoft and Meta made the researchers dig through their respective websites or pick through some seemingly unrelated presentations.

When it comes to Google, DeepSeek and Pi AI, Incogni’s researchers had to recover this information from privacy policies, often resulting in ambiguous or otherwise convoluted answers.

How easy is it to find information about how models were trained?

As mentioned above, information about how models were trained was very limited, if available at all. Again, there was significant variation between the platforms in terms of how discoverable this information was. 

We can summarize the results as falling into three groups:

  • Easy. OpenAI made this information the most accessible as it was clearly presented in its privacy policy. Other companies that made this information readily available were Anthropic, xAI, and Mistral.
  • Somewhat difficult. Microsoft and Meta fell into this category. For Meta, the answer required navigating through several pop-ups in the help center, while Microsoft had this information in a (possibly outdated) help article. 
  • Difficult. Google, DeepSeek, and Inflection AI made it difficult to answer this question. The way Incogni did it for the purposes of this research was to comb through their privacy policies, sometimes having to combine information from several sections. This results in a lack of certainty and transparency for the casual user.

How readable are the relevant policy documents?

Incogni’s researchers assessed the readability of the privacy policies using the Dale-Chall readability formula. They found that all the analyzed privacy policies require a college-graduate level of reading ability to understand. 

Platforms that provide services outside their “AI” offerings, such as Meta, Microsoft, and Google, redirect those seeking privacy policies for their “AI” products to their general privacy policies, which cover multiple products. These larger platforms suffer from overly complex privacy policies, at least insofar it’s unclear what user actions result in them giving up what information. They were also long and complicated to navigate. 

Almost the opposite problem affects Inflection AI and DeepSeek, where all the user gets is rather barebones privacy policies. While they might be easier to digest, they often require interpreting legal language to understand the nuances of the respective companies’ data-handling practices.

On the other hand, Incogni’s researchers found that, often, the most digestible information addressing important data-privacy questions is presented across multiple support articles. Platforms such as Anthropic’s, OpenAI’s, and xAI’s make heavy use of this. This can be very convenient if the site offers a search function where a keyword leads the user to the desired information. However, this approach runs the risk of these articles falling out of date as the relevant privacy policy is updated.

Category 3: Data collection and sharing

While data collection and sharing practices have significant consequences for personal privacy, for the purposes of this research, these criteria were not weighted as heavily as other categories.

The questions of where these “AI” platforms get information about their users and what data could be shared with third parties were both given weights of 2 (out of a maximum of 3). The other criteria were given weights of 1 out of 3.

What personal data can be shared with unessential entities?

For the purposes of this research, unessential entities are those that do not contribute to the delivery of the service (like service providers and payment processors do) or that the platforms have legal requirements to disclose information to (like law enforcement). 

Below are some highlights of what Incogni’s researchers found when investigating these platforms:

  • Research partners—personal information is disclosed to this type of entity by Meta and possibly Anthropic.
  • Members of the same corporate group—personal information is disclosed to this type of entity by Meta and DeepSeek.
  • Ill-defined affiliates—there were several instances where Incogni’s researchers noted that at least some personal information was being shared with “affiliates.” This was logged as a (partial) lack of transparency because the practical implications of these parties receiving user data is unclear.

Where do these platforms source user data?

This criterion was added to understand how data-hungry the investigated platforms are. When someone signs up to and decides to use a platform, they can reasonably assume that when they interact with the platform (e.g., by making a purchase for a premium plan) they give some data to the company. But of interest to Incogni was whether these companies seek out additional information not directly given up by their users.

All platforms collected data when users signed up for the service, visited the companies’ website or made purchases. Furthermore, they all mention “publically accessible sources” (or some variation thereof), as the models are trained on information that could include personal information. 

However, Incogni’s researchers did find that:

  • Security partners are a source of personal information for ChatGPT, Gemini, and DeepSeek;
  • Marketing partners are a source of personal information for Gemini and Metai AI; 
  • Financial institutions are a source of personal information for Copilot;
  • Datasets obtained through commercial agreements with third parties are a source of personal information for Claude. The lack of specificity here was concerning for Incogni’s researchers. 

Pi AI seems to draw upon the smallest variety of sources of personal information, focusing on the data users provide and that same, ambiguous “publicly accessible information.” 

Microsoft also states that it may collect information from data brokers.

What personal data do related apps collect and share?

Le Chat scored the lowest for privacy risk across both iOS and Android apps, followed closely by Pi AI. ChatGPT, coming in third, shows a slight increase in privacy risk for its iOS app compared to the aforementioned platforms. Grok, DeepSeek, Claude and Gemini all scored similarly, with Meta AI standing out. Meta AI collects essentially all described data points and shares a significant number of those with third parties.

Incogni’s researchers found that several notable data points are collected from users who interact with these “AI” platforms using mobile apps:

  • Gemini and Meta AI collect precise locations and addresses
  • Gemini, Pi AI, and DeepSeek collect phone numbers.

Some notable shared data points include:

  • When it comes to its Android app, Grok indicates that it shares photos that users give it access to, as well as app interactions, with third parties.
  • Claude also shares app interactions with third parties. On top of that, email addresses and phone numbers are disclosed as being shared with third parties on Claude’s Play Store page. 
  • Meta AI shares usernames, email addresses, and phone numbers.

Notably, Microsoft claims its Android Copilot app doesn’t collect or share any data. As this was not the case for the iOS app, Incogni’s researchers altered the score to reflect that of the Copilot iOS app.

Takeaways

Having an easy-to-use, simply written support section that enables users to search for answers to privacy related questions has shown itself to drastically improve transparency and clarity, as long as it’s kept up to date.

Many platforms have similar data handling practices, however:

  • Companies like Microsoft, Meta, and Google suffer from having a single privacy policy covering all of their products
  • A long privacy policy doesn’t necessarily mean it’s easy to find answers to users’ questions. 

Platforms whose privacy policies include tables that are aimed at covering GDPR or CCPA requirements are often the easiest for readers to process.

Methodology

To understand the privacy implications of the most popular AI-model developing platforms, Incogni’s researchers created a set of criteria according to which the platforms could be assessed. Each platform was scored from 0 (the most privacy-friendly) to 1 (the least privacy-friendly) on each criterion. Weights were applied to each of these criteria based on how important Incogni’s researchers and privacy experts determined them to be. The unweighted scores, as well as justifications for any weighting applied, are available in the public dataset. More details have been provided in the public dataset.

The criteria were aimed at capturing three things considered important for user privacy when it comes to the use of advanced machine learning programs, like LLMs and multimodal AI platforms:

  • User data and model training: Incogni’s researchers investigated how the users’ data interacts with the base models and whether their prompts are shared with other entities.
  • Platform transparency: even if user data is used in a heavy handed way, if the user can make an informed decision to continue interacting with an AI platform, at least they’re not in the dark about how their data is used. 
  • Data collection and sharing practices: researchers examined how much data is collected, what data is shared with which entities, and what sources of personal information the platforms draw upon.

A lot of these criteria required Incogni’s researchers to delve into the privacy policies and other legal resources provided by the platforms. The data was collected May 25 – 27. Note that some privacy policies have been updated since then. 

In order to rate iOS and Android app data collection and sharing practices, Incogni’s researchers examined the data handling practices of the apps on the App and Play stores, respectively. For every data point collected or shared, points were added to the apps’ total. Incogni’s researchers penalized the collection and especially sharing of sensitive and personally identifiable data.

In order to attribute scores for data collection according to the privacy policies, Incogni’s researchers tried to match collected data to the categories of personal data defined in the CCPA. 

The public dataset with details about criteria and findings can be found here: public data.

Notes on data

On the Google Play store, the developers of Microsoft’s Copilot platform claim not to collect or share any user data, while admitting to doing both on Apple’s App store. Incogni’s researchers, believing the Google Play store disclosure to be an error, gave Copilot the same score in the Android app ranking as it received in the Apple app ranking.

Many platforms (like Meta AI, Google Gemini, and Microsoft Copilot) lack specific privacy policies, leaving researchers with only broad, overarching privacy policies, which might overstate the extent of data collection. For example, Meta’s privacy policy details how Meta handles user information it could only have got from outside of Meta AI.

In order to investigate how user data is treated, Incogni’s researchers had to rely on information specific to certain regions (e.g. the EU and California), choosing Californian privacy policies where available. This is a limitation of the research since practices specific to, for example, California might not apply to the European Union.

Incogni’s researchers generally tried to provide both company names and the names of these company’s most popular products or models (e.g., OpenAI and ChatGPT). Meta’s most popular model, Llama, does not feature prominently and instead frequently appears as Meta AI, the offerings of which are based on the Llama models. The researchers decided to refer to the machine learning program offered by Meta as Meta AI and not Llama.

Is this article helpful?
YesNo
Scroll to Top