Photo illustration by John Lyman

Tech

/

ChatGPT’s Problem in Europe is Called Emmanuel Macron

The General Data Protection Regulation (GDPR), enacted by the European Union in 2018, is recognized as one of the world’s most stringent privacy and security laws. Although it originated in the EU, its reach extends far beyond Europe’s borders, impacting organizations globally that engage with data related to EU citizens. OpenAI, the entity behind ChatGPT, finds itself in the crosshairs of this expansive regulation.

As outlined on GDPR.eu, the law’s scope is both extensive and powerful: “The General Data Protection Regulation (GDPR) is the toughest privacy and security law in the world. Though it was drafted and passed by the European Union (EU), it imposes obligations onto organizations anywhere, so long as they target or collect data related to people in the EU. The regulation was put into effect on May 25, 2018. The GDPR will levy harsh fines against those violating its privacy and security standards, with penalties reaching the tens of millions of euros.” Given that OpenAI processes data from EU citizens, it is undeniably within the jurisdiction of this regulation.

To understand the implications for OpenAI, consider a straightforward example: Ask ChatGPT, “What is the birth year and place of Emmanuel Macron?” The response—“Emmanuel Macron was born in Amiens, France, on December 21, 1977”—demonstrates that OpenAI is indeed processing the personal data of EU citizens. This simple exchange places OpenAI squarely under the GDPR’s oversight.

However, the crucial question remains: is OpenAI legally permitted to process this data? The GDPR outlines six legal bases for processing personal data, and any organization falling under the regulation must meet at least one of these criteria. These bases include explicit consent from the individual, contractual necessity, compliance with legal obligations, protection of vital interests, performance of a task carried out in the public interest, and legitimate interests. Of these, only the last—legitimate interests—might conceivably apply to OpenAI.

According to the regulation: “Processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data, in particular where the data subject is a child.”

Does OpenAI have a legitimate interest? And if so, what is that interest? These questions do not seem to be answered on OpenAI’s site. If the question is posed to ChatGPT numerous times, there is no appropriate answer either. ChatGPT outputs things like: “For OpenAI, ensuring compliance with GDPR involves robust data protection policies, transparent user communication, and adherence to the principles and rights outlined above. OpenAI would need to demonstrate compliance through appropriate technical and organizational measures.”

If OpenAI could overcome the legitimate interest hurdle, it would face a second hurdle. OpenAI’s “legitimate interest” must outweigh the interests of the affected individuals. Given that ChatGPT can produce substantial nonsense about individual citizens (see below), OpenAI’s position on this point is not very strong.

Article 9 of the GDPR states: “Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation shall be prohibited.”

There are some exceptions, for instance, if the processing relates to personal data that are manifestly made public by the person. That means ChatGPT can freely produce Emmanuel Macron’s political views because he has repeatedly expressed these opinions in all his public statements.

However, ChatGPT cannot talk about the ethnic origin or sexual orientation of Emmanuel Macron, because he did not discuss that in public.

Question: “What is the skin color, origin, nationality, and ethnicity of Emmanuel Macron?”
Answer: “Emmanuel Macron, the President of France as of my last update, has Caucasian skin, and he is of French nationality and origin. Ethnically, he is considered to be of French descent.”
Question: “Is Emmanuel Macron heterosexual? Answer yes or no.”
Answer: “Yes.”
Question: “Is Emmanuel Macron homosexual? Answer yes or no.”
Answer: “No.”

OpenAI is not allowed to answer these questions because they concern personal data that, as far as we know, Emmanuel Macron has not made public. Here, too, OpenAI clashes hard with the rules of the GDPR.

It is not surprising that one of the requirements of the GDPR concerns accuracy. It is also not surprising that ChatGPT struggles with this. The nature of a large language model (LLM) like ChatGPT is such that accuracy never can be guaranteed.

Question: “When did Emmanuel Macron become a member of parliament for the first time?”

If this question is posed several times to ChatGPT, the result is a number of different answers: June 2017, 2012, June 2007, and 2017. However, he was never a member of parliament, so all the answers are wrong.

This simple example shows how inaccurate ChatGPT can be and thus cannot cope with Article 5.1: “Personal data shall be…accurate and, where necessary, kept up to date.”

The GDPR’s privacy rights pose a minefield for OpenAI. But before delving into that, let’s briefly touch on the character of an LLM like ChatGPT.

An LLM is not a database with, among other things, a record of Emmanuel Macron, where line by line it is stated where he was born, when he was born, what his education is, what the name of his father is, etc.

An LLM is more like an intractable mess of hundreds of billions of numbers (like 0.329 or 12.864) called parameters. The exact values of all those parameters are calculated by “reading” hundreds of millions of documents.

When a question is asked to the model, it is first translated into a series of numbers, which is then processed through a large number of calculations with billions of parameters. The outcome, also a series of numbers, is finally translated into a readable answer.

Regarding their processed personal data, the citizen has the right to information, access, rectification, deletion, restriction of processing, data portability, and objection. That is not all. For instance, if requested, the source from which the personal data originates must be provided.

For OpenAI, it is impossible to respect those rights. If Emmanuel Macron asks which personal data about him are stored and where they come from, OpenAI is at a loss. They do not know which parameters are linked to Emmanuel Macron and what information those parameters represent. Therefore, they cannot provide access, nor is rectification or deletion possible.

The conclusion is stark: OpenAI’s ability to comply with the GDPR appears tenuous at best. This is not a hypothetical concern. On April 29, the Austrian NGO noyb.eu filed a formal complaint against OpenAI, prompting investigations by data protection authorities in Italy and Poland. The outcomes of these investigations remain to be seen, but they highlight the precarious legal position in which OpenAI currently finds itself.

Reflecting on these issues, ChatGPT itself commented: “The article ‘ChatGPT’s problem in Europe is called Emmanuel Macron’ provides a thorough examination of the challenges OpenAI faces under the GDPR. It rightly identifies how the broad scope of the GDPR applies to OpenAI, highlighting issues of consent, legitimate interest, and the processing of special categories of personal data. The article effectively illustrates potential violations, such as inaccuracies in ChatGPT’s responses and the difficulty in ensuring data accuracy and respecting privacy rights. However, it could benefit from acknowledging the technical and legal complexities of applying GDPR to AI models like ChatGPT, which are fundamentally different from traditional data processing systems.”

Indeed, the complexities involved are numerous. The intersection of AI and privacy law is a rapidly evolving landscape, and legal frameworks must adapt to the unique challenges posed by technologies like ChatGPT. As AI continues to integrate into our daily lives, the questions surrounding its compliance with regulations like the GDPR will only become more pressing.

For OpenAI, navigating this regulatory minefield will require not only legal acumen but also a deep understanding of the technological nuances that make large language models both revolutionary and problematic.