Meta will train AI models using EU user data

Meta will train AI models using EU user data

Table of Contents

Meta has confirmed plans to utilise content shared by its adult users in the EU (European Union) to train its AI models.

It seeks to improve the functionality and cultural understanding of AI systems towards the region’s people after the recent rollout of Meta AI features in Europe.

In a statement, Meta wrote: “Today, we’re announcing our plans to train AI at Meta using public content, like public posts and comments shared by adults on our products in the EU.

“People’s interactions with Meta AI like questions and queries  will also be used to train and improve our models.”

Beginning this week, users of Meta’s platforms such as Facebook, Instagram, WhatsApp, Messenger will receive in-app and email notifications explaining the usage of their data. Users will also be provided information on what public data is used and given access to an objection form.

“We have made this objection form easy to find, read, and use, and we’ll honour all objection forms we have already received, as well as newly submitted ones,” Meta explained.

Meta made it clear that some kinds of data will forever remain restricted for use during AI training, and they will not attempt to repurpose them.

The social media company stated that it would not “use individuals’ private and personal communications with friends and family” to develop its generative AI systems. In addition, publicly available information from accounts owned by EU minors who are under the age of eighteen will not be part of the training data.

Meta wants to build AI tools designed for EU users

Meta positions this initiative as a necessary step towards creating AI tools designed for EU users. Meta launched its AI chatbot functionality across its messaging apps in Europe last month, framing this data usage as the next phase in improving the service.

“We believe we have a responsibility to build AI that’s not just available to Europeans but is built for them,” the company explained.

“That means everything from dialects and colloquialisms to hyper-local knowledge and the distinct ways different countries use humour and sarcasm on our products.”

This becomes increasingly pertinent as AI models evolve with multi-model capabilities spanning text, voice, video, and imagery.

Meta also situated its actions in the EU within the broader industry landscape, pointing out that training AI on user data is a common practice.

“It’s important to note that the kind of AI training we’re doing is not unique to Meta, nor will it be unique to Europe,” the statement reads.

“We’re following the example set by others, including Google and OpenAI, both of which have already used data from European users to train their AI models.”

Meta further claimed its approach surpasses others in openness, stating, “We’re proud that our approach is more transparent than many of our industry counterparts.”

Regarding regulatory compliance, Meta referenced prior engagement with regulators, including a delay initiated last year while awaiting clarification on legal requirements. The company also cited a favourable opinion from the European Data Protection Board (EDPB) in December 2024.

“We welcome the opinion provided by the EDPB in December, which affirmed that our original approach met our legal obligations,” wrote Meta.

Broader concerns over AI training data

Although Meta markets its policies as transparent and compliant with laws in the EU, concerns continue to be raised by privacy advocates surrounding the use of public data scraped from social media platforms to train large language models (LLMs) and generative AI.

In the first place, the explanation of “public” data can be very controversial. They questioned whether the content that is posted on social media sites like Instagram and Facebook was posted with the intention of it being used as data to train commercial AI systems capable of generating new content or insights. While users may provide personal anecdotes, opinions, and even creative works, they do so with the expectation that large-scale automated analysis will not occur.

In the second place, the effectiveness and fairness of “opt-out” as compared to “opt-in” systems are also debatable. Requiring users to actively oppose, usually after and due to notification fatigue, relies on basic but questionable informed consent standards. Such notification not being visible, understandable, or actionable leads to the assumption of consent by default, which results in permission being rendered useless.

Thirdly, the issue of discernable bias is a problematic one. Social media sometimes amplifies societal biases such as racism, sexism, and general bias. Models of AI trained on this media will pick up these biases and even further advance them. Companies use various methods to try and filter and fine-tune this bias, but the scale of data is simply too large. An example would be an AI trained on European public data; so much care has to be taken in collecting the data so that no preconceptions are created or harmful generalisations made about the cultures being studied.

Furthermore, questions surrounding copyright and intellectual property persist. Public posts often contain original text, images, and videos created by users. Using this content to train commercial AI models, which may then generate competing content or derive value from it, enters murky legal territory regarding ownership and fair compensation issues currently being contested in courts worldwide involving various AI developers.

And even while Meta promotes their form of “transparency” compared to competitors, the actual processes of data selection and filtering retain a great deal of model behaviour mysticism. Transparency of contemporary significance would involve more scrutiny of the frameworks for AI outputs and the checks to prevent abuse or unintended damage.

Meta’s strategies in the EU illustrate just how tech-savvy the AI economy is with “free” user-generated content. As more companies adopt these policies, the discussion around privacy, consent, data ownership, not to mention the biases embedded in algorithms and the accountability of AI developers, will grow stronger for Europe and the rest of the world.

See also: DolphinGemma: Google AI model understands dolphin chatter.

Leave A Comment

Open chat
Need Help!
Customer Support
Hi!
How can we help you?