Meta's Galactica AI Incident in late 2022

Meta showed Large Language Models can Hallucinate

Nov 22, 2022

∙ Paid

Image: Battlestar Galactica.

Hey Everyone,

I wanted to delve into Meta’s Galactica AI, what it is and what happened last week. On last tuesday, November 15th, 2022, Meta AI unveiled a demo of Galactica, a large language model designed to "store, combine and reason about scientific knowledge."

Meta recently launched the demo version of its new artificial intelligence tool "Galactica" that is able to summarise academic literature, solve mathematics equations, generate Wikipedia-like articles and more.

It’s an evolving story (YouTube summary) that I found pretty interesting as human civilization gets better at LLMs and interact with their text prompt-based demos.

Galactica models are trained on a large corpus comprising more than 360 millions in-context citations and over 50 millions of unique references normalized across a diverse set of sources. This enables Galactica to suggest citations and help discover related papers.

Grady Booch @Grady_Booch

Absolutely. Galactica is little more than statistical nonsense at scale. Amusing. Dangerous. And IMHO unethical.

@emilymbender@dair-community.social on Mastodon @emilymbender

Facebook (sorry: Meta) AI: Check out our "AI" that lets you access all of humanity's knowledge. Also Facebook AI: Be careful though, it just makes shit up. This isn't even "they were so busy asking if they could"—but rather they failed to spend 5 minutes asking if they could. >> https://t.co/g1Ndvy2P10

While intended to accelerate writing scientific literature, adversarial users running tests found it could also generate realistic nonsense. After several days of ethical criticism, Meta took the demo offline, reported MIT Technology Review and dozens of other pubs.

Yann LeCun @ylecun

@Grady_Booch Oh come on Grady! Is your predictive keyboard dangerous and unethical? Is GitHub Copilot dangerous and unethical? Is the Automatic Emergency Braking System in your car dangerous and unethical because it doesn't do Level-5 fully autonomous driving?

Meta AI has been in recent months touting its own horn, but this is a bit disgraceful and yet another PR setback for the company, Facebook.

The story however has taken many interesting turns.

Emad @EMostaque

Should we put a @paperswithcode Galactica 120bn demo back online for science? Assuming no objections from @MetaAI given license terms 🤔

Click the above tweet to see the Poll.

The demo is currently still down.

Meanwhile executives at Meta A.I. are leaving as the company continues to be reorganized in recent months and years.

Galactica was supposed to help scientists and likely still will but the drama involved in the demo is reminsnic of Microsoft Tay. Twitter users typically try to “pervert” the A.I. usually evolving into clickbait articles about how racist the model became, whereupon the demo is pulled. This occurred even faster this time around in 2022 with Galactica.

Large language models (LLMs), such as OpenAI's GPT-3 (GPT-4 to be announced soon), learn to write text by studying millions of examples and understanding the statistical relationships between words.

I actually think this is a great or “noble attempt”. Enter Galactica, an LLM aimed at writing scientific literature. Its authors trained Galactica on "a large and curated corpus of humanity’s scientific knowledge," including over 48 million papers, textbooks and lecture notes, scientific websites, and encyclopedias. According to Galactica's paper, Meta AI researchers believed this purported high-quality data would lead to high-quality output.

Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. - Part of the Abstract of the Galactica paper.

Read the Paper

Meta AI’s ambition here is actually really impressive. This new large language model called Galactica (GAL) for automatically organizing science is really pretty neat. Galactica is trained on a large and curated corpus of humanity’s scientific knowledge. This includes over 48 million papers, textbooks and lecture notes, millions of compounds and proteins, scientific websites, encyclopedias and more.

Keep reading with a 7-day free trial

Subscribe to AI Supremacy to keep reading this post and get 7 days of free access to the full post archives.