Meta's Galactica AI Incident in late 2022
Meta showed Large Language Models can Hallucinate
Image: Battlestar Galactica.
I wanted to delve into Meta’s Galactica AI, what it is and what happened last week. On last tuesday, November 15th, 2022, Meta AI unveiled a demo of Galactica, a large language model designed to "store, combine and reason about scientific knowledge."
Meta recently launched the demo version of its new artificial intelligence tool "Galactica" that is able to summarise academic literature, solve mathematics equations, generate Wikipedia-like articles and more.
It’s an evolving story (YouTube summary) that I found pretty interesting as human civilization gets better at LLMs and interact with their text prompt-based demos.
Galactica models are trained on a large corpus comprising more than 360 millions in-context citations and over 50 millions of unique references normalized across a diverse set of sources. This enables Galactica to suggest citations and help discover related papers.
While intended to accelerate writing scientific literature, adversarial users running tests found it could also generate realistic nonsense. After several days of ethical criticism, Meta took the demo offline, reported MIT Technology Review and dozens of other pubs.
Meta AI has been in recent months touting its own horn, but this is a bit disgraceful and yet another PR setback for the company, Facebook.
The story however has taken many interesting turns.
Click the above tweet to see the Poll.
The demo is currently still down.
Meanwhile executives at Meta A.I. are leaving as the company continues to be reorganized in recent months and years.
Galactica was supposed to help scientists and likely still will but the drama involved in the demo is reminsnic of Microsoft Tay. Twitter users typically try to “pervert” the A.I. usually evolving into clickbait articles about how racist the model became, whereupon the demo is pulled. This occurred even faster this time around in 2022 with Galactica.
Large language models (LLMs), such as OpenAI's GPT-3 (GPT-4 to be announced soon), learn to write text by studying millions of examples and understanding the statistical relationships between words.
I actually think this is a great or “noble attempt”. Enter Galactica, an LLM aimed at writing scientific literature. Its authors trained Galactica on "a large and curated corpus of humanity’s scientific knowledge," including over 48 million papers, textbooks and lecture notes, scientific websites, and encyclopedias. According to Galactica's paper, Meta AI researchers believed this purported high-quality data would lead to high-quality output.
Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. - Part of the Abstract of the Galactica paper.
Meta AI’s ambition here is actually really impressive. This new large language model called Galactica (GAL) for automatically organizing science is really pretty neat. Galactica is trained on a large and curated corpus of humanity’s scientific knowledge. This includes over 48 million papers, textbooks and lecture notes, millions of compounds and proteins, scientific websites, encyclopedias and more.