AI Supremacy

Share this post

Meta's Galactica AI Incident in late 2022

aisupremacy.substack.com
Flash

Meta's Galactica AI Incident in late 2022

Meta showed Large Language Models can Hallucinate

Michael Spencer
Nov 22, 2022
5
Share this post

Meta's Galactica AI Incident in late 2022

aisupremacy.substack.com
Image: Battlestar Galactica.

Hey Everyone,

I wanted to delve into Meta’s Galactica AI, what it is and what happened last week. On last tuesday, November 15th, 2022, Meta AI unveiled a demo of Galactica, a large language model designed to "store, combine and reason about scientific knowledge."

Meta recently launched the demo version of its new artificial intelligence tool "Galactica" that is able to summarise academic literature, solve mathematics equations, generate Wikipedia-like articles and more.

It’s an evolving story (YouTube summary) that I found pretty interesting as human civilization gets better at LLMs and interact with their text prompt-based demos.

Galactica models are trained on a large corpus comprising more than 360 millions in-context citations and over 50 millions of unique references normalized across a diverse set of sources. This enables Galactica to suggest citations and help discover related papers.

Twitter avatar for @Grady_Booch
Grady Booch @Grady_Booch
Absolutely. Galactica is little more than statistical nonsense at scale. Amusing. Dangerous. And IMHO unethical.
Twitter avatar for @emilymbender
@emilymbender@dair-community.social on Mastodon @emilymbender
Facebook (sorry: Meta) AI: Check out our "AI" that lets you access all of humanity's knowledge. Also Facebook AI: Be careful though, it just makes shit up. This isn't even "they were so busy asking if they could"—but rather they failed to spend 5 minutes asking if they could. >> https://t.co/g1Ndvy2P10
12:07 AM ∙ Nov 17, 2022
328Likes46Retweets

While intended to accelerate writing scientific literature, adversarial users running tests found it could also generate realistic nonsense. After several days of ethical criticism, Meta took the demo offline, reported MIT Technology Review and dozens of other pubs.

Twitter avatar for @ylecun
Yann LeCun @ylecun
@Grady_Booch Oh come on Grady! Is your predictive keyboard dangerous and unethical? Is GitHub Copilot dangerous and unethical? Is the Automatic Emergency Braking System in your car dangerous and unethical because it doesn't do Level-5 fully autonomous driving?
2:06 PM ∙ Nov 17, 2022
113Likes3Retweets

Share

Meta AI has been in recent months touting its own horn, but this is a bit disgraceful and yet another PR setback for the company, Facebook.

The story however has taken many interesting turns.

Twitter avatar for @EMostaque
Emad @EMostaque
Should we put a @paperswithcode Galactica 120bn demo back online for science? Assuming no objections from @MetaAI given license terms 🤔
12:56 AM ∙ Nov 21, 2022
195Likes36Retweets

Click the above tweet to see the Poll.

The demo is currently still down.

Meanwhile executives at Meta A.I. are leaving as the company continues to be reorganized in recent months and years.

Galactica was supposed to help scientists and likely still will but the drama involved in the demo is reminsnic of Microsoft Tay. Twitter users typically try to “pervert” the A.I. usually evolving into clickbait articles about how racist the model became, whereupon the demo is pulled. This occurred even faster this time around in 2022 with Galactica.

Large language models (LLMs), such as OpenAI's GPT-3 (GPT-4 to be announced soon), learn to write text by studying millions of examples and understanding the statistical relationships between words.

I actually think this is a great or “noble attempt”. Enter Galactica, an LLM aimed at writing scientific literature. Its authors trained Galactica on "a large and curated corpus of humanity’s scientific knowledge," including over 48 million papers, textbooks and lecture notes, scientific websites, and encyclopedias. According to Galactica's paper, Meta AI researchers believed this purported high-quality data would lead to high-quality output.

Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. - Part of the Abstract of the Galactica paper.

Read the Paper

Meta AI’s ambition here is actually really impressive. This new large language model called Galactica (GAL) for automatically organizing science is really pretty neat. Galactica is trained on a large and curated corpus of humanity’s scientific knowledge. This includes over 48 million papers, textbooks and lecture notes, millions of compounds and proteins, scientific websites, encyclopedias and more.

Keep reading with a 7-day free trial

Subscribe to AI Supremacy to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
Previous
Next
© 2023 Michael Spencer
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing