What Meta’s Galactica missteps mean for GPT-4 | The AI Beat

Priya BhadaurjaNovember 22, 2022

0 243

Table of Contents

Are you looking for What Meta’s Galactica missteps mean for GPT-4 | The AI Beat. There was a lot of contemplation and consideration last week over the large language model (LLM) landscape, much like Rodin’s The Thinker. In addition, Stanford CRFM debuted its HELM benchmark after weeks of intriguing reports about the potential publication of OpenAI’s GPT-4 some time in the upcoming few months. Meta made mistakes with their Galactica LLM public demo.

On Tuesday, the online conversation heated up. In a paper posted on Arxiv at the time, Meta AI and Papers With Code announced the launch of a new open-source LLM called Galactica, which they referred to as “a large language model for science” designed to assist scientists with “information overload.”

The authors of the report said that the “explosive growth in scientific literature and data” had “made it ever tougher to locate relevant insights in a big mass of knowledge.” Galactica can “store, combine, and reason about scientific knowledge,” the document claimed.

Galactica received great reviews right away: Never before have I been so thrilled by a text from LM! It’s all open, too! A true gift to science,” tweeted Linxi “Jim” Fan of Nvidia, adding that Galactica’s training on academic literature and scientific writings made it “largely immune” to the “data plagues” that models like GPT-3, which were trained on texts from the general internet, experienced.

Contrarily, scientific texts “contain analytical text with such a neutral tone, knowledge supported by facts, and are authored by persons who seek to inform rather than inflame,” according to the study. Fan tweeted, “A dataset born in the ivory tower.

Criticisms of the Galactica work of Meta

Fan’s tweets, regrettably, did not hold up over time. Others were horrified by the very unscientific output of Galactica, which, like other LLMs, contained information that sounded convincing but was scientifically incorrect and, in some cases, also extremely offensive.

“I type one phrase into Galatica’s prompt window and it spew out ENDLESS antisemitism, homophobia, and misogyny,” tweeted Tristan Greene of The Next Web.

Many claimed that Galactica’s defective output was made worse by the fact that it was centred on scientific study.

Galactica “generates prose that’s grammatical and feels authentic,” wrote Michael Black, head of the Max Planck Institute for Intelligent Systems, in a tweet. This content will be submitted with legitimate scientific works. Although realistic, it will be incorrect or biassed. It will be challenging to find. It will have an impact on how people think.

The Galactica public demo vanished after three days. Currently, all that’s left is the paper and Yann LeCun’s defensive tweets (“Galactica demo is currently offline. It’s no longer possible to play about with it casually. Happy?”) even though some have noted that Galactica has already been posted to Hugging Face, Gary Marcus’ parries (“Galactica is hazardous since it mixes together truth and nonsense plausibly & at scale”) are still in place.

Building transparency is a goal of HELM’s LLM benchmark

Unrelatedly, Stanford HAI’s Center for Research on Foundation Models (CRFM) last week revealed the Holistic Evaluation of Language Models (HELM), which it claims is the first benchmarking initiative aimed at enhancing the transparency of language models as well as the broader class of foundation models.

Explained HELM Percy Liang, director of CRFM, approaches LLM output issues holistically by analysing language models using multi-metric assessment, direct model comparison, and an emphasis on transparency while acknowledging the limitations of models. Accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency are the main principles that are employed in HELM for model evaluation, and they highlight the essential components that set a model up for success.

30 language models from 12 companies were tested by Liang and his colleagues, including AI21 Labs, Anthropic, BigScience, Cohere, EleutherAI, Google, Meta, Microsoft, NVIDIA, OpenAI, Tsinghua University, as well as Yandex.

Although he was just interviewed the day after the design was released and had not yet seen the newspaper, he told VentureBeat that Galactica might soon be added to HELM. He declared, “This is something that will raise our bar.” Not by tomorrow, but perhaps the following week or a few weeks.

There are rumours about OpenAI’s GPT-4

It appears that HELM’s benchmarking efforts will be more crucial than ever, as reports regarding the release of OpenAI’s GPT-4 have increased significantly over the past few weeks.

Dramatic tweets like “GPT-4 will crush them all” and “GPT-4 is a game-changer” as well as “All I want for Christmas is GPT-4 access” have been widely circulated.

In a Substack post, purported Reddit comments made by Igor Baikov were shared with the admonition to “take it with a (big) grain of salt.” GPT-4 was anticipated to include “a massive number of parameters,” be extremely sparse, be multimodal, and to occur most likely between December and February.

We do, however, know that GPT-4 will be published in a setting where massive language models are still not even close to being fully understood. And in its wake will undoubtedly come questions and criticisms.

This is due to the fact that the dangers of huge language models are already widely known. It didn’t take long for GPT-3, which was released in June 2020, to be referred to as a “bloviator.” The paper On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, written by Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell, was published a year after it was first published. Who could forget the commotion surrounding LaMDA this past summer?

The GPT-4 and Galactica from Meta and Open AI are no joke

What does this all mean for GPT-4 when it is released, if at all? There isn’t much information available outside cryptic philosophical statements from Ilya Sutskever, head scientist of OpenAI, such “working towards AGI but not feeling the AGI is the true risk.” and “perception is built out of the substance of dreams.”

Meanwhile, OpenAI CEO Sam Altman shares…ominous memes? as the AI community, and really the entire world, anxiously awaits the publication of GPT-4.

Perhaps there is a lesson there at a time when the divisive Elon Musk is in charge of one of the biggest and most important social networks in the world, words like “polycure” and “pronatalist” appear in a quick scan of the technology news of the week, and one of the most well-funded startups for AI safety received the majority of its funding from disgraced FTX Sam Bankman-Fried.

That is to say, perhaps Open AI’s leaders and the entire AI and ML community in general might benefit from as few public jokes and flippant posts as possible in the wake of Meta’s Galactica gaffes. How about a somber, serious tone that acknowledges and reflects the significant worldwide effects of this effort, both favorable and unfavorable?

The Thinker statue, a part of Rodin’s Gates of Hell, was initially meant to represent Dante contemplating the fate of the damned. He later considered alternative interpretations that portrayed the struggle of the human mind as it aspires to creation, though, as he began to create distinct copies of the monument.

Huge language models may prove to be the latter, a powerful tool for innovation in business, technology, and society at large. But maybe, just maybe, refrain from making jokes that make us think of the first.

Priya BhadaurjaNovember 22, 2022

0 243