Editor’s Note: Today we get to read Sahar Mor, who is a product Lead at Stripe. His minimalistic summaries of A.I. news are a hit. He promises us No personal takes, No summaries, and No endless scrolling: 🚀
You can catch his LinkedIn posts here (I recommend a Super follow).
If you are interested in a Guest post on A.I. Supremacy contact me in a DM here and pitch me your topic.
A lot happend in A.I. in just April, 2023. So get comfortable, get ready.
It’s time to 🤿 dive right in.
By May, 2023.
April was an exciting month for generative AI with a veritable treasure trove of progress: the open-source large language models space witnessed the emergence of groundbreaking models like Dolly 2.0, HuggingChat, and WebLLM and large Multimodal Models in the form of MiniGPT-4 and LLaVA have taken center stage, unlocking unparalleled potential in the fusion of AI capabilities.
Moreover, brace yourselves for the AutoGPT revolution as Automated GPT agents, such as BabyAGI and AutoGPT, make their mark in diverse use cases, from an automated sales representative to a new frontend developer colleague.
This month's highlights also include Meta's game-changing Segment Anything, the first foundation model for image segmentation, cutting-edge prompting techniques like Self-Refine, and an insightful glimpse into the future with Sam Altman's note on the highly anticipated GPT-5.
Researchers from UC Berkeley, CMU, Stanford, and UC San Diego unveil Vicuna-13B - an open-source alternative to GPT-4 which reportedly achieves 90% of ChatGPT’s quality
Databricks introduces Dolly 2.0 - the first open-source LLM for commercial use that was fine-tuned on human-generated instructions
Hugging Face introduces HuggingChat, an open source alternative to ChatGPT
Stability AI releases StableLM - their first-ever commercially available LLMs with 3B and 7B parameters models and more powerful ones to follow
LAION releases Open Assistant Conversational dataset, containing 600k human-generated data points covering a wide range of topics and writing styles in 35 languages
Together announces RedPajama - creating a leading, fully open-source LLMs, beginning with the release of a 1.2 trillion token dataset
CMU and OctoML release WebLLM, bringing instruction fine-tuned LLMs to the browser
Bloomberg announces BloombergGPT - an LLM trained on financial data to support financial NLP tasks
UCSD and Microsoft open source Baize - a chat model trained on100k dialogs generated by letting ChatGPT chat with itself
UC Berkeley publishes Koala, a dialogue model for research purposes that was trained by fine-tuning Meta’s LLaMA
Researchers introduce MiniGPT-4, an open-source vision-language LLM, powered by Vicuna-13B and BLIP-2
Microsoft and Columbia University introduce LLaVA - a large multimodal model combining a vision encoder and Vicuña for visual and language understanding, achieving impressive chat capabilities with the traits of multimodal GPT-4
OpenGVLab releases Ask-Anything - a chatbot that understands and converses over video
Researchers release Auto-GPT - a GPT-4 program that chains together LLM "thoughts" to autonomously achieve its user's goals
A VC partner open sources BabyAGI - an AI-powered task management model that leverages GPT-4 to plan, prioritize, and execute tasks
Microsoft and Zhejiang University release HuggingGPT - a framework leveraging LLMs to connect various AI models to solve AI tasks
Meta introduces Segment Anything, the first foundation model for image segmentation
Meta open sources DINOv2, the first method for training computer vision models that uses self-supervised learning without fine-tuning
Researchers introduce Track-Anything, an interactive tool for video object tracking and segmentation powered by Meta's Segment Anything
Join Premium 🌟🏆🚀
Researchers present vid2vid-zero - a novel zero-shot video editing method that utilizes off-the-shelf image diffusion models for text-to-video alignment
Nvidia showcases VideoLDM - a SOTA text-to-video model powered by Stable Diffusion, capable of generating minutes-long high-resolution videos
Researchers introduce Bark - a text-to-audio model that can generate highly-realistic, multilingual speech including music, background noise and sound effects
Researchers from CMU present AudioGPT, a multimodal AI that combines LLMs with foundation models to process and generate speech, music, and sound
Microsoft showcases AUDIT - editing audio with textual instructions
CMU and Google release Self-Refine - a novel approach for LLMs to iteratively refine outputs and incorporate feedback to improve performance on diverse tasks
Researchers from TAU and Allen Institute for AI present Multi-Chain Reasoning - a new approach for LLM reasoning, allowing humans to verify LLMs' output
Stanford researchers introduce Gisting - a novel technique for prompt compression in LLMs reducing the computation requirements while while maintaining performance
CMU And UCI present a new prompting technique in which LLMs recursively criticize and improve their output, improving AI agents in various applications such as solving computer tasks using a mouse and keyboard
DeepSpeed-Chat: easy and fast RLHF training of ChatGPT-like models
GPT4All-J - the first Apache-2 licensed chatbot that runs locally
LLaMA-Adapter - a lightweight method for fine-tuning instruction-following LLaMA models
Lit-LLaMA - implementation of the LLaMA language model based on nanoGPT for commercial use
Stanford and Google researchers showcase a tiny Sims-like town with AI agents that act as humans
Amazon enters the Generative AI race by unveiling Bedrock, a platform to fine-tune foundational AI models, along with CodeWhisperer - its Copilot coding companion
Sam Altman confirms OpenAI won't be developing GPT-5 any time soon
AI Tidbits April rounds-ups
Thanks for skimming! 🔮
After just 16 months of operations, A.I. Supremacy is a top #35 Newsletter on Substack’s Technology category.