Speculating on Multimodal LLMs and GPT-4

Microsoft Kosmos-1 suddenly looks more interesting. Are MLLMs a precursor to AGI?

Mar 11, 2023

∙ Paid

There was quite a revelation by the CTO of Microsoft in Germany about the arrival of GPT-4, likely schedule for next week (week of March 13th, 2023). I like speculation about GPT-4, proved they are not all-out exaggerations or clickbait.

On March 6th, Microsoft announced Kosmos-1 a rather small LLM that’s multi-modal. What could it all mean? Large language models (LLMs) have emerged as powerful tools for a wide range of natural language processing (NLP) tasks.

Last year, the AI company OpenAI showed the huge language model GPT-3 with 175 billion parameters. Yet when it comes to multi-modal LLMs, size is not what matters. In the new paper Language Is Not All You Need: Aligning Perception with Language Models, a Microsoft research team presents KOSMOS-1, a multimodal large language model (MLLM) that is able to perceive general modalities, learn in context, and follow instructions. KOSMOS-1 achieves impressive performance on language, perception-language, and vision tasks.

KOSMOS-1 is just 1.6 Billion parametres.

Since Microsoft Germany CTO, Andreas Braun, confirmed that GPT-4 is coming within a week of March 9, 2023 and that it will be multimodal is makes KOSMOS-1 way more interesting. Multimodal AI means that it will be able to operate within multiple kinds of input, like video, images and sound.

But could Multi-modal LLMs be related to the emergence of AGI?

Keep reading with a 7-day free trial

Subscribe to AI Supremacy to keep reading this post and get 7 days of free access to the full post archives.