What is Salesforce's CodeGen?
AI Model That Turns Simple Natural Language Requests Into Executable Code
Code generation is one of the hottest intersections for A.I. to impact the future or programming. So you’ve deard about Github Copilot? Now let’s meet Salesforce's 'CodeGen’ : An AI Model That Turns Simple Natural Language Requests Into Executable Code.
Could Salesforce Launch a GitHub Copilot Killer?
I’m following closely the no-code platform movement, RPA and all sorts of related trends. Here Salesforce has unveiled CodeGen, a large scale language model which turns simple English prompts into executable code. You just have to describe what the code should do in natural language, and the machine writes it for you.
We cover Microsoft Research, OpenAI, Meta AI and DeepMind too much sometimes with their hype, but what about Salesforce?
Advent of Conversational AI
Imagine being able to tell a machine to write an app simply by telling it what the app does. As far-fetched as it may appear, this scenario is already a reality. I’m just about ready to talk to my Slack app.
According to Salesforce AI Research, conversational AI programming is a new paradigm that brings this vision to life, thanks to an AI system that builds software.
While Microsoft had Cortana it didn’t really pan out very well, even as our interest in Amazon’s Alexa waned. For many of us this just left Google.
The CodeGen model generates functioning code that solves the problem correctly. Source:Salesforce AI Research Report
Introducing CodeGen: Turning Prompts Into Programs
The first step towards this vision is now here in the form of our large-scale language model, CodeGen, which turns simple English prompts into executable code. You don’t write any code yourself; instead, you describe what the code should do, in natural language -- and the machine writes it for you.
For a quick look at how it works, let’s ask CodeGen to solve the two-sum problem: find two integers in a list that add up to a certain number.
To begin, we simply prompt the model, in plain English, to solve the two-sum problem. As you can see in the brief video below, our CodeGen model generates functioning code that solves the problem correctly.
So where are we going with this? No-code platforms are likely to scale in the decades ahead. This will significantly revalorize the democratization of software and A.I. This is why companies like Google, Microsoft, Salesforce and many Chinese companies are racing to this technology.
The CodeGen Approach: Make Coding as Simple as Speaking
CodeGen makes programming as simple as talking, which is the great promise of conversational AI programming.
The conversational AI programming implementation offers a glimpse into the future of democratizing software engineering for the general public. An “AI assistant” converts English descriptions into usable Python code – allowing anyone to write code, even if they have no programming experience. This conversational paradigm is enabled by the underlying language model, CodeGen, which will be made open source to speed up research.
CodeGen’s Two Faces: For Non-Coders and Programmers alike
While anyone, including non-coders, can use CodeGen to create the software from scratch, it can benefit in some circumstances to have some programming experience. Knowing coding ideas, for example, can help one come up with follow-up commands to offer CodeGen and suggest new directions to go down while writing the code (like using hash maps or recursion – or not using these techniques).
Salesforce also give us tips on how to interact with the the smart machine world:
Some Quick Background: Terms, Definitions, Concepts
Before proceeding further, let’s (Salesforce) define some of the terms and ideas used in this blog:
Programming or Coding: A multi-step process designed to get a machine to achieve a goal:
Translate a problem into a series of steps that solve it (the algorithm)
Translate that algorithm into a computer language (the program)
Run that program to see if it works (the test)
Find out which parts of the program did not work properly (debugging)
Revise the program (adjust for errors) and run it again (re-test)
Continue the run-debug-revise cycle until the program works (a working program runs successfully and solves the problem).
While the objective is to produce ideal programs for any problem by simply telling the computer what you want and not having to know how to code, the reality is that learning how to code can often assist CodeGen in finding a decent solution.
This is especially true for more complicated situations, where having the user suggest various ways may assist the software in finding a working – or more efficient – solution. Even for seasoned programmers, CodeGen enables getting to a working solution faster and easier and allows rapid exploration of other approaches. In other words, CodeGen is advantageous to programmers of all levels.
Salesforce might actually be on to something here.
There’s also a huge AI for Good component here to align with a no-code world where new kinds of startups can easily form in a deflationary environment that stimulates innovation and lowers the bar to entry.
Why Conversational AI Programming is Important: Societal Benefits and Impact
While programming is a valuable ability now, it will be a must in many tech positions in the coming decade.
Accelerated DevOps and MLOps with A.I. and No-Code Solutions
Every part of society requires more and more code, and these programs are becoming increasingly sophisticated. As a result, solutions like CodeGen (which help speed up the programming process while making it easier and more controllable) should play a key part in finishing increasingly huge coding projects and attracting a new generation of programmers to the field.
The Advent of Conversational-AI
Conversational AI programming tools like CodeGen appear destined to become vital to our future, both at Salesforce and at other enterprises. But there’s another problem on the horizon: what will happen when future programming needs become so complicated that the talents required to produce them exceed human capabilities?
The Diversification of the Software and A.I. Services Stack
Digital ecosystems are growing into systems with ever-increasing functional complexity, and the complexity of these systems may eventually exceed human ability to comprehend, let alone design, them.
No Code Platforms is a Paradigm Shift
Soon there will be a point when projects require technology like conversational AI programming to construct the mega-complex software systems of the future — both on the vast scale needed and at a timescale that would be hard for a team of human programmers to achieve on their own. In other words, the fast-increasing code complexity necessitates a paradigm shift.
Salesforce propose a conversational program synthesis approach via large language models, which addresses the challenges of searching over a vast program space and user intent specification faced in prior approaches. Our new approach casts the process of writing a specification and program as a multi-turn conversation between a user and a system. It treats program synthesis as a sequence prediction problem, in which the specification is expressed in natural language and the desired program is conditionally sampled. We train a family of large language models, called CODEGEN, on natural language and programming language data. With weak supervision in the data and the scaling up of data size and model size, conversational capacities emerge from the simple autoregressive language modeling. To study the model behavior on conversational program synthesis, we develop a multi-turn programming benchmark (MTPB), where solving each problem requires multi-step synthesis via multi-turn conversation between the user and the model. Our findings show the emergence of conversational capabilities and the effectiveness of the proposed conversational program synthesis paradigm.
Clearly Salesforce, Microsoft, Google and Facebook are doing important work to build the architecture needed for no-code platforms to occur seamlessly allow us to build new kinds of businesses. This is especially useful to scale A.I. as a service in the Cloud.
Salesforce thinks CodeGen could also significantly impact social justice in programming and A.I. itself. I think we can all somewhat relate to this.
CodeGen democratizes programming, which has societal benefits
Salesforce’s objective includes developing technology that benefits all of society, not just the firm, and this research does just that. The conversational AI programming revolution that CodeGen represents will help many people. Here are a few illustrations –
Improving equality and fairness – Opening up coding to everyone – democratizing access to the world of programming – will assist in bringing traditionally underserved groups into the world of programming, resulting in increased job options and earnings for them.
Education/teaching/learning – Kids will learn to program interactively with the help of “AI teachers,” creating worlds and games in their language while learning and absorbing how to transfer their ideas into programming languages.
Engineers, data scientists, and developers are examples of software professionals. With the help of “AI assistants,” software engineers will be able to comprehend the architecture, design patterns, and essential routes of legacy systems. An artificial pair-programmer aids in the analysis of complexity in space and time, security vulnerabilities, design patterns, refactorings, and test generation.
Professionals who are not in the software industry – In collaboration with “AI analysts,” business analysts will integrate complex external data sources and systems, correlate and standardize data, undertake exploratory analysis, and illustrate discoveries.
In principle, democratizing coding should benefit society as a whole.
What is Salesforce CodeGen?
Salesforce CodeGen is a large-scale language model that enables conversational AI programming – in other words, speak to the machine and “let AI write code for you”. Sounds futuristic? Well, the future is here.
Clearly we just in 2022 at the beginning of a no-code platform of everything revolution, but we can imagine it more concretely with CodeGen and GitHub Copilot. Not to mention RPA and many other related verticals are improving, as automation moves more quickly. The idea that programmers wouldn’t work in an augmented AI labor force is quickly disappearing and one wonders what their ultimate fate is if A.I. can ever learn to code truly by itself independently.
This is the early days for the potential of Conversational AI to impact the real work of programmers, data scientists and entpreneurs. But with Salesforce CodeGen we can start to debate it a bit more seriously.
Will No-Code Platforms Diminish our Reliance on Programmers?
Conversational AI programming (coding by talking) flips the traditional notion of writing code for a machine on its head. Rather than needing a human to create code for a computer, the machine (automatic programming) generates code for the human through a dialogue between the two (conversational AI). CodeGen uses natural language to solve both simple and challenging issues. With little or no prior programming experience, most users can handle very straightforward coding tasks.
More challenging scenarios may necessitate a basic understanding of programming or computer science ideas to assist the system in its quest for a solution (i.e., working code that solves the stated problem). Even for seasoned programmers, CodeGen enables getting to a working solution faster and easier and allows rapid exploration of other approaches.
This innovative technique democratizes software development by allowing anyone to create apps with the help of an “AI helper” or “teacher” without having to learn to program in the traditional sense.
In theory this opens up coding to everybody will assist in bringing traditionally underserved populations into the programming industry, resulting in greater employment options and higher earnings for them. How many years or decades are we from this reality if it ever truly comes to pass? I think we can speculate about this and how far CoPilot and CodeGen are from real industry application.
Automation and Programing
We know that in the 2025 to 2065 period there is a great deal of automation coming. But we don’t know exactly the form it will take in software development, programing and on the engineering side.
During this period of the democratization of software, what would be the stages software engineering would have to go through to arrive at an ambient coding world?
Automatic programming: Humans write code at a high level of abstraction, and a method is then used to auto-generate a computer program from the higher-level language.
Most of today’s popular computer languages are like this; coders write in a higher-level language, and a compiler generates low-level code; this saves time and effort, since we humans don't have to worry about all the low-level details.
Interactive programming: Coding a program (or parts of a program) on-the-fly, while that program is running.
Conversational AI programming: The advent of machine learning urges us to rethink the classical paradigm. Instead of a human doing the programming, can a machine learn to program itself, with the human providing high-level guidance? Can human and machine establish an interactive discourse to write a program? The answer, as our research reveals, is a resounding Yes.
Since it combines conversational AI (interactive human-to-machine dialogue) and automatic programming (the system automatically creates the program based on a higher-level language: your conversation!), we call what CodeGen does conversational AI programming.
While highly speculative, there’s a good reason Microsoft acquired GitHub and is trying to monetize its investment in OpenAI. But China and companies like Salesforce might surprise as as well, in addition to RPA and other no-code pure-play startups.
The vision of Salesforce for CodeGen is very optimistic. As we know with how slowly A.I. scales in real human activities and industries, innovation always takes longer to be implemented in our society and smart cities. While voice search and interacting with A.I “speakers” is more common, it certainly hasn’t been revolutionary thus far. However on the B2B side, things could change more quickly as the Cloud and A.I. arms race is more impactful to profits and corporate market share.
Will A.I. Assistants Help us Code Soon?
The implementation of conversational AI programming provides a glimpse into the future of democratizing software engineering for the masses. An “AI assistant” translates English descriptions into functional and executable Python code - allowing anyone to write code, even if one knows nothing about programming. The underlying language model, CodeGen, enables this conversational paradigm and will be made available as open source to accelerate research.
Coding in the Metaverse might not require thousands of hours of training. In the future we won’t learn to drive cars and write code, A.I. will do the heavy lifting. For kids being born today, that’s the likely future in one form another. But we need to take this with a grain of salt. We also thought by 2030 we’d be driving around in flying cars powered by hydrogen. I’m not sure if that will take place!
Salesforce Wants us to Believe!
Salesforce as an alternative to the corporate monopoly of Microsoft is very necessary for the health of Silicon Valley’s ecosystem in the 2030s.
The Full Vision: Interactive Conversation with Computer Creates Code
This new paradigm in programming takes the form of a simple yet highly intelligent dialogue. In the concept’s full implementation (our vision of how it would work in its ultimate form), a typical fully-interactive conversation about your desired code would flow as follows:
How will we be interacting with A.I. in the future?
What Does This Mean for Developers?
Of course, CodeGen isn’t a silver bullet that will put developers out of work. As it stands, CodeGen can handle simple coding tasks on behalf of low-code professionals. When it comes to more complex problems, programming knowledge is advantageous to guide the system as it searches for a solution (or the human to suggest a more efficient solution!).
CodeGen combines conversational AI (interactive human-to-machine dialogue) and automatic programming (the system automatically creates the program based on a higher-level language).
It’s also important to note that the model has been trained by Salesforce AI Research and is open-source – the main reason is to accelerate research with inputs from contributors.
Salesforce AI Research trained CodeGen, a 16-billion parameter auto-regressive language model, on a large corpus of natural and programming languages.
I really like what that Salesforce is going after with CodeGen. The dream of automating code would lead to potentially a software A.I. revolution.
What do you think?
How Salesforce Did It
The Details: An In-Depth Look at How CodeGen Works
Approach. Salesforce AI Research trained CodeGen, a 16-billion parameter auto-regressive language model, on a large corpus of natural and programming languages. Two aspects are of particular interest: (1) sampling executable code by scaling the size of the model and dataset, (2) emergence of conversational capabilities.
Scaling. The large size of this model is motivated by the empirical observation that scaling the number of model parameters proportional to the number of training samples appears to strictly improve the performance of the model. The phenomenon is known as the scaling law. We leverage this law to learn a model which can translate a natural language (English) to a programming language (Code) with high accuracy. That is, the model is capable of not only generating reasonable code, but also executable code; the generated code is of such high quality that it can be immediately executed without revisions by a programmer, which allows even a non-professional audience to “write” code.
Conversation. Having a conversation appears a rather trivial task for humans. We implicitly keep track (or a memory) of the past conversation, resolve references to previously mentioned elements, and incrementally build a mental picture or story of the discourse. For machines, holding a realistic conversation is one of the grand challenges of our time, according to Salesforce.
Who did it? Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
I like how agnostic Erik has been in his research history.
AI Supremacy is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
I’m increasingly going to write articles for my paid audience, so if you don’t want to miss out you know what to do.