Update on DALL-E A.I. 2, OpenAI's Picture Making Breakthough
OpenAI's picture generating A.I. upgraded
Credit source: OpenAI’s Twitter.
This week OpenAI has done a PR campaign on the latest abilities of DALL-E. DALL-E 2 was released in April 2022, and described as a model that "can create original, realistic images and art from a text description. It can combine concepts, attributes, and styles."
OpenAI, the San Francisco artificial intelligence company that is closely affiliated with Microsoft, just announced it has created an A.I. system that can take a description of an object or scene and automatically generate a highly realistic image depicting it.
Meanwhile it’s important to note in the deepfake era that Google has now publically said that AI-generated content is against its guidelines and will be considered like a kind of spam, according to SEJ (Search Engine Journal). It’s also not clear how Google would know if a piece of content is generated by A.I. either way, either in text or with images.
You will remember that already in late March, 2022, OpenAI Releases New Version of GPT-3 and Codex That Can Edit or Insert Content Into Existing Text.
So essentially, OpenAI’s DALL-E AI image generator can now edit pictures, too. This means the era of more content online being synthetic and (not real) generated by A.I. is coming very quickly thanks to Microsoft. Microsoft after a $1 Billion contribution to OpenAI is exceedingly involved in its product and commercialization.
In an era where many NFTs are generated by A.I. and memes are popular, perhaps A.I. can have its fun too:
I’m personally not so sure I want to be dealing with fake human accounts on LinkedIn though. Microsoft is getting greedy in its attempts at innovation and could fundamentally distort the future of how we trust content online if it continues in this direction. But as so much with machine learning and A.I. it cannot really be slowed down, since if they don’t do it, someone else will.
Early last year OpenAI showed off a remarkable new AI model called DALL-E (a combination of WALL-E and Dali), capable of drawing nearly anything and in nearly any style.
Now DALL-E 2 is out, and it does what its predecessor did much, much better — scarily well, in fact. But the new capabilities come with new restrictions to prevent abuse, according to the company and their PR.
DALL-E 2 Upgraded
DALL-E 2 features a higher-resolution and lower-latency version of the original system, which produces pictures depicting descriptions written by users.
It also includes new capabilities, like editing an existing image. As with previous OpenAI work, the tool isn’t being directly released to the public. But researchers can sign up online to preview the system, and OpenAI hopes to later make it available for use in third-party apps.
While that’s impressive, OpenAI’s impact is not just about Microsoft’s plans any longer, but about the future of the internet and synthetic content. If as people we won’t know what is real or A.I. Generated is that even desirable for the on-ramp into the Metaverse? Do we want to create more simulation like matrix environments and experiences? Is that A.I. for good?
Like the Verge hints at, the original DALL-E, a portmanteau of the artist “Salvador Dalí” and the robot “WALL-E,” debuted in January of 2021. It was a limited but fascinating test of AI’s ability to visually represent concepts, from mundane depictions of a mannequin in a flannel shirt to “a giraffe made of turtle” or an illustration of a radish walking a dog. Now with these technologies always improving, where does commercial viability begin and where does regulation even start? In 2022, there are no clear answers.
The Synthetic Content Paradox
Is it fun to ask an A.I. to create an image that or such as “A bear riding a bicycle through a mall, next to a picture of a cat stealing the Declaration of Independence.” I’m not sure if this is a triumph of A.I. worth celebrating myself.
DALL-E 2 does the same thing fundamentally, turning a text prompt into a surprisingly accurate image. But it has learned a few new tricks.
At the time, OpenAI said it would continue to build on the system while examining potential dangers like bias in image generation or the production of misinformation. It’s attempting to address those issues using technical safeguards and a new content policy while also reducing its computing load and pushing forward the basic capabilities of the model.
First, it’s just plain better at doing the original thing. The images that come out the other end of DALL-E 2 are several times bigger and more detailed.
But only a select few vetted testers can play with the generative AI for the time being.
It’s actually faster despite producing more imagery, meaning more variations can be spun out in the handful of seconds a user might be willing to wait.
It’s 2022 and we can Dress up Dogs Really Nice
Command: A DALL-E 2 result for “Shiba Inu dog wearing a beret and black turtleneck.”
How about something a bit more artistic?
“A sea otter in the style of Girl with a Pearl Earring” turns out pretty good. Image Credits: OpenAI.
Cute! I mean I guess?
Part of that improvement comes from a switch to a diffusion model, a type of image creation that starts with pure noise and refines the image over time, repeatedly making it a little more like the image requested until there’s no noise left at all.
Want to generate an image for your work of fiction, no problem. One of the new DALL-E 2 features, inpainting, applies DALL-E’s text-to-image capabilities on a more granular level. Users can start with an existing picture, select an area, and tell the model to edit it. Sorry then, no human illustrators needed.
When OpenAI gets going with GPT-4, my (amateur Futurist) gut tells me a lot of copywriters, marketers, sales reps and social media managers also won’t likely to be needed. That’s just the nature of the beast when you open up this Pandora’s box. While I’m sure Microsoft is okay with this, I’m not sure the world will be.
The PR of course don’t mention that, but only focuses on how amazing the new iteration of the A.I. is. TechCrunch goes on to state: DALL-E 2’s capabilities are much greater, able to invent new things, for example a different kind of bird, or a cloud, or in the case of the table, a vase of flowers or a spilled bottle of ketchup. It’s not hard to imagine useful applications for this.
There’s no opt-in as to whether we actually want A.I. to create content at scale and transform the internet into something unrecognizable in just the next five years. There’s no moral dilemma stated, not even an open-question, from our dear startup OpenAI, that sold out (to BigTech) at the first chance it got. I mean hey whatever is good for Sam Altman’s wallet I guess.
Because well, if Elon Musk actually one day ages, Sam Altman will be the chosen messiah of Silicon Valley. Right?
Still the A.I. does stunning work. Especially if Teddy bears with a Japanese flavor is your thing! Or you know, imaginary flower shops (let’s call it digital Portugal).
Examples of teddy bears in an ukiyo-e style and a quaint flower shop. Image Credits: OpenAI
The third new capability is “variations,” which is accurate enough: You give the system an example image and it generates as many variations on it as you like, from very close approximations to impressionistic redos. You can even give it a second image and it will sort of cross-pollinate them, combining the most salient aspects of each.
Well as A.I. images become “even better than the real thing” it appears the internet will be even less about free speech and the truth and even more about the commercialization of the fake and unreal. As an older Millennial this might irk me, but I can assume a GenZ or Alpha citizen might not care about that context.
The Corporate Dystopia of Internet Content and Sales Funnels
So where do we go from here? Up to you Microsoft!
You can block out a painting on a living room wall and replace it with a different picture, for instance, or add a vase of flowers on a coffee table. The model can fill (or remove) objects while accounting for details like the directions of shadows in a room. That sort of sounds like “creating your own world” which sounds a bit like a precursor to the Metaverse to me.
I’m all about ambient computing and A.I., preferably if it can do useful stuff that doesn’t displace people in their jobs. Why would I pay an illustrator, digital artist or a blogger if I can incorporate A.I. generated text, with an A.I. marketer who knows the best Ads and with A.I. generated images all around?
BigTech has a high chance of creating Synthetic Creators, not even real personal brands to ditch the “Creator Economy” aspect, so they don’t have to pay those folk. Combined with GPT-4 this won’t just impact low level illustrators and content creators. There’s a real chance marketing folk and sales folk will become expendable.
Progress is Faster Now
DALL-E 2 builds on CLIP, a computer vision system that OpenAI also announced last year. “DALL-E 1 just took our GPT-3 approach from language and applied it to produce an image: we compressed images into a series of words and we just learned to predict what comes next,” says OpenAI research scientist Prafulla Dhariwal, referring to the GPT model used by many text AI apps.
DALL-E 2 runs on a hosted platform for now, an invite-only test environment where developers can try it out in a controlled way. Part of that means that all their prompts for the model are evaluated for violations of a content policy that prohibits, as they put it, “images that are not G-rated.”
That means no: hate, harassment, violence, self-harm, explicit or “shocking” imagery, illegal activities, deception (e.g., fake news reports), political actors or situations, medical or disease-related imagery, or general spam. In fact much of this won’t be possible as violating imagery was excluded from the training set: DALL-E 2 can do a shiba inu in a beret, but it doesn’t even know what a missile strike is.
OpenAI sees this as sufficient ethical rules and safeguards.
CLIP was designed to look at images and summarize their contents the way a human would, and OpenAI iterated on this process to create “unCLIP” — an inverted version that starts with the description and works its way toward an image. DALL-E 2 generates the image using a process called diffusion, which Dhariwal describes as starting with a “bag of dots” and then filling in a pattern with greater and greater detail.
An existing image of a room with a flamingo added in one corner.
If A.I. can design images like this in 2022, what will it be able to do in 2032?
“We hope tools like this democratize the ability for people to create whatever they want,” Alex Nichol, one of the OpenAI researchers who worked on the project, said.
That’s so incredibly reassuring.
I’m going to be putting more of my articles for AiSupremacy premium members. So if you don’t want to miss out, support my work.