What Is DALL-E: Understanding the Basics of the Most Popular AI Image Generator
A few days back, artificial intelligence (AI) research firm OpenAI announced that it’s expanding the beta version of its popular AI image-generating tool called DALL-E 2. As of last month, the tool was only available to only a few thousand users. However, OpenAI has expanded the waiting list to over 1 million people now.
Ever since this announcement, there has been a lot of hype around this AI image generator. For those closely working in the AI space, it’s not a big deal. But for many of us non-tech folks, it’s intriguing.
So, we decided to create an introductory guide around DALL-E 2 and AI image generators in general.
Here we not only talk about the basics of DALL-E but also share many real examples of what is possible with this tool. We also briefly touch upon the various limitations of DALL-E and some of its alternatives.
So, let’s get started.
What is an AI image generator?
An AI image generator is a type of generative AI that’s capable of creating novel and original images from text.
Think of it like this – An AI image generator can ingest words you feed and based on those words, it can create realistic-looking images.
It uses unsupervised learning algorithms to create new plausible content using an existing database. The machine learning algorithms understand patterns in the input content and use these patterns to generate similar content.
What is DALL-E?
DALL-E is a proprietary language model by Open AI that’s capable of converting plain text instructions into realistic images.
First launched in January last year, DALL-E is a 12-billion parameter modified version of Open AI’s super popular language generation model GPT-3.
In simple terms, it’s an algorithm trained to generate images from nothing but text descriptions. It uses machine learning to understand the logical link between words and interpret it in a visual form.
DALL-E is among the new generation of AI image generators that are capable of creating high-resolution realistic images without requiring any cognitive inputs from humans.
The second generation called DALL-E 2 hasn’t been released by Open AI. However, the company has offered the option to join the waitlist. Moreover, the company has open-sourced CLIP which forms the basis of DALL-E 2. We’ll talk more about CLIP in the subsequent sections.
Fun fact: The name DALL-E is coined after the artists Salvador Dali and Wall-E.
How does DALL-E work?
When you add a piece of text, DALL-E encodes the text instruction and tries to understand the words separately before finding a logical link between the words.
Now, to explain the working of DALL-E more deeply, let’s look at 4 key high-level concepts that you should be aware of –
- CLIP – It’s a model that creates image-caption pairs and creates a mental representation in the form of vectors (also known as embeddings).
- Prior model – It takes the CLIP text embeddings and turns them into CLIP image embeddings.
- Decoder Diffusion model (unCLIP) – It takes the CLIP image embeddings and uses them to generate images.
- DALL-E 2 – It’s a combination of prior and unCLIP models.
The image (by Open AI) showcases the two-part model that is DALL-E 2.
The first generation of DALL-E was trained using text-image pairs using 12 billion parameters. The second generation, although using comparatively fewer parameters (3.5 billion) claims to produce a wider variety and better resolution visuals.
Some top use cases
AI image generators are still in the early stages. Therefore, the scope and use cases will expand as the technology progresses. With time, many new industries will start seeing its applicability.
But here are some of the areas where DALL-E is currently used –
- One of the biggest areas where DALL-E has applications is in improving the computer vision systems that power modern-day autonomous vehicles.
- It is also used by individuals with disabilities to help them create art.
- It also offers illustrators and visual designers a tool to supplement their workflows or create templates for them to work on.
Seeing the tool in action
We’ve discussed a lot about what DALL-E is and what it is capable of. Now, it’s time to see the tool in action. Check out the images below that highlight what this is tool is capable of.
Did you say a solar panel shipping container?
An IT guy from 1506?
Created with DALL·E 2 by @OpenAI— Merzmensch Kosmopol (@Merzmensch) April 27, 2022
📝 "An IT-guy trying to fix hardware of a PC tower is being tangled by the PC cables like Laokoon. Marble, copy after Hellenistic original from ca. 200 BC. Found in the Baths of Trajan, 1506."
🔎 He just can't#DALLE // #dalle2 // #DALLEmerz pic.twitter.com/5kZ9p9u6Ph
An artwork featuring a kid and a dog
Such realistic images
That’s absurd but wonderful
Did someone say Teddy bears?
Looks like DALL-E is competing with DaVinci now
What are the risks involved?
So, we saw what DALL-E is capable of creating. What’s more impressive is the fact that it can create such realistic images using only a simple piece of text.
But while the amazingness of DALL-E is there for all to see, we must also talk a bit about its limitations and risks. Here are a few of them –
- Language models of such large size are often prone to bias, toxicity, stereotypes, or behaviors that can be discriminatory.
- The tool, when in the wrong hands, can be used to create prohibited images that might be used to threaten or harass others. Think of how deepfake technology has the potential to be misused by cybercriminals.
- It can be used to create explicit content.
- It can be used to spread fake news and misinformation.
The risks seem scary and therefore, organizations need to be extra careful in how they choose to use this tool.
DALL-E is not the only AI image generator in the market. There are many alternatives available albeit not as popular as DALL-E.
Here are a few top alternatives to DALL-E –
- Deep Dream Generator by Google
- Big Sleep
- AI Gahaku
DALL-E, as of now, is not available for everyone. The lucky few who will get access to the tool will get 50 credits in the first month to try out the tool.
As for the future implications, it’s hard to predict the various avenues where we will find the applicability of this tool. But one thing is for certain – DALL-E represents an exciting future and gives us a glimpse into what the future of AI might look like.