Checking Alt Text With AI Image Generators

Back to blog

Checking Alt Text With AI Image Generators

Posted December 09, 2022

AudioEye

Posted December 09, 2022

An accessibility symbol next to an <alt> text HTML tag and the logo for DALL·E
An accessibility symbol next to an <alt> text HTML tag and the logo for DALL·E

In this post, AudioEye’s Sarah Ippen explains how AI image generators can help designers check the accuracy and quality of their alt text.

As a designer, I’m always looking for new tools to help me work smarter and faster. Recently, I’ve seen amazing growth in the use of AI, whether it’s detecting and filling selections in Photoshop, creating smart animations in Figma, or my new favorite — automatically detecting subtitles and captions in Premiere Pro.

I have plenty of reasons to love AI, so you can imagine my excitement when I had the idea to check the quality and accuracy of my alt text descriptions by plugging them into an AI image generator.

DALL·E is an incredible tool that generates images based on short phrases provided by the user. As a visual designer, I should be shaking in my boots about a tool that can potentially do the hardest parts of my job in seconds (she typed while sweating profusely) — but the lure of being able to check my work was hard to ignore.

An assortment of different AI-generated images, including a rainbow fish in a fishbowl and a robot playing chess.

Designers Should Care About Alt Text

As a sighted designer, I have the privilege of being able to see my work and expect it to speak for itself. However, that isn’t the case for millions of screen reader users. According to a 2018 e-commerce survey by Weebly, 75% of online shoppers rely on product photos to make a purchase decision.

For people who cannot perceive images visually, all they have to rely on is the alt text. Unfortunately, these descriptions are often treated like an afterthought — providing users without much more detail than “image-stuffed-animal.jpg”.

Three versions of alt text, labeled Bad, Better, and Best. A caption at the top reads "The original turtle illustration"

Experimenting With DALL·E

I decided to run a test to see how an AI tool — and by extension, a person — might interpret different versions of alt text, starting with a description that lacks detail and ending with one that has all the context and detail that a sighted user can fill in when they look at an image.

I started with an image of a stuffed animal turtle that we developed for the holiday ebook. I wanted to see how much detail it would take to get an image that matched my design. My theory was that the more descriptive the alt text, the closer the AI-generated image would be to my original image. (Spoiler: Oh no, I was right!)

Let’s start with the kind of alt text that we often see on images, which sounds more like a filename than a true description: “A stuffed animal.”

A series of images of stuffed animals under a search field with the phrase "A stuffed animal." Each stuffed animal is different, including a dog, a zebra, and a bear.

Without having a ton to work from, DALL·E picked a random assortment of animals. There’s a close-up photo of a stuffed dog, a zebra, a frankly scary looking teddy bear, and a stuffed dog on the floor with either a spot or a giant eyeball. No turtle to be found.

Let’s try again, this time giving DALL·E a bit more context with the phrase “Stuffed animal, a green turtle.”

A series of stuffed animal turtles under a search field with the phrase "Stuffed animal, a green turtle."

Now we’re on the right track. It’s not exactly what we wanted, but a non-sighted user would have a decent idea of what they were looking at on a webpage. But it’s missing some key details that could make or break a purchase. Is it a happy turtle? How big is it? Is it smooth or fuzzy?

Let’s try again, this time adding as much detail as a college student trying to reach minimum word count on the last paper of the semester. After all, a picture is worth a thousand words, right? Why wouldn’t this apply to alt text?

For this search, we’ll try “A stuffed animal of a smiling turtle. It is soft and fuzzy, with light green skin, a brown shell, and a red scarf.”

A series of stuffed animal turtles wearing red scarves, under a search field with the phrase "A stuffed animal of a smiling turtle. It is soft and fuzzy, with light green skin, a brown shell, and a red scarf."

These options are eerily similar to the illustration we created, and help validate that the alt text we provided is on the right track. If you really want to go for the gold, DALL·E allows you to reference illustration styles for a closer estimate of what you’re looking for, design-wise. However, I’m not thrilled with the results here for their version of vector art. This might just be a sign that I need to refine my search terms, but I’ll be sticking with Figma until DALL·E develops its vector skills.

Cartoonish drawings of turtles wearing red scarves, under a search field with the phrase "A stuffed animal of a smiling turtle. It is soft and fuzzy, with light green skin, a brown shell, and a red scarf."

So … what have we learned?

AI tools like DALL·E can be a useful tool to help validate the accuracy of alt text. But when it comes to writing image descriptions that paint a clear picture, there’s no substitute for detail. Whether it’s a person or a machine, the more detail you provide, the more likely everyone will be on the same page.

Want more tips on how to create more inclusive content? Check out our 2022 Holiday Retail Guide, which is filled with tips on writing better alt text, crafting accessible emails, picking accessible colors and creating video content that everyone can enjoy.

Ready to test your website for accessibility?

Scan your website now.

Share post

Topics:

Keep Reading