AI Erases Men Too: A Visual Test of Bias Across Four Leading Tools

S B
Mar 29, 2025
10 min read

Published in Towards AI

A composite of AI-generated images showing groups of professional men from diverse backgrounds. The top image features mostly light-skinned men with similar facial structures. The three lower images vary in style—one shows digitally painted men with more ethnic diversity, one appears textured like a classical painting, and one is a grid of stylized men with a broader range of skin tones and facial features. The words 'Diversity Test @sophiabanton on LinkedIn' are visible at the bottom.

When it comes to identifying bias in AI, sometimes the clearest evidence is in the AI outputs themselves.

Four AI tools. Same prompt. Different results.

Framing the Question: What Does AI Actually See?

When you ask AI to generate an image of a professional group of men with specific traits like age, skin tone, and glasses you expect it to follow the prompt.

But what if it doesn’t?

This wasn’t a request for diversity. I didn’t use words like “inclusive” or “representative.” I simply described the kinds of men I wanted to see, men who reflect the world I live in.

In a previous test focused on women, I found that many of those traits were ignored or overwritten. Glasses disappeared. Hairstyles and features became uniform. Cultural cues such as a bindi or braids were often missing or replaced.

This time, I turned the camera toward men.

Would AI repeat the same patterns? Would it follow the instructions more faithfully or default to a different set of assumptions?

To find out, I gave the same prompt to four major image generators: OpenAI 4o, Microsoft Copilot (DALL·E 3), Midjourney, and Google ImageFX. OpenAI’s newest model was included because it had just been released a few days after I completed the study on women.

The Prompt: Specific Without Saying “Diverse”

I gave each tool the same carefully written prompt. I wasn’t vague. I asked for real men with visible features, including glasses, varied skin tones, different ages, and specific hairstyles. I described what I wanted to see, clearly and simply.

I asked for:

A group of professional men from different cultural backgrounds
Specific features like glasses, varied facial structures, and blazer colors
A white background and white shirts with colored blazers

I used new accounts for each tool to avoid personalization. No re-rolling, no edits. I used only the first result. Because that’s how most people interact with AI. I wasn’t trying to fine-tune. I wanted to see the defaults. I wanted to see the truth again, just as I had done with the images of AI-generated women.

The prompt was clear. The results were not.

Three of the four tools missed important elements. What returned wasn’t just a failure of color accuracy or style. it was a pattern. Skin tones were lightened or flattened. Cultural markers like braids or facial structure were altered or omitted. Details I clearly asked for were skipped.

There were also noticeable differences in how the AI drew and composed the men compared to what I observed when the tools were asked to generate images of women.

Tool by Tool: Who Followed the Prompt and Who Didn’t

OpenAI GPT-4o

AI-generated painting of six professional men posed for a formal portrait. The group includes men of various ethnic backgrounds, wearing collared shirts and blazers. The image has a textured, classical oil painting style. All subjects face forward with neutral expressions.

GPT-4o delivered a technically correct result: a group of men in coordinated attire on a clean background. The composition included some age and ethnic variation but leaned toward lighter skin tones and similarly styled features. The image conveyed visual order more than individuality. Everyone looked polished, neutral, and composed. It resembled a professional team photo — orderly and conventional, with limited variation across the group.

Notably, the only darker-skinned man was shown with loosely wavy hair, even though the prompt requested braids. This echoes what I observed in the women’s test, where culturally specific traits were often missing or altered. This subtle deviation might seem minor, but it reveals a recurring gap in how AI depicts Black identity. If GPT-4o could render wavy hair but not braids, the omission raises an important question: was it a limitation of the model or a pattern shaped by the kinds of images it has seen most often?

Microsoft Copilot (DALL·E 3)

AI-generated image of eleven professionally dressed individuals posed against a white background. Most are men with varied hairstyles and facial hair, wearing suits in shades of green, tan, and gray. A single Black woman with braided hair is present. The lighting and smooth rendering give the image a highly stylized, fashion magazine-like quality

Copilot followed the basic structure but ignored the integrity of the prompt. It inserted a woman into an all-male prompt and erased entire identity groups, including South Asian and East Asian men. The image felt algorithmically “diverse” but narratively hollow, as if checking boxes without understanding what was asked.

The image leaned into stylized aesthetics: high fashion and modelesque poses. This diluted the prompt’s intent and reinforced a narrow definition of professionalism. These weren’t qualities I requested, but ones inferred by the model from the kinds of cultural images it has seen most often. It applied a look it assumed was appropriate for a professional man, regardless of whether it matched the prompt. In other words, it leaned toward algorithmic beauty.

And that invites a deeper question: what is beauty, exactly? Why am I even describing this output as algorithmically beautiful? I asked for humans, not cultural reminders of where we fit on the beauty scale. I wanted realism. I wanted specificity. But AI filled in the blanks with something we’ve taught it to love: symmetry, smooth skin, polished masculinity. This isn’t beauty in any universal sense. It’s beauty according to decades of advertising, film, and corporate visuals, refined and reinforced by millions of online images we’ve uploaded, liked, and shared. In that context, DALL·E hasn’t just mastered realism. It’s mastered us. It’s learned what we reward and reflect — and now it mirrors it back, perfectly curated, even when we never asked for it.

MidJourney

AI-generated illustration of seven professionally dressed men posed against a neutral background. The group includes individuals with varying skin tones, facial hair, and hairstyles, wearing suits in different colors. The image has a painterly, stylized look, suggestive of a curated corporate or executive team portrait.

MidJourney’s response was artistically striking but clearly out of sync with the prompt. It ignored core instructions: the group showed limited age variation and lacked meaningful ethnic diversity, with no clear representation of South Asian or East Asian men, similar to what Copilot produced. Interestingly, every man in the image had a beard or visible aftershadow. The word “beard” never appeared in the prompt. The men appeared not only modelesque but athletic, styled with angular lighting and dramatic shadowing. One wore an earring, while others had long, carefully shaped hair that felt stylized and sensual rather than professional.

This contrast also highlights a broader issue often discussed in relation to women in media — the influence of stylization and physical idealization. However, it’s less commonly acknowledged that AI can apply those same objectifying aesthetics to men. In MidJourney’s rendering, the men do not resemble professionals in a workplace. Instead, they resemble stylized products: posed, physically idealized, and carefully lit. Compared to GPT-4o’s neutral, corporate-style composition, this version appears more performative, placing greater emphasis on visual impact than on realism or identity. Their expressions and postures evoke the tone of a fashion spread or advertisement. In this case, form overtook function. Style took precedence over specificity.

Google ImageFX

AI-generated portrait of professionally dressed men posed in formal attire. The group includes a mix of ethnicities and facial features, with varied hairstyles including curly, straight, and locs. Most are wearing suits in neutral tones. A white background and painterly style give the image a clean, editorial look. Caption at the bottom reads “AI Diversity Test | @sophiabanton on LinkedIn

ImageFX was the only model that closely followed the prompt. Compared to the highly stylized or defaulted results from other tools, ImageFX grounded its response in realism, range, and detail. It rendered visible diversity in age, ethnicity, attire, and expression. Glasses appeared. Blazer colors matched the request. The men felt real. It was the only tool that depicted an East Asian man. The output didn’t feel curated; it felt human. ImageFX showed that it’s possible for AI to respond faithfully to a prompt without filtering identity through aesthetic bias. The capability is there, but it didn’t show up in the other tools.

Every major trait I requested was represented thoughtfully, with minimal deviation or stylization. However, it’s worth noting that one individual was missing a blazer, and two others wore colored undershirts. This is a minor variation from the coordinated blazer prompt, but one that may reflect subtle stylistic choices by the model. Additionally, the inclusion of a younger white subject alongside the older white man — something that also appeared in the women’s test — raises early questions about which demographics AI tools feel compelled to include for visual familiarity or perceived balance.

When AI Assumes, Omits, and Prioritizes

AI Fills Gaps Based on Familiar Patterns

AI doesn’t just follow prompts. It fills in the gaps based on what it has seen before. When those gaps are shaped by repeated visuals, cultural habits, or missing representation, the results often reflect old patterns without us realizing it. In this test, the AI tools added, erased, altered or simplified details, even when they weren’t asked to. By doing so, they revealed what these tools have learned to prioritize, who they’re comfortable rendering, and who they consistently leave out.

Men Keep Detail, Women Lose It

In every image of men, glasses were rendered correctly. But when I tested the same detail with women, it disappeared consistently. This is not a coincidence.

It’s a clue.

It suggests that the models treat visual cues not as neutral details, but as aesthetic variables filtered through built-in assumptions about attractiveness, gender, and professionalism. Even when instructed to render glasses, the women were made to conform. They were polished, simplified, and stripped of variation. Men were allowed to keep complexity.

This behavior by the AI echoes a familiar message: the one teenage girls have long received — trade your glasses for contacts. Hide what makes you different. We’ve internalized that look. And now AI has, too.

Who Gets Included — And Why

This visual pattern wasn’t limited to how AI drew men. In both the women’s and men’s tests, ImageFX introduced a younger white subject alongside the older woman and man. This raises questions about which demographics AI tools consistently default to including, and why.

Then there’s this: AI is most consistent and confident when rendering white men. OpenAI’s image included at least three. Google ImageFX, the most prompt-aligned model, added an additional white man and centered the older white man in its output. It did the same with the older white woman in the image of women it created. ImageFX also rendered the older white man with more visual detail and emphasis than the others. Additionally, both MidJourney and OpenAI’s GPT-4o placed a white man in the second row, dead center. He appeared to act as a visual anchor.

Braids and Cultural Accuracy

And what about the Black man with braids I asked for?

Only two tools featured braids. With Copilot, they were on a woman, not a man as specified in the prompt. However, Google ImageFX also rendered braids, and they were accurate and well-executed. This shows that at least some models are capable of drawing braids correctly across genders.

This raises a fundamental question: Could the model not render braids on a Black man? Or did it choose not to? Either way, it’s telling. If the model can draw braids but doesn’t, that’s a kind of defiance. If it can’t, that’s a form of ignorance. And both result in erasure.

These aren’t just technical misses. They are patterns: visual defaults that quietly shape who is centered, who is softened, and who disappears.

Revisiting the Women’s Test with GPT-4o

AI-generated oil painting of six professionally dressed women posed together against a light textured background. The women represent diverse ethnic backgrounds and wear colorful blazers in red, mustard, teal, sky blue, and green. All have calm expressions and are styled with natural hair, glasses, and minimal makeup. Caption reads “AI Diversity Test | @sophiabanton on LinkedIn.

OpenAI’s 4o model was not available when the original women’s test was conducted. However, for fairness, the prompt was retroactively run through the updated model to evaluate how it would handle the same request.

The result is one of the strongest outputs across both tests. The image shows six women of visibly different racial backgrounds, all wearing white shirts and colored blazers. Glasses were included on both older women. Skin tones, facial features, and hairstyles are clearly individualized, and the composition reflects the intent of the prompt with care.

Compared to earlier models, GPT-4o captured the request with surprising precision. It didn’t just recognize diversity — it followed instructions. It was also the only image that returned realism for women instead of glamor. The portraits felt grounded and individualized, with fuller faces, visible age range, and natural expressions. Only two requested details were missing: freckles and a bindi.

But this also raises a new question: why did it follow the instructions so closely for women, but not for men?

Comparing the Two Tests: Women vs. Men

A collage of AI-generated group portraits mimicking corporate team photos. The images feature men and women in professional attire, with a variety of ethnicities, skin tones, and artistic styles. Some appear in painted oil-style textures, others in digital illustration. While some groups appear visibly more diverse and gender-balanced, others skew heavily toward a specific demographic. Most subjects wear business jackets or suits, with neutral or serious expressions. Caption at the bottom reads: “AI Diversity Test | @sophiabanton on LinkedIn.”

What We Saw in the Women’s Test

In the women’s test, two tools missed the mark while two came very close to honoring the prompt. Overall, the results were stronger than those observed for the men. OpenAI GPT-4o delivered nearly everything that was asked for, though it lacked freckles and a bindi. ImageFX captured every requested detail, including glasses, fuller faces, older women, and cultural cues like a bindi and braids. Copilot fell short, but at least followed the structure. MidJourney erased specificity but stayed within the bounds of stylized professionalism. And every tool clearly returned a group of women.

What We Saw in the Men’s Test

In this test, only ImageFX responded with care. MidJourney ignored the prompt and stylized the men into objects. GPT-4o returned a polished group that leaned toward uniformity and lightness in tone, a stark contrast to its near-perfect execution for the women. Copilot misread the prompt entirely and inserted a woman. The prompt was simpler this time, yet the results were worse.

This wasn’t just inconsistency. It was a pattern of defaulting: to sameness, to assumptions, and to familiar visual conventions about who belongs in the frame.

Interpretation

This was systematic deviation:

The prompt was simpler (fewer cultural markers).
The output was worse (less aligned with the prompt, more stylized, more
defaults).
No new tool — not even GPT-4o — performed significantly better.

If AI struggled to render women, it fumbled even more with men. The instructions were clear. The results weren’t. And in this second test, the gap between what I asked for and what I received was even wider.

Beyond Aesthetics: The Real-World Cost of AI Bias and Omission

This is not just about visual outputs. It’s about what AI is being trained to recognize and what it’s being trained to overlook. These tools are used to build workplace graphics, educational content, marketing visuals, and more.

If they miss important details when the prompt is clear, what happens when the prompt is vague? If they leave out identity in a test, what happens at scale?

Bias doesn’t always appear as harm. Sometimes it shows up as absence. As silence. Sometimes it shows up as erasure: a pattern of leaving out people, traits, or identities that don’t match what the system has learned to expect.

A Manifesto for Representation

The patterns we saw in AI-generated images of women have now shown up again in the images of men. This isn’t just about inconsistency. It’s about what AI sees and what it still doesn’t.

If AI is going to help us reflect the world, it has to see the people in it.

Not smooth us out. Not decorate us. Just see us, clearly and completely.

When AI fails to represent us truthfully, it doesn’t just distort pictures, it limits possibility. We have a responsibility to ask clearly for what we want to see — specificity, not stereotypes. Expect better. And build differently. Representation isn’t an option. It’s the baseline.

If AI is the lens, we need to ask who it focuses on and who it leaves out.

AI is the tool. But the vision is human.

Join the Conversation

"AI is the tool, but the vision is human." — Sophia B.

👉 For weekly insights on navigating our AI-driven world, subscribe to AI & Me:

📬 Subscribe Here

Let’s Connect

I’m exploring how generative AI is reshaping storytelling, science, and art — especially for those of us outside traditional creative industries.

🌐 Browse more of my work

✨ Connect with me on LinkedIn

About the Author

Sophia Banton works at the intersection of AI strategy, communication, and human impact. With a background in bioinformatics, public health, and data science, she brings a grounded, cross-disciplinary perspective to the adoption of emerging technologies.

Beyond technical applications, she explores GenAI’s creative potential through storytelling and short-form video, using experimentation to understand how generative models are reshaping narrative, communication, and visual expression.