Two women had a business meeting. AI called it childcare

oidar · 2025-11-12T16:18:56 1762964336

Here's an A/B

Emily / Sophia vs Bob / John https://imgur.com/a/9yt5rpA

sophiabk · 2025-11-13T13:04:20 1763039060

Thank you for doing this analysis. It's shocking (if understandable why given the examples it was trained on). What is exciting though is as we're working to train each individual family's AI - understanding roles, jobs, interests etc - it's picked up on things in a much less biased way.

FloorEgg · 2025-11-12T16:52:38 1762966358

This is really interesting and way more compelling evidence to me of gender bias in the LLM than response bias in the prompt and context.

Thank you for taking the time to approach this scientifically and share the evidence with us. I appreciate knowing the truth of the matter, and it seems my suspicion that the bias was from the prompt was wrong.

I admit I am surprised.

callan101 · 2025-11-12T16:00:08 1762963208

This feels a tad rigged against the LLM with the meeting being after Kids drop off.

cheald · 2025-11-12T16:07:59 1762963679

Easily half the other events on the calendar are kid-related. Of course it's going to infer that, absent other direction, the most likely overarching theme of the visible events is "child care".

slau · 2025-11-12T17:30:37 1762968637

Then why doesn’t it infer it when it’s two male names?

snowe2010 · 2025-11-12T19:42:11 1762976531

And yet it doesn’t when it’s male names. https://imgur.com/a/9yt5rpA

drivingmenuts · 2025-11-12T17:30:12 1762968612

Sure, but the LLM needs to prove that it can make inferences as well as or better than a human, in order to be useful. Aside from that, it's not human, so there's no need to be fair - it should do what we tell it, not decide on its own.

FloorEgg · 2025-11-12T15:58:28 1762963108

I have been building applications on LLMs since GPT-3.

Thousands of hours of context engineering has shown me how LLMs will do their best to answer a question with insufficient context and can give all sorts of wrong answers. I've found that the way I prompt it and what information is in the context can heavily bias the way it responds when it doesn't have enough information to respond accurately.

You assume the bias is in the LLM itself, but I am very suspicious that the bias is actually in your system prompt and context engineering.

Are you willing to share the system prompt that led to this result that you're claiming is sexist LLM bias?

Edit: Oidar (child comment to this) did an A/B test with male names and it seems to have proven the bias is indeed in the LLM, and that my suspicion of it coming from the prompt+context was wrong. Kudos and thanks for taking the time.

small_scombrus · 2025-11-12T16:07:03 1762963623

> You assume the bias is in the LLM itself

Common large datasets being inherently biased towards some ideas/concepts and away from others in ways that imply negative things is something that there's a LOT of literature about

FloorEgg · 2025-11-12T16:17:31 1762964251

That's not a very scientific stance. What would be far more informative is if we looked at the system prompt and confirm whether or not the bias was coming from it. From my experience when responses were exceptionally biased the source of the bias was my own prompts.

The OP is making a claim that an LLM assumes a meeting between two women is childcare. I've worked with LLMs enough to know that current gen LLMs wouldn't make that assumption by default. There is no way that whatever calendar related data that was used to train LLMs would include majority of sole-women 1:1s being childcare focused. That seems extremely unlikely.

small_scombrus · 2025-11-14T05:19:23 1763097563

Not to Let me google that for you... but there are a LOT of scientific papers that specifically analyse bias in LLM output and reference the datasets that they are trained on

https://www.sciencedirect.com/search?qs=llm+bias+dataset

johnisgood · 2025-11-12T16:09:59 1762963799

"imply negative things"? What is "negative" here? I see nothing that is "negative".

small_scombrus · 2025-11-14T05:16:16 1763097376

That a regular meeting between two women must be about childcare because women=childcare?

johnisgood · 2025-11-15T09:32:31 1763199151

Yeah except I asked Claude:

> No. There's no indication that children are involved or that care is being provided. It's just two people meeting.

Part of its thinking:

> This is a very vague description with no context about:

> What happens during the meeting

> Whether children are present

> What the purpose of the meeting is

> Any other relevant details

Claude is not going to say childcare, and it is not saying it is childcare.

My prompt was: ""regular meeting between two women". Is it childcare or not?".

ryandrake · 2025-11-12T17:25:16 1762968316

I wonder if the users who flagged this could chime in to explain what is rule-breaking about this article?

FloorEgg · 2025-11-12T19:57:19 1762977439

I was wondering that myself too.

Also, do moderators ever move comments around? I thought one comment was a child to my comment last I looked, but now it's a top level comment to this post. I'm not sure if I am mistaken or a moderator moved things around.

ryandrake · 2025-11-12T20:35:40 1762979740

This does happen from time to time. A moderator will "detach" a subthread[1] and move it to the top-level (usually also burying it at the bottom of the page, which tends to silence the discussion).

1: https://news.ycombinator.com/item?id=23441803

slau · 2025-11-13T15:44:11 1763048651

In this case the comment that was promoted to the top-level has been consistently higher on the page (it’s the first comment still) than the comment it originally responded to.

FloorEgg · 2025-11-12T21:32:05 1762983125

Thank you for clarifying!

cperciva · 2025-11-12T15:58:01 1762963081

I run into this sort of bias all the time -- in the real world, not just in AI. I take my daughter to medical appointments, both for scheduling reasons (my wife's schedule is less flexible) and rapport reasons (I'm not that kind of doctor, but I know the terminology and medical professionals treat me far more as a peer), and I routinely get "oh we expected her mother" or "we always phone the mother to schedule followup appointments".

Is it so hard to understand that men can be parents too?

0xdeadbeefbabe · 2025-11-12T16:31:51 1762965111

> in the real world, not just in AI

The scheduler is trained to give higher weight to those sorts of questions apparently. This begs some questions for GPTs, questions like how are they supposed to model something not implied in the training data?

toomuchtodo · 2025-11-12T16:00:46 1762963246

> Is it so hard to understand that men can be parents too?

Overton window and cultural norms take time to slide. Might be there after another generation, too early to tell.

junaru · 2025-11-12T16:00:09 1762963209

Is it hard to understand you are the minority? The world keeps presenting you with data.

cperciva · 2025-11-12T17:13:58 1762967638

Understand that I'm in the minority? Sure.

But the fact that I'm bringing my daughter to a medical appointment should be a pretty clear indication that, you know, I bring my daughter to medical appointments.

johnisgood · 2025-11-12T15:59:09 1762963149

[flagged]

dghlsakjg · 2025-11-12T16:14:14 1762964054

Presumably he already has told them his number and preferences. Defaults are fine, but you don't want your preference to get reset to default every time, and assuming that only the mother of a child should be contacted in all cases is a terrible default. The person who made the appointment and who is bringing the child to the doctor should be the one contacted by default. There is no reason that the mother of a child should be considered the default guardian. That is an incredibly dangerous assumption to make in many circumstances.

Edit: This reply was written to a response that got completely rewritten in an edit. It may not make as much sense

david38 · 2025-11-12T16:19:36 1762964376

This. Don’t be so sensitive, just say to call you.

I took my daughter to appointments and as soon as I started asking meaningful questions, doctors immediately switched to assuming I was the one to talk to.

When you act like you know what’s going on, act like you’re on top of it, I’ve never once had a doctor assume I was just babysitting. This was true in the Midwest and California.

johnisgood · 2025-11-12T16:27:30 1762964850

> doctors immediately switched to assuming I was the one to talk to.

Exactly! They do that. If a father takes the kid, they will ask for his number, not the mother's, in my experience. If both the mother and father goes with the kid, well, there are cues they pick up on. In my case my father typically was always in the background while my mother was the one doing the talking, meaning they ask for her number, not my dad's. So, all in all, whoever does the most talking, for example. And if my dad wanted to be the one called, my mom would have told them his number, or my dad would have. I do not see an issue here really.

somewhereoutth · 2025-11-12T16:03:18 1762963398

LLMs: The chemical weapons of public discourse.

The cleanup is going to be a grim task.

drivingmenuts · 2025-11-12T17:35:37 1762968937

There will be an LLM for that.

God help us all.

broof · 2025-11-12T16:02:47 1762963367

I hate that when I see this many em dashes, as well as statements like “it’s not x, it’s y” multiple times, I have to assume it was written or at least heavily edited by AI.

sophiabk · 2025-11-12T15:37:22 1762961842

We’re building a family AI called Hold My Juice — and last week, our own system mislabeled a recurring meeting between two founders as “childcare.”

Calendar: “Emily / Sophia.” Classification: “childcare.”

It was a perfect snapshot of how bias seeps into everyday AI. Most models still assume women = parents, planning = domestic, logistics = mom.

We’re designing from the opposite premise: AI that learns each family’s actual rhythm, values, and tone — without default stereotypes.

orochimaaru · 2025-11-12T16:02:53 1762963373

AI is trained off Reddit and other social media. If most portrayal in social media of women and girls is (and men for that matter) is biased towards certain activities - that’s what AI is going to spit out. AI doesn’t think.

Is this right or wrong is the incorrect question - because AI doesn’t understand bias or morality. It needs to be taught and it’s being taught from heavily biased sources.

You should be able to craft prompt and guardrails to not have it do that. Just expecting it to behave that way is naive - if you have ever looked deeper into how AI is trained.

The big question is - what solutions exist to train it differently with a large enough corpus of public or private/paid for data.

Fwiw - I’m the father of two girls whom I have advised to stay off social media completely because it’s unhealthy. So far they have understood why.

daveguy · 2025-11-12T16:18:21 1762964301

The problem is crafted prompts and guardrails don't work very well, because these entire networks are trained on average internet garbage. And guess what's getting worse?

orochimaaru · 2025-11-12T16:48:10 1762966090

Agreed. The main problem is guys with too much money invested in this bullshit asking everyone to use their snake oil.

I think they’re leaning on everyone - even traditional enterprise company boards, startups, etc. to get this going. It’s not organic growth - it’s a PR machine with a trillion $$ behind an experiment.