Monday (no class work on Midterm Assignment)

Wednesday

The Problem: AI models were Black Boxes. No one knew how they were trained or where they failed, but they were being deployed in many contexts

* Intended Use: What is this for?

* Out-of-Scope Use: What is this not for?

* Factors: Demographic, environmental, or technical factors.

Margaret Mitchell & Timnit Gebru

Institutional Context: They were leading the Ethical AI team at Google Research.
The “Gender Shades” Legacy: Dr. Gebru’s previous work showed that facial recognition failed for women with darker skin (Buolamwini & Gebru, 2018).
Their Philosophy: > “A model’s performance is not a single number.” (Mitchell et al., 2019)
Standardization as Activism: They used the language of engineering to force social accountability.

Why this paper changed the world.

The Shift: Instead of one “Accuracy Score,” the paper demanded Disaggregated Evaluation.
What it reveals: A model might be 98% accurate overall, but only 60% accurate for a specific subgroup.
Impact: Within a year, Google, NVIDIA, and Salesforce adopted this framework.
This is now a “Regulatory Requirement” in many draft AI laws (like the EU AI Act).

The Success: Google marketed “Model Cards” as a sign of their leadership in safety.
The Conflict: In 2020, Gebru co-authored Stochastic Parrots, which critiqued the environmental and social costs of Large Language Models (LLMs).
The Firing: Google “accepted her resignation” (widely viewed as a retaliatory firing) when she refused to retract the critique.
Documentation is accepted by power only when it doesn’t threaten the bottom line