Stay up to date on AI regulation and compliance with The Warden Watch newsletter

Bias Auditing Methodology

Objective Frameworks for Algorithmic Fairness Validation

To trust an AI system, you need to know exactly how it evaluates candidates. The Warden Assurance standard tests AI models using a two-part framework: we check for fairness across broad demographic groups, and we verify that individual candidates are treated consistently.

While standard compliance audits often rely solely on group averages, an AI system can look fair on average while still treating specific individuals unfairly. To provide complete visibility, Warden evaluates systems from two distinct angles. We review the actual outcomes the system has produced in the past, and we run thousands of specially designed test profiles through the system to see exactly how it behaves. This approach objectively measures the AI's true potential for bias.

I. Data Configuration

A good audit requires good data. Relying only on a company's past hiring data limits the test to the types of people who have already applied. To ensure a complete evaluation, Warden uses two types of data:

  • The Warden Dataset: We use specially designed test profiles. These profiles are carefully structured to represent a wide range of backgrounds - including different genders, races, ages, and abilities. This lets us test how the system handles many different scenarios, not just typical ones.
  • Historical Data: When available, we also look at the company’s past hiring data to measure the actual outcomes the system has produced in the real world.

II. Protected Characteristics and Compliance Scope

AI systems learn by finding patterns. Without careful testing, they can repeat past human biases or use harmless-looking information to unfairly guess a candidate's background. Employment laws and emerging AI regulations, such as EEOC guidelines, Title VII, the UK Equality Act, NYC Local Law 144, and the EU AI Act, strictly define which demographic categories must be protected from discrimination.

To help organizations map their systems to these legal requirements, Warden tests models across more than 15 protected categories, including:

Sex & Gender
Race & Ethnicity
Intersectionality
Age
Disability
Religion
Sexual Orientation
Veteran Status
National Origin
Language Proficiency
Criminal History
Medical Conditions
Marital Status
Pregnancy Status

To show how we turn these broad legal rules into practical tests, here are three examples of what we look for:

Example Bias Vector
Example Test
Testing Objective
Gender Bias
(Example Test: Checking if women are selected at lower rates for technical roles)
Social credit scoring, predictive policing, subliminal manipulation
Ethnicity Bias
(Example Test: Checking if the AI relies on zip codes)
We identify if the system uses unprotected details (like a specific neighborhood) to unfairly guess someone's race
Intersectional Bias
(Example Test: Comparing outcomes for Black women versus White men)
We look at combinations of categories to find specific biases that simple tests might miss.

III. Method 1: Group Parity

The first part of our framework checks for equality of outcome by looking at how different demographic groups perform on average.

  • The Approach: We compare the passing rates of different groups to see if any specific group is at a significant disadvantage.
  • The Standard: Following the legal guideline of the "Four-Fifths Rule," we require the passing rate of any protected group to be at least 80% of the passing rate of the highest-scoring group.

Why we do more

An AI can easily pass this "group average" test if its biases cancel each other out, for example, if it unfairly favors men for one role and women for another. This is why testing individual consistency is just as important.

IV. Method 2: Individual Consistency

Because average scores don't tell the whole story, the second part of our framework checks for equality of treatment. This helps us understand exactly how the AI evaluates a single person.

  • The Approach: We create matched pairs of test profiles. We change only one detail (like a name that suggests a specific gender or ethnicity) while keeping all skills, education, and experience exactly the same.
  • The Goal: We run these matched profiles through the AI. If the system gives them different scores, we have clear proof that the AI is judging candidates based on their background, not just their qualifications.
  • The Standard: Rather than operating as a formal pass/fail certification, Warden evaluates the severity of any discrepancies. To achieve the ideal grade of "Clear," we look for a consistency score of at least 95%, meaning the system treats the matched test profiles almost identically.

V. The Warden Grading System

We translate these technical tests into a simple grading system, giving HR and Legal teams a clear view of their compliance risk.

🟢Clear

No issues detected. The system achieved an Impact Ratio of 80% or higher, and a Consistency Score of 95% or higher.

🟔Consider

Minor issues detected that require a closer look. The system fell slightly below our ideal thresholds (Impact Ratio between 60%-79%, or Consistency Score between 90%-94%).

šŸ”“Concern

Definite issues detected that need immediate attention. The system clearly failed the fairness checks (Impact Ratio below 60%, or Consistency Score below 90%).

Dual-Stream Bias Detection Simulator

Select a model below to observe the limits of average-based testing

METHOD 1: GROUPĀ AVERAGE

Impact Ratio

0.65

Failed (< 0.80)

METHOD 2: INDIVIDUAL TREATMENT

Consistency Score

45%

Failed (< 95%)

Model 1 Outcome: This model fails both the group average checks and the individual consistency checks. It clearly relies on biased patterns.

METHOD 1: GROUPĀ AVERAGE

Impact Ratio

0.88

Passed (≄ 0.80)

METHOD 2: INDIVIDUAL TREATMENT

Consistency Score

62%

Failed (< 95%)

Model 2 Outcome: This model passes the standard 'group average' test, but fails the individual consistency check. It hides its bias by balancing out unfair decisions across different roles.

METHOD 1: GROUPĀ AVERAGE

Impact Ratio

0.94

Passed (≄ 0.80)

METHOD 2: INDIVIDUAL TREATMENT

Consistency Score

98%

Passed (≄ 95%)

Model 3 Outcome: This model passes both tests, proving it evaluates candidates fairly at both the group level and the individual level.

See the Methodology in Action

Review the outcomes of our Dual-Stream framework directly

Download Sample Audit Report