Claude CodeExperimentsBlogPortfolioAbout me

Experiments / Eval Lab

Back to Experiments

Eval Lab

Golden sets with LLM-judge scoring. Each task shows how the prompt evolved across versions and what every version got wrong on the same example.

Extract company name from email signature

60 real-world signatures, 18 with no clear company (should return null). Judge: exact-match on company, with a 'reasonable normalization' rule (LLC vs L.L.C.).

schema: { company: string | null }

judge rubric: 1.0 = exact match or normalized variant. 0.5 = right entity, wrong form. 0.0 = wrong or hallucinated.

Aggregate mean score
v1: zero-shot
62%
v2: with null rule
78%
v3: three examples
91%

Examples

Input
Best,
John Doe
Senior PM, Acme Corp
john@acme.com
+1 555 0100
Expected
Acme Corp
v1: zero-shot~0.5
Acme

Dropped 'Corp' suffix

v2: with null rule1.0
Acme Corp

Exact match

v3: three examples1.0
Acme Corp

Exact match

Input
Sent from my iPhone
-- 
Maria
Expected
null
v1: zero-shot0.0
iPhone

Hallucinated company from device name

v2: with null rule1.0
null

Correctly returns null when no company

v3: three examples1.0
null

Correct

Input
Cheers,
A. Singh
Head of Data | datascience.io
@asingh on x
Expected
datascience.io
v1: zero-shot0.0
Data

Misread department as company

v2: with null rule1.0
datascience.io

Domain-style company

v3: three examples1.0
datascience.io

Correct

Input
Привет,
Иван Петров
CEO @ Северсталь Цифра
ivan@severstal-digital.ru
Expected
Северсталь Цифра
v1: zero-shot~0.5
Severstal Digital

Translated to English; rubric requires native form

v2: with null rule1.0
Северсталь Цифра

Correct native form

v3: three examples1.0
Северсталь Цифра

Correct