
Museum für Naturkunde Berlin
Germany250,000 specimen labels with handwritten metadata spanning two centuries. Standard OCR failed entirely — faded ink, damaged paper, mixed scripts, and non-standard layouts.
Developed a Smart Extract model — a single-pass AI that understands label structure contextually. Added Named Entity Recognition with GeoNames enrichment to automatically tag species and resolve place names.
First real-world Smart Extract deployment. Complete machine-readable dataset of 250,000 transcribed and tagged labels — a replicable model for natural history collections worldwide.









