Artificial Intelligence Machine Learning DevOps

From Expert Judgment to Autonomous Optimization: Encoding Human Expertise into LLM Judges with DSPy

Name: From Expert Judgment to Autonomous Optimization: Encoding Human Expertise into LLM Judges with DSPy
Location: Universidad EAFIT

LEVEL: Intermediate LANGUAGE: English

A single misread clause in a reinsurance contract can mean millions in liability. Our LLM pipeline could extract and summarize these documents, but how do you know the output is actually correct? String matching fails ("USD 5,000,000" vs "$5M" scores zero), human review at scale is unaffordable, and a single LLM-as-judge prompt gives inconsistent, uncalibrated scores. The real bottleneck was never generation; it was evaluation. This talk shows how we solved it in two steps, both built entirely in Python. First, we encoded expert evaluation at scale using DSPy to distill judgments from five domain experts into a panel of calibrated LLM judges, each targeting a single quality dimension, weighted to reflect what experts actually care about. Then we closed the loop using DSPy's MIPROv2 and GEPA optimizers, wiring the judge panel as a fitness function and letting the system rewrite prompts autonomously, with regression guards and CI gates so humans review only the final score delta. The stack is Python-native: DSPy, MLflow, LiteLLM, Pydantic. You will leave with a concrete recipe for encoding expert knowledge into automated LLM evaluation and self-improving optimization, applicable to any domain where "correct" is nuanced.

Speakers

Mateo Rios Querubin

Senior ML Engineer @ Provectus / Universidad EAFIT

Mathematical Engineer with an MSc in Applied Mathematics (Universidad EAFIT) and 7+ years of experience as a Data Scientist and Machine Learning Engineer. Currently at Provectus, building LLM evaluation and optimization pipelines for document processing automation. Also 3+ years lecturing Advanced Analytics and foundational Mathematics at Universidad EAFIT. Specialized in designing and deploying AI solutions including LLM evaluation systems, computer vision models, recommendation systems, and end-to-end ML pipelines. AWS Certified Generative AI Developer and AWS Certified Machine Learning Specialty.

View speaker

Sebastián Gómez Ahumada

Middle ML Engineer @ Provectus

Biomedical engineer from Universidad de los Andes, with a minor in neuroscience and an MSc in biomedical engineering focused on machine learning. Currently working as a machine learning engineer building production AI systems, extraction pipelines, and conversational agents. Python was my first programming language and the one where I learned how to think in code, not just write it. For the past two and a half years I was a teaching assistant for ML fundamentals at Los Andes, a role that reinforced a conviction I'll happily defend on stage: complexity is rarely the answer and almost always the excuse.

View speaker

Want to know more?

Join PyCon Colombia newsletter and get a complete overview of our events, speakers and community participation.