Sebastián Gómez Ahumada

Middle ML Engineer @ Provectus

About

Biomedical engineer from Universidad de los Andes, with a minor in neuroscience and an MSc in biomedical engineering focused on machine learning. Currently working as a machine learning engineer building production AI systems, extraction pipelines, and conversational agents. Python was my first programming language and the one where I learned how to think in code, not just write it. For the past two and a half years I was a teaching assistant for ML fundamentals at Los Andes, a role that reinforced a conviction I'll happily defend on stage: complexity is rarely the answer and almost always the excuse.

Talk

Artificial IntelligenceMachine LearningDevOps

From Expert Judgment to Autonomous Optimization: Encoding Human Expertise into LLM Judges with DSPy

LEVEL: IntermediateLANGUAGE: English

A single misread clause in a reinsurance contract can mean millions in liability. Our LLM pipeline could extract and summarize these documents, but how do you know the output is actually correct? String matching fails ("USD 5,000,000" vs "$5M" scores zero), human review at scale is unaffordable, and a single LLM-as-judge prompt gives inconsistent, uncalibrated scores. The real bottleneck was never generation; it was evaluation. This talk shows how we solved it in two steps, both built entirely in Python. First, we encoded expert evaluation at scale using DSPy to distill judgments from five domain experts into a panel of calibrated LLM judges, each targeting a single quality dimension, weighted to reflect what experts actually care about. Then we closed the loop using DSPy's MIPROv2 and GEPA optimizers, wiring the judge panel as a fitness function and letting the system rewrite prompts autonomously, with regression guards and CI gates so humans review only the final score delta. The stack is Python-native: DSPy, MLflow, LiteLLM, Pydantic. You will leave with a concrete recipe for encoding expert knowledge into automated LLM evaluation and self-improving optimization, applicable to any domain where "correct" is nuanced.

Mateo Rios Querubin

Senior ML Engineer @ Provectus / Universidad EAFIT

Sebastián Gómez Ahumada

Middle ML Engineer @ Provectus

View talk

Want to know more?

Join PyCon Colombia newsletter and get a complete overview of our events, speakers and community participation.