Artificial Intelligence Machine Learning DevOps

Mateo Rios Querubin

Senior ML Engineer @ Provectus / Universidad EAFIT

About

Mathematical Engineer with an MSc in Applied Mathematics (Universidad EAFIT) and 7+ years of experience as a Data Scientist and Machine Learning Engineer. Currently at Provectus, building LLM evaluation and optimization pipelines for document processing automation. Also 3+ years lecturing Advanced Analytics and foundational Mathematics at Universidad EAFIT. Specialized in designing and deploying AI solutions including LLM evaluation systems, computer vision models, recommendation systems, and end-to-end ML pipelines. AWS Certified Generative AI Developer and AWS Certified Machine Learning Specialty.

Talk

From Expert Judgment to Autonomous Optimization: Encoding Human Expertise into LLM Judges with DSPy

A single misread clause in a reinsurance contract can mean millions in liability. Our LLM pipeline could extract and summarize these documents, but how do you know the output is actually correct? String matching fails, human review at scale is unaffordable, and a single LLM-as-judge prompt gives inconsistent, uncalibrated scores. The real bottleneck was never generation; it was evaluation. This talk shows how we solved it in two steps, both built entirely in Python. First, we encoded expert evaluation at scale using DSPy to distill judgments from five domain experts into a panel of calibrated LLM judges. Then we closed the loop using DSPy's MIPROv2 and GEPA optimizers, wiring the judge panel as a fitness function and letting the system rewrite prompts autonomously. The stack is Python-native: DSPy, MLflow, LiteLLM, Pydantic. You will leave with a concrete recipe for encoding expert knowledge into automated LLM evaluation and self-improving optimization.

Country: Colombia
Language: English / Inglés
Level: Intermediate / Intermedio

Want to know more?

Join PyCon Colombia newsletter and get a complete overview of our events, speakers and community participation.