Coming soon

Evaluating LLM Outputs Beyond Vibes

Implement BLEU, ROUGE, and METEOR from scratch, build an LLM-as-judge pipeline with structured scoring, detect judging biases, and compute Cohen's Kappa.

Upcoming in Transformers & LLMs

← Building a Tool-Calling Agent with RAG