Routing Strategies for Heterogeneous GenAI Systems: Lessons from Real-World Practice

Thursday 11:05 in room 1.38 (ground floor)

Routing Strategies for Heterogeneous GenAI Systems: Lessons from Real-World Practice

Oliver Zeigemann

Modern Generative AI (GenAI) systems combine prompts, language models, inference servers, and specialized hardware into sophisticated stacks. As no single large GenAI system excels at all tasks, we at Techniker Krankenkasse are increasingly adopting a multi-system approach, employing different models tailored to specific tasks, domains, cost, or latency requirements. While this approach enhances robustness and efficiency, it introduces a critical operational challenge: effectively routing each incoming query to the most suitable GenAI system.

In this talk, we present our real-world experiences developing dynamic routing pipelines for selecting the optimal GenAI system based on input content and task specificity. We detail the evolution and refinement of our routing strategies, including:

Regular-expression filters to quickly capture clear-cut topics and enforce guardrails;
Off-the-shelf Named Entity Recognition (NER) modules to integrate domain-specific contextual signals;
Few-shot fine-tuning intent classifiers capable of generalizing beyond simple keyword matching
Lightweight generative LLMs that enable cost-effective, context-aware decision-making;
Selective escalation strategies employing state-of-the-art LLMs exclusively when more economical routes provide insufficient confidence.

We share insights and best practices from our real-world implementation experience.

Oliver Zeigemann

Oliver Zeigermann has been developing software for 40 years, progressing from assembly language to C, then Python, and ultimately to machine learning. He currently works as a machine learning engineer at Techniker Krankenkasse.