Routing Strategies for Heterogeneous GenAI Systems: Lessons from Real-World Practice
Oliver Zeigemann
Modern Generative AI (GenAI) systems combine prompts, language models, inference servers, and specialized hardware into sophisticated stacks. As no single large GenAI system excels at all tasks, we at Techniker Krankenkasse are increasingly adopting a multi-system approach, employing different models tailored to specific tasks, domains, cost, or latency requirements. While this approach enhances robustness and efficiency, it introduces a critical operational challenge: effectively routing each incoming query to the most suitable GenAI system.
In this talk, we present our real-world experiences developing dynamic routing pipelines for selecting the optimal GenAI system based on input content and task specificity. We detail the evolution and refinement of our routing strategies, including:
- Regular-expression filters to quickly capture clear-cut topics and enforce guardrails;
- Off-the-shelf Named Entity Recognition (NER) modules to integrate domain-specific contextual signals;
- Few-shot fine-tuning intent classifiers capable of generalizing beyond simple keyword matching
- Lightweight generative LLMs that enable cost-effective, context-aware decision-making;
- Selective escalation strategies employing state-of-the-art LLMs exclusively when more economical routes provide insufficient confidence.
We share insights and best practices from our real-world implementation experience.