This post is part of Incorta's Innovate with Intelligence webinar series, a four-part exploration of agentic AI built for enterprise teams. From design patterns to evaluation to governance, each session tackles a different layer of what it takes to move AI from demo to production. Catch the full series here.
The core argument is simple: almost every problem an AI team encounters has been seen before. Design patterns capture the accumulated wisdom of those who've solved these problems already - the best practices, the common pitfalls, the right tool for the right job.
The goal isn't to memorize patterns. It's to recognize which pattern fits the problem in front of you, and then benefit from everything that's already been figured out.
With that framing, here are the four patterns from this session.
Instead of writing one giant, monolithic prompt and hoping the LLM does everything correctly, prompt chaining breaks a complex problem into a sequence of smaller, focused subproblems - each with its own targeted prompt.
The output of each step feeds into the next, creating a pipeline.
Why it works:
Where it fits best: Information processing workflows with predefined steps: data extraction, validation pipelines, structured information retrieval from unstructured sources.
Where it falls short: It doesn't handle situations where the agent needs to make dynamic decisions or take different branches based on input. For that, you need the next pattern.
Routing is the pattern for adaptive, dynamic workflows. Rather than following a fixed sequence, the LLM evaluates the input and decides which path, tool, or pipeline to invoke next.
Think of it as the agent reading the situation and choosing the right response - not having the response hardcoded in advance.
Sample use cases:
Routing logic can take different forms: rule-based if/else logic, a secondary LLM that decides the path, or a trained ML classifier.
The key insight: routing is what makes an agent feel intelligent and responsive rather than mechanical.
When a workflow contains components that don't depend on each other's outputs, there's no reason to run them sequentially. Parallelization means executing those independent components at the same time, cutting latency significantly.
A simple example: if your pipeline needs to search two data sources and then combine the results, you don't need to search source one, wait, then search source two. Both searches can run simultaneously, and the final step combines their outputs once both are done.
Most agentic frameworks handle the orchestration automatically through asynchronous execution - you kick off the tasks and let the framework manage the timing.
The payoff: Faster pipelines, more responsive agents, better user experience, without changing any of the underlying logic.
Parallelization pairs naturally with chaining and routing to build workflows that are both sophisticated and efficient.
This is arguably the most transformative pattern of the four. Tool use enables LLMs to reach beyond their training data and interact with the outside world - databases, APIs, code execution environments, other agents, even physical systems.
Without tools, an LLM is limited to what it learned during training. With tools, it can:
How it works in practice:
This loop - decide, call, observe, decide - is what gives modern agents their real capability. Tool use isn't an add-on. It's what makes agents useful.
After the four patterns, Abd Rahman walked through Retrieval-Augmented Generation (RAG) - one of the most practical techniques for reducing hallucination in production environments.
The core problem RAG solves: LLMs don't know what they don't know. Ask them about your internal HR policy, your proprietary data, or a recent event, and they'll either hallucinate or admit ignorance. RAG fixes this by giving the LLM a trusted knowledge base to draw from at query time.
Offline (setup):
Online (at query time):
We showed a demo of a practival implementation: BI engineers save verified question-and-SQL pairs as reference anchors. When a business user asks a question, the system semantically matches it to the closest reference question - and uses the verified SQL query as the foundation for the answer.
Crucially, it's not a text match. When the same question was asked with a different filter (changing "Oregon" to "Washington"), the system recognized the semantic similarity, reused the reference query, and updated only the relevant filter - leaving everything else intact.
The result: business users get answers grounded in queries that a BI expert already validated, and trust is built in.
One subtle but important point: how you chunk documents significantly affects RAG quality. Blindly splitting at fixed word counts can break semantic context mid-thought — the model ends up with fragments that don't make sense in isolation. Smart chunking strategies (semantic boundaries, paragraph-aware splits) are essential for reliable retrieval.
These four patterns - prompt chaining, routing, parallelization, and tool use - are the building blocks of every production-grade agentic system. And RAG is the practical answer to one of enterprise AI's most persistent problems: getting an LLM to tell you the truth about your own data.