Agents, Workflows, Frameworks!

July 9, 2025 | Nico Lutz

Developers and businesses always need to adapt. But the current landscape of developer possibilities is shifting dramatically. Just as JavaScript has undergone dramatic transformations—evolving from jQuery to Angular to React, with each shift bringing new paradigms and possibilities—the current AI revolution in development presents a parallel landscape of rapid change. However, where JavaScript's evolution unfolded over decades, AI agents are reshaping development practices at an unprecedented pace. This creates both vast opportunities and considerable confusion as developers struggle to determine which tools and approaches will prove most valuable in this accelerated cycle of innovation.

Here I try to lay out some patterns that evolved over time and that form the current debate about agentic development. First things first, let's get some terminology out of the way.

Agent

Basically, an agent is a software system that leverages artificial intelligence to execute a series of tasks aimed at achieving a specific goal. Nowadays, agents predominantly rely on Large Language Models (LLMs), which enable them to demonstrate reasoning, memory, planning, and ideally autonomy in decision-making and adaptation. The implementation of these components varies significantly depending on the design and architecture of the system, leading to different outcomes in performance and functionality. Reasoning capabilities allow agents to analyze and solve problems, while memory systems enable them to retain and recall past interactions for context-aware responses. The variations in implementation influence how effectively agents can perform complex tasks and achieve their intended objectives.

Tools & MCP

Tools refer to a collection of functions that an agent can utilize to enhance its capabilities, allowing it to perform tasks beyond its core AI-driven reasoning. These tools can take many forms, including custom scripts, databases of files, the ability to execute code, connections to external servers like an MCP server, or even interactions with another agent. By integrating these tools, an agent gains access to specialized functionalities that expand its problem-solving capacity. For example, a script might enable data processing, while an external server could provide real-time information retrieval. Ultimately, tools empower agents to operate more effectively, adapt to diverse challenges, and achieve goals that would otherwise be out of reach.

MCP - or Model Context Protocol - is an open standard designed to connect AI systems with a wide range of data sources and tools, enabling them to access real-time information and perform actions based on contextual insights. By standardizing the integration of AI applications with external services, MCP enhances their functionality, responsiveness, and adaptability in dynamic environments. For instance, a GitHub MCP extension provides capabilities to interact with GitHub repositories, allowing AI agents to push or pull code changes, manage branches, or even automate workflows. This protocol ensures seamless communication between AI models and external systems, reducing the complexity of integration and expanding the agent's operational capabilities. Ultimately, MCP empowers AI agents to operate more efficiently, leveraging real-time data and external tools to achieve their goals with greater precision and autonomy.

Workflows

The term originated in Anthropic's building effective agents blog post and describes it as a decomposition of tasks into a chain of sequences. Each one gets executed one at a time, and the next one is fed with information from the previous. While routing can be part of a workflow, research shows that complexity is your enemy and keeping it simple helps achieve the overall goal.

Evaluation & Verification

Evaluation refers to the process of assessing a model's performance for a specific application, particularly crucial when dealing with AI agents. In the context of agents, this can involve comparing outputs against previously compiled data or benchmarks to ensure accuracy and reliability. Another approach is leveraging additional Large Language Models (LLMs) to verify correct operation, though this method introduces its own complexities. However, this aspect remains one of the most challenging due to the dynamic nature of AI systems, the subjective criteria involved in many tasks, and the need for continuous adaptation to evolving standards.

Frameworks

Frameworks are either your best friend or your worst enemy. When it comes to building AI agents, many frameworks overpromise and underdeliver, particularly in terms of true automation. Most of these frameworks act as wrappers around LLM providers, offering basic memory logic and task execution—often structured as a Directed Acyclic Graph (DAG) protocol. What they all share is an attempt to navigate the ever-expanding and fragmented landscape of LLM providers, each with its own unique way of serving models. What follows is a non-exclusive list of frameworks that we found interesting for various reasons:

PydanticAI leverages Pydantic classes to structure LLM outputs, making them more usable for subsequent tasks, and integrates seamlessly with Logfire for easy logging and debugging of each LLM call.
Dagger.io emphasizes workflow execution, resembling classic ETL frameworks like Dagster.io or Airflow, but with a unique twist—it uses containers to build reproducible workflows in any language with custom environments.
OpenAI Agents SDK is a lightweight, provider-agnostic SDK that simplifies agent development without locking users into a specific LLM provider.
CrewAi follows a DAG-based approach, allowing for modular and scalable agent workflows targeting enterprises.
n8n focuses on workflow automation with an intuitive UI builder, making it accessible for users who prefer visual workflow design.
Autogen combines a UI builder with DAG-based workflows, offering flexibility in both code and visual development.

Architectural patterns that emerge

When designing AI agents, several architectural patterns become evident to ensure functionality, scalability, and reliability. These patterns address key aspects of agent behavior, including execution control, logging, safety, and extensibility. Below is an expanded breakdown of these patterns:

Wrappers Around Different LLM Providers
Execution Control (Orchestration):
- Orchestration manages the flow of tasks, ensuring they are executed in the correct sequence.
- Can be implemented via a Scheduler or DAGs
- Key features include:
  - Start/Stop/Resume Execution: Allows for dynamic control over task execution, accommodating interruptions or changes in requirements.
  - Handoffs: Seamless transitions between tasks or sub-agents, ensuring continuity and minimizing data loss.
Logging and Tracing LLM Calls:
- Comprehensive logging tracks all LLM interactions, including inputs, outputs, and metadata.
Evaluation & Guardrails:
- Especially in production settings, agents require mechanisms to assess their performance against predefined metrics
- Guardrails are configurable safety checks that validate inputs and outputs to prevent harmful or nonsensical actions.

Conclusion

The current limitation is that we still need to manually define each step and design the workflow, rather than relying on LLMs to dictate the process. We don’t want models that claim to know everything—what we need is an LLM that excels at determining where to look for answers, ensuring we maintain control over the information it accesses. This approach allows us to curate and verify the data sources, preventing misinformation and aligning the model’s responses with our standards. Ultimately, the goal is to create a system where the LLM acts as a decision-making assistant rather than an all-knowing authority, keeping human oversight and input at the core.