Building Agentic Systems: Why You Should Start with Humans as AI Agents

Building Agentic Systems: Why You Should Start with Humans as AI Agents

Table of Contents

The rise of Agentic Systems is reshaping how we think about building complex AI solutions. Like early software architecture, the new wave focuses on modularity, separation of concerns, and safe boundaries — smaller specialized models working together, orchestrated by a central host system.

In theory, it’s clean. In practice, it’s messy.

To bring these systems to life, every agent, tool, and integration must operate at a level that’s acceptable in risk, correctness, and cost. Complexity doesn’t just add up, it compounds as you add additional agents, especially for tasks that demand human-like reasoning, judgment, and decision-making. Without a thoughtful approach, it’s easy to end up with brittle systems, unfinished proof-of-concepts, and automation that erodes trust instead of building it.

That’s why I believe the best way to build Agentic Systems today is to design for a fully autonomous future, but start with humans role-playing as AI agents for the riskiest and most critical tasks. Let the system mature step-by-step, instead of betting everything on day-one automation.

What Does ‘Humans-as-Agents’ Means

The idea of humans-as-agents is to have humans performing the tasks that we eventually want to automate and be operated by AI, in the specific way we expect AI to eventually approach the task. This can be accomplished by having an integration with a tool like Slack, email, or custom-build app acting as a proxy to communicate with the human agent.

Why is this different than what we are doing today, if perhaps we already have those tasks being performed by humans? At least with today’s technology, the way we approach a process and break down the tasks for a system that is handled by humans is different than the way it would approach when handled by AI.

Humans can perform complex tasks that are fluid, cross-domain, and more context-aware than AI systems.

AI-systems tasks tend to be more bounded, explicitly defined, and focused on a specific domain or task.

Take this example of a task to respond to a customer support request.

Human:
  • Skim entire request, instantly spot name, tone, urgency
  • Recall prior interactions & purchases from memory or CRM quick search
  • Decide whether to resolve solo or loop in a colleague or superior, perhaps even consider exceptions to the rules
  • Draft response mixing policy + personal judgment
  • Hit send or escalate further
AI System:
  • Agent 1: Extract customer information from request
  • Agent 2: Look up customer history on multiple databases to build context around customer
  • Agent 3: Classify request type (complaint, question, feedback, etc) and assign to the right specialized agent
  • Agent 4 (Specialized Agent for request type): Review knowledge base and guidelines on how to process response
  • Agent 5: Generate response based on the information provided by the previous agents based on customer tone and context
  • Agent 6: Validate response and send to customer, or generate an escalation if the confidence of resolution level is low.

Why Start Here?

  • Responsibility by Default: When you start with humans acting as agents, you start with the assumption of zero trust on AI, and gradually build up the trust in the system as it matures.
  • Agile Development: You can start testing your system solution even if you don’t have all agents available.
  • Process Discovery: as the human agent is performing the task, it provides a unique opportunity to develop an understanding of the task, challenges, and edge cases that might be encountered.
  • Data Generation for Supervised Learning: as a human agent is performing the task, the system can collect data on how the human is performing the task, generating a trove of data that can be used to train and validate the future AI agent.
  • Change Management: Build trust and cultural buy-in with early and direct participation of stakeholders in the process. It keeps them engaged in the development of the process, and can create a sense of ownership and accountability.

AI Agent Maturity Stage

To achieve that, I classify the following different stages of maturity to consider when building an Agentic System:

  • Stage 0 - No Agent: Interface is designed, but mocked up with predefined responses.
  • Stage 1 - Manual Mode: the interface is used to interact with a human through a proxy communication channel (e.g., Slack, email, etc).
  • Stage 2 - Copilot Mode: The model suggests partial outputs and a human assembles, edits, and signs off. At least one manual action is required so nothing ships without human eyes.1
  • Stage 3 - Human-in-the-Loop Mode: the model is now responsible to handle the task, but there’s still a human-in-the loop accountable to review the output and provide feedback to the system.
  • Stage 4 - Human Escalation Mode: the AI system has reached a level of maturity where it can be trusted for the subset of the tasks without human input or review, but still not able to handle all of the tasks. The system is configured with a subsystem to assess the risk of the task or the confidence level of the model output, and based on those parameters or rules, can delegate this task to a human.
  • Stage 5 - Fully Autonomous Mode: the AI system is fully autonomous and can be assumed to be able to handle all tasks. At this point, inability to handle a task correctly is considered a regression of the system. The system might still sample some tasks to a human to ensure that the system is continuously performing as expected, and leveraging this feedback to improve the model and prevent drift over time.

Additional Thoughts and Considerations

Building these systems isn’t just about orchestrating tasks — it’s about continuously measuring, improving, and adapting both agents and the system as a whole.

  • Human-in-the-Loop by Default: A side effect of starting with humans-as-agents is that the system is natively designed to keep a human-in-the-loop when needed — whether to increase oversight, pause or disable an agent or tool, or create additional labeled datasets. This can be used to set up regular re-labeling or validation rounds to perform quality checks and generate fresh supervised training data to control for drift.
  • Quantify Performance Early: Build a system to track agent and system-level performance from the start. For some tasks, automated metrics (accuracy, latency, success rate) are enough. For others, especially more subjective outputs, human evaluation loops will be necessary to measure quality and correctness.
  • Bias Management: Be cautious of bias leakage. If the human agents you use early on are not diverse, the AI systems will inherit those patterns. Consider strategies such as rotating human agents, diverse reviewer pools, or bias detection audits to mitigate this risk.
  • Latency, Queueing, and Escalation: Consider whether you might face bottlenecks. Build some classifiers or scorers first can help with the routing and prioritization of tasks. For example: classifying requests that you can’t trust an AI agent yet and want to escalate to a human, while a priority scoring model can help you prioritize your agent’s tasks queue.
  • Promotion Criteria: Define clear, measurable criteria for agents to graduate from one maturity stage to the next. For example:
    • Sustained performance above a target metric (e.g., 98% match rate over 30 days)
    • Human override rate below a threshold (e.g., <1%)

Agentic Systems offer immense potential, but realizing that potential responsibly requires careful design, human insight, and progressive trust-building. Starting with humans-as-agents isn’t a compromise, it’s an operational blueprint for building robust, trustworthy AI ecosystems that can scale.


  1. Perhaps intentionally creating a UI that adds some friction to force additional human input to ensure that the human is still deeply engaged in the review. ↩︎

Related Posts

Resume & Interview Tips: A Tech Hiring Manager’s Perspective

Resume & Interview Tips: A Tech Hiring Manager’s Perspective

I’m often asked to help with resume feedback and interview tips, and it has become even more frequent now that the tech market is tightening. I’ve implemented tech hiring processes across multiple teams and companies and usually am deeply involved in candidate screenings, so this post consolidates my thoughts and perspectives, as well as some “behind the scenes” into how the hiring process works.

Read More
Starry Sky Space-Themed Nursery

Starry Sky Space-Themed Nursery

On January 1st, New Year Day, my daughter Lua was born. Her name means “moon” in Portuguese, but even before we had picked the name and had decided to make the nursery space-themed.

Read More