Our Three Step Process

February 7, 2024

Architecting Scalable AI Systems the Right Way

Our Three Step Process

February 7, 2024

Architecting Scalable AI Systems the Right Way

Architecting Scalable AI Systems the Right Way

Most AI systems don’t fail because the model is bad.
They fail because the architecture never evolved beyond the demo stage.

What usually starts as a simple script built to “just test the idea” slowly becomes business-critical. At that point, the cracks begin to show: brittle code, manual workflows, security risks, and systems that collapse the moment scale is introduced.

This article walks through a practical architectural progression for AI systems:

  1. Local scripts (Proof of Concept)

  2. Model Context Protocol (MCP) for scalable data access

  3. Multi-Agent Systems using Agent-to-Agent (A2A) orchestration

We’ll ground this discussion using a real-world example: automated invoice processing.

Phase 1: Proof of Concept

The Manual, Script-First Approach

Every AI project begins here, and that’s not a mistake. It’s necessary.

At this stage, the goal is simple:
👉 Can we extract meaningful structured data from invoices using an LLM?

Typical Workflow

A basic invoice-processing script usually looks like this:

  1. Local Storage
    Invoices are manually downloaded and placed into a predefined local folder.

  2. OCR & Text Extraction
    Libraries such as PdfReader OCR utilities read PDF files and convert them into raw text.

  3. LLM-Based Extraction
    The extracted text is sent to a model (e.g., Gemini 2.5 Flash) with a prompt to identify fields like:

    • Client name

    • Invoice amount

    • Product or service name

  4. Local Persistence
    The model returns structured JSON, which is parsed and stored in a local sqlite3 database.

This approach is fast to build and excellent for validating assumptions.

Where This Breaks Down

The problems don’t appear immediately, but they are inevitable:

  • Manual file handling
    Every document must be downloaded locally. This is slow, error-prone, and not scalable.

  • Format rigidity
    The script often works only for PDFs. Adding JPEGs, scans, or Word documents requires rewriting large portions of the pipeline.

  • Security risks
    Storing financial documents locally introduces compliance and data-leak concerns.

  • Tight coupling
    OCR logic, business logic, and storage are deeply intertwined. Any change breaks something else.

At this point, most teams attempt to “patch” the script.
That’s usually the wrong move.

Phase 2: Scaling with the Model Context Protocol (MCP)

To move beyond fragile scripts, AI systems need a standardized way to interact with data and tools.
This is exactly the problem that the Model Context Protocol (MCP) solves.

The M × N Integration Problem

Without MCP:

  • M AI applications

  • N data sources

Result in M × N custom integrations.

This grows exponentially—and becomes unmaintainable very quickly.

What MCP Changes

MCP introduces a clean separation:

  • MCP Server
    Defines tools and executes logic (file access, extraction, search).

  • MCP Client (your AI app)
    Simply calls tools without caring how they are implemented.

This reduces complexity to M + N integrations.

Replacing Custom OCR with the Box MCP Server

In our invoice system, the biggest architectural win comes from removing custom OCR logic entirely.

Instead of:

  • Downloading files

  • Writing parsers

  • Handling edge cases manually

We integrate the Box MCP Server.

What This Enables

  • Remote, in-cloud processing
    Files are never downloaded locally. The MCP server processes them directly inside Box faster and more securely.

  • Automatic format handling
    PDFs, images, and Word documents are handled uniformly without custom code.

  • Tool-driven workflows
    The LLM is given access to predefined tools such as

    • list_files

    • extract_text

    • query_document

The model decides which tool to call, and MCP executes it.

This architectural shift transforms the system from a fragile script into a production-ready data pipeline.

Phase 3: Multi-Agent Orchestration

Scaling Logic with the A2A Protocol

As systems mature, requirements grow beyond simple extraction:

  • Anomaly detection

  • Invoice classification

  • Payment scheduling

  • Compliance checks

Cramming all of this into a single “super-agent” creates a monolith that is hard to test, hard to extend, and harder to debug.

The solution is multi-agent architecture, coordinated using the Agent-to-Agent (A2A) protocol.

What Is A2A?

A2A is an open standard that allows agents to:

  • Discover each other’s capabilities

  • Communicate through a shared protocol

  • Delegate tasks without tight coupling

Agents can be:

  • Built by different teams

  • Running on different machines

  • Using different internal tools

Yet still collaborate seamlessly.

A Practical Multi-Agent Invoice Architecture

A scalable invoice system can be decomposed into specialized agents, each doing one thing well.

1. Files Agent

Responsibility:
Locate and list documents.

  • Uses Box MCP tools to scan folders

  • Knows nothing about extraction or business rules

2. Extraction Agent

Responsibility:
Convert documents into structured data.

  • Uses MCP’s AI extraction tools

  • Focuses only on parsing and normalization

3. Orchestrator Agent

Responsibility:
Coordinate the workflow.

  • Has no direct access to MCP tools

  • Breaks high-level requests into steps

  • Delegates tasks to the appropriate agents

How the Workflow Executes

  1. User requests: “Process all invoices.”

  2. The orchestrator creates a plan

  3. Files Agent returns a list of relevant documents

  4. The orchestrator assigns each document to the extraction agent.

  5. Results are aggregated and passed downstream (DB, analytics, alerts)

Each agent is defined using an AgentCard via the Agent Development Kit (ADK), which declares:

  • Capabilities

  • Endpoints

  • Communication rules

This allows dynamic discovery and flexible orchestration.

Final Takeaway

Scalable AI systems are not built by adding more prompts or bigger models.
They are built by evolving architecture.

  • Scripts validate ideas

  • MCP standardizes data access

  • A2A enables composable intelligence

When combined, these layers transform AI from a single tool into a coordinated ecosystem secure, modular, and future-proof.

If you’re serious about deploying AI beyond demos, this progression isn’t optional.
It’s the blueprint.

Architecting Scalable AI Systems the Right Way

Most AI systems don’t fail because the model is bad.
They fail because the architecture never evolved beyond the demo stage.

What usually starts as a simple script built to “just test the idea” slowly becomes business-critical. At that point, the cracks begin to show: brittle code, manual workflows, security risks, and systems that collapse the moment scale is introduced.

This article walks through a practical architectural progression for AI systems:

  1. Local scripts (Proof of Concept)

  2. Model Context Protocol (MCP) for scalable data access

  3. Multi-Agent Systems using Agent-to-Agent (A2A) orchestration

We’ll ground this discussion using a real-world example: automated invoice processing.

Phase 1: Proof of Concept

The Manual, Script-First Approach

Every AI project begins here, and that’s not a mistake. It’s necessary.

At this stage, the goal is simple:
👉 Can we extract meaningful structured data from invoices using an LLM?

Typical Workflow

A basic invoice-processing script usually looks like this:

  1. Local Storage
    Invoices are manually downloaded and placed into a predefined local folder.

  2. OCR & Text Extraction
    Libraries such as PdfReader OCR utilities read PDF files and convert them into raw text.

  3. LLM-Based Extraction
    The extracted text is sent to a model (e.g., Gemini 2.5 Flash) with a prompt to identify fields like:

    • Client name

    • Invoice amount

    • Product or service name

  4. Local Persistence
    The model returns structured JSON, which is parsed and stored in a local sqlite3 database.

This approach is fast to build and excellent for validating assumptions.

Where This Breaks Down

The problems don’t appear immediately, but they are inevitable:

  • Manual file handling
    Every document must be downloaded locally. This is slow, error-prone, and not scalable.

  • Format rigidity
    The script often works only for PDFs. Adding JPEGs, scans, or Word documents requires rewriting large portions of the pipeline.

  • Security risks
    Storing financial documents locally introduces compliance and data-leak concerns.

  • Tight coupling
    OCR logic, business logic, and storage are deeply intertwined. Any change breaks something else.

At this point, most teams attempt to “patch” the script.
That’s usually the wrong move.

Phase 2: Scaling with the Model Context Protocol (MCP)

To move beyond fragile scripts, AI systems need a standardized way to interact with data and tools.
This is exactly the problem that the Model Context Protocol (MCP) solves.

The M × N Integration Problem

Without MCP:

  • M AI applications

  • N data sources

Result in M × N custom integrations.

This grows exponentially—and becomes unmaintainable very quickly.

What MCP Changes

MCP introduces a clean separation:

  • MCP Server
    Defines tools and executes logic (file access, extraction, search).

  • MCP Client (your AI app)
    Simply calls tools without caring how they are implemented.

This reduces complexity to M + N integrations.

Replacing Custom OCR with the Box MCP Server

In our invoice system, the biggest architectural win comes from removing custom OCR logic entirely.

Instead of:

  • Downloading files

  • Writing parsers

  • Handling edge cases manually

We integrate the Box MCP Server.

What This Enables

  • Remote, in-cloud processing
    Files are never downloaded locally. The MCP server processes them directly inside Box faster and more securely.

  • Automatic format handling
    PDFs, images, and Word documents are handled uniformly without custom code.

  • Tool-driven workflows
    The LLM is given access to predefined tools such as

    • list_files

    • extract_text

    • query_document

The model decides which tool to call, and MCP executes it.

This architectural shift transforms the system from a fragile script into a production-ready data pipeline.

Phase 3: Multi-Agent Orchestration

Scaling Logic with the A2A Protocol

As systems mature, requirements grow beyond simple extraction:

  • Anomaly detection

  • Invoice classification

  • Payment scheduling

  • Compliance checks

Cramming all of this into a single “super-agent” creates a monolith that is hard to test, hard to extend, and harder to debug.

The solution is multi-agent architecture, coordinated using the Agent-to-Agent (A2A) protocol.

What Is A2A?

A2A is an open standard that allows agents to:

  • Discover each other’s capabilities

  • Communicate through a shared protocol

  • Delegate tasks without tight coupling

Agents can be:

  • Built by different teams

  • Running on different machines

  • Using different internal tools

Yet still collaborate seamlessly.

A Practical Multi-Agent Invoice Architecture

A scalable invoice system can be decomposed into specialized agents, each doing one thing well.

1. Files Agent

Responsibility:
Locate and list documents.

  • Uses Box MCP tools to scan folders

  • Knows nothing about extraction or business rules

2. Extraction Agent

Responsibility:
Convert documents into structured data.

  • Uses MCP’s AI extraction tools

  • Focuses only on parsing and normalization

3. Orchestrator Agent

Responsibility:
Coordinate the workflow.

  • Has no direct access to MCP tools

  • Breaks high-level requests into steps

  • Delegates tasks to the appropriate agents

How the Workflow Executes

  1. User requests: “Process all invoices.”

  2. The orchestrator creates a plan

  3. Files Agent returns a list of relevant documents

  4. The orchestrator assigns each document to the extraction agent.

  5. Results are aggregated and passed downstream (DB, analytics, alerts)

Each agent is defined using an AgentCard via the Agent Development Kit (ADK), which declares:

  • Capabilities

  • Endpoints

  • Communication rules

This allows dynamic discovery and flexible orchestration.

Final Takeaway

Scalable AI systems are not built by adding more prompts or bigger models.
They are built by evolving architecture.

  • Scripts validate ideas

  • MCP standardizes data access

  • A2A enables composable intelligence

When combined, these layers transform AI from a single tool into a coordinated ecosystem secure, modular, and future-proof.

If you’re serious about deploying AI beyond demos, this progression isn’t optional.
It’s the blueprint.

Join our newsletter list

Sign up to get the most recent blog articles in your email every week.

Share this post to the social medias

Architecting Scalable AI Systems the Right Way

Most AI systems don’t fail because the model is bad.
They fail because the architecture never evolved beyond the demo stage.

What usually starts as a simple script built to “just test the idea” slowly becomes business-critical. At that point, the cracks begin to show: brittle code, manual workflows, security risks, and systems that collapse the moment scale is introduced.

This article walks through a practical architectural progression for AI systems:

  1. Local scripts (Proof of Concept)

  2. Model Context Protocol (MCP) for scalable data access

  3. Multi-Agent Systems using Agent-to-Agent (A2A) orchestration

We’ll ground this discussion using a real-world example: automated invoice processing.

Phase 1: Proof of Concept

The Manual, Script-First Approach

Every AI project begins here, and that’s not a mistake. It’s necessary.

At this stage, the goal is simple:
👉 Can we extract meaningful structured data from invoices using an LLM?

Typical Workflow

A basic invoice-processing script usually looks like this:

  1. Local Storage
    Invoices are manually downloaded and placed into a predefined local folder.

  2. OCR & Text Extraction
    Libraries such as PdfReader OCR utilities read PDF files and convert them into raw text.

  3. LLM-Based Extraction
    The extracted text is sent to a model (e.g., Gemini 2.5 Flash) with a prompt to identify fields like:

    • Client name

    • Invoice amount

    • Product or service name

  4. Local Persistence
    The model returns structured JSON, which is parsed and stored in a local sqlite3 database.

This approach is fast to build and excellent for validating assumptions.

Where This Breaks Down

The problems don’t appear immediately, but they are inevitable:

  • Manual file handling
    Every document must be downloaded locally. This is slow, error-prone, and not scalable.

  • Format rigidity
    The script often works only for PDFs. Adding JPEGs, scans, or Word documents requires rewriting large portions of the pipeline.

  • Security risks
    Storing financial documents locally introduces compliance and data-leak concerns.

  • Tight coupling
    OCR logic, business logic, and storage are deeply intertwined. Any change breaks something else.

At this point, most teams attempt to “patch” the script.
That’s usually the wrong move.

Phase 2: Scaling with the Model Context Protocol (MCP)

To move beyond fragile scripts, AI systems need a standardized way to interact with data and tools.
This is exactly the problem that the Model Context Protocol (MCP) solves.

The M × N Integration Problem

Without MCP:

  • M AI applications

  • N data sources

Result in M × N custom integrations.

This grows exponentially—and becomes unmaintainable very quickly.

What MCP Changes

MCP introduces a clean separation:

  • MCP Server
    Defines tools and executes logic (file access, extraction, search).

  • MCP Client (your AI app)
    Simply calls tools without caring how they are implemented.

This reduces complexity to M + N integrations.

Replacing Custom OCR with the Box MCP Server

In our invoice system, the biggest architectural win comes from removing custom OCR logic entirely.

Instead of:

  • Downloading files

  • Writing parsers

  • Handling edge cases manually

We integrate the Box MCP Server.

What This Enables

  • Remote, in-cloud processing
    Files are never downloaded locally. The MCP server processes them directly inside Box faster and more securely.

  • Automatic format handling
    PDFs, images, and Word documents are handled uniformly without custom code.

  • Tool-driven workflows
    The LLM is given access to predefined tools such as

    • list_files

    • extract_text

    • query_document

The model decides which tool to call, and MCP executes it.

This architectural shift transforms the system from a fragile script into a production-ready data pipeline.

Phase 3: Multi-Agent Orchestration

Scaling Logic with the A2A Protocol

As systems mature, requirements grow beyond simple extraction:

  • Anomaly detection

  • Invoice classification

  • Payment scheduling

  • Compliance checks

Cramming all of this into a single “super-agent” creates a monolith that is hard to test, hard to extend, and harder to debug.

The solution is multi-agent architecture, coordinated using the Agent-to-Agent (A2A) protocol.

What Is A2A?

A2A is an open standard that allows agents to:

  • Discover each other’s capabilities

  • Communicate through a shared protocol

  • Delegate tasks without tight coupling

Agents can be:

  • Built by different teams

  • Running on different machines

  • Using different internal tools

Yet still collaborate seamlessly.

A Practical Multi-Agent Invoice Architecture

A scalable invoice system can be decomposed into specialized agents, each doing one thing well.

1. Files Agent

Responsibility:
Locate and list documents.

  • Uses Box MCP tools to scan folders

  • Knows nothing about extraction or business rules

2. Extraction Agent

Responsibility:
Convert documents into structured data.

  • Uses MCP’s AI extraction tools

  • Focuses only on parsing and normalization

3. Orchestrator Agent

Responsibility:
Coordinate the workflow.

  • Has no direct access to MCP tools

  • Breaks high-level requests into steps

  • Delegates tasks to the appropriate agents

How the Workflow Executes

  1. User requests: “Process all invoices.”

  2. The orchestrator creates a plan

  3. Files Agent returns a list of relevant documents

  4. The orchestrator assigns each document to the extraction agent.

  5. Results are aggregated and passed downstream (DB, analytics, alerts)

Each agent is defined using an AgentCard via the Agent Development Kit (ADK), which declares:

  • Capabilities

  • Endpoints

  • Communication rules

This allows dynamic discovery and flexible orchestration.

Final Takeaway

Scalable AI systems are not built by adding more prompts or bigger models.
They are built by evolving architecture.

  • Scripts validate ideas

  • MCP standardizes data access

  • A2A enables composable intelligence

When combined, these layers transform AI from a single tool into a coordinated ecosystem secure, modular, and future-proof.

If you’re serious about deploying AI beyond demos, this progression isn’t optional.
It’s the blueprint.

Join our newsletter list

Sign up to get the most recent blog articles in your email every week.

Share this post to the social medias

Create a free website with Framer, the website builder loved by startups, designers and agencies.