Our Three Step Process

February 7, 2024

Architecting Scalable AI Systems the Right Way

Our Three Step Process

February 7, 2024

Architecting Scalable AI Systems the Right Way

Most AI systems don’t fail because the model is bad.
They fail because the architecture never evolved beyond the demo stage.

What usually starts as a simple script built to “just test the idea” slowly becomes business-critical. At that point, the cracks begin to show: brittle code, manual workflows, security risks, and systems that collapse the moment scale is introduced.

This article walks through a practical architectural progression for AI systems:

Local scripts (Proof of Concept)
Model Context Protocol (MCP) for scalable data access
Multi-Agent Systems using Agent-to-Agent (A2A) orchestration

We’ll ground this discussion using a real-world example: automated invoice processing.

Phase 1: Proof of Concept

The Manual, Script-First Approach

Every AI project begins here, and that’s not a mistake. It’s necessary.

At this stage, the goal is simple:
👉 Can we extract meaningful structured data from invoices using an LLM?

Typical Workflow

A basic invoice-processing script usually looks like this:

Local Storage
Invoices are manually downloaded and placed into a predefined local folder.
OCR & Text Extraction
Libraries such as PdfReader OCR utilities read PDF files and convert them into raw text.
LLM-Based Extraction
The extracted text is sent to a model (e.g., Gemini 2.5 Flash) with a prompt to identify fields like:
- Client name
- Invoice amount
- Product or service name
Local Persistence
The model returns structured JSON, which is parsed and stored in a local sqlite3 database.

This approach is fast to build and excellent for validating assumptions.

Where This Breaks Down

The problems don’t appear immediately, but they are inevitable:

Manual file handling
Every document must be downloaded locally. This is slow, error-prone, and not scalable.
Format rigidity
The script often works only for PDFs. Adding JPEGs, scans, or Word documents requires rewriting large portions of the pipeline.
Security risks
Storing financial documents locally introduces compliance and data-leak concerns.
Tight coupling
OCR logic, business logic, and storage are deeply intertwined. Any change breaks something else.

At this point, most teams attempt to “patch” the script.
That’s usually the wrong move.

Phase 2: Scaling with the Model Context Protocol (MCP)

To move beyond fragile scripts, AI systems need a standardized way to interact with data and tools.
This is exactly the problem that the Model Context Protocol (MCP) solves.

The M × N Integration Problem

Without MCP:

M AI applications
N data sources

Result in M × N custom integrations.

This grows exponentially—and becomes unmaintainable very quickly.

What MCP Changes

MCP introduces a clean separation:

MCP Server
Defines tools and executes logic (file access, extraction, search).
MCP Client (your AI app)
Simply calls tools without caring how they are implemented.

This reduces complexity to M + N integrations.

Replacing Custom OCR with the Box MCP Server

In our invoice system, the biggest architectural win comes from removing custom OCR logic entirely.

Instead of:

Downloading files
Writing parsers
Handling edge cases manually

We integrate the Box MCP Server.

What This Enables

Remote, in-cloud processing
Files are never downloaded locally. The MCP server processes them directly inside Box faster and more securely.
Automatic format handling
PDFs, images, and Word documents are handled uniformly without custom code.
Tool-driven workflows
The LLM is given access to predefined tools such as
- list_files
- extract_text
- query_document

The model decides which tool to call, and MCP executes it.

This architectural shift transforms the system from a fragile script into a production-ready data pipeline.

Phase 3: Multi-Agent Orchestration

Scaling Logic with the A2A Protocol

As systems mature, requirements grow beyond simple extraction:

Anomaly detection
Invoice classification
Payment scheduling
Compliance checks

Cramming all of this into a single “super-agent” creates a monolith that is hard to test, hard to extend, and harder to debug.

The solution is multi-agent architecture, coordinated using the Agent-to-Agent (A2A) protocol.

What Is A2A?

A2A is an open standard that allows agents to:

Discover each other’s capabilities
Communicate through a shared protocol
Delegate tasks without tight coupling

Agents can be:

Built by different teams
Running on different machines
Using different internal tools

Yet still collaborate seamlessly.

A Practical Multi-Agent Invoice Architecture

A scalable invoice system can be decomposed into specialized agents, each doing one thing well.

1. Files Agent

Responsibility:
Locate and list documents.

Uses Box MCP tools to scan folders
Knows nothing about extraction or business rules

2. Extraction Agent

Responsibility:
Convert documents into structured data.

Uses MCP’s AI extraction tools
Focuses only on parsing and normalization

3. Orchestrator Agent

Responsibility:
Coordinate the workflow.

Has no direct access to MCP tools
Breaks high-level requests into steps
Delegates tasks to the appropriate agents

How the Workflow Executes

User requests: “Process all invoices.”
The orchestrator creates a plan
Files Agent returns a list of relevant documents
The orchestrator assigns each document to the extraction agent.
Results are aggregated and passed downstream (DB, analytics, alerts)

Each agent is defined using an AgentCard via the Agent Development Kit (ADK), which declares:

Capabilities
Endpoints
Communication rules

This allows dynamic discovery and flexible orchestration.

Final Takeaway

Scalable AI systems are not built by adding more prompts or bigger models.
They are built by evolving architecture.

Scripts validate ideas
MCP standardizes data access
A2A enables composable intelligence

When combined, these layers transform AI from a single tool into a coordinated ecosystem secure, modular, and future-proof.

If you’re serious about deploying AI beyond demos, this progression isn’t optional.
It’s the blueprint.