
Our Three Step Process
February 7, 2024
Architecting Scalable AI Systems the Right Way

Our Three Step Process
February 7, 2024
Architecting Scalable AI Systems the Right Way
Architecting Scalable AI Systems the Right Way
Most AI systems don’t fail because the model is bad.
They fail because the architecture never evolved beyond the demo stage.
What usually starts as a simple script built to “just test the idea” slowly becomes business-critical. At that point, the cracks begin to show: brittle code, manual workflows, security risks, and systems that collapse the moment scale is introduced.
This article walks through a practical architectural progression for AI systems:
Local scripts (Proof of Concept)
Model Context Protocol (MCP) for scalable data access
Multi-Agent Systems using Agent-to-Agent (A2A) orchestration
We’ll ground this discussion using a real-world example: automated invoice processing.
Phase 1: Proof of Concept
The Manual, Script-First Approach
Every AI project begins here, and that’s not a mistake. It’s necessary.
At this stage, the goal is simple:
👉 Can we extract meaningful structured data from invoices using an LLM?
Typical Workflow
A basic invoice-processing script usually looks like this:
Local Storage
Invoices are manually downloaded and placed into a predefined local folder.OCR & Text Extraction
Libraries such asPdfReaderOCR utilities read PDF files and convert them into raw text.LLM-Based Extraction
The extracted text is sent to a model (e.g., Gemini 2.5 Flash) with a prompt to identify fields like:Client name
Invoice amount
Product or service name
Local Persistence
The model returns structured JSON, which is parsed and stored in a localsqlite3database.
This approach is fast to build and excellent for validating assumptions.
Where This Breaks Down
The problems don’t appear immediately, but they are inevitable:
Manual file handling
Every document must be downloaded locally. This is slow, error-prone, and not scalable.Format rigidity
The script often works only for PDFs. Adding JPEGs, scans, or Word documents requires rewriting large portions of the pipeline.Security risks
Storing financial documents locally introduces compliance and data-leak concerns.Tight coupling
OCR logic, business logic, and storage are deeply intertwined. Any change breaks something else.
At this point, most teams attempt to “patch” the script.
That’s usually the wrong move.
Phase 2: Scaling with the Model Context Protocol (MCP)
To move beyond fragile scripts, AI systems need a standardized way to interact with data and tools.
This is exactly the problem that the Model Context Protocol (MCP) solves.
The M × N Integration Problem
Without MCP:
M AI applications
N data sources
Result in M × N custom integrations.
This grows exponentially—and becomes unmaintainable very quickly.
What MCP Changes
MCP introduces a clean separation:
MCP Server
Defines tools and executes logic (file access, extraction, search).MCP Client (your AI app)
Simply calls tools without caring how they are implemented.
This reduces complexity to M + N integrations.
Replacing Custom OCR with the Box MCP Server
In our invoice system, the biggest architectural win comes from removing custom OCR logic entirely.
Instead of:
Downloading files
Writing parsers
Handling edge cases manually
We integrate the Box MCP Server.
What This Enables
Remote, in-cloud processing
Files are never downloaded locally. The MCP server processes them directly inside Box faster and more securely.Automatic format handling
PDFs, images, and Word documents are handled uniformly without custom code.Tool-driven workflows
The LLM is given access to predefined tools such aslist_filesextract_textquery_document
The model decides which tool to call, and MCP executes it.
This architectural shift transforms the system from a fragile script into a production-ready data pipeline.
Phase 3: Multi-Agent Orchestration
Scaling Logic with the A2A Protocol
As systems mature, requirements grow beyond simple extraction:
Anomaly detection
Invoice classification
Payment scheduling
Compliance checks
Cramming all of this into a single “super-agent” creates a monolith that is hard to test, hard to extend, and harder to debug.
The solution is multi-agent architecture, coordinated using the Agent-to-Agent (A2A) protocol.
What Is A2A?
A2A is an open standard that allows agents to:
Discover each other’s capabilities
Communicate through a shared protocol
Delegate tasks without tight coupling
Agents can be:
Built by different teams
Running on different machines
Using different internal tools
Yet still collaborate seamlessly.
A Practical Multi-Agent Invoice Architecture
A scalable invoice system can be decomposed into specialized agents, each doing one thing well.
1. Files Agent
Responsibility:
Locate and list documents.
Uses Box MCP tools to scan folders
Knows nothing about extraction or business rules
2. Extraction Agent
Responsibility:
Convert documents into structured data.
Uses MCP’s AI extraction tools
Focuses only on parsing and normalization
3. Orchestrator Agent
Responsibility:
Coordinate the workflow.
Has no direct access to MCP tools
Breaks high-level requests into steps
Delegates tasks to the appropriate agents
How the Workflow Executes
User requests: “Process all invoices.”
The orchestrator creates a plan
Files Agent returns a list of relevant documents
The orchestrator assigns each document to the extraction agent.
Results are aggregated and passed downstream (DB, analytics, alerts)
Each agent is defined using an AgentCard via the Agent Development Kit (ADK), which declares:
Capabilities
Endpoints
Communication rules
This allows dynamic discovery and flexible orchestration.
Final Takeaway
Scalable AI systems are not built by adding more prompts or bigger models.
They are built by evolving architecture.
Scripts validate ideas
MCP standardizes data access
A2A enables composable intelligence
When combined, these layers transform AI from a single tool into a coordinated ecosystem secure, modular, and future-proof.
If you’re serious about deploying AI beyond demos, this progression isn’t optional.
It’s the blueprint.
Architecting Scalable AI Systems the Right Way
Most AI systems don’t fail because the model is bad.
They fail because the architecture never evolved beyond the demo stage.
What usually starts as a simple script built to “just test the idea” slowly becomes business-critical. At that point, the cracks begin to show: brittle code, manual workflows, security risks, and systems that collapse the moment scale is introduced.
This article walks through a practical architectural progression for AI systems:
Local scripts (Proof of Concept)
Model Context Protocol (MCP) for scalable data access
Multi-Agent Systems using Agent-to-Agent (A2A) orchestration
We’ll ground this discussion using a real-world example: automated invoice processing.
Phase 1: Proof of Concept
The Manual, Script-First Approach
Every AI project begins here, and that’s not a mistake. It’s necessary.
At this stage, the goal is simple:
👉 Can we extract meaningful structured data from invoices using an LLM?
Typical Workflow
A basic invoice-processing script usually looks like this:
Local Storage
Invoices are manually downloaded and placed into a predefined local folder.OCR & Text Extraction
Libraries such asPdfReaderOCR utilities read PDF files and convert them into raw text.LLM-Based Extraction
The extracted text is sent to a model (e.g., Gemini 2.5 Flash) with a prompt to identify fields like:Client name
Invoice amount
Product or service name
Local Persistence
The model returns structured JSON, which is parsed and stored in a localsqlite3database.
This approach is fast to build and excellent for validating assumptions.
Where This Breaks Down
The problems don’t appear immediately, but they are inevitable:
Manual file handling
Every document must be downloaded locally. This is slow, error-prone, and not scalable.Format rigidity
The script often works only for PDFs. Adding JPEGs, scans, or Word documents requires rewriting large portions of the pipeline.Security risks
Storing financial documents locally introduces compliance and data-leak concerns.Tight coupling
OCR logic, business logic, and storage are deeply intertwined. Any change breaks something else.
At this point, most teams attempt to “patch” the script.
That’s usually the wrong move.
Phase 2: Scaling with the Model Context Protocol (MCP)
To move beyond fragile scripts, AI systems need a standardized way to interact with data and tools.
This is exactly the problem that the Model Context Protocol (MCP) solves.
The M × N Integration Problem
Without MCP:
M AI applications
N data sources
Result in M × N custom integrations.
This grows exponentially—and becomes unmaintainable very quickly.
What MCP Changes
MCP introduces a clean separation:
MCP Server
Defines tools and executes logic (file access, extraction, search).MCP Client (your AI app)
Simply calls tools without caring how they are implemented.
This reduces complexity to M + N integrations.
Replacing Custom OCR with the Box MCP Server
In our invoice system, the biggest architectural win comes from removing custom OCR logic entirely.
Instead of:
Downloading files
Writing parsers
Handling edge cases manually
We integrate the Box MCP Server.
What This Enables
Remote, in-cloud processing
Files are never downloaded locally. The MCP server processes them directly inside Box faster and more securely.Automatic format handling
PDFs, images, and Word documents are handled uniformly without custom code.Tool-driven workflows
The LLM is given access to predefined tools such aslist_filesextract_textquery_document
The model decides which tool to call, and MCP executes it.
This architectural shift transforms the system from a fragile script into a production-ready data pipeline.
Phase 3: Multi-Agent Orchestration
Scaling Logic with the A2A Protocol
As systems mature, requirements grow beyond simple extraction:
Anomaly detection
Invoice classification
Payment scheduling
Compliance checks
Cramming all of this into a single “super-agent” creates a monolith that is hard to test, hard to extend, and harder to debug.
The solution is multi-agent architecture, coordinated using the Agent-to-Agent (A2A) protocol.
What Is A2A?
A2A is an open standard that allows agents to:
Discover each other’s capabilities
Communicate through a shared protocol
Delegate tasks without tight coupling
Agents can be:
Built by different teams
Running on different machines
Using different internal tools
Yet still collaborate seamlessly.
A Practical Multi-Agent Invoice Architecture
A scalable invoice system can be decomposed into specialized agents, each doing one thing well.
1. Files Agent
Responsibility:
Locate and list documents.
Uses Box MCP tools to scan folders
Knows nothing about extraction or business rules
2. Extraction Agent
Responsibility:
Convert documents into structured data.
Uses MCP’s AI extraction tools
Focuses only on parsing and normalization
3. Orchestrator Agent
Responsibility:
Coordinate the workflow.
Has no direct access to MCP tools
Breaks high-level requests into steps
Delegates tasks to the appropriate agents
How the Workflow Executes
User requests: “Process all invoices.”
The orchestrator creates a plan
Files Agent returns a list of relevant documents
The orchestrator assigns each document to the extraction agent.
Results are aggregated and passed downstream (DB, analytics, alerts)
Each agent is defined using an AgentCard via the Agent Development Kit (ADK), which declares:
Capabilities
Endpoints
Communication rules
This allows dynamic discovery and flexible orchestration.
Final Takeaway
Scalable AI systems are not built by adding more prompts or bigger models.
They are built by evolving architecture.
Scripts validate ideas
MCP standardizes data access
A2A enables composable intelligence
When combined, these layers transform AI from a single tool into a coordinated ecosystem secure, modular, and future-proof.
If you’re serious about deploying AI beyond demos, this progression isn’t optional.
It’s the blueprint.


Architecting Scalable AI Systems the Right Way
Most AI systems don’t fail because the model is bad.
They fail because the architecture never evolved beyond the demo stage.
What usually starts as a simple script built to “just test the idea” slowly becomes business-critical. At that point, the cracks begin to show: brittle code, manual workflows, security risks, and systems that collapse the moment scale is introduced.
This article walks through a practical architectural progression for AI systems:
Local scripts (Proof of Concept)
Model Context Protocol (MCP) for scalable data access
Multi-Agent Systems using Agent-to-Agent (A2A) orchestration
We’ll ground this discussion using a real-world example: automated invoice processing.
Phase 1: Proof of Concept
The Manual, Script-First Approach
Every AI project begins here, and that’s not a mistake. It’s necessary.
At this stage, the goal is simple:
👉 Can we extract meaningful structured data from invoices using an LLM?
Typical Workflow
A basic invoice-processing script usually looks like this:
Local Storage
Invoices are manually downloaded and placed into a predefined local folder.OCR & Text Extraction
Libraries such asPdfReaderOCR utilities read PDF files and convert them into raw text.LLM-Based Extraction
The extracted text is sent to a model (e.g., Gemini 2.5 Flash) with a prompt to identify fields like:Client name
Invoice amount
Product or service name
Local Persistence
The model returns structured JSON, which is parsed and stored in a localsqlite3database.
This approach is fast to build and excellent for validating assumptions.
Where This Breaks Down
The problems don’t appear immediately, but they are inevitable:
Manual file handling
Every document must be downloaded locally. This is slow, error-prone, and not scalable.Format rigidity
The script often works only for PDFs. Adding JPEGs, scans, or Word documents requires rewriting large portions of the pipeline.Security risks
Storing financial documents locally introduces compliance and data-leak concerns.Tight coupling
OCR logic, business logic, and storage are deeply intertwined. Any change breaks something else.
At this point, most teams attempt to “patch” the script.
That’s usually the wrong move.
Phase 2: Scaling with the Model Context Protocol (MCP)
To move beyond fragile scripts, AI systems need a standardized way to interact with data and tools.
This is exactly the problem that the Model Context Protocol (MCP) solves.
The M × N Integration Problem
Without MCP:
M AI applications
N data sources
Result in M × N custom integrations.
This grows exponentially—and becomes unmaintainable very quickly.
What MCP Changes
MCP introduces a clean separation:
MCP Server
Defines tools and executes logic (file access, extraction, search).MCP Client (your AI app)
Simply calls tools without caring how they are implemented.
This reduces complexity to M + N integrations.
Replacing Custom OCR with the Box MCP Server
In our invoice system, the biggest architectural win comes from removing custom OCR logic entirely.
Instead of:
Downloading files
Writing parsers
Handling edge cases manually
We integrate the Box MCP Server.
What This Enables
Remote, in-cloud processing
Files are never downloaded locally. The MCP server processes them directly inside Box faster and more securely.Automatic format handling
PDFs, images, and Word documents are handled uniformly without custom code.Tool-driven workflows
The LLM is given access to predefined tools such aslist_filesextract_textquery_document
The model decides which tool to call, and MCP executes it.
This architectural shift transforms the system from a fragile script into a production-ready data pipeline.
Phase 3: Multi-Agent Orchestration
Scaling Logic with the A2A Protocol
As systems mature, requirements grow beyond simple extraction:
Anomaly detection
Invoice classification
Payment scheduling
Compliance checks
Cramming all of this into a single “super-agent” creates a monolith that is hard to test, hard to extend, and harder to debug.
The solution is multi-agent architecture, coordinated using the Agent-to-Agent (A2A) protocol.
What Is A2A?
A2A is an open standard that allows agents to:
Discover each other’s capabilities
Communicate through a shared protocol
Delegate tasks without tight coupling
Agents can be:
Built by different teams
Running on different machines
Using different internal tools
Yet still collaborate seamlessly.
A Practical Multi-Agent Invoice Architecture
A scalable invoice system can be decomposed into specialized agents, each doing one thing well.
1. Files Agent
Responsibility:
Locate and list documents.
Uses Box MCP tools to scan folders
Knows nothing about extraction or business rules
2. Extraction Agent
Responsibility:
Convert documents into structured data.
Uses MCP’s AI extraction tools
Focuses only on parsing and normalization
3. Orchestrator Agent
Responsibility:
Coordinate the workflow.
Has no direct access to MCP tools
Breaks high-level requests into steps
Delegates tasks to the appropriate agents
How the Workflow Executes
User requests: “Process all invoices.”
The orchestrator creates a plan
Files Agent returns a list of relevant documents
The orchestrator assigns each document to the extraction agent.
Results are aggregated and passed downstream (DB, analytics, alerts)
Each agent is defined using an AgentCard via the Agent Development Kit (ADK), which declares:
Capabilities
Endpoints
Communication rules
This allows dynamic discovery and flexible orchestration.
Final Takeaway
Scalable AI systems are not built by adding more prompts or bigger models.
They are built by evolving architecture.
Scripts validate ideas
MCP standardizes data access
A2A enables composable intelligence
When combined, these layers transform AI from a single tool into a coordinated ecosystem secure, modular, and future-proof.
If you’re serious about deploying AI beyond demos, this progression isn’t optional.
It’s the blueprint.


Other Blogs
Other Blogs
Check our other project Blogs with useful insight and information for your businesses
Other Blogs
Other Blogs
Check our other project Blogs with useful insight and information for your businesses



