From Cockiness to 8 AI Agents: Building a Production Tax System in 3 Months
The Humbling Beginning: When All AIs Failed
It started with brotherly cockiness. My brother - the smartest person I knew growing up, #1 in tax at university, fresh off graduating top of his class at Oxford Tax - was "dying with work" while I was living my best life automating everything with AI.
"Give me a valuable task of what you are doing, and I will easily 1-shot prompt it into existence," I boasted.
Reality hit hard. o1 pro failed. Claude 3.7 failed. DeepSeek failed. Grok failed. Gemini 2.5 failed. Every single AI model I threw at Singapore tax computation produced nonsense.
The Core Problem: Teaching AI complex tax law is like teaching someone to drive by only letting them touch the steering wheel once every 5 minutes and only telling them "you crashed" without explaining why. The feedback loops are slow, visibility is limited, and the system is inherently fragmented.
What followed was a humbling two-week deep dive, reverse-engineering tax papers, studying IRAS ITA guides, and rebuilding my understanding from first principles. The breakthrough: AI won't replace tax professionals, but it can be a powerful assistant with the right architecture and guidance.
The Journey: 3 Months, 20+ Projects, 8 Final Agents
Phase 1: The Foundation (March 2025)
Project: tax-annihilator-v1
I started with traditional programming - a Python/Flask application with ML-powered expense categorization using scikit-learn. It could process transactions from JSON, CSV, and Excel, apply Singapore tax rules, and generate reports.
# The naive beginning - ML but not LLM
class TaxCalculator:
def __init__(self):
self.income_tax_rate = 0.17 # Singapore corporate tax
self.partial_exemption_threshold = 10000
self.partial_exemption_rate = 0.75
def calculate_tax(self, chargeable_income):
# First $10,000 at 75% exemption
# Next $190,000 at 50% exemption
# Complex but deterministic rules
Key Learning: Rule-based systems work but don't scale. Every edge case requires new code.
The Research Phase: I built python-dl-iras-etax-guides
to download every IRAS guide, realizing I needed deep domain knowledge to make this work.
Early Business Development: Created tax-linkedin-email
- a sophisticated automation tool that scraped LinkedIn, researched companies via Perplexity API, and generated personalized outreach. This wasn't just about building tech; it was about understanding the market.
Phase 2: The LLM Awakening (April 2025)
Project: tax-annihilator-v2
The first real LLM integration. I built a modular system with swappable engines - rule-based vs LLM-based - to benchmark approaches.
# First LLM integration - the excitement was real
class LLMExpenseEngine:
def tag_expense(self, transaction):
prompt = f"""
Analyze this Singapore business expense:
Vendor: {transaction.vendor}
Amount: ${transaction.amount}
Description: {transaction.description}
Determine tax treatment under Singapore tax law.
"""
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
Security Revelation: Financial data + LLMs = privacy concerns. Built privacy measures and local processing options.
The UI Challenge: Created multiple frontend versions:
taxtagger-frontend
- React showcase for demosanimejs-taxtagger
- Animated explanations of tax flow (because tax is complex!)
RAG Implementation: langchain-tax
introduced vector search over Singapore tax documents:
# The game-changer - RAG for tax knowledge
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
vectorstore = FAISS.from_documents(tax_documents, embeddings)
Phase 3: The Agent Revolution (May 2025)
Project: tax-agent-3
This is where things got serious. I abandoned monolithic LLM calls for specialized agents:
# The breakthrough - specialized agents
agents = {
"orchestrator": OrchestratorAgent(model="gemini-2.0-flash"),
"expense_classifier": ExpenseClassificationAgent(model="gpt-4o-mini"),
"capital_allowance": CapitalAllowanceAgent(model="gpt-4o"),
"evaluator": EvaluatorAgent(model="gpt-4o-mini")
}
# Each agent had specific prompts and tools
class ExpenseClassificationAgent:
tools = [CalculatorTool(), KnowledgeRetrievalTool(), LookupTool()]
def process(self, transaction):
# Specialized logic for expense classification
# Can retrieve specific tax rules when needed
Multi-Pass Validation: Built systems that use different models in sequence, escalating to more powerful (expensive) models only when needed.
Phase 4: The Crown Jewel (June 2025)
Project: tax-compute-mvp - The 8-Agent System
After 20+ iterations, I finally cracked it. Eight specialized agents, each mastering one aspect of Singapore tax law:
# The final architecture - 8 specialized agents
# Each agent handles a specific aspect of Singapore tax law:
# - Income classification and exemptions
# - Expense breakdown and categorization
# - Source mapping and deduction rules
# - Special deductions (R&D, training)
# - Capital allowances and depreciation
# - Final tax computation with exemptions
#
# Combined system accuracy: 90%+
The Results:
- Over 90% accuracy matching tax professionals
- Under $1 cost vs hundreds for manual process (orders of magnitude cheaper)
- Minutes vs hours of processing time (dramatically faster)
Technical Innovations That Made It Work
1. Schema-First Architecture
The biggest breakthrough: forcing LLMs to output valid JSON with automatic fixing.
// Every agent output is validated and auto-corrected
const schema = z.object({
items: z.array(z.object({
id: z.string().uuid(), // Auto-generates if missing
amount: z.number(), // Auto-coerces strings
description: z.string()
})),
total: z.number()
});
// Automatic fixes for common LLM errors
if (output.id === "FAKE-ID-123") {
output.id = crypto.randomUUID();
}
2. Arithmetic Post-Processing
LLMs are bad at math. So we built automatic correction:
// LLM identifies items correctly but fails addition
// Before: { items: [100, 200, 300], total: 500 } ❌
// After: { items: [100, 200, 300], total: 600 } ✅
function fixArithmetic(output) {
const calculatedTotal = output.items.reduce((sum, item) =>
sum + item.amount, 0
);
if (output.total !== calculatedTotal) {
console.log(`🧮 Fixed: ${output.total} → ${calculatedTotal}`);
output.total = calculatedTotal;
}
}
3. Hybrid Model Strategy
Not all tasks need expensive models:
// Different tasks require different model capabilities
// Complex reasoning tasks use more powerful models
// Pattern matching and basic calculations use efficient models
// Result: high accuracy at a fraction of the cost
4. YAML-Based Prompt Management
Tax rules as data, not code:
# Version-controlled tax logic
critical_rules:
- rule: "S-PLATED VEHICLES ALWAYS NON-DEDUCTIBLE"
section: "s15(1)(o)"
examples:
- "SBA1234A - private car"
- "Any S-plate registration"
test_cases: ["uber_car_expense", "grab_vehicle_lease"]
5. RAG with Retrieval Gating
Not every query needs document retrieval:
def should_retrieve(query):
# Simple queries don't need RAG
if "calculate total" in query.lower():
return False
# Complex tax rules need documentation
if any(term in query for term in ["s15", "capital allowance", "exemption"]):
return True
Business Model Evolution
The Market Journey
Started (March): "We'll help SMEs with tax filing!"
- Reality: SMEs spend typical accounting fees annually, trust their accountant, not interested
Pivot 1 (April): "We'll be cheaper than accountants!"
- Reality: Trust > Price for tax matters
Pivot 2 (May): "We'll target growing companies!"
- Reality: The mythical "middle market" doesn't exist
Final Position (June): "AI Tax Brain for Modern Businesses"
- Freemium for small businesses
- Affordable monthly subscriptions for growing companies
- Enterprise deals for large corporations
- Channel partnerships with accountants
The Numbers That Matter
# Key Performance Metrics:
Accuracy: 90%+
Cost_Reduction: 99%
Time_Savings: 98%
# Business Model:
- Freemium for small businesses
- Affordable monthly subscriptions for growing companies
- Enterprise deals for large corporations
- Channel partnerships with accountants
Key Lessons for AI Builders
1. Domain Expertise Is Irreplaceable
My brother's tax knowledge was the secret weapon. AI amplifies expertise; it doesn't replace it.
2. Evolution, Not Revolution
- March: Basic ML categorization
- April: First LLM integration
- May: Multi-agent architecture
- June: Production-ready system
Each iteration built on previous learnings.
3. Schema-First for Production
// This saved the project
const validateOutput = (output: unknown): ValidatedOutput => {
return outputSchema.parse(output); // Throws if invalid
};
4. Hybrid Models Are The Future
Don't use a sledgehammer for every nail:
- Complex reasoning: Expensive models (o3-mini)
- Pattern matching: Cheap models (gpt-4o-mini)
- Calculations: Post-processing (not LLM)
5. The Business Model Takes More Iteration Than The Tech
- 4 different brandings (Deductly → TaxEase → TaxTag)
- 3 pricing models
- 2 market segments explored
- 1 final positioning that worked
6. Building In Public Accelerates Learning
20+ repositories in 3 months seems chaotic, but each was a learning experiment. Fast iteration beats perfect planning.
Technical Assets Created (All Reusable)
# 1. Schema Validation System
class SchemaValidator:
"""Force LLMs to output valid, consistent JSON"""
# 2. Arithmetic Post-Processor
class MathFixer:
"""Automatically fix LLM calculation errors"""
# 3. Multi-Agent Pipeline
class AgentPipeline:
"""Orchestrate specialized agents with error recovery"""
# 4. Hybrid Model Router
class ModelSelector:
"""Choose optimal model based on task complexity"""
# 5. YAML Prompt Manager
class PromptVersionControl:
"""Manage prompts as configuration, not code"""
# 6. RAG with Gating
class SmartRetrieval:
"""Retrieve documents only when necessary"""
What's Next?
The tax compliance journey provided invaluable lessons, but the real insight is bigger: production AI systems require specialized architectures, not just API calls to LLMs.
The 8-agent system we built for tax can be adapted to any complex domain:
- Legal document analysis
- Medical diagnosis assistance
- Financial planning
- Compliance automation
The key is understanding that AI agents should be specialists, not generalists. Just like my brother specializes in tax, each AI agent should master one thing exceptionally well.
The Real Success Metric
Not the high accuracy. Not the minimal cost. Not even the dramatic speed improvement.
The real success? My brother now uses the system daily. The smartest tax person I know trusts AI to help with his work. That's when I knew we'd built something real.
Building AI for complex domains? Let's connect. The journey from "AI can't do this" to "AI does this better than humans" is shorter than you think - with the right architecture.
Appendix: The Full Project Timeline
March 2025:
- tax-annihilator-v1: ML-based categorization
- python-dl-iras-etax-guides: Domain research
- tax-linkedin-email: Market validation
April 2025:
- tax-annihilator-v2: First LLM integration
- tax-adjustment-analyzer: P&L analysis
- taxtagger-frontend: Demo UI
- animejs-taxtagger: Visual explanations
- langchain-tax: RAG implementation
May 2025:
- tax-adjustment-analyzer-v2: Multi-pass validation
- tax-agent-3: Agent architecture
- taxtagger-mcp: Integration attempts
June 2025:
- tax-agent-sdk: Productization
- sg-corp-tax: Specialization
- tax-agent-v4-js: JavaScript port
- tax-ai-with-fe: Full-stack app
- tax-tagger-fe-10jun: Final UI
- tax-compute-mvp: The 8-agent system
20+ projects. 3 months. 1 working system. Countless lessons learned.