From Cockiness to 8 AI Agents: Building a Production Tax System in 3 Months

    How my brother's tax struggles led to 20+ iterations, 8 specialized AI agents, and a 90%+ accurate system that costs under $1 instead of hundreds of dollars

    TECHNOLOGY AIBUSINESS ENTREPRENEURSHIP

    The Humbling Beginning: When All AIs Failed

    It started with brotherly cockiness. My brother - the smartest person I knew growing up, #1 in tax at university, fresh off graduating top of his class at Oxford Tax - was "dying with work" while I was living my best life automating everything with AI.

    "Give me a valuable task of what you are doing, and I will easily 1-shot prompt it into existence," I boasted.

    Reality hit hard. o1 pro failed. Claude 3.7 failed. DeepSeek failed. Grok failed. Gemini 2.5 failed. Every single AI model I threw at Singapore tax computation produced nonsense.

    The Core Problem: Teaching AI complex tax law is like teaching someone to drive by only letting them touch the steering wheel once every 5 minutes and only telling them "you crashed" without explaining why. The feedback loops are slow, visibility is limited, and the system is inherently fragmented.

    What followed was a humbling two-week deep dive, reverse-engineering tax papers, studying IRAS ITA guides, and rebuilding my understanding from first principles. The breakthrough: AI won't replace tax professionals, but it can be a powerful assistant with the right architecture and guidance.


    The Journey: 3 Months, 20+ Projects, 8 Final Agents

    Phase 1: The Foundation (March 2025)

    Project: tax-annihilator-v1

    I started with traditional programming - a Python/Flask application with ML-powered expense categorization using scikit-learn. It could process transactions from JSON, CSV, and Excel, apply Singapore tax rules, and generate reports.

    # The naive beginning - ML but not LLM
    class TaxCalculator:
        def __init__(self):
            self.income_tax_rate = 0.17  # Singapore corporate tax
            self.partial_exemption_threshold = 10000
            self.partial_exemption_rate = 0.75
        
        def calculate_tax(self, chargeable_income):
            # First $10,000 at 75% exemption
            # Next $190,000 at 50% exemption
            # Complex but deterministic rules

    Key Learning: Rule-based systems work but don't scale. Every edge case requires new code.

    The Research Phase: I built python-dl-iras-etax-guides to download every IRAS guide, realizing I needed deep domain knowledge to make this work.

    Early Business Development: Created tax-linkedin-email - a sophisticated automation tool that scraped LinkedIn, researched companies via Perplexity API, and generated personalized outreach. This wasn't just about building tech; it was about understanding the market.


    Phase 2: The LLM Awakening (April 2025)

    Project: tax-annihilator-v2

    The first real LLM integration. I built a modular system with swappable engines - rule-based vs LLM-based - to benchmark approaches.

    # First LLM integration - the excitement was real
    class LLMExpenseEngine:
        def tag_expense(self, transaction):
            prompt = f"""
            Analyze this Singapore business expense:
            Vendor: {transaction.vendor}
            Amount: ${transaction.amount}
            Description: {transaction.description}
            
            Determine tax treatment under Singapore tax law.
            """
            
            response = openai.ChatCompletion.create(
                model="gpt-4",
                messages=[{"role": "user", "content": prompt}]
            )

    Security Revelation: Financial data + LLMs = privacy concerns. Built privacy measures and local processing options.

    The UI Challenge: Created multiple frontend versions:

    • taxtagger-frontend - React showcase for demos
    • animejs-taxtagger - Animated explanations of tax flow (because tax is complex!)

    RAG Implementation: langchain-tax introduced vector search over Singapore tax documents:

    # The game-changer - RAG for tax knowledge
    from langchain.embeddings import HuggingFaceEmbeddings
    from langchain.vectorstores import FAISS
    
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2"
    )
    vectorstore = FAISS.from_documents(tax_documents, embeddings)

    Phase 3: The Agent Revolution (May 2025)

    Project: tax-agent-3

    This is where things got serious. I abandoned monolithic LLM calls for specialized agents:

    # The breakthrough - specialized agents
    agents = {
        "orchestrator": OrchestratorAgent(model="gemini-2.0-flash"),
        "expense_classifier": ExpenseClassificationAgent(model="gpt-4o-mini"),
        "capital_allowance": CapitalAllowanceAgent(model="gpt-4o"),
        "evaluator": EvaluatorAgent(model="gpt-4o-mini")
    }
    
    # Each agent had specific prompts and tools
    class ExpenseClassificationAgent:
        tools = [CalculatorTool(), KnowledgeRetrievalTool(), LookupTool()]
        
        def process(self, transaction):
            # Specialized logic for expense classification
            # Can retrieve specific tax rules when needed

    Multi-Pass Validation: Built systems that use different models in sequence, escalating to more powerful (expensive) models only when needed.


    Phase 4: The Crown Jewel (June 2025)

    Project: tax-compute-mvp - The 8-Agent System

    After 20+ iterations, I finally cracked it. Eight specialized agents, each mastering one aspect of Singapore tax law:

    # The final architecture - 8 specialized agents
    # Each agent handles a specific aspect of Singapore tax law:
    # - Income classification and exemptions
    # - Expense breakdown and categorization  
    # - Source mapping and deduction rules
    # - Special deductions (R&D, training)
    # - Capital allowances and depreciation
    # - Final tax computation with exemptions
    # 
    # Combined system accuracy: 90%+

    The Results:

    • Over 90% accuracy matching tax professionals
    • Under $1 cost vs hundreds for manual process (orders of magnitude cheaper)
    • Minutes vs hours of processing time (dramatically faster)

    Technical Innovations That Made It Work

    1. Schema-First Architecture

    The biggest breakthrough: forcing LLMs to output valid JSON with automatic fixing.

    // Every agent output is validated and auto-corrected
    const schema = z.object({
      items: z.array(z.object({
        id: z.string().uuid(),  // Auto-generates if missing
        amount: z.number(),     // Auto-coerces strings
        description: z.string()
      })),
      total: z.number()
    });
    
    // Automatic fixes for common LLM errors
    if (output.id === "FAKE-ID-123") {
      output.id = crypto.randomUUID();
    }

    2. Arithmetic Post-Processing

    LLMs are bad at math. So we built automatic correction:

    // LLM identifies items correctly but fails addition
    // Before: { items: [100, 200, 300], total: 500 }  ❌
    // After:  { items: [100, 200, 300], total: 600 }  ✅
    
    function fixArithmetic(output) {
      const calculatedTotal = output.items.reduce((sum, item) => 
        sum + item.amount, 0
      );
      if (output.total !== calculatedTotal) {
        console.log(`🧮 Fixed: ${output.total}${calculatedTotal}`);
        output.total = calculatedTotal;
      }
    }

    3. Hybrid Model Strategy

    Not all tasks need expensive models:

    // Different tasks require different model capabilities
    // Complex reasoning tasks use more powerful models
    // Pattern matching and basic calculations use efficient models
    // Result: high accuracy at a fraction of the cost

    4. YAML-Based Prompt Management

    Tax rules as data, not code:

    # Version-controlled tax logic
    critical_rules:
      - rule: "S-PLATED VEHICLES ALWAYS NON-DEDUCTIBLE"
        section: "s15(1)(o)"
        examples:
          - "SBA1234A - private car"
          - "Any S-plate registration"
        test_cases: ["uber_car_expense", "grab_vehicle_lease"]

    5. RAG with Retrieval Gating

    Not every query needs document retrieval:

    def should_retrieve(query):
        # Simple queries don't need RAG
        if "calculate total" in query.lower():
            return False
        
        # Complex tax rules need documentation
        if any(term in query for term in ["s15", "capital allowance", "exemption"]):
            return True

    Business Model Evolution

    The Market Journey

    Started (March): "We'll help SMEs with tax filing!"

    • Reality: SMEs spend typical accounting fees annually, trust their accountant, not interested

    Pivot 1 (April): "We'll be cheaper than accountants!"

    • Reality: Trust > Price for tax matters

    Pivot 2 (May): "We'll target growing companies!"

    • Reality: The mythical "middle market" doesn't exist

    Final Position (June): "AI Tax Brain for Modern Businesses"

    • Freemium for small businesses
    • Affordable monthly subscriptions for growing companies
    • Enterprise deals for large corporations
    • Channel partnerships with accountants

    The Numbers That Matter

    # Key Performance Metrics:
      Accuracy: 90%+
      Cost_Reduction: 99%  
      Time_Savings: 98%
      
    # Business Model:
      - Freemium for small businesses
      - Affordable monthly subscriptions for growing companies
      - Enterprise deals for large corporations
      - Channel partnerships with accountants

    Key Lessons for AI Builders

    1. Domain Expertise Is Irreplaceable

    My brother's tax knowledge was the secret weapon. AI amplifies expertise; it doesn't replace it.

    2. Evolution, Not Revolution

    • March: Basic ML categorization
    • April: First LLM integration
    • May: Multi-agent architecture
    • June: Production-ready system

    Each iteration built on previous learnings.

    3. Schema-First for Production

    // This saved the project
    const validateOutput = (output: unknown): ValidatedOutput => {
      return outputSchema.parse(output);  // Throws if invalid
    };

    4. Hybrid Models Are The Future

    Don't use a sledgehammer for every nail:

    • Complex reasoning: Expensive models (o3-mini)
    • Pattern matching: Cheap models (gpt-4o-mini)
    • Calculations: Post-processing (not LLM)

    5. The Business Model Takes More Iteration Than The Tech

    • 4 different brandings (Deductly → TaxEase → TaxTag)
    • 3 pricing models
    • 2 market segments explored
    • 1 final positioning that worked

    6. Building In Public Accelerates Learning

    20+ repositories in 3 months seems chaotic, but each was a learning experiment. Fast iteration beats perfect planning.


    Technical Assets Created (All Reusable)

    # 1. Schema Validation System
    class SchemaValidator:
        """Force LLMs to output valid, consistent JSON"""
        
    # 2. Arithmetic Post-Processor  
    class MathFixer:
        """Automatically fix LLM calculation errors"""
        
    # 3. Multi-Agent Pipeline
    class AgentPipeline:
        """Orchestrate specialized agents with error recovery"""
        
    # 4. Hybrid Model Router
    class ModelSelector:
        """Choose optimal model based on task complexity"""
        
    # 5. YAML Prompt Manager
    class PromptVersionControl:
        """Manage prompts as configuration, not code"""
        
    # 6. RAG with Gating
    class SmartRetrieval:
        """Retrieve documents only when necessary"""

    What's Next?

    The tax compliance journey provided invaluable lessons, but the real insight is bigger: production AI systems require specialized architectures, not just API calls to LLMs.

    The 8-agent system we built for tax can be adapted to any complex domain:

    • Legal document analysis
    • Medical diagnosis assistance
    • Financial planning
    • Compliance automation

    The key is understanding that AI agents should be specialists, not generalists. Just like my brother specializes in tax, each AI agent should master one thing exceptionally well.


    The Real Success Metric

    Not the high accuracy. Not the minimal cost. Not even the dramatic speed improvement.

    The real success? My brother now uses the system daily. The smartest tax person I know trusts AI to help with his work. That's when I knew we'd built something real.

    Building AI for complex domains? Let's connect. The journey from "AI can't do this" to "AI does this better than humans" is shorter than you think - with the right architecture.


    Appendix: The Full Project Timeline

    March 2025:
    - tax-annihilator-v1: ML-based categorization
    - python-dl-iras-etax-guides: Domain research
    - tax-linkedin-email: Market validation
    
    April 2025:
    - tax-annihilator-v2: First LLM integration
    - tax-adjustment-analyzer: P&L analysis
    - taxtagger-frontend: Demo UI
    - animejs-taxtagger: Visual explanations
    - langchain-tax: RAG implementation
    
    May 2025:
    - tax-adjustment-analyzer-v2: Multi-pass validation
    - tax-agent-3: Agent architecture
    - taxtagger-mcp: Integration attempts
    
    June 2025:
    - tax-agent-sdk: Productization
    - sg-corp-tax: Specialization
    - tax-agent-v4-js: JavaScript port
    - tax-ai-with-fe: Full-stack app
    - tax-tagger-fe-10jun: Final UI
    - tax-compute-mvp: The 8-agent system

    20+ projects. 3 months. 1 working system. Countless lessons learned.