How We Built an AI Engineer for a 400,000-File Legacy Migration

How We Built an AI Engineer for a 400,000-File Legacy Migration

Download MarkDown
Velocity Software Solutions
Velocity Software Solutions
May 4, 2026·15 min read

Somewhere around January 2026, we ran into a problem that no amount of developer hiring was going to solve.

One of our clients — a large ecommerce company based in the UK — needed to migrate their entire platform. The system had been running for over 15 years. It started as a relatively straightforward PHP application, but after a decade and a half of feature additions, hotfixes, integrations, team changes, and “temporary” workarounds that became permanent, it had grown into something that no single person fully understood. Over 400,000 PHP files. Some individual files exceeding 15,000 lines of code. More than 600 database tables. Approximately 270 cron jobs running throughout the day. Deep integrations with third-party marketplaces and dozens of external applications.

The target: a modern stack built on Node.js and React for the backend, with a MedusaJS-based storefront on Next.js. The old platform’s dependencies were losing vendor support one by one, so migration was not optional — it was a matter of when, not if.

The real problem was not the migration itself. It was understanding what we were migrating. With 15 years of accumulated business logic, tribal knowledge scattered across departed team members, and documentation that ranged from outdated to nonexistent, we needed something that could absorb the entire system’s context and help our engineers make sense of it. That is when we decided to build what we internally started calling “the AI engineer.”

The Scale of the Problem

Before getting into the solution, it is worth understanding just how tangled a 15-year-old enterprise ecommerce system gets. This was not a clean MVC application with well-defined boundaries. It was layers upon layers of implementation — a PHP monolith that had absorbed business rules from multiple eras of the company’s growth.

DimensionScale
Codebase400,000+ PHP files
Largest files15,000+ lines each
Database tables600+
Scheduled cron jobs~270
Third-party integrationsMultiple marketplaces + external apps
System age15+ years in production

The cron jobs alone were a nightmare. Some ran every minute, others once a day, a few once a week. Many of them had undocumented dependencies on each other — cron A populated a staging table that cron B consumed an hour later, but nothing in the code made this relationship explicit. Miss that dependency during migration and you silently break order processing, inventory sync, or marketplace listings.

A typical approach would be to throw a team of senior developers at the codebase, let them spend weeks reading code, and produce a migration plan. We tried that initially. The problem was that no human could hold enough of the system in their head at once. Developer A would map out the order management subsystem while Developer B worked on the product catalog, and when they compared notes, they would discover hidden coupling they had both missed. The system was simply too large for human working memory.

Building the AI Engineer

What we needed was not a chatbot. We needed an engineer that could hold the entire codebase in context, understand the business logic, write code, and review migration work done by our human team. We built this in stages, and honestly, the first few attempts did not work very well.

Stage 1: Knowledge Ingestion

The first task was giving the AI access to everything the human team would normally reference. We started by embedding all available knowledge sources:

  • Jira tickets and tasks: Over a decade of tickets containing requirements, bug reports, feature discussions, and implementation notes. These were surprisingly valuable — the ticket comments often contained the “why” behind decisions that the code itself did not explain.
  • Project wikis: Internal documentation that described business processes, integration specifications, and deployment procedures. Much of it was outdated, but even outdated docs told us what the system was supposed to do at various points in its history.
  • Database schema and relationships: All 600+ tables, their columns, foreign keys, indexes, and — crucially — the implicit relationships that were not enforced by foreign keys but relied on by application code.
  • Cron job definitions and schedules: The full list of cron jobs, their schedules, and their entry point scripts.

For the embedding infrastructure, we used Qdrant as our vector database. It handled the volume well and its filtering capabilities turned out to be essential — we needed to search within specific knowledge domains (e.g., “only Jira tickets related to the checkout flow”) rather than across everything at once. OpenAI’s embedding models were used for generating the vector representations.

Python
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
import openai
import hashlib

client = QdrantClient(host='localhost', port=6333)

# Create separate collections for different knowledge domains
COLLECTIONS = {
    'jira_tickets': 'Knowledge from Jira tickets and comments',
    'wiki_pages': 'Project wiki documentation',
    'db_schema': 'Database table definitions and relationships',
    'cron_jobs': 'Cron job definitions, schedules, and dependencies',
    'code_modules': 'Code structure and module documentation',
}

for name in COLLECTIONS:
    client.recreate_collection(
        collection_name=name,
        vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
    )


def embed_and_store(text: str, metadata: dict, collection: str):
    """Generate embedding and store in the appropriate Qdrant collection."""
    response = openai.embeddings.create(
        model='text-embedding-3-small',
        input=text
    )
    vector = response.data[0].embedding
    
    point_id = hashlib.md5(text.encode()).hexdigest()[:16]
    
    client.upsert(
        collection_name=collection,
        points=[PointStruct(
            id=point_id,
            vector=vector,
            payload={
                'text': text,
                'source': metadata.get('source', ''),
                'module': metadata.get('module', ''),
                'date': metadata.get('date', ''),
                'relevance_tags': metadata.get('tags', []),
            }
        )]
    )

Stage 2: The Context Window Problem (and Our SQLite Solution)

This is where things got interesting — and where most teams building AI coding assistants hit a wall.

We were using Claude Code with Anthropic’s Sonnet 4.5 and Opus 4.6 models. The AI-assisted coding capabilities were genuinely impressive. Claude could read a 15,000-line PHP file, understand the business logic, and write the equivalent Node.js implementation with correct business rules preserved. The problem was that even with Claude’s large context windows, we kept running into limits.

A single migration task might require understanding:

  • The source PHP file (potentially 10,000+ lines)
  • 3-4 related PHP files it depends on
  • The database schema for 15-20 relevant tables
  • The target Node.js architecture and existing migrated modules
  • Relevant Jira context explaining business rules
  • Cron job dependencies
  • Integration specifications for affected third-party systems

Loading all of this into a single prompt would blow past any context window. And even when the context technically fit, the quality of responses degraded as the prompt got longer. The model would “forget” details from the beginning of the context when answering questions about code near the end.

RAG (retrieval-augmented generation) from our Qdrant embeddings helped, but it introduced a different problem: hallucination at the boundaries. The AI would retrieve a Jira ticket about the checkout flow, retrieve a database schema for the orders table, and then confabulate a relationship between them that did not actually exist in the codebase. With 600+ tables and thousands of implicit relationships, getting the connections right was critical.

Our solution was to build a SQLite-based business knowledge graph that sat between the raw embeddings and the AI. Instead of asking the AI to infer relationships from retrieved chunks, we pre-computed and stored the relationships explicitly.

Python
import sqlite3

def init_knowledge_db(db_path: str):
    """Initialize the business knowledge relationship database."""
    conn = sqlite3.connect(db_path)
    c = conn.cursor()
    
    # Core entities
    c.execute('''CREATE TABLE IF NOT EXISTS modules (
        id TEXT PRIMARY KEY,
        name TEXT,
        type TEXT,  -- 'php_legacy', 'nodejs_target', 'cron', 'integration'
        description TEXT,
        file_paths TEXT,  -- JSON array of file paths
        status TEXT  -- 'not_started', 'in_progress', 'migrated', 'verified'
    )''')
    
    # Explicit relationships between modules
    c.execute('''CREATE TABLE IF NOT EXISTS dependencies (
        source_module TEXT,
        target_module TEXT,
        dependency_type TEXT,  -- 'calls', 'reads_from', 'writes_to', 
                               -- 'triggered_by', 'shares_table', 'feeds_data_to'
        description TEXT,
        confidence REAL,  -- 0.0 to 1.0, how certain we are about this
        evidence TEXT,  -- where we found this: 'code_analysis', 'jira', 'wiki', 'manual'
        UNIQUE(source_module, target_module, dependency_type)
    )''')
    
    # Database table ownership and access patterns
    c.execute('''CREATE TABLE IF NOT EXISTS table_access (
        module_id TEXT,
        table_name TEXT,
        access_type TEXT,  -- 'read', 'write', 'read_write', 'create', 'drop'
        columns_used TEXT,  -- JSON array of column names
        UNIQUE(module_id, table_name, access_type)
    )''')
    
    # Business rules extracted from code, Jira, and wikis
    c.execute('''CREATE TABLE IF NOT EXISTS business_rules (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        module_id TEXT,
        rule_description TEXT,
        source TEXT,  -- 'code_comment', 'jira_ticket', 'wiki', 'inferred'
        source_ref TEXT,  -- e.g., 'PROJ-1234' or 'wiki/checkout-flow'
        criticality TEXT  -- 'high', 'medium', 'low'
    )''')
    
    # Migration mapping: legacy module -> new module
    c.execute('''CREATE TABLE IF NOT EXISTS migration_map (
        legacy_module TEXT,
        target_module TEXT,
        migration_notes TEXT,
        complexity TEXT,  -- 'simple', 'moderate', 'complex', 'rewrite'
        UNIQUE(legacy_module, target_module)
    )''')
    
    conn.commit()
    return conn

The key insight was this: instead of asking the AI “what does the checkout module depend on?” and hoping it would correctly infer dependencies from code snippets, we pre-analyzed the codebase and stored explicit facts: “Module checkout_processor reads from tables orders, order_items, and payment_transactions. It calls shipping_calculator and tax_engine. Cron job order_cleanup depends on it completing successfully.” When the AI needed context for a migration task, we did not just dump raw code into the prompt. We queried the knowledge graph for the specific module, pulled its dependencies, business rules, and table access patterns, and constructed a focused, fact-checked context.

Python
def build_migration_context(module_id: str, db_path: str, qdrant_client) -> str:
    """Build a focused, fact-checked context for migrating a specific module.
    
    This replaces the naive 'dump everything into the prompt' approach
    with structured knowledge retrieval that prevents hallucination.
    """
    conn = sqlite3.connect(db_path)
    c = conn.cursor()
    
    context_parts = []
    
    # 1. Module overview
    c.execute('SELECT name, type, description, file_paths FROM modules WHERE id = ?', 
              (module_id,))
    module = c.fetchone()
    context_parts.append(f'## Module: {module[0]} ({module[1]})')
    context_parts.append(f'Description: {module[2]}')
    
    # 2. Direct dependencies (verified, not inferred)
    c.execute('''
        SELECT d.target_module, d.dependency_type, d.description, m.name
        FROM dependencies d
        JOIN modules m ON m.id = d.target_module
        WHERE d.source_module = ? AND d.confidence > 0.7
        ORDER BY d.confidence DESC
    ''', (module_id,))
    
    deps = c.fetchall()
    if deps:
        context_parts.append('\n## Dependencies (verified):')
        for target, dep_type, desc, name in deps:
            context_parts.append(f'- {dep_type}: {name} -- {desc}')
    
    # 3. Database tables this module touches
    c.execute('''
        SELECT table_name, access_type, columns_used
        FROM table_access WHERE module_id = ?
    ''', (module_id,))
    
    tables = c.fetchall()
    if tables:
        context_parts.append('\n## Database tables used:')
        for table, access, cols in tables:
            context_parts.append(f'- {table} ({access}): columns {cols}')
    
    # 4. Business rules (the stuff that is easy to miss)
    c.execute('''
        SELECT rule_description, source, source_ref, criticality
        FROM business_rules WHERE module_id = ?
        ORDER BY CASE criticality 
            WHEN 'high' THEN 1 WHEN 'medium' THEN 2 ELSE 3 END
    ''', (module_id,))
    
    rules = c.fetchall()
    if rules:
        context_parts.append('\n## Business rules (DO NOT IGNORE):')
        for desc, source, ref, crit in rules:
            context_parts.append(f'- [{crit.upper()}] {desc} (source: {source} {ref})')
    
    # 5. Migration mapping if exists
    c.execute('''
        SELECT target_module, migration_notes, complexity
        FROM migration_map WHERE legacy_module = ?
    ''', (module_id,))
    
    mapping = c.fetchone()
    if mapping:
        context_parts.append(f'\n## Migration target: {mapping[0]}')
        context_parts.append(f'Complexity: {mapping[1]}')
        context_parts.append(f'Notes: {mapping[2]}')
    
    # 6. Relevant Jira context from Qdrant (semantic search, but scoped)
    module_name = module[0]
    search_results = qdrant_client.search(
        collection_name='jira_tickets',
        query_vector=get_embedding(f'{module_name} business logic requirements'),
        limit=5,
        query_filter={'must': [{'key': 'module', 'match': {'value': module_name}}]}
    )
    
    if search_results:
        context_parts.append('\n## Relevant Jira context:')
        for result in search_results:
            context_parts.append(f'- {result.payload["source"]}: {result.payload["text"][:300]}')
    
    conn.close()
    return '\n'.join(context_parts)

Stage 3: Code Understanding with CodeGraph and Steering Documents

For the actual code analysis, we used a combination of tools:

CodeGraph gave us a semantic knowledge graph of the codebase. Instead of grepping through 400,000 files looking for function calls, we could query the code graph directly: “What calls the calculateShipping function?” or “What modules write to the orders table?” This was orders of magnitude faster than manual code exploration, and it fed directly into our SQLite knowledge graph.

Claude steering documents were essential for maintaining consistency across the migration. We created detailed specifications covering:

  • Product spec (product.md): What the ecommerce platform does, its business rules, the user journey from browsing to order fulfillment. This gave the AI the “why” behind the code.
  • Tech spec (tech.md): The target architecture — Node.js services, MedusaJS storefront on Next.js, database migration strategy, API design patterns. This told the AI what the migrated code should look like.
  • Structure spec (structure.md): Directory layout, naming conventions, how to organize services, where tests go, how to handle shared utilities. This prevented the AI from inventing its own conventions with every file it wrote.

Database documentation as Mermaid diagrams turned out to be surprisingly effective. We generated ER diagrams for each major subsystem (orders, products, customers, inventory, marketplace integrations) and included them in the AI’s context. Mermaid’s text-based format was perfect for this — it fit cleanly into prompts without requiring image processing, and the AI could reference specific table relationships in its responses.

Python
# Example: Order subsystem ER diagram fed to the AI as context
# This was generated from our SQLite knowledge graph

order_er_diagram = '''
```mermaid
erDiagram
    orders ||--o{ order_items : contains
    orders ||--o{ order_status_history : tracks
    orders ||--|| customers : placed_by
    orders ||--o{ payment_transactions : paid_via
    orders ||--o{ shipments : shipped_as
    order_items ||--|| products : references
    order_items ||--o{ item_discounts : has
    shipments ||--o{ tracking_events : tracked_by
    shipments ||--|| shipping_providers : handled_by
    orders ||--o{ marketplace_order_refs : synced_to
    marketplace_order_refs ||--|| marketplace_channels : from
    
    orders {
        int id PK
        int customer_id FK
        string order_number
        decimal total
        decimal tax
        decimal shipping_cost
        string status
        string source_channel
        datetime created_at
        datetime updated_at
    }
    
    order_items {
        int id PK
        int order_id FK
        int product_id FK
        int quantity
        decimal unit_price
        decimal discount_amount
        string sku
    }
```
'''

# This diagram plus the business rules from SQLite gave the AI
# everything it needed to correctly migrate the order module
# without hallucinating table relationships

Stage 4: The AI Engineer in Action

With all the pieces in place, our AI engineer operated in three modes:

1. Analysis mode: Before any migration work on a module, the AI would analyze the legacy code, cross-reference it against the knowledge graph, identify business rules, map database dependencies, and produce a migration plan. Human engineers reviewed and adjusted this plan, but the AI did 80% of the discovery work in minutes rather than days.

2. Coding mode: Using Claude Code with Opus 4.6 for complex modules and Sonnet 4.5 for straightforward ones, the AI wrote the Node.js equivalent of the legacy PHP code. The steering documents ensured it followed our target architecture consistently. When it needed to understand a specific legacy function, it queried the knowledge graph rather than trying to hold the entire codebase in context.

3. Review mode: When our human engineers wrote migration code themselves, the AI reviewed it against the knowledge graph. It would flag things like: “This migration does not account for the discount recalculation that cron job pricing_sync performs nightly on this table. See business rule #47 from Jira ticket PROJ-3891.” These were exactly the kinds of subtle dependencies that humans miss when working on unfamiliar legacy code.

What Worked (and What Did Not)

It would be dishonest to pretend everything went smoothly. Here is a candid assessment.

What worked well

The SQLite knowledge graph was the single most valuable piece. It turned the AI from a smart-but-amnesiac coding assistant into something that actually understood the system. Without it, the AI kept confabulating relationships between modules. With it, the AI’s accuracy on dependency questions went from roughly 60% to over 95%. The key was giving it verified facts instead of asking it to infer from code snippets.

Qdrant performed well at scale. With millions of embedded chunks across Jira tickets, wiki pages, and code documentation, Qdrant’s filtered search let us scope queries to specific domains without sacrificing speed. The ability to filter by module, date range, or source type was essential for building focused contexts.

Mermaid diagrams for database documentation. We did not expect this to be as effective as it was. The AI understood Mermaid ER diagrams natively and could reference them precisely when writing database migration code. Plain text descriptions of table relationships were ambiguous; Mermaid diagrams were not.

Claude steering documents saved enormous time. Instead of explaining the target architecture in every prompt, the steering docs acted as persistent memory. Every interaction started with the AI already understanding what “good” looked like for this project.

What did not work

Naive RAG was not enough. Our initial approach — embed everything, retrieve relevant chunks, stuff them into the prompt — produced too many hallucinations. The AI would “connect the dots” between unrelated code snippets in ways that seemed plausible but were wrong. This is particularly dangerous in a migration project where incorrect assumptions lead to data loss or broken business logic.

Context window limits are real, even with 200K tokens. For the most complex modules, even Claude’s large context window was not sufficient to hold all the source code, target architecture docs, database schemas, and business rules simultaneously. The SQLite knowledge graph was born out of this limitation — we needed a way to give the AI exactly the right context for each task without overwhelming it.

The AI needs human judgment for ambiguous business rules. When the legacy code contained contradictory logic — and 15-year-old systems always do — the AI would flag the contradiction but could not decide which behavior was “correct.” That required human engineers who understood the business context. The AI reduced the discovery time for these ambiguities from weeks to hours, but humans still had to resolve them.

The Architecture at a Glance

Other
                     ┌─────────────────────────────────────┐
                     │         CLAUDE CODE (AI Engineer)    │
                     │    Opus 4.6 / Sonnet 4.5 Models     │
                     │                                     │
                     │  Modes: Analysis | Coding | Review  │
                     └──────────┬──────────────────────────┘
                                │
              ┌─────────────────┼─────────────────┐
              │                 │                  │
              ▼                 ▼                  ▼
   ┌──────────────────┐ ┌─────────────┐ ┌──────────────────┐
   │  SQLite Knowledge │ │   Qdrant    │ │   CodeGraph      │
   │  Graph            │ │  Embeddings │ │  (Code Analysis) │
   │                   │ │             │ │                  │
   │  - Modules        │ │ - Jira      │ │ - Function calls │
   │  - Dependencies   │ │ - Wikis     │ │ - Symbol lookup  │
   │  - Table access   │ │ - DB docs   │ │ - Impact analysis│
   │  - Business rules │ │ - Code docs │ │                  │
   │  - Migration map  │ │             │ │                  │
   └──────────────────┘ └─────────────┘ └──────────────────┘
              │                 │                  │
              └─────────────────┼─────────────────┘
                                │
              ┌─────────────────┼─────────────────┐
              │                 │                  │
              ▼                 ▼                  ▼
   ┌──────────────────┐ ┌─────────────┐ ┌──────────────────┐
   │  Steering Docs    │ │  Mermaid ER │ │  Legacy Codebase │
   │  (product.md,     │ │  Diagrams   │ │  (400K+ PHP      │
   │   tech.md,        │ │  (600+      │ │   files)         │
   │   structure.md)   │ │   tables)   │ │                  │
   └──────────────────┘ └─────────────┘ └──────────────────┘

Results So Far

The migration is ongoing, but the impact of the AI engineer on the team’s velocity has been significant:

  • Module discovery time reduced by ~80%. Understanding a complex legacy module used to take a senior developer 3-5 days. With the AI engineer, the initial analysis and dependency mapping takes hours.
  • Hidden dependencies surfaced early. The knowledge graph has caught dozens of implicit dependencies between modules that would have caused production bugs if missed during migration.
  • Consistent code quality. The steering documents and review mode ensure that all migrated code follows the same patterns, regardless of which developer or AI wrote it.
  • Business rule preservation. By extracting business rules into the SQLite database and flagging them in every migration context, we have significantly reduced the risk of silently dropping business logic during the rewrite.

The AI engineer is not replacing our human developers. What it does is eliminate the tedious, error-prone discovery work that consumes most of the time in legacy migration projects, and catch the subtle dependencies and business rules that humans inevitably miss when working on systems this large.

Frequently Asked Questions

Can this approach work for systems smaller than 400,000 files?

Yes, and it is actually simpler. The SQLite knowledge graph and context-building approach work for any system too large to fit in an AI’s context window — which in practice means anything over a few thousand files. For smaller systems, you might skip the knowledge graph and use direct RAG from embeddings. The approach scales down gracefully.

Why SQLite instead of a graph database like Neo4j?

Pragmatism. SQLite is zero-configuration, runs anywhere, and the relationship queries we needed (module dependencies, table access patterns) are straightforward joins, not complex graph traversals. We considered Neo4j but decided the operational overhead was not justified for this use case. If we needed to traverse chains of dependencies more than 3-4 levels deep, we would reconsider.

How do you keep the knowledge graph up to date as migration progresses?

Every time a module is migrated and verified, its status is updated in the knowledge graph. New dependencies discovered during migration are added. The graph is a living document — it gets more accurate over time as we learn more about the system. This also means the AI’s context improves as the project progresses, because it has access to migration outcomes and lessons from earlier modules.

What LLM models do you use and why?

We use Claude Opus 4.6 for complex analysis and migration tasks — modules with intricate business logic, heavy database dependencies, or significant cron job interactions. Claude Sonnet 4.5 handles more straightforward migrations and code reviews where speed matters more than depth. The cost difference is significant, so routing tasks to the appropriate model keeps the project economically viable.

Build Your AI Engineer

If you are facing a similar challenge — a legacy system that needs migration but is too complex for any single person to fully understand — we can help you build an AI engineer tailored to your codebase. At Velsof, we have done this for enterprise-scale ecommerce systems and can apply the same approach to ERP migrations, platform modernization, and large-scale refactoring projects.

The approach is not magic. It is careful engineering: structured knowledge extraction, verified relationship mapping, focused context building, and human-AI collaboration where each does what they are best at. The AI handles the breadth (reading 400,000 files). The humans handle the judgment (deciding what the business actually needs).

Talk to our team about your legacy migration project, or learn more about our AI automation and custom AI agent development services.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *