Robert Craig

Tech Stack

Python Qwen 2.5 (7B) Ollama Pydantic Playwright rapidfuzz Flask HTMX SQLite Azure SQL SharePoint

About this project

The challenge

Marshall Fire needed roughly 3,000 to 5,000 project records moved into a new system, and no single source held the truth. The data was spread across a project-management Excel spreadsheet, an Azure SQL database, and a pile of SharePoint documents (emails, Word files, PDFs), and the destination system gave no direct database access, so every record had to go in through its web forms. Done by hand, each record meant cross-checking several sources, resolving where they disagreed, and re-keying the result into a multi-step form. At that volume it is hundreds of hours of careful, error-prone work.

The approach

I built a pipeline that does the corroboration and submission automatically but keeps a person as the final gate on every record. The parts that made it non-trivial are the cross-source matching, and the fact that the AI and the human are two separate checkpoints rather than one.

Marshall Fire migration pipeline: five sources reconciled into one record, gated by a local LLM and a human reviewer, then submitted to the destination CRM by Playwright — Five sources reconciled into one candidate record, gated by an on-prem LLM and a human reviewer, then submitted to the destination CRM by Playwright.

1. Ingest: five sources into one

Project-management Excel spreadsheet
Azure SQL backup
Azure SQL live scrape for records added after the backup cut-off
SharePoint files: emails, Word documents, PDFs
A scrape of the new system to detect what has already been migrated

All five land in a single local SQLite database, the working store for everything downstream.

2. Match and reconcile

rapidfuzz token-scoring matches the same project across Excel, Azure SQL and SharePoint, and a three-way pass flags where the sources disagree. This is the step that turns five noisy inputs into one candidate record per project.

3. Local LLM (on-prem)

Qwen 2.5 7B runs locally through Ollama, on-premises, so none of the client's data leaves the machine. It runs two advisory passes: a judge and veto on the Azure SQL matches, and a verdict on fee lines before submission.

4. Validation and normalisation

Pydantic schemas validate every record, and canonical-value maps fold messy inputs (status, sector, project type) into the exact values the destination system will accept.

5. Human review

A Flask and HTMX review queue puts a person in control: a SharePoint evidence panel alongside each record, per-field overrides, and explicit approval before anything is submitted. The LLM is advisory; the human is the gate.

6. Submission

Playwright drives the destination's web forms in order, Contact then Company then Project, because direct database access was not available. It runs a list-page duplicate probe before each insert and re-queries after Save and Exit to catch silent failures, so the automation does not quietly skip or double-enter records. A post-submission audit and live-patch CLI verifies what landed and fixes records after the fact, so it is not fire-and-forget.

Outcome

The pipeline covers roughly 3,000 to 5,000 records that would otherwise have to be corroborated and submitted entirely by hand. Running the model locally kept a confidential migration confidential, and the two-gate design, an AI veto plus human approval, gave the speed of automation without giving up control over what actually got written into the destination system.

Gallery

Marshall Fire – Fire & Security Services