Real Estate / PropTech

20 to 40 Hours of Manual Research Per County, Replaced by a Pipeline That Runs Itself

A Florida real estate investor needed deed restriction data across US counties before making acquisition decisions. Manual research took 20 to 40 hours per county. At state scale with 67 counties, that is 1,340 to 2,680 hours of research before a single decision gets made. The pipeline now runs itself.

n8n Automation · Custom MCP Servers

20 n8n workflows 200+ nodes total

50K+ Records classified by AI

74min Harris TX processed 4,510 pages, zero timeouts

12 Bugs found and fixed in production

Minutes research that took 20 to 40 hours per county now returns results in seconds

50,000+ deed records classified using AI, without reading raw legal text

3,200 US counties in the registry the pipeline can reach

Zero timeouts Harris County, Texas: 4,510 pages processed in 74 minutes after 4 task runner refactors

Challenge

The Problem: 20 to 40 Hours Per County, and Every County Is Different

Deed restriction research does not scale. That is the core problem.

A Florida real estate investor needed to know, before making acquisition decisions, what restrictions applied to properties across US counties. Lot size minimums. Building requirements. Land use limits. Anything in the deed record that affects what a buyer can and cannot do with a property. That information exists in county deed records, behind a commercial deed data API, paginated and behind OAuth authentication.

But raw deed records are not usable. They contain two distinct categories of language. The first is actual restrictions: recorded limits that affect property value and legal use. The second is boilerplate: standard legal phrasing that appears in every deed and means nothing operationally. A human can tell the difference. That distinction requires reading each record. At county scale, with thousands of pages per county, reading each record is not a research strategy. It is a full-time job.

Twenty to 40 hours per county was the manual baseline. That is not a rough estimate. That is the actual time cost of retrieving paginated deed records, reading through the legal language, separating real restrictions from boilerplate, and organizing the results into something an investor could use for acquisition decisions. A state with 67 counties is 1,340 to 2,680 hours of research before a single decision gets made.

The competitive dimension made this more urgent. Deed restriction data determines which properties are worth bidding on. The investor who identifies undervalued properties with clean restriction profiles before competitors do is operating with a real advantage. Speed of research directly controlled deal flow. Manual research at 20 to 40 hours per county was not a bottleneck. It was a ceiling.

Solution

How the Pipeline Handles Any County Without Timing Out

The answer was not a single workflow that runs start to finish. It is 20 workflows across a dedicated VPS, structured as 12 independent webhook HTTP endpoints callable by the frontend or by other workflows. Each stage of the pipeline is its own API surface.

Core Pipeline (6 workflows)

Auth Token manages the OAuth token lifecycle so the investor never sees an authentication failure. Preview returns a page count and time estimate before a job starts. Fetch Page retrieves records one page at a time. Upsert Records handles batch writes with idempotent logic, so running the same job twice never creates duplicate records. Job Run is the main orchestrator: it creates a job, loops through retrieval in batches, then calls itself for the next batch. Counties handles the registry of approximately 3,200 US counties the pipeline can reach.

Enrichment Pipeline (5 workflows)

Job Enrich chains four downstream workflows in sequence. Classify Matches sends batches of 100 deed records to Claude Sonnet AI for classification, separating real restrictions from boilerplate at scale. Geo Enrich pulls school district and zoning data with per-county caching. AI Analyze runs per-record deep analysis on records that cleared classification: zoning type, lot sizes, restrictions, allowed uses, risk factors. The two-stage approach keeps cost proportional to the volume of meaningful data: cheap batch classification first, then expensive targeted analysis only on records that matter.

Publishing and Research (3 workflows)

Publish to App transforms pipeline data into the searchable frontend tables. County Research generates AI-written county reports with structured infrastructure analysis. County Backfill runs on a cron every 30 minutes to fill missing research data for counties that entered the system outside the main path.

Self-Batching: How the Pipeline Handles 4,510 Pages Without Failing

n8n's task runner has a 60-second execution timeout per workflow run. Harris County, Texas has 4,510 pages of deed records. Processing them synchronously would timeout at the 60-second mark. The solution: each execution processes exactly 10 pages (approximately 30 seconds), writes the current position to the database, then calls its own webhook endpoint for the next batch. No external job queue. No Redis. The database is the state store. The webhook is the continuation mechanism. Harris County: 4,510 pages, 74 minutes, zero timeouts.

AI Classification

Real Restrictions vs. Boilerplate at Scale

The enrichment pipeline's core value is the classification stage. Without it, 50,000 deed records are a wall of legal text.

Classify Matches sends records in batches of 100 to Claude Sonnet AI. Each batch returns a classification decision for every record: real restriction or boilerplate. The model reads the actual deed language and distinguishes between language that constrains property use and language that is standard legal phrasing with no operational significance.

Only records classified as real restrictions proceed to the expensive per-record deep analysis. The AI Analyze stage produces structured output per record: zoning type, lot size parameters, what uses are permitted, what restrictions apply, and identified risk factors. County Research adds a third AI layer: per-county reports generated from the processed deed data.

Approximately 50,000 AI classifications across the current dataset. All processed without the investor reading a single page of raw legal text.

Results

The Impact

Search results that would have taken 20 to 40 hours of manual research per county appear in seconds. The investor does not navigate a county portal. The investor does not read raw legal text. The investor sees classified, analyzed, geolocated property records in a structured interface.

Four task runner refactors eliminated approximately 100 timeouts per 3-minute window during heavy processing and reduced them to zero. The fix in each case: moving expensive multi-call operations out of Code nodes (which hold runner slots during I/O) into HTTP nodes (which release slots during I/O wait). Twelve bugs found and fixed in production across 10-plus sessions from March to May 2026, covering 30-plus git commits.

20 to 40 hrs per county reduced to minutes (74 min for 4,510 pages)

50,000+ records classified vs. hundreds possible manually

Zero timeouts after 4 refactors, down from ~100 per 3-minute window

3,200 US counties in registry vs. 1 at a time manually

None of the production reliability work is visible to the investor using the platform. The 32-hour silent hang on the Harris County run, the 11,978 duplicate rows, the markdown-wrapped AI responses, the missing backfill cron: all of it was found, diagnosed, and fixed. The investor sees clean search results.

If the research bottleneck in your operation looks like this one

The diagnostic call is 30 minutes. No sales pitch, just an honest assessment.

Book a Diagnostic Call