Achieving 50% Reduction in AI False Positives for a Leading Real Estate Investment Platform

Scott Proost
Jan 7
5 min read

When your AI system tags thousands of investment properties every day, false positives aren't just annoying—they're expensive. Every incorrect tag means manual review. Every manual review erodes trust in the system. And when your clients are managing portfolios worth millions, that trust is everything. That was the situation facing one of the nation's leading real estate investment platforms. Commerce Architects was brought in to figure out why and fix it.

About the Client

Our client operates one of the nation's leading technology platforms for single-family residential real estate investment. Backed by leading venture capital and institutional investors, the company has built an end-to-end SaaS solution that enables institutional and mid-market investors to discover, evaluate, and acquire investment properties with unprecedented efficiency.

Since launching in 2017, the platform has facilitated more than $5 billion in real estate transactions across 29 U.S. markets. Their competitive advantage lies in combining powerful data analytics with AI-driven automation—allowing clients to analyze thousands of properties and move quickly on the best opportunities.

The Challenge

A core feature of the platform is automated property tagging. Using AI, the system analyzes listing descriptions and extracts key attributes—things like "new roof," "updated kitchen," "motivated seller," or "low HOA fees"—that help investors filter and prioritize properties at scale.

However, as the platform grew, the tagging system struggled to keep pace. The client engaged Commerce Architects to assess the situation and recommend a path forward. Our team's analysis identified three interconnected problems undermining performance:

Fragmented Technology Stack. The tagging workflow relied on three separate tools: one for prompt management, another for experiment tracking, and a third for structured data outputs. We recognized immediately that this fragmentation was creating inefficiencies at every turn—engineers were spending valuable time switching contexts between systems, debugging was cumbersome, and improvements to one component often created unexpected issues in another.

Lack of Workflow Engineering. Our review revealed a deeper architectural issue: the system wasn't designed as a workflow at all. The approach was essentially "send everything to the LLM and hope for the best"—with prompts asking the AI to perform six or more classification tasks in a single call. This created two problems. First, the model struggled to maintain accuracy across so many simultaneous objectives. Second, and just as importantly, many of these tasks didn't require an LLM in the first place. Simple operations like filtered keyword searches were being routed through expensive AI inference calls when straightforward programmatic logic would have been faster, cheaper, and more reliable.

Excessive False Positives. The combination of fragmented tooling and undisciplined AI usage resulted in an unacceptably high rate of incorrect tags. Each false positive required manual review, consuming staff time and—more importantly—eroding user confidence in the platform's recommendations.

Based on our assessment, we recommended a comprehensive redesign built on proper workflow engineering principles—not just better prompts, but a fundamentally smarter architecture that would use AI where it adds value and conventional programming where it doesn't.

Our Solution

Commerce Architects worked closely with the client to redesign the property tagging workflow using modern AI orchestration principles. We led the engagement across three strategic priorities:

Consolidating to a Unified Framework

We replaced the fragmented three-tool stack with a single, integrated framework built on LangChain—the industry's most widely adopted platform for AI application development. This consolidation delivered immediate benefits: a cleaner codebase, faster development cycles, and significantly easier debugging. Our approach enabled the client to manage prompts, track experiments, and generate structured outputs within one cohesive system rather than juggling multiple disconnected tools.

Intelligent Workflow Engineering

This is where our approach diverged from typical AI implementations. Rather than simply optimizing prompts, we restructured the entire process into a true workflow using LangGraph—one that intelligently routes tasks to the right tool for the job.

The workflow we designed includes both LLM-powered nodes and conventional programmatic nodes working together:

Tag Identification — An LLM-powered analysis pass identifies candidate tags present in the listing description, leveraging the model's strength in understanding natural language context
Domain-Specific Evaluation — A hybrid approach where some tag categories route to LLM nodes for nuanced evaluation, while others—like ultra-strict tags requiring exact text matches—route to efficient keyword search logic that executes in milliseconds without an inference call
Quality Filtering — A programmatic aggregation step applies confidence thresholds to ensure only high-quality tags reach the end user

This hybrid architecture delivers the best of both worlds: AI handles the tasks that genuinely require language understanding, while proven programmatic techniques handle everything else. The result is a system that's not only more accurate, but also faster and more cost-effective to operate.

Implementing End-to-End Observability

To support ongoing optimization, we integrated LangSmith tracing throughout the workflow. This provides complete visibility into how each request moves through the system—from initial input to final output. When issues arise, engineers can now pinpoint exactly where and why, rather than troubleshooting blind. We also configured prompt versioning and automated evaluation capabilities, enabling the team to measure the impact of changes before deploying them to production.

Results

The system we delivered produced substantial improvements across the metrics that matter most:

Dramatic Accuracy Gains False positives dropped by 50%, significantly reducing the need for manual review and restoring user confidence in the platform's AI-driven recommendations.

Streamlined Operations Our consolidation from three tools to one unified framework simplified the codebase and accelerated the team's ability to experiment, debug, and deploy improvements.

Optimized Resource Usage By routing appropriate tasks to programmatic nodes instead of defaulting to LLM inference, we reduced unnecessary AI compute costs while actually improving accuracy on tasks better suited to deterministic logic.

Complete Visibility The end-to-end tracing we implemented transformed troubleshooting from guesswork into a systematic process, reducing the time required to identify and resolve issues.

Reusable Architecture The patterns we established—intelligent task routing, hybrid LLM/programmatic workflows, confidence-based filtering—now serve as a template the client is applying to other AI initiatives across their platform.

The Takeaway

There's a temptation in AI projects to route everything through a language model. It feels cutting-edge, and modern LLMs are remarkably capable. But capability isn't the same as suitability. When you ask an LLM to perform tasks that don't require language understanding—exact string matching, mathematical comparisons, structured lookups—you're paying for power you don't need while often getting worse results than simpler approaches would deliver.

The discipline of workflow engineering asks a different question: What's the right tool for each task? Sometimes that's an LLM. Sometimes it's a keyword filter. Often it's both, working together in a thoughtfully designed pipeline.

Commerce Architects brought this engineering mindset to our client, transforming a struggling tagging system into a strategic asset. The 50% reduction in false positives was the immediate win. But the greater value lies in the architecture now in place—one that's faster, more accurate, more cost-effective, and ready to scale with their business.

Commerce Architects helps organizations design and build AI systems that perform reliably in production. If your team is navigating similar challenges, we'd welcome the opportunity to discuss how we can help.