What is on-device vs cloud AI for CRE? On-device vs cloud AI for CRE is the architectural choice between running AI models directly on a CRE firm's local hardware (laptops, workstations, or self hosted servers) and running models in the cloud through APIs to providers like Anthropic, OpenAI, and Google. The choice trades off privacy and control against raw capability and cost predictability for commercial real estate workflows. In May 2026, on-device AI is no longer a theoretical option for CRE; smaller models like Apple Intelligence, gpt-oss-20b, and Llama 4 Scout run usefully on a single workstation. For a complete model overview, see our pillar guide on AI model comparison for CRE investors.
Key Takeaways
- Cloud AI models (Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro) currently lead on-device models by 25 to 40 percent on complex CRE reasoning tasks like full deal underwriting.
- On-device models are now genuinely useful for moderate complexity workflows: lease abstraction, OM summarization, basic comparable lookups, and IC memo drafts.
- The latency profile is reversed from intuition: on-device is faster for short prompts because it skips the network round trip; cloud is faster for very long context tasks because of higher compute capacity.
- Total cost of ownership crosses over in favor of on-device for firms running over 5 million tokens per day, similar to the open source crossover point.
- Genuine air gapped on-device AI is the only deployment option for CRE firms with sovereign or government adjacent clients who require zero data leaving the network.
What "On-Device AI" Actually Means in 2026
On-device AI ran the spectrum from impossible to laughably bad just two years ago. In May 2026 it is a credible option for many CRE workflows. The current landscape:
- Phone class on-device: Apple Intelligence (a 3 billion parameter local model on iPhone 17), Google Gemini Nano on Pixel 10. Useful for quick text rewriting, simple summarization, basic Q&A on documents.
- Laptop class on-device: Mistral 7B, Llama 3.1 8B, gpt-oss-20b (21 billion parameters, 16 GB memory footprint). Run on M3 or M4 MacBook Pro or higher with 32 GB or more unified memory. Useful for moderate complexity tasks like lease abstraction, market research summarization, and basic underwriting analysis.
- Workstation class on-device: Llama 4 Scout (109 billion total parameters, 17 billion active via mixture of experts), Mistral Small 4, gpt-oss-120b. Run on workstations with 1 or 2 RTX 6000 Ada or H100 GPUs. Useful for production CRE workflows including IC memo drafting and longer reasoning tasks.
- Server class on-device (self hosted): Llama 4 Maverick (400 billion total parameters, mixture of experts with 128 expert networks), DeepSeek V4 Pro (1.6 trillion total parameters, 49 billion active). Requires multi GPU H100 or H200 servers. Approaches but does not match flagship cloud models.
The Performance Gap: Where Cloud Still Wins
Cloud flagship models retain a meaningful capability lead on the hardest CRE reasoning tasks. Specifically:
- Long context analysis: Claude Opus 4.7 supports 1 million token context windows. The largest on-device models top out around 128,000 tokens with degraded performance over 32,000 tokens.
- Complex multi step reasoning: Sensitivity analysis, deal risk assessment, and contract interpretation still favor flagship cloud models by 25 to 40 percent accuracy.
- Live data and search: Cloud models with built in search (GPT-5.5, Perplexity, Gemini) handle market research natively. On-device models need to integrate retrieval, web access, and data lookup separately.
- Specialized capabilities: High resolution vision tasks (rent roll image OCR, site photo analysis at 2576px supported by Claude Opus 4.7), voice intelligence, and image generation are all stronger in cloud flagship models.
For empirical evidence of the gap on a specific workflow, see our analysis of Claude vs ChatGPT property valuation accuracy, and our AI property valuation accuracy verification guide for testing methodology.
Where On-Device Now Wins
For a meaningful slice of CRE workflows, on-device is now genuinely the better option:
- Routine document classification: Sorting leases, rent rolls, financials, environmental reports by type. On-device handles 90 percent accuracy at near zero marginal cost.
- Lease abstraction: Extracting key terms from commercial leases. On-device Llama 4 Scout with fine tuning matches GPT-5.5 quality on commonly seen lease structures.
- Call and meeting summarization: Local Whisper for transcription plus on-device gpt-oss-20b for summary. Zero cost, zero latency, full privacy.
- Draft IC memos: First draft generation. Editor reviews and revises with a flagship cloud model.
- Internal Q&A on firm proprietary data: Retrieval augmented generation over Yardi, SharePoint, or proprietary databases. On-device avoids leaking deal data into cloud vendors.
Latency: An Unexpected Reversal
The intuition that cloud is faster than on-device is backwards for many CRE tasks. For short interactive prompts (under 500 tokens), on-device models running on a recent MacBook Pro respond in 200 to 400 milliseconds because they skip the network round trip. Cloud flagship models have 600 to 1,400 ms time to first token from the same network conditions.
The intuition flips back for long context tasks. A 100,000 token deal analysis runs faster on Claude Opus 4.7 in the cloud than on Llama 4 Scout locally because cloud providers have far more compute capacity per inference job.
The practical takeaway is that on-device is often faster for interactive use and slower for batch use. The right architecture pairs both.
Privacy and Compliance
For CRE firms with strict fiduciary obligations or government adjacent clients, on-device AI provides a level of privacy that no cloud enterprise tier can match. Specifically:
- No data leaves the network: Rent rolls, T12s, LP communications, and purchase agreements never touch a third party server.
- No vendor risk: No reliance on a cloud vendor's data retention, security, or business continuity practices.
- Regulatory compliance: Easier to satisfy GDPR data residency, ITAR for defense related CRE, or SEC fiduciary obligations.
- Audit simplicity: All prompts and outputs stay in the firm's logging infrastructure.
Cloud enterprise tiers have closed much of this gap. ChatGPT Enterprise, Claude Enterprise, and Google Workspace Enterprise with Gemini all offer zero data retention, audit logging, and dedicated tenants. For most CRE firms, the enterprise tier is functionally equivalent on privacy. The remaining 5 to 10 percent of firms (government adjacent, sovereign wealth, defense related) genuinely need on-device.
Total Cost of Ownership
The cost calculation looks similar to the open source vs closed comparison:
- Cloud API consumption: Claude Opus 4.7 at $5 per million input tokens and $25 per million output tokens; similar pricing for GPT-5.5 and Gemini 3.1 Pro.
- On-device per user: $4,000 to $8,000 one time for a workstation capable of running Llama 4 Scout, amortized over 3 to 4 years. Marginal cost per prompt is essentially the cost of electricity.
- On-device per firm (self hosted server): $50,000 to $250,000 capital expense, plus 0.25 to 0.5 FTE in MLOps for operations.
For a 5 person CRE firm running moderate volume, cloud APIs are cheaper. For a 50 person firm running high volume pipeline screening, on-device pays back within 6 to 12 months.
According to JLL research, 92 percent of CRE firms have piloted AI but only 5 percent report achieving most of their AI program goals. Deployment model choice is one of several reasons; the other is governance and workflow design.
Hybrid Architectures in Practice
Most institutional CRE firms in May 2026 run a hybrid deployment. The pattern looks roughly like:
- On-device for sensitive routine work: Internal retrieval augmented generation, deal data analysis, lease abstraction, call summarization.
- Cloud flagship for high stakes complex work: Full deal underwriting, IC memos for committee review, LP communications.
- Cloud lightweight for general productivity: Email drafting, calendar management, general research.
If you are evaluating whether on-device, cloud, or a hybrid architecture fits your firm's mix of deals and privacy posture, The AI Consulting Network specializes in this exact kind of deployment design.
Frequently Asked Questions
Q: Is on-device AI accurate enough for CRE underwriting?
A: For preliminary underwriting and OM screening, yes. Llama 4 Scout running locally matches roughly 75 to 90 percent of cloud flagship accuracy on routine financial analysis. For committee level final underwriting, cloud flagship models are still preferred.
Q: What hardware do I need to run a useful on-device AI for CRE?
A: A MacBook Pro M3 or M4 with 32 to 128 GB unified memory handles gpt-oss-20b and Llama 3.1 8B comfortably. For Llama 4 Scout, you need a workstation with one or two RTX 6000 Ada GPUs or an Apple Mac Studio with 128 GB unified memory.
Q: Is on-device AI more private than cloud enterprise tiers?
A: Materially, yes. Cloud enterprise tiers with zero data retention are very close, but only on-device guarantees that prompts and outputs never touch a third party server. For most CRE firms the cloud enterprise tier is sufficient; for sovereign or government adjacent deals, on-device is required.
Q: Can I run on-device AI with retrieval over my firm's Yardi or SharePoint data?
A: Yes. Several open source retrieval augmented generation stacks (LlamaIndex, LangChain, Haystack) integrate cleanly with on-device models and connect to Yardi, SharePoint, and proprietary databases. Setup typically takes 1 to 3 weeks of engineering work.
Q: What is the simplest way to test whether on-device works for my CRE workflow?
A: Install LM Studio or Ollama on a MacBook Pro, download Llama 3.1 8B or gpt-oss-20b, and run 20 typical workflow prompts. Compare to your existing cloud AI subscription. Most firms find on-device handles 60 to 80 percent of routine tasks well.