Skip to main content

AI Model Multi-Modal Capabilities for CRE: Who Reads Site Plans Best

By Avi Hacker, J.D. · 2026-05-13

What are AI multi-modal capabilities for CRE site plans? AI multi-modal capabilities for CRE site plans is the ability of a large language model to interpret images such as architectural drawings, site plans, condition photos, and marketing renderings in addition to processing text. In 2026, this matters more than it did 12 months ago: Claude Opus 4.7 (released April 16, 2026) shipped a 3x image resolution upgrade to 3.75 megapixels, GPT-5.5 brought native image input with the same 1 million token context window, and Gemini 3.1 Pro added improved spatial reasoning for floor plans and elevations. The question of which model reads site plans best now has a real answer, and it matters for due diligence, condition assessment, and entitlement review. For a comprehensive comparison framework, see our pillar guide on AI model comparison for CRE investors.

Key Takeaways

  • Claude Opus 4.7's 3x image resolution upgrade (3.75 megapixels per image as of April 16, 2026) makes it the strongest current model for reading detailed site plans, lease floor plans, and engineering drawings.
  • GPT-5.5 performs strongly on photo-based condition assessments (HVAC, roofing, parking lot conditions) and on extracting numbers from imaged spreadsheets.
  • Gemini 3.1 Pro leads on multi-image comparison tasks (before and after renovation, comp property comparisons) and integrates tightly with Google Workspace.
  • No frontier model reliably reads engineering-precision dimensions off a scanned blueprint; vision is good for understanding, not for measurement.
  • For high-resolution drawings, OCR pre-processing of any text annotations on the drawing increases accuracy by 30 to 50 percent regardless of model.

Why Multi-Modal Matters for CRE

CRE is image-heavy. A typical due diligence package includes the property condition report (50 to 200 photos), the site plan (architectural drawing in PDF), the floor plan for each suite, marketing brochures with renderings, photos in the appraisal, and aerial imagery. A model that can only read text leaves all that visual content untouched. A model that can read images can answer questions like, "Which of these 47 unit photos shows a recently renovated kitchen?" or "Does the site plan show ingress and egress consistent with the lease's parking ratio?"

For commercial property due diligence, our AI property condition assessment guide covers the photo-based workflows in depth. This article focuses on the comparison across models.

The Three Model Vision Profiles in May 2026

Claude Opus 4.7

Anthropic released Opus 4.7 on April 16, 2026 with a 3x image resolution upgrade. The model now supports input images up to 3.75 megapixels (2,576 pixels on the longer edge) compared to the prior 1.15 megapixels. For CRE, this means architectural site plans and detailed unit layouts can be processed without aggressive downsampling. The model is also notably better at reading text inside images (room labels on a floor plan, callout numbers on a survey, dimensions on a building elevation).

Practical strengths:

  • Reading and explaining engineering drawings, site plans, and floor plans.
  • Pulling text annotations off architectural drawings.
  • Understanding spatial relationships such as "the loading dock is on the south side of the building behind the utility room."

Practical limits:

  • Cannot measure dimensions to engineering precision. The model can identify a roughly 30 by 50 foot suite but cannot calculate exact square footage from a drawing alone.
  • Struggles with low-resolution scanned blueprints under 200 DPI.

GPT-5.5

GPT-5.5 brought a 1 million token context window and native image input. Its strengths are in photo-based work rather than engineering drawings.

Practical strengths:

  • Reading numeric data from imaged spreadsheets, scanned rent rolls, and photographed offering memorandums.
  • Photo-based condition assessment: parking lot cracks, roof seam conditions, HVAC equipment, signage condition.
  • OCR of typed text on photographed documents.

Practical limits:

  • Lower image resolution support than Opus 4.7, which can cause loss of detail on complex architectural drawings.
  • Less reliable on spatial reasoning, for example "which side of the building is the elevator on?"

Gemini 3.1 Pro

Gemini 3.1 Pro added adaptive thinking and the 1 million token context window. It is the strongest current model on multi-image comparison tasks because Google's vision stack was trained heavily on side-by-side image data.

Practical strengths:

  • Comparing two images of the same scene (before and after renovation, two comparable properties).
  • Identifying differences across a series of progress photos.
  • Integration with Google Workspace and Google Cloud Vision for chained workflows.

Practical limits:

  • Less reliable on dense single-image analysis (a 50-room hotel floor plan with 200 labels).
  • Spatial reasoning is improved but still behind Claude Opus 4.7 for complex drawings.

Workflow-by-Workflow Recommendations

Site plan review

Best model: Claude Opus 4.7. Provide the site plan at the highest available resolution. Ask specific questions: "Are there 200 parking spaces shown?" "Is the egress on the east side?" "Does the building footprint match the legal description?" The model will not measure to precision but will answer factual questions reliably.

Floor plan and lease drawing review

Best model: Claude Opus 4.7 for single-suite drawings; Gemini 3.1 Pro if you are comparing the as-built drawing to a marketing floor plan. Ask: "Are there four offices and two conference rooms shown?" "Where is the IT closet relative to the main entrance?"

Property condition photos

Best model: GPT-5.5 for individual photo analysis; Gemini 3.1 Pro for before-and-after comparisons. Upload a folder of 20 to 50 photos and ask: "Identify any photos showing visible roof damage." "Rank the unit photos from most to least renovated."

Imaged spreadsheets and scanned offering memorandums

Best model: GPT-5.5 for spreadsheet extraction; Claude Opus 4.7 for memorandums with embedded drawings. Always OCR the document first if possible; vision should be the fallback, not the first pass.

Survey and ALTA review

Best model: Claude Opus 4.7 because the 3x resolution upgrade lets it read smaller text and finer line work. Use vision to extract callouts and notes, then have a human verify dimensions against the title commitment.

Critical Limits That Apply to All Models

  1. No engineering-precision measurement. Frontier models can describe what they see but cannot reliably measure dimensions off a drawing. For any number that goes into an underwriting model, have a human read the scale and verify.
  2. Resolution matters more than model choice. A 4K resolution scan in Claude Opus 4.7 produces better results than the same drawing at 600 DPI in any model. Scan settings are the largest variable.
  3. Text in images needs OCR. If a drawing has typed annotations (zoning callouts, surveyor notes, easement descriptions), OCR the document first. The model will then have both the OCR text and the visual context.
  4. Vision does not replace specialized tools. For roof leak detection, dedicated CV models still outperform general LLMs. For multi-frame photogrammetry, dedicated tools are required. LLMs are good for question-answering, not for forensic analysis.

Implementation Workflow

If you are ready to transform your due diligence process with vision-enabled AI, The AI Consulting Network specializes in exactly this. A practical workflow for a CRE firm doing image-based due diligence in 2026:

  1. Standardize on Claude Opus 4.7 for drawings and Gemini 3.1 Pro or GPT-5.5 for photos. Use the API or an enterprise plan with admin controls.
  2. For every property due diligence file, OCR all PDFs first. Then run vision queries against the original images for content that OCR cannot capture.
  3. Write standard prompts for each document type (site plan review, floor plan review, condition photo triage). Save them in Claude Projects or as custom prompts in your enterprise tool.
  4. Run two-pass verification on any numeric output: the vision model identifies, a human verifies on the drawing.

For CRE investors looking to roll out a vision-enabled due diligence workflow firm-wide, The AI Consulting Network specializes in exactly this integration. To understand the broader vision-vs-text accuracy tradeoffs, our Claude vs ChatGPT property valuation accuracy comparison covers the underlying benchmarks. Industry research from CBRE on AI adoption in 2026 reinforces that document-heavy due diligence is one of the highest-ROI workflows for AI in CRE.

Frequently Asked Questions

Q: Which AI model is best for reading commercial real estate site plans in 2026?

A: Claude Opus 4.7, after its April 2026 release brought a 3x image resolution upgrade to 3.75 megapixels. The higher resolution preserves the fine line work and small annotations that are typical on commercial site plans.

Q: Can AI measure dimensions off a site plan or floor plan?

A: Not to engineering precision. All current frontier models can estimate approximate dimensions (around 30 by 50 feet) but cannot replace a surveyor or measure off a drawing accurately. For any number that drives an underwriting model, have a human verify against the scale.

Q: What resolution should I scan a site plan for AI review?

A: At least 300 DPI, ideally 600 DPI. Higher resolution always helps, up to the model's input limit. Claude Opus 4.7 accepts up to 2,576 pixels on the longer edge; GPT-5.5 and Gemini 3.1 Pro accept smaller maxima.

Q: Do I need to OCR a drawing before sending it to a vision-enabled model?

A: Yes if there is significant typed text on the drawing. OCR captures the text precisely; the model's vision provides the spatial and visual context. Sending both improves accuracy by 30 to 50 percent compared to vision alone.

Q: How is photo-based condition assessment different from site plan review?

A: Photo-based condition assessment relies on real-world image interpretation (is this roof in good condition, is this paint peeling). Site plan review relies on architectural drawing interpretation (where is the loading dock, where are the easements). GPT-5.5 is stronger on the first; Claude Opus 4.7 is stronger on the second.