Skip to main content

Thinking Machines Interaction Models: What 0.4-Second Voice AI Means for CRE Brokerages in 2026

By Avi Hacker, J.D. · 2026-05-17

What are Thinking Machines Interaction Models? Thinking Machines Interaction Models are a new class of multimodal AI built by Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, designed to listen, watch, and respond simultaneously instead of waiting for a user to finish speaking. The first model in the family, TML-Interaction-Small, was unveiled in a May 11, 2026 research preview and replies in roughly 0.4 seconds, fast enough to support real-time tenant calls, leasing tours, and broker negotiations. For CRE leaders mapping the next wave of AI commercial real estate tools, this is the first model purpose-built for the voice and video workflows where real-time interaction matters.

Key Takeaways

  • Thinking Machines Lab released TML-Interaction-Small on May 11, 2026, a 276B parameter Mixture-of-Experts model with 12B active parameters built for real-time voice and video.
  • TML-Interaction-Small responds in 0.40 seconds on FD-bench v1, versus 1.18 seconds for GPT-Realtime-2.0 and 0.57 seconds for Gemini 3.1 Flash Live, making natural conversation possible.
  • The model uses a multi-stream architecture with 200 millisecond micro-turns, processing audio, video, and text in parallel rather than the turn-based loop used by most current AI assistants.
  • For CRE, real-time voice AI changes the economics of property management call centers, leasing intake, broker support, and tenant retention workflows over 12 to 24 months.
  • Murati's lab raised $2 billion in seed funding from a16z, Nvidia, AMD, Accel, and ServiceNow, signaling Wall Street confidence in real-time AI as the next interaction paradigm.

What Thinking Machines Just Shipped

On May 11, 2026, Thinking Machines Lab published the research preview for TML-Interaction-Small, its first production model. The architecture breaks from the turn-based loop that defines today's AI assistants. Most current systems wait for the user to finish, transcribe the audio, run a language model, then play back synthesized speech. The result is a stitched stack with 1 to 2 seconds of unavoidable latency. Thinking Machines argues this is a dead end: real conversation is not turn-based, it is continuous.

TML-Interaction-Small is a 276 billion parameter Mixture-of-Experts model with 12 billion active parameters per token. It processes audio, video, and text in continuous 200 millisecond micro-turns, perceiving and responding at the same time. On FD-bench v1, its turn-taking latency is 0.40 seconds. Gemini 3.1 Flash Live Preview comes closest at 0.57 seconds. GPT-Realtime-2.0 at minimal settings takes 1.18 seconds, and at high reasoning takes 1.63 seconds. On the FD-bench v1.5 interaction quality benchmark, TML-Interaction-Small scored 77.8, against 54.3 for Gemini 3.1 Flash Live and 46.8 for GPT-Realtime-2.0.

The underlying trick is what the lab calls encoder-free early fusion: instead of running massive separate encoders like Whisper for audio, the model ingests raw audio signals and image patches through a lightweight embedding layer, with the full transformer co-trained across modalities from scratch. Streaming server support has already been upstreamed to SGLang, which means infrastructure teams already deploying open inference servers can plug into the same latency profile. Access today is limited to research partners, with broader rollout expected later in 2026.

Why This Matters for CRE Brokerages

For commercial real estate brokerages and property management firms, the practical question is simple: where in the operating model does sub-second AI voice change the cost curve? Three places stand out.

First, the property management call center. A 2026 Famulor case study of a housing cooperative with 1,200 managed units showed 76% of inbound tenant calls handled end-to-end by an AI voice agent, with average handle time dropping from 8 minutes to roughly 3 minutes per maintenance ticket. Those gains were achieved with current-generation voice stacks that still feel robotic because of 1 to 2 second latency. At 0.4 seconds, the AI sounds like a human dispatcher, which directly lifts containment rates, NPS, and renewal probability. This is the same direction FORE Real and Pegasus are pushing in commercial property management.

Second, leasing intake and tour scheduling. Brokerages lose deals when prospects do not get a same-day callback. A real-time voice agent that can qualify a tenant, surface available units from a CRM, schedule a tour, and hand off context to a human agent transforms top-of-funnel economics. For a multifamily owner with 5,000 units, even a 10 percentage point improvement in lead-to-tour conversion translates into meaningful occupancy gains over a year.

Third, broker support and underwriting copilots. Real-time interaction lets a broker on a call with an LP or seller pull comps, model a basic cap rate or DSCR, and surface relevant clauses without breaking the conversation. Cap rate is calculated as NOI divided by purchase price, while DSCR is NOI divided by annual debt service. Both can be sanity-checked live by a fast voice copilot, instead of being typed into a spreadsheet after the call.

How Thinking Machines Fits the Broader 2026 AI Stack

Thinking Machines is not building a CRE product. It is building a foundation model that voice-first applications can sit on top of. That puts it in the same architectural layer as OpenAI Realtime, Google's Gemini Live, and Anthropic's Claude voice. The bet investors made when they wrote checks for the $2 billion seed round, led by a16z with Nvidia, AMD, Accel, and ServiceNow participating, is that real-time interaction will be a distinct product category, not a feature of existing chatbots. For CRE, this is consistent with the broader 2026 pattern: foundation labs ship infrastructure, and vertical platforms wire it into workflows.

It is also consistent with the AEO trend. A study reported in the HubSpot AEO Sensor launch showed organic traffic at HubSpot customers down 27% year over year, with ChatGPT business referral traffic hitting a 12-month low in April 2026. Voice AI accelerates this shift further: the next discovery moment for many tenants and investors will not be a Google search, it will be a spoken question to an AI agent. CRE firms that own the voice-layer relationship with their tenants, prospects, and LPs will have a durable advantage. The AI Consulting Network specializes in exactly this kind of voice and agent workflow design for CRE operators.

How CRE Leaders Should Respond

Practical next steps depend on portfolio size. Independent sponsors and small operators (under 1,000 units) should start by piloting a current-generation voice agent on a single workflow, typically tenant maintenance intake, and design the architecture so the underlying model can be swapped when interaction models become widely available. Mid-market sponsors (1,000 to 10,000 units) should run a 3-month voice agent pilot with measurable KPIs: containment rate, average handle time, after-hours coverage, and tenant satisfaction. Enterprise operators should formalize an interaction layer strategy that spans leasing, property management, and asset management, because the labor and retention math becomes material at scale.

Regardless of size, the data layer matters most. Voice agents are only as good as the property data, lease abstractions, and CRM integrations behind them. CRE investors looking for hands-on AI implementation support can reach out to Avi Hacker, J.D. at The AI Consulting Network. As BCG recently noted, only 25% of real estate firms qualify as AI leaders versus 40% across industries, which means the early movers on real-time voice will compound advantage faster than the rest of the sector.

Frequently Asked Questions

Q: What is TML-Interaction-Small?

A: TML-Interaction-Small is the first model released by Thinking Machines Lab, the AI startup founded by former OpenAI CTO Mira Murati. It is a 276 billion parameter Mixture-of-Experts model with 12 billion active parameters, designed for real-time voice and video interaction with 0.40 second turn-taking latency.

Q: How does Thinking Machines compare to OpenAI Realtime and Gemini Live?

A: On FD-bench v1, TML-Interaction-Small achieves 0.40 second turn-taking latency, versus 0.57 seconds for Gemini 3.1 Flash Live Preview and 1.18 seconds for GPT-Realtime-2.0 at minimal settings. On interaction quality (FD-bench v1.5), it scored 77.8, against 54.3 for Gemini and 46.8 for GPT-Realtime.

Q: Can CRE firms use Thinking Machines Interaction Models today?

A: Not yet at general availability. The model is in research preview with limited partners as of May 2026, with broader rollout planned later in 2026. CRE firms should design current voice agent pilots so the underlying model can be swapped when interaction models become widely available.

Q: Where does real-time voice AI move the needle in CRE?

A: The biggest near-term wins are property management call centers (handling tenant maintenance, leasing inquiries, and billing questions), leasing intake and tour scheduling for multifamily and office, and live broker copilots that surface comps, cap rates, and DSCR figures during calls with LPs or sellers.

Q: Why did investors give Thinking Machines $2 billion before it shipped a product?

A: Investors bet that real-time interaction will be its own product category, not a feature of existing chatbots. The $2 billion seed round was led by Andreessen Horowitz with participation from Nvidia, AMD, Accel, and ServiceNow, who view voice and video AI as the next interaction paradigm after text-based assistants.