Uncategorized

Karpathy’s March of Nines shows why 90% AI reliability isn’t even close to enough

“When you get a demo and something works 90% of the time, that’s just the first nine.” — Andrej Karpathy The “March of Nines” frames a common production reality: You can reach the first 90% reliability with a strong demo, and each additional nine often requires comparable engineering effort. For enterprise teams, the distance between…

Read More

Anthropic launches Claude Marketplace, giving enterprises access to Claude-powered tools from Replit, GitLab, Harvey and more

San Francisco startup Anthropic continues to ship new AI products and services at a blistering pace, despite a messy ongoing dispute with the U.S. Department of War. Today, the company announced Claude Marketplace, a new offering that lets enterprises with an existing Anthropic spend commitment apply part of it toward tools and applications powered by…

Read More

LangChain’s CEO argues that better models alone won’t get your AI agent to production

As models get smarter and more capable, the “harnesses” around them must also evolve. This “harness engineering” is an extension of context engineering, says LangChain co-founder and CEO Harrison Chase in a new VentureBeat Beyond the Pilot podcast episode. Whereas traditional AI harnesses have tended to constrain models from running in loops and calling tools,…

Read More

New KV cache compaction technique cuts LLM memory 50x without accuracy loss

Enterprise AI applications that handle large documents or long-horizon tasks face a severe memory bottleneck. As the context grows longer, so does the KV cache, the area where the model’s working memory is stored. A new technique developed by researchers at MIT addresses this challenge with a fast compression method for the KV cache. The…

Read More

Google PM open-sources Always On Memory Agent, ditching vector databases for LLM-driven persistent memory

Google senior AI product manager Shubham Saboo has turned one of the thorniest problems in agent design into an open-source engineering exercise: persistent memory. This week, he published an open-source “Always On Memory Agent” on the official Google Cloud Platform Github page under a permissive MIT License, allowing for commercial usage. It was built with…

Read More

Google Workspace CLI brings Gmail, Docs, Sheets and more into a common interface for AI agents

What’s old is new: the command line — the original, clunky non-graphical interface for interacting with and controlling PCs, where the user just typed in raw commands in code — has become one of the most important interfaces in agentic AI. That shift has been driven in part by the rise of coding-native tools such…

Read More

OpenAI launches GPT-5.4 with native computer use mode, financial plugins for Microsoft Excel, Google Sheets

The AI updates aren’t slowing down. Literally two days after OpenAI launched a new underlying AI model for ChatGPT called GPT-5.3 Instant, the company has unveiled another, even more massive upgrade: GPT-5.4. Actually, GPT-5.4 comes in two varieties: GPT-5.4 Thinking and GPT-5.4 Pro, the latter designed for the most complex tasks. Both will be available…

Read More

Databricks built a RAG agent it says can handle every kind of enterprise search

Most enterprise RAG pipelines are optimized for one search behavior. They fail silently on the others. A model trained to synthesize cross-document reports handles constraint-driven entity search poorly. A model tuned for simple lookup tasks falls apart on multi-step reasoning over internal notes. Most teams find out when something breaks. Databricks set out to fix…

Read More

Black Forest Labs’ new Self-Flow technique makes training multimodal AI models 2.8x more efficient

To create coherent images or videos, generative AI diffusion models like Stable Diffusion or FLUX have typically relied on external “teachers”—frozen encoders like CLIP or DINOv2—to provide the semantic understanding they couldn’t learn on their own. But this reliance has come at a cost: a “bottleneck” where scaling up the model no longer yields better…

Read More

Microsoft built Phi-4-reasoning-vision-15B to know when to think — and when thinking is a waste of time

Microsoft on Tuesday released Phi-4-reasoning-vision-15B, a compact open-weight multimodal AI model that the company says matches or exceeds the performance of systems many times its size — while consuming a fraction of the compute and training data. The release marks the latest and most technically ambitious chapter in the software giant’s year-long campaign to prove…

Read More