Bringing AI agents into the enterprise software development lifecycle is fast becoming the norm. As developers experiment with new platforms, organizations are exposed to potential security and orchestration failures. Systems that work in pilots may fail once the agents start working with real-time data.
Legacy tech giant IBM is one of several companies trying to address that gap by introducing more structure into how these workflows run. Yesterday, it announced the global launch of its AI-powered software development platform Bob, designed to write and test code across the development cycle, already in use by more than 80,000 of its employees after starting with just 100 internal users in summer 2025.
Bob introduces a structured layer that constantly pauses for human-led checkpoints, yet by harnessing AI models to perform agentic tasks, IBM says it has saved some teams up to 70% of time “on selected tasks…equaling an average time savings of 10 hours per week.”
Specific models supported include IBM’s own Granite series, Anthropic’s Claude, some from French AI firm Mistral and other smaller distilled models — no Alibaba Qwen or other fully open source ones.
This approach reflects a shift in how enterprises want to approach AI-led development: to build systems that not only build applications but also execute complex, multi-step workflows that do not rely on a single model or a single orchestration framework. It provides a structured, guarded approach to automation that seeks to center humans more in the process and fill audit gaps.
Neal Sundaresan, general manager, Automation and AI at IBM, told VentureBeat in an exclusive interview that a large part of using AI for software development is being systematic.
“Model capability alone isn’t enough,” Sundaresan said. “How you deploy it, how you structure context, and how you keep humans in the loop is what determines whether AI actually delivers.”
That divide is shaping how enterprises choose AI tools, whether they prioritize flexibility and experimentation or reliability and auditability.
Varying approaches to AI-led development
A growing class of open or autonomous agent systems has pushed the boundaries of what developers can do. They can now run extended or stateful workflows without much human intervention.
The rise of OpenClaw showed enterprises how far experimentation can go, especially when trained on local data and run in sandboxes. But it also meant that the choice between easier agent and workflow creation and security.
Some companies have embraced this spirit of experimentation.
Enterprise providers like Nvidia chose to embrace OpenClaw-like systems by adding a fence around the sandbox environment that runs autonomous agents, using NemoClaw. Kilo launched Kilo Claw, aimed at providing security for autonomous agents. OpenAI, in its updated Agents SDK, added support for sandbox agent implementations that mirror a lot of the usage patterns of systems like OpenClaw.
Sundaresan said enterprises continue to experiment with how they want to approach coding and agent building. He doesn’t want to close the door on fully autonomous agents proactively completing tasks, but he believes enterprises will want to exercise more caution as well.
“If you tell me that the final answer will be OpenClaw, then we will get there,” he said. “But it’s better to open the gate slowly than say, ‘oops, how do I close it now?’”
Bob reflects that thought process, highlighting the increasing shift for enterprises.
How Bob compares
Bob acts as a coding platform, but unlike similar products, it aims to standardize and govern the agent workflows created on it.
Tools like Cursor and Claude Code position the user at the beginning of the task. They are writing the prompts, chaining steps and debugging. LangGraph does similarly while also allowing teams to define agent flows.
The difference is not about capabilities but about control, and whether the system enterprises use explores potential solutions or delivers predictable execution.
In this case, the human employee starts and ends the process. If the agent is unable to complete its task or makes a mistake, this is handled after the fact.
Bob, on the other hand, essentially pre-structures the development lifecycle into role-based stages. The agents will often check-in with the user for approval as a natural workflow checkpoint. Sundaresan said the idea is to combine the human and automated workflows.
What is becoming clear is that the next phase of enterprise AI no longer relies on model power, but rather on how well tools are designed to balance autonomy and control.
Pricing and availability
As mentioned previously, Bob is now available for all regions where IBM does business. IBM’s pricing structure for Bob consists of four primary subscription tiers for each user/seat and is built around its own internal credits system called “Bobcoins,” which serves as the primary metric for transparency and predictability.
These are set at a fixed valuation of 1 Bobcoin per $0.50 USD. Users consume these coins by performing specific actions, such as generating code, running commands, or performing file operations. If a user exhausts their balance, they must upgrade their plan to continue using the service.
Here are the plans currently offered and how many Bobcoins the user obtains by subscribing to each tier.
-
30-day Free Trial providing 40 Bobcoins
-
Pro plan at $20 per month with 40 Bobcoins
-
Pro+ plan at $60 per month with 160 Bobcoins
-
Ultra tier priced at $200 per month for 500 Bobcoins.
All standard plans provide access to core features including specialized agentic modes, literate coding, the Bob Shell for intelligent CLI workflows, and Model Context Protocol (MCP) integration.
While all individual plans are restricted to a single user, an Enterprise plan is available through sales contact, offering centralized team management, flexible role assignments, and the ability to distribute Bobcoins across an organization.
Enterprise subscribers receive additional benefits such as priority support and a dashboard to track entitlements and usage awareness.