Microsoft open sources AI evaluation framework for enterprise agents – InfoWorld

Microsoft open sources AI evaluation framework for enterprise agents - InfoWorld https://indiaprimetv.com/breaking-news/microsoft-open-sources-ai-evaluation-framework-for-enterprise-agents-infoworld/

Microsoft has open-sourced an AI evaluation framework that converts natural-language requirements into executable tests, expanding its push into enterprise AI governance as organizations struggle to validate agent behavior before production deployments systematically.
The framework, called ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing), generates evaluation scenarios, datasets, metrics, and scorecards from written specifications, product requirements, and governance documents, Microsoft said in a blog post announcing the release.
“Agents fail in ways that are hard to see,” Microsoft wrote in the blog post. “They drift from policy, produce unsafe outputs in edge cases, and behave differently in production than they did in testing. Generic benchmarks do not catch these failures because they are not built around your policies, your agent, or your use case.”
Rather than requiring developers to manually create evaluation suites, ASSERT translates written intent into reusable tests that can be integrated into AI development pipelines, the company said in the blog post.
With ASSERT, Microsoft is entering an increasingly competitive AI evaluation market that already includes platforms such as LangChain’s LangSmith, Braintrust, Patronus AI, Galileo, Arize AI’s Phoenix, and Promptfoo, which help enterprises benchmark, monitor, and validate large language model applications.
The release comes as enterprises rapidly expand AI agent deployments while formal evaluation practices remain the exception rather than the rule.
“Most organizations, in fact, 99% of them, do not evaluate any AI agents pre-production,” said Anushree Verma, senior director analyst at Gartner.
According to Verma, the industry’s next competitive advantage will depend less on advances in reasoning models than on how effectively organizations simulate and stress-test AI agents before deployment.
“The next competitive moat in agentic AI is not about the sophistication of reasoning models or the underlying architecture,” she said. “It will be about the depth and realism of the training environment through agentic simulation, particularly for mission-critical deployments.”
Gartner estimates that by 2029, more than 75% of domain-specific agents designed without agentic simulation in regulated industries will fail to deliver value.
Forrester sees enterprises moving toward behavioral evaluation but says most organizations have yet to make it a formal production requirement.
“Most enterprises are still in an intermediate stage where behavioral evaluation is inconsistently applied rather than treated as a formal production gate,” said Biswajeet Mahapatra, principal analyst at Forrester.
According to Forrester data, more than 45% of organizations are already using AI agents, and another 25% are piloting them, yet many continue to struggle with scaling because of immature governance and limited operational rigor.
“The net is that behavioral evaluation is becoming important, but for most organizations it is still ad hoc or tool-driven rather than a standardized release gate enforced across the lifecycle,” Mahapatra said.
Microsoft said ASSERT uses large language models as judges, with model-generated evaluations agreeing with human reviewers 80% to 90% of the time in the company’s internal validation.
That level of agreement can help automate large portions of AI testing, but should not be treated as a standalone governance mechanism, Mahapatra said.
“An 80% to 90% agreement rate with human reviewers indicates strong alignment but is not sufficient as a standalone control for governance or compliance,” he said.
Instead, enterprises should adopt layered oversight where AI evaluates AI at scale while humans retain supervisory accountability for high-risk, regulated, or ambiguous scenarios. Buyers should also watch for bias, consistency issues, and overreliance on a single model acting as both generator and evaluator, he added.
Microsoft released ASSERT under the MIT open-source license, allowing organizations to inspect, modify, and integrate the framework into existing AI development workflows.
But open sourcing a framework does not eliminate questions around evaluation neutrality, Mahapatra said.
“Open sourcing under an MIT license reduces lock-in concerns and enables broad interoperability across model ecosystems,” he said. “However, it does not fully eliminate trust or conflict-of-interest questions because the originating vendor still influences how evaluation criteria, scoring logic, and definitions of acceptable behaviour are encoded.”
Instead of relying on a single evaluation framework, enterprises should validate AI systems against multiple evaluation approaches and retain ownership of internal evaluation policies, he said.
Gyana Swain is a seasoned technology journalist with over 20 years’ experience covering the telecom and IT space. He is a consulting editor with VARINDIA and earlier in his career, he held editorial positions at CyberMedia, PTI, 9dot9 Media, and Dennis Publishing. A published author of two books, he combines industry insight with narrative depth. Outside of work, he’s a keen traveler and cricket enthusiast. He earned a B.S. degree from Utkal University.

source

Leave a Reply

Your email address will not be published. Required fields are marked *

Dhiraj Seth as the next Chief of Army Staff

Lt Gen Dhiraj Seth Appointed as India’s Next Army Chief: Career, Achievements and Key Challenges Ahead

By Devender Singh | IndiaprimeTV.com | Updated: June 19, 2026 The Government of India has appointed Lieutenant General Dhiraj Seth as the next Chief of Army Staff (COAS), marking a significant leadership transition in the Indian Army. Currently serving as the Vice Chief of Army Staff, Lt Gen Seth will assume office on June 30, 2026, […]

Read More
India-UK FTA 2026, UK car exports to India, British cars in India, India UK trade agreement, import duty on luxury cars India, Jaguar Land Rover India, luxury car prices India, India UK CETA, Indian automobile industry, UK trade deal impact Rolls-Royce India, Bentley India, Aston Martin India, McLaren India, India exports to UK, Indian auto components industry, India UK bilateral trade, premium EV market India

Biz Updates: Britain to Export 378,000 Cars to India Over 15 Years as Import Duties Fall Under India-UK Trade Deal

Author: Devender Singh | indiaprimetv.com Biz Updates: Britain to Export 378,000 Cars to India Over 15 Years as Import Duties Fall Under India-UK Trade Deal India-UK Trade Agreement Set to Transform India’s Auto Market The recently signed India-UK Comprehensive Economic and Trade Agreement (CETA) is expected to bring significant changes to India’s automobile sector. Under […]

Read More
Monsoon 2026: How El Niño Could Shape India's Rainfall,

Monsoon 2026: How El Niño Could Shape India’s Rainfall, Agriculture and Food Prices

By Tejasvi Singh | IndiaPrimeTV.com  Monsoon 2026: How El Niño Could Shape India’s Rainfall, India’s annual monsoon is more than a weather event—it is the backbone of the country’s economy. Nearly half of India’s farmland depends directly on rainfall, making the southwest monsoon crucial for agriculture, food prices, water availability and overall economic growth. As […]

Read More