Topic #agent benchmarking

AI: Events

Holo3: A New Record in AI-Powered Computer Control

Technical context • Products

Hcompany has introduced Holo3, an agent model that set a record on a key computer operation benchmark and is designed for autonomous work in corporate environments.

Hugging Facehuggingface.co Apr 2, 2026

AI: Events

Holo3: A New Record for AI Agents That Operate Computers

Products

Company H has announced the release of Holo3, a model that has set a new record on the leading benchmark for AI agents that operate computers.

H Companyhcompany.ai Mar 31, 2026

AI: Events

When an Agent Doesn't Know the Answer: How Retrieval Models Are Learning to Find the Unreachable

Products

Mixedbread has released Search v3 – a retrieval model that significantly narrows the gap between what an agent actually finds and what is theoretically discoverable within the data.

Mixedbreadwww.mixedbread.com Mar 25, 2026

AI: Events

MolmoWeb: An Open AI Agent for Autonomous Web Browsing

Products

The Allen Institute has introduced MolmoWeb, an open-source web agent. It navigates browsers visually, much like a human, and outperforms many proprietary competitors.

Ai2allenai.org Mar 25, 2026

AI: Events

coSTAR: How Databricks Launches AI Agents Quickly and Reliably

Development

Databricks has developed its own approach to creating AI agents – the coSTAR system, which allows the team to work quickly without losing control over quality.

Databrickswww.databricks.com Mar 22, 2026

AI: Events

Assessing AI Agent Skills: What to Look For

Development

We explore why assessing AI agents' skills isn't just a formality, but a crucial step toward building systems you can trust with real-world tasks.

OpenHandsopenhands.dev Mar 18, 2026

AI: Events

How to Tell if Your AI Agent is Actually Working or Just Looking Convincing

Development

LightOn has introduced the NOVA evaluation system. We explore how it works and why a «gut feeling» isn't enough to verify AI agents.

LightOn AIwww.lighton.ai Mar 12, 2026

AI: Events

OpenAI and Federal Permits: How AI Is Accelerating One of the Slowest U.S. Bureaucratic Systems

Regulation

In partnership with a national laboratory, OpenAI has developed a tool to evaluate AI agents for speeding up federal approvals and is already seeing the first measurable results.

OpenAIopenai.com Mar 6, 2026

AI: Events

A Powerful AI Agent Without the Cloud: How LFM2-24B-A2B Runs Directly on Your Computer

Products

Liquid AI has introduced the LFM2-24B-A2B model, capable of running AI agents with tool-calling capabilities directly on consumer hardware – without the cloud or latency.

Liquidwww.liquid.ai Mar 6, 2026