Intellectual hub of the topic

ai benchmarks

AI: Events

16 AI Models, 9,000+ Documents: Who Came Out on Top?

Products

A large-scale test of 16 AI models on real-world documents revealed surprising results: expensive solutions don't always outperform their more affordable counterparts.

Nanonetsnanonets.com Mar 20, 2026

AI: Events

How to Measure Our Proximity to True AI: Google DeepMind Proposes a New Framework

Research

Google DeepMind has introduced a cognitive framework for assessing progress toward artificial general intelligence (AGI) and launched a Kaggle hackathon to develop relevant benchmarks.

Google DeepMinddeepmind.google Mar 19, 2026

AI: Events

Inference: Why a Single Metric Can't Judge an AI Accelerator

Infrastructure

AMD explains why comparing AI accelerators using a single performance metric is misleading and advocates for a multi-dimensional evaluation approach.

AMDwww.amd.com Mar 19, 2026

AI: Events

M4-RAG: When AI Seeks Answers in Images, Not Just Text, and Across Multiple Languages

Research

Researchers have introduced M4-RAG, a large-scale benchmark for evaluating systems that answer questions about images by drawing on external knowledge and operating in multiple languages.

Capital Onewww.capitalone.com Mar 17, 2026

AI: Events

Sber Now Able to Verify if AI Truly Can Peer Into the Future

Research

Sber researchers have launched an open-source platform for the objective assessment of how accurately AI models can predict chains of events over long-term horizons.

SberLabssberlabs.com Mar 16, 2026

Lab

A Voice at the Appointment: Why AI Can't Make Out the Doctor

Electrical Engineering & System Sciences

Researchers tested whether AI systems can comprehend real-world medical conversations – and the results delivered a harsh verdict for the entire industry.

Dr. Alexey Petrov Mar 11, 2026

AI: Events

Spatial Orientation: Can AI Models Handle What We Take for Granted?

Research

Stanford researchers tested leading AI models on their ability to navigate space and found surprisingly poor results.

Stanford AI Laboratoryai.stanford.edu Mar 5, 2026

AI: Events

EDiTh: How to Test Corporate Search Without Revealing Company Secrets

Products

LightOn has released EDiTh, an open-source benchmark that allows testing corporate search on realistic documents without the risk of leaking confidential data.

LightOn AIwww.lighton.ai Mar 4, 2026

AI: Events

OpenHands Index: How Developers Are Improving the Evaluation of AI Coding Agents

Research

The OpenHands team explains how their benchmark for evaluating AI agents works and why conventional metrics don't always reflect the true picture.

OpenHandsopenhands.dev Feb 21, 2026

Want to dive deeper into the world
of neuro-creativity?

Be the first to learn about new books, articles, and AI experiments
on our Telegram channel!