LLM Model Benchmarks - Search News

6don MSN

Google's Gemini 3 Flash model outperforms GPT-5.2 in some benchmarks

Almost exactly a month after the debut of Gemini 3 Pro in November, Google has begun rolling out the more efficient Flash ...

SiliconANGLE

MLCommons releases new AILuminate benchmark for measuring AI model safety

MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms.

19d

Nvidia's new AI framework trains an 8B model to manage tools like a pro

Instead of a single, massive LLM, Nvidia's new 'orchestration' paradigm uses a small model to intelligently delegate tasks to a team of tools and specialized models.

Business Wire

Cognite Launches the Cognite Atlas AI™ LLM & SLM Benchmark Report for Industrial Agents

AUSTIN, Texas & OSLO, Norway--(BUSINESS WIRE)--Cognite, the global leader in AI for industry, today announced the launch of the Cognite Atlas AI™ LLM & SLM Benchmark Report for Industrial Agents. The ...

Geeky Gadgets

DeepSeek-v2.5 open source LLM performance tested – Beats Claude 3, GPT-4o and Google Gemini

The development of DeepSeek v2.5 involved the fusion of two highly capable models: DeepSeek version 2 0628 and DeepSeek Coder version 2 0724. By combining the strengths of these models, DeepSeek v2.5 ...

Tech Xplore on MSN

Squashing 'fantastic bugs' hidden in AI benchmarks

After reviewing thousands of benchmarks used in AI development, a Stanford team found that 5% could have serious flaws with ...

insideHPC

MLCommons Launches LLM Safety Benchmark

Dec. 4, 2024 — MLCommons today released AILuminate, a safety test for large language models. The v1.0 benchmark – which provides a series of safety grades for the most widely-used LLMs – is the first ...

The Economist

GPT, Claude, Llama? How to tell which AI model is best

When Meta, the parent company of Facebook, announced its latest open-source large language model (LLM) on July 23rd, it claimed that the most powerful version of Llama 3.1 had “state-of-the-art ...

SiliconANGLE

Google updates Gemini 2.5 LLM series with new entry-level model, pricing changes

Google LLC today introduced a new large language model, Gemini 2.5 Flash-Lite, that can process prompts faster and more cost-efficiently than its predecessor. The algorithm is rolling out as part of a ...

EurekAlert!

Release of “Fugaku-LLM” – a large language model trained on the supercomputer “Fugaku”

A team of researchers in Japan released Fugaku-LLM, a large language model with enhanced Japanese language capability, using the RIKEN supercomputer Fugaku. A team of researchers in Japan released ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results