Programming Language Benchmarks

Autonomous AI Coding Clears 60,000-Line Ceiling: MirrorCode Benchmark Released

AI coding benchmark MirrorCode published its full results June 26, showing Claude Opus 4.7 autonomously rebuilt a 60,000-line interpreter and scored 56% overall — completing tasks that take human ...

GitHub

LangArena: A Balanced Programming Language Benchmark Suite

The suite started with my original implementation in Crystal. AI tools assisted in translating it to other languages. Throughout this process, I reviewed and edited the implementation for semantic ...

Small Language Models Outperform Frontier AI On Cost, Speed And Accuracy

Bigger has defined AI from day one. New data says task-specific small models beat frontier LLMs on accuracy, cost and speed — ...

Slator

AI Translation’s Key Benchmark Takes Aim at Low-Resource Languages

The Eleventh Conference on Machine Translation (WMT26) has moved into its active evaluation phase, with test data releases and submission windows now opening across several of the conference’s shared ...

10d

Cecuro reaches 91.45% detection rate on EVMBench security benchmark

Cecuro reports 91.45% vulnerability detection on EVMBench, the independent benchmark from OpenAI and Paradigm, up from 87.17% and more than double the best general-purpose frontier model ...

MIT Technology Review

A startup claims it broke through a bottleneck that’s holding back LLMs

The Miami-based AI startup Subquadratic came out of stealth mode last month with a huge claim. It announced that it had ...

IEEE

Large Language Models Versus Static Code Analysis Tools: A Systematic Benchmark for Vulnerability Detection

Abstract: Modern software relies on a multitude of automated testing and quality assurance tools to prevent errors, bugs and potential vulnerabilities. This study sets out to provide a head-to-head, ...

TechRepublic

The 5 Best Online C Programming Courses for 2026

Learning to program in C on an online platform can provide structured learning and a certification to show along with your resume. Learning C can still be useful in 2026, especially if you want to ...

17d

Z.ai’s open-weights GLM-5.2 beats GPT-5.5 on multiple long-horizon coding benchmarks for 1/6th the cost

It allows engineering teams to host frontier-level AI on their own sovereign infrastructure, entirely eliminating vendor lock ...

Medical Xpress

Multilingual benchmark evaluates how well AI interprets clinical text and health records in nine languages

Researchers at Mass General Brigham recently developed BRIDGE, a multilingual benchmark that evaluates how well large language models (LLMs) understand clinical patient care text, including language ...

Senior Housing News

Benchmark Revamps Memory Care Programming With Montessori Methods, Eye on Reducing Medications

Benchmark Senior Living is emphasizing personal connection and supporting more nonpharmacological interventions in memory care through new programming and engagement. The Waltham, Massachusetts-based ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results