Programming Language Benchmarks

The Anoa-L01 Benchmark: Prompt-Based Zero-Shot Evaluation for Sulawesi’s Regional Languages Detection in LLMs

Abstract: In recent years, large language models (LLMs) have demonstrated impressive performance in a wide range of tasks of natural language processing. However, their performance on low-resource ...

Security Boulevard

Cut your coding agent’s cost with Sonar Vortex

New benchmarks show semantic code graphs helping coding agents find change locations faster and complete updates more ...

Tech Times

Claude AI Beats Human Robotics Teams 20x: Anthropic Marks Physical AI Turn

Claude AI robotics benchmark shows Opus 4.7 finishing physical robot programming in 9 minutes, against 181 minutes for ...

Tech Times

Autonomous AI Coding Clears 60,000-Line Ceiling: MirrorCode Benchmark Released

AI coding benchmark MirrorCode published its full results June 26, showing Claude Opus 4.7 autonomously rebuilt a 60,000-line interpreter and scored 56% overall — completing tasks that take human ...

GitHub

LangArena: A Balanced Programming Language Benchmark Suite

The suite started with my original implementation in Crystal. AI tools assisted in translating it to other languages. Throughout this process, I reviewed and edited the implementation for semantic ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results