Haze Seas, based on the One Piece universe, has a roster of Devil Fruits and Races that each bring a distinct playstyle and ...
New benchmarks show semantic code graphs helping coding agents find change locations faster and complete updates more ...
Abstract: Magnetic field sensors based on quantum effects have shown superior performance in sensitivity, wherein the nitrogen-vacancy (NV) center in diamond has emerged as a microfabrication ...
Skill Eval Harness is a Python CLI for testing whether an Agent Skill changes observable output. It reads evals/shared-benchmark.json, emits answer-key-safe task rows, grades files under eval-runs/, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results