This is the official repository for paper Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory. In this paper, we introduce PSN-IRT, a framework based on Item ...
Ready-to-use configurations for Anthropic's Claude Code. A comprehensive collection of AI agents, custom commands, settings, hooks, external integrations (MCPs), and project templates to enhance your ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results