This is the official repository for paper Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory. In this paper, we introduce PSN-IRT, a framework based on Item ...
Ready-to-use configurations for Anthropic's Claude Code. A comprehensive collection of AI agents, custom commands, settings, hooks, external integrations (MCPs), and project templates to enhance your ...