This repository contains all code for reproducing experiments from the paper Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? Given a BPE tokenizer, our attack infers ...
See the VS Code Tips wiki for a quick primer on getting started with VS Code. Setting up the JDK The extension requires JDK 17 or newer to run. Optionally, set a different JDK to compile and run ...