Official project repository for the Codec-SUPERB benchmark and codec-superb-tiny regression results.
Codec-SUPERB evaluates neural sound codec models through downstream speech, audio, and music tasks. The benchmark focuses on whether codec tokenization and reconstruction preserve task-relevant information, and it provides a unified protocol for comparing codec families under consistent evaluation conditions.
- π Out-of-the-Box Interface: Intuitive API for easy integration and rapid experimentation with diverse codec models.
- π Multi-Perspective Leaderboard: Comprehensive assessment across various speech processing dimensions with rankings for competitive transparency.
- ποΈ Standardized Environment: Ensures fair and consistent comparisons by using uniform testing conditions for all models.
- π Unified Datasets: Curated collection of datasets testing a wide range of real-world speech processing scenarios.
- β‘ Batch Processing Support: Highly optimized batch encoding/decoding for significant performance speedups.
# Clone the repository
git clone https://github.com/voidful/Codec-SUPERB.git
cd Codec-SUPERB
# Install dependencies
pip install -r requirements.txtfrom SoundCodec import codec
# List all available codecs
print(codec.list_codec())
# Load a specific codec
model = codec.load_codec('encodec_24k_6bps')import torchaudio
# Load audio
waveform, sample_rate = torchaudio.load('sample_audio.wav')
data_item = {'audio': {'array': waveform.numpy()[-1], 'sampling_rate': sample_rate}}
# Extract discrete units
sound_unit = model.extract_unit(data_item).unit
# Reconstruct audio
reconstructed = model.synth(data_item, local_save=False)['audio']['array']Codec-SUPERB supports efficient batch operations, typically providing 3-5x performance improvement on GPU.
# Prepare multiple samples
data_list = [
{'audio': {'array': wave1, 'sampling_rate': 16000}},
{'audio': {'array': wave2, 'sampling_rate': 16000}}
]
# Option 1: Batch extraction and decoding (Recommended)
batch_extracted = model.batch_extract_unit(data_list)
batch_decoded = model.batch_decode_unit(batch_extracted)
# Option 2: Complete batch synthesis pipeline
results = model.batch_synth(data_list, local_save=False)Tip
Grouping samples by similar lengths can further optimize batch processing efficiency.
The website leaderboard is generated from reruns on voidful/codec-superb-tiny, a 6,000-row Hugging Face subset with Speech, Audio, and Music splits. The dataset download is about 3.2 GB, so prefer targeted model reruns and cleanup flags when working on shared disks.
Run targeted codecs directly on the tiny dataset without saving reconstructed audio files:
PYTHONPATH=. python3 scripts/benchmarking.py \
--dataset voidful/codec-superb-tiny \
--models llmcodec bigcodec_1k auv \
--max_workers 1 \
--output_dir results/codec-superb-tiny \
--no_save_audio \
--cleanup_cache--no_save_audio prevents large reconstructed_* folders, --cleanup_cache removes the temporary cache_original directory after evaluation, and --output_dir keeps result JSON files under results/codec-superb-tiny/. Omit --cleanup_cache only when you intentionally want to reuse the cached original WAVs for another immediate run.
After benchmark JSON files are generated, rebuild the React data file:
python3 scripts/update_leaderboard.pyThe updater scans results/codec-superb-tiny/*codec-superb-tiny*evaluation_results*.json, ignores temporary cache directories, excludes llmcodec_abl_* variants from the published leaderboard, and writes web/src/results/data.js.
The committed tiny result artifacts live in results/codec-superb-tiny/. The generated website shows per-metric direction labels, BPS/TPS grouped visual analysis, and the full sortable table.
For larger sweeps where repeated metric runs are needed, you can still create a synthesized dataset first. This uses more disk space because each codec split is materialized.
PYTHONPATH=. python3 scripts/dataset_creator.py \
--dataset voidful/codec-superb-tiny \
--specific_codecs llmcodec bigcodec_1k auv
PYTHONPATH=. python3 scripts/benchmarking.py \
--dataset datasets/voidful/codec-superb-tiny_synth \
--models llmcodec bigcodec_1k auv \
--output_dir results/codec-superb-tiny \
--no_save_audio- Locate the generated JSON file under
results/codec-superb-tiny/, such asvoidful_codec-superb-tiny_evaluation_results*.jsonfor direct tiny reruns ordatasets_voidful_codec-superb-tiny_synth_evaluation_results*.jsonfor pre-synthesized reruns. - Open a New Issue in this repository titled
New Benchmark Result: [Codec Name]. - Attach the JSON file or paste its content.
Certain codecs (e.g., s3tokenizer) focus on tokenization and do not support reconstruction. Codec-SUPERB handles these automatically:
- Benchmarking: Automatically skipped during reconstruction evaluation.
- API: Raises
NotImplementedErrorifdecode_unitis called, with clear messaging.
# Run all tests
python -m pytest SoundCodec/test/
# Verify all codecs (Initialization & Synthesis)
PYTHONPATH=. python3 scripts/check_all_codecs.pyIf you use Codec-SUPERB in your research, please cite: ACL:
@inproceedings{wu-etal-2024-codec,
title = "Codec-{SUPERB}: An In-Depth Analysis of Sound Codec Models",
author = "Wu, Haibin and Chung, Ho-Lam and Lin, Yi-Cheng and Wu, Yuan-Kuei and Chen, Xuanjun and Pai, Yu-Chi and Wang, Hsiu-Hsuan and Chang, Kai-Wei and Liu, Alexander and Lee, Hung-yi",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
year = "2024",
url = "https://aclanthology.org/2024.findings-acl.616",
doi = "10.18653/v1/2024.findings-acl.616",
pages = "10330--10348",
}SLT:
@article{wu2024codec,
title={Codec-superb: An in-depth analysis of sound codec models},
author={Wu, Haibin and Chung, Ho-Lam and Lin, Yi-Cheng and Wu, Yuan-Kuei and Chen, Xuanjun and Pai, Yu-Chi and Wang, Hsiu-Hsuan and Chang, Kai-Wei and Liu, Alexander H and Lee, Hung-yi},
journal={arXiv preprint arXiv:2402.13071},
year={2024}
}Arxiv:
@article{wu2024towards,
title={Towards audio language modeling-an overview},
author={Wu, Haibin and Chen, Xuanjun and Lin, Yi-Cheng and Chang, Kai-wei and Chung, Ho-Lam and Liu, Alexander H and Lee, Hung-yi},
journal={arXiv preprint arXiv:2402.13236},
year={2024}
}Contributions are highly encouraged! See CONTRIBUTING.md for details. This project is licensed under the MIT License.
