Skip to content

voidful/Codec-SUPERB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

270 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Codec-SUPERB: Sound Codec Speech Processing Universal Performance Benchmark

Overview

Project Page Paper License

Official project repository for the Codec-SUPERB benchmark and codec-superb-tiny regression results.


πŸ“– Introduction

Codec-SUPERB evaluates neural sound codec models through downstream speech, audio, and music tasks. The benchmark focuses on whether codec tokenization and reconstruction preserve task-relevant information, and it provides a unified protocol for comparing codec families under consistent evaluation conditions.


✨ Key Features

  • πŸš€ Out-of-the-Box Interface: Intuitive API for easy integration and rapid experimentation with diverse codec models.
  • πŸ“Š Multi-Perspective Leaderboard: Comprehensive assessment across various speech processing dimensions with rankings for competitive transparency.
  • πŸ—οΈ Standardized Environment: Ensures fair and consistent comparisons by using uniform testing conditions for all models.
  • πŸ“š Unified Datasets: Curated collection of datasets testing a wide range of real-world speech processing scenarios.
  • ⚑ Batch Processing Support: Highly optimized batch encoding/decoding for significant performance speedups.

πŸ› οΈ Installation

# Clone the repository
git clone https://github.com/voidful/Codec-SUPERB.git
cd Codec-SUPERB

# Install dependencies
pip install -r requirements.txt

πŸš€ Quick Start

List and Load Codecs

from SoundCodec import codec

# List all available codecs
print(codec.list_codec())

# Load a specific codec
model = codec.load_codec('encodec_24k_6bps')

Single Audio Processing

import torchaudio

# Load audio
waveform, sample_rate = torchaudio.load('sample_audio.wav')
data_item = {'audio': {'array': waveform.numpy()[-1], 'sampling_rate': sample_rate}}

# Extract discrete units
sound_unit = model.extract_unit(data_item).unit

# Reconstruct audio
reconstructed = model.synth(data_item, local_save=False)['audio']['array']

⚑ Advanced Usage: Batch Processing

Codec-SUPERB supports efficient batch operations, typically providing 3-5x performance improvement on GPU.

# Prepare multiple samples
data_list = [
    {'audio': {'array': wave1, 'sampling_rate': 16000}},
    {'audio': {'array': wave2, 'sampling_rate': 16000}}
]

# Option 1: Batch extraction and decoding (Recommended)
batch_extracted = model.batch_extract_unit(data_list)
batch_decoded = model.batch_decode_unit(batch_extracted)

# Option 2: Complete batch synthesis pipeline
results = model.batch_synth(data_list, local_save=False)

Tip

Grouping samples by similar lengths can further optimize batch processing efficiency.


🎯 Benchmarking & Leaderboard

The website leaderboard is generated from reruns on voidful/codec-superb-tiny, a 6,000-row Hugging Face subset with Speech, Audio, and Music splits. The dataset download is about 3.2 GB, so prefer targeted model reruns and cleanup flags when working on shared disks.

Tiny Rerun Command

Run targeted codecs directly on the tiny dataset without saving reconstructed audio files:

PYTHONPATH=. python3 scripts/benchmarking.py \
    --dataset voidful/codec-superb-tiny \
    --models llmcodec bigcodec_1k auv \
    --max_workers 1 \
    --output_dir results/codec-superb-tiny \
    --no_save_audio \
    --cleanup_cache

--no_save_audio prevents large reconstructed_* folders, --cleanup_cache removes the temporary cache_original directory after evaluation, and --output_dir keeps result JSON files under results/codec-superb-tiny/. Omit --cleanup_cache only when you intentionally want to reuse the cached original WAVs for another immediate run.

Update the Web Leaderboard

After benchmark JSON files are generated, rebuild the React data file:

python3 scripts/update_leaderboard.py

The updater scans results/codec-superb-tiny/*codec-superb-tiny*evaluation_results*.json, ignores temporary cache directories, excludes llmcodec_abl_* variants from the published leaderboard, and writes web/src/results/data.js.

Current Tiny Rerun Results

The committed tiny result artifacts live in results/codec-superb-tiny/. The generated website shows per-metric direction labels, BPS/TPS grouped visual analysis, and the full sortable table.

Optional Pre-Synthesized Workflow

For larger sweeps where repeated metric runs are needed, you can still create a synthesized dataset first. This uses more disk space because each codec split is materialized.

PYTHONPATH=. python3 scripts/dataset_creator.py \
    --dataset voidful/codec-superb-tiny \
    --specific_codecs llmcodec bigcodec_1k auv

PYTHONPATH=. python3 scripts/benchmarking.py \
    --dataset datasets/voidful/codec-superb-tiny_synth \
    --models llmcodec bigcodec_1k auv \
    --output_dir results/codec-superb-tiny \
    --no_save_audio

Submit Results

  1. Locate the generated JSON file under results/codec-superb-tiny/, such as voidful_codec-superb-tiny_evaluation_results*.json for direct tiny reruns or datasets_voidful_codec-superb-tiny_synth_evaluation_results*.json for pre-synthesized reruns.
  2. Open a New Issue in this repository titled New Benchmark Result: [Codec Name].
  3. Attach the JSON file or paste its content.

πŸ›‘οΈ Encode-Only Codec Support

Certain codecs (e.g., s3tokenizer) focus on tokenization and do not support reconstruction. Codec-SUPERB handles these automatically:

  • Benchmarking: Automatically skipped during reconstruction evaluation.
  • API: Raises NotImplementedError if decode_unit is called, with clear messaging.

πŸ§ͺ Testing

# Run all tests
python -m pytest SoundCodec/test/

# Verify all codecs (Initialization & Synthesis)
PYTHONPATH=. python3 scripts/check_all_codecs.py

πŸ“ Citation

If you use Codec-SUPERB in your research, please cite: ACL:

@inproceedings{wu-etal-2024-codec,
    title = "Codec-{SUPERB}: An In-Depth Analysis of Sound Codec Models",
    author = "Wu, Haibin and Chung, Ho-Lam and Lin, Yi-Cheng and Wu, Yuan-Kuei and Chen, Xuanjun and Pai, Yu-Chi and Wang, Hsiu-Hsuan and Chang, Kai-Wei and Liu, Alexander and Lee, Hung-yi",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
    year = "2024",
    url = "https://aclanthology.org/2024.findings-acl.616",
    doi = "10.18653/v1/2024.findings-acl.616",
    pages = "10330--10348",
}

SLT:

@article{wu2024codec,
  title={Codec-superb: An in-depth analysis of sound codec models},
  author={Wu, Haibin and Chung, Ho-Lam and Lin, Yi-Cheng and Wu, Yuan-Kuei and Chen, Xuanjun and Pai, Yu-Chi and Wang, Hsiu-Hsuan and Chang, Kai-Wei and Liu, Alexander H and Lee, Hung-yi},
  journal={arXiv preprint arXiv:2402.13071},
  year={2024}
}

Arxiv:

@article{wu2024towards,
  title={Towards audio language modeling-an overview},
  author={Wu, Haibin and Chen, Xuanjun and Lin, Yi-Cheng and Chang, Kai-wei and Chung, Ho-Lam and Liu, Alexander H and Lee, Hung-yi},
  journal={arXiv preprint arXiv:2402.13236},
  year={2024}
}

🀝 Contribution & License

Contributions are highly encouraged! See CONTRIBUTING.md for details. This project is licensed under the MIT License.


Developed with ❀️ by the Codec-SUPERB Team

About

Audio Codec Speech processing Universal PERformance Benchmark

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors