This challenge evaluates self-supervised speech representation learning on real-world heterogeneous audio at scale. Participants train models without transcripts using the Unsupervised People's Speech (UPS) dataset.
Performance will be assessed via phonetic discrimination, low-resource ASR probes, multilingual language ID, and speaker clustering.
Significance
This challenge advances speech research by moving beyond curated corpora, emphasizing multilingual and accent diversity with realistic environments that include noise and music. By evaluating self-supervised models on spontaneous and heterogeneous audio, the challenge promotes scalable, inclusive representation learning aligned with the Interspeech 2026 theme "Speaking Together", encouraging models that generalize across global voices and real-world scenarios.