Full bibliography
Oman-Speech: A Multi-Layer Annotated Speech Corpus for Omani Arabic Dialects
Resource type
Authors/contributors
- Khadhuri, Rayyan S Al (Author)
- Mahrouqi, Firas Al (Author)
- Mandhari, Salim Al (Author)
- Kathiri, Amir al- (Author)
- Alshahri, Omar Said (Author)
- Alsaqr, Ghassab Mansoor (Author)
- Mudhsh, Badri Abdulhakim (Author)
- Fatnassi, Tarek (Author)
Title
Oman-Speech: A Multi-Layer Annotated Speech Corpus for Omani Arabic Dialects
Abstract
Automatic Speech Recognition (ASR) has achieved strong performance in high-resource languages; however, Dialectal Arabic remains significantly under-resourced. This gap is particularly evident in Oman, where Arabic exhibits substantial sociolinguistic variation shaped by settlement patterns between sedentary (Hadari) and nomadic (Badu) communities, which are often overlooked by urbancentric or generalized Gulf Arabic datasets. We introduce OMAN-SPEECH, a sociolinguistically stratified spoken corpus for Omani Arabic comprising approximately 40 hours of spontaneous and semi-spontaneous speech from 32 speakers across 11 Wilayats (provinces). The corpus is balanced to capture regional and lifestyle variation and is annotated at the sentence level with Arabic transcription, English translation, and phonetic transcription using the International Phonetic Alphabet (IPA) through a human-in-the-loop annotation pipeline. OMAN-SPEECH provides a foundational resource for evaluating ASR and related speech technologies on Omani and Gulf Arabic varieties and supports more granular modeling of regional dialectal variation.
Proceedings Title
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script (AbjadNLP 2026)
Conference Name
2nd Workshop on NLP for Languages Using Arabic Script (AbjadNLP 2026)
Publisher
Association for Computational Linguistics
Place
Kerrville
Date
2026
Pages
229-235
Citation Key
khadhuriOmanSpeechMultiLayerAnnotated2026
Language
eng
Citation
Khadhuri, R. S. A., Mahrouqi, F. A., Mandhari, S. A., Kathiri, A. al-, Alshahri, O. S., Alsaqr, G. M., Mudhsh, B. A., & Fatnassi, T. (2026). Oman-Speech: A Multi-Layer Annotated Speech Corpus for Omani Arabic Dialects. Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script (AbjadNLP 2026), 229–235.
Topic
Document
Link to this record