A Corpus of Emirati Arabic

We have created a parallel corpus of Emirati Arabic, the main dialect of Arabic and its regional variations as used within the geographical region of the United Arab Emirates. This is, to our knowledge, the first systematic approach to create a database of a spoken Arabic dialect in the Gulf region. The annotated parallel corpus contains two million words of Emirati Arabic with English translation, based on the spoken language retrieved from broadcasting agents in the region, including TV and radio stations. It is transcribed in broad IPA transcription and translated into English. At a later stage it will be annotated using established, standardized techniques so that the output can be usable in a variety of contexts. The spoken corpus will enable researchers to base their inquiries about language on objective data rather than on speaker’s introspection.

Sample Files

EAC001

EAC026

If you would like to use this corpus for research purposes please return a signed copy of the license agreement below to me at dimitrios_n[at]uaeu.ac.ae, or directly to the UAEU Research Affairs office at research.office[at]uaeu.ac.ae

License Agreement