Center for Robust Speech Systems



This corpus is designed to assist English dialect research. It include three dialects: Australian English, American English, United Kingdom English. For convenience, they are noted as: AU, UK, and US, respectively. It includes both audio and text. The point of providing this corpus is to assist acoustic and language modeling for accent research. The data is collected from public sources on the internet and is intended only for the purposes of academic research, and not to be used for commercial use/applications.


  • John H.L. Hansen, Gang Liu, “Unsupervised accent classification for deep data fusing of acoustic and language information”, Speech Communication, Volume 78, April 2016, pages 19-33.
  • Rahul Chitturi, John H.L. Hansen, "Dialect Classification for Online Podcasts Fusing Acoustic and Language Based Structural and Semantic Information," ACL-08: HCT:Association for Computational Linguistics (ACL): Human Communication Technologies Conf. , pp. 21-24, Columbus, Ohio, June 15-20, 2008