Sociolinguistic Corpus of Englishes in Hong Kong (SCOEHK)

A corpus that consists of a sub-databank of online communication data and a sub-databank of transcribed verbal communication data. The first will be compiled from public Tweets using an existing program I created and anonymized WhatsApp messages collected manually with assistants. The second will be derived from the transcribed discussion and narrative recordings of 51 English-using residents of HK, stratified by region and ethnicity. Both sub-corpora will be tagged with social metadata using manual and computational methods.