Corpus of Singapore English Messages (CoSEM)

wdwgonzales
Feb 24, 2023
1 min read

The Corpus of Singapore English Messages (CoSEM) is a monitor corpus of online text messages collected between 2016 and 2022, compiled and managed by a group of scholars who share an interest in Colloquial Singapore English (CSE) research. It is available via GitHub:

https://github.com/wdwgonzales/CoSEM

Collection methodology

Please check out our overview paper.

Overview paper

We have published a paper that explains the motivations behind developing a new corpus for the investigation of CSE in 2021. It documents the process of compiling and organizing CoSEM and describes the corpus’s initial structure and composition. We further discuss the social variables used in tagging the data, as well as ethical challenges, advantages, and disadvantages unique to online message datasets. In addition, we present preliminary analyses of two selected CSE features: (1) the Hokkien-derived expression (bo)jio and (2) sentence-final adverbs (already, also, only). We concluded the article with notes on future directions.

The paper can be found here.

To Cite

Please cite the overview paper if you use our corpus or mention it in your work.

Gonzales, Wilkinson Daniel Wong, Mie Hiramoto, Jakob Leimgruber, Jun Jie Lim. 2021. The Corpus of Singapore English Messages (CoSEM). World Englishes, Wiley. https://doi.org/10.1111/weng.12534

/wdwg

Corpus of Singapore English Messages (CoSEM)

Collection methodology

Overview paper

To Cite

Recent Posts

Comments