top of page
  • wdwgonzales

Twitter Corpus of Philippine Englishes (TCOPE)

Updated: Sep 16, 2023

The Twitter Corpus of Philippine Englishes (TCOPE) is a 135-million-word corpus created from roughly 27 million public tweets sampled from 29 major cities in the Philippines. It is now available for download via OSF:


  • Contains the following metadata (as reflected in tags)

    • geographical region

    • twitter user id

    • month of tweet

    • year of tweet

    • unique corpus line id

  • Has different formats

    • hierarchical text format (txt): primed for concordance software including AntConc, CasualConc

    • spreadsheet format (csv): primed for analysis using popular data analysis tools such as R and Python

  • Is tagged for part-of-speech using spaCy

  • Contains dependency parsing information derived from spaCy

Overview paper

I have an overview paper in English World-Wide, published in 2023. In that paper, I first discuss the considerations that went into TCOPE’s design, the compilation procedure, the format, and access. Then, I demonstrate how it can be used to examine the linguistic features of Philippine English (PhilE) as well as the relationship between these features and other language-internal and language-external factors (e.g., ethno-geographic region, time, age, sex) insightfully. The paper focuses on four documented PhilE features: (1) the use of irregular past tense morpheme -t, (2) double comparatives, (3) subjunctive were in subordinate counterfactual clauses, and (4) the phrasal verb base from. A distributional analysis of these features without considering other factors generally indicated similar patterning as previous work. A deeper analysis of the data using Bayesian multivariate regression revealed structured heterogeneity within PhilE, pointing to the multifaceted and dynamic nature of the variety. Because of its large size, sampling distribution, and its availability in different formats, TCOPE can be used to investigate ‘general’ contemporary PhilE as well as different types of variation within this PhilE. It can broaden horizons in the diachronic and sociolinguistic study of Philippine English(es).

You can find the article here.

To Cite

Please cite the overview paper if you use my corpus or mention it in your work.

Gonzales, Wilkinson Daniel Wong. In press. Broadening horizons in the diachronic and sociolinguistic study of Philippine English with the Twitter Corpus of Philippine Englishes (TCOPE). English World-Wide, John Benjamins.

197 views0 comments

Recent Posts

See All


bottom of page