Towards developing colloquial Indonesian language pedagogy: A corpus analysis

Halim Nataprawira, Michael Carey


This study was motivated by the situation that many students studying Indonesian language have problems to understand and communicate in spoken Indonesian. This is because Indonesian is a diglossic language in which different sets of grammar and vocabulary are used between the high and low diglossic variants, whereas students are usually only taught the high diglossic variant. Only the high diglossic variant of formal Indonesian has an official status, while the low diglossic variant of colloquial Indonesian does not. Sneddon observed that in everyday speech the linguistic features of high and low diglossic variants are merging into a middle variant that Errington called Middle Indonesian. This study examines the extent to which a middle variant of spoken Indonesian has formed by quantifying the amount of high and low linguistic elements that are present in a corpus of everyday spoken Indonesian derived from audio-recordings and written texts containing spoken language. We collected and classified a 14,000+ word corpus of spoken Indonesian. With reference to published descriptions of high (formal) and low (colloquial) diglossia, each colloquial item in the corpus was counted and calculated as a ratio to the total N of the corpus. Colloquial features were found with an average proportion of 0.39 across the corpus, indicating that colloquial Indonesian lexicon and grammar may contribute as much as 39% to everyday spoken Indonesian. This result evidences the need to include this middle variant of spoken Indonesian in the design and resourcing of materials within the Indonesian language curriculum.


Colloquial Indonesian; corpus analysis; diglossia; teaching spoken Indonesian,

Full Text:




  • There are currently no refbacks.

View My Stats

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.