Language Identification of English and Punjabi Code-Mixing and Code-Switching Sentences

Enjula Uchoi, Mandeep kaur
» doi: 10.48047/ecb/2023.12.si6.367


People express their opinions freely on these platforms in a variety of informal languages because social media has become such an integral part of everyday life. As a result, it becomes quite challenging for traditional language detectors to recognise such languages in a multilingual nation like India. In this research, the primary goal is in order to determine the language at the word level of the code mixed sentences of English and Punjabi language. As per our knowledge very few researches has been done so far in English and Punjabi code-mixing and code-switching sentences. The suggested model combines a language dependent morphological dictionary-based model with a character n-gram language model based on frequency lexicons to accurately classify each word. With few dataset we could achieve the accuracy level of 88%.

