Indic Scripts and the Internet

  • RAW

Dibyajyoti Ghosh

30 June 2015

This post by Dibyajyoti Ghosh is part of the Studying Internets in India series. Dibyajyoti is a PhD student in the Department of English, Jadavpur University. He has four years of full-time work experience in projects which dealt with digital humanities and specially with digitisation of material in Indic scripts. In this essay, Dibyajyoti explores the effects the English language has on the Internet population of India.

Internet Usage Statistics in India

According to the latest statistics [1], while the rural mobile tele-density in India is 47.78%, the urban tele-density for mobile phones is 143.08% (which means, more than one registered SIM card per person and this phenomenon is thus also reflected in the rural figures). On the other hand, roughly only 6.5% of the population has access to ‘broadband’ Internet (>= 512 kbps) through a phone or a dongle and only 1.23% has access to a wired-broadband connection. However, roughly 20% of India’s population is roughly connected to the Internet [2]. Thus, roughly 12% of the population has access to low-speed Internet. What these figures do not reveal is the quantum of consumption of data. It can be safely assumed given the comparatively high costs of mobile Internet usage and the difficult method of feeding large tracts of data through a mobile phone, that the quantum of consumption is significantly higher in the case of computer Internet users as opposed to mobile users. Though as these statistics reveal, the chances of India being connected to the Internet depends largely on mobile phones, rather than desktop/ laptop computers.

Thus, the status of the Internet in India is still that of a niche medium. Other than the cost-factor of having access to a device which can access the Internet and paying for the Internet data package, some other factors also hinder the growth of the Internet in India. One of them is the issue of language. Whereas the 1990s saw an over-domination of English on the Internet given the linguistic communities which were developing the world of computers and the world of the Internet [3], by 2015, some of the disparity with offline linguistic patterns has been reduced [4]. However, for Indic scripts, much less development has taken place. If one is studying the Internet in India, chances are one is studying it in English.

Languages the Indian Internet User Encounters Both Online and Offline

What does this hold for the future of these Indic scripts? Given the multi-lingual skills of Indian school-goers and the increasing amount of daily reading time of those connected to the Internet (which is somewhere between 12% and 20% of the population) being devoted to reading on the Internet, chances are reading is increasingly in English.

The importance of English-language skills in India, as indeed in the rest of the world, in 2015 is undeniable [5]. English is also a signifier of class in India. However, despite the three-language policy adopted by schools, schools which offer courses primarily in other Indian languages suffer from an inherent disadvantage that students face when these students enter colleges and universities where the medium of teaching is usually English, and later on take up jobs which require official reports to be written in English. Thus, Indian languages other than English offer much less incentive for parents and students to encourage their study. Whereas oral conversation among the Indian population is largely conducted in languages other than English, written conversation is increasingly being conducted in English. Language is not only a political issue but also a subject of social study, not to mention the issues of linguistics. The larger socio-political issues of language are perhaps too vast to be discussed in connection to Indic scripts and the Internet. Thus, apart from this basic point about the bias towards English, I am not delving into it further.

Indic Script Software and Data Entry

Let me start with discussing natively-digital material. In the digital domain, entering text in Indic scripts is a difficult task. Indic scripts are primarily abugida scripts, which are writing systems ‘in which consonant–vowel sequences are written as a unit: each unit is based on a consonant letter, and vowel notation is secondary’ [6]. This contrasts with the Latin script used to write English, in which vowels have status equal to consonants, and with abjad scripts such as the script used to write Arabic, in which vowel marking is absent or optional. Similar difficulty is also encountered in entering texts in other non-Latin scripts such as Chinese. Mandarin Chinese may be the world’s most-spoken language and China may be one of the software and hardware giants, but supposedly even Chinese is not particularly amenable to the Internet [7]. Entering Indic scripts on a computer is difficult because it usually involves the addition of new software or tweaking existing software which is slightly difficult for the novice/ casual user.

ISIS, developed by Gautam Sengupta of the University of Hyderabad and sponsored by the Government of India, is an early example of Indic script input software [8]. It is available online for free. It is not fully phonetic. iLEAP, developed by the Centre for Development of Advanced Computing (CDAC), a Government of India funded agency, is now no longer extant but CDAC have produced other input software thereafter. Google too offers an Indic language input tool now [9]. For languages such as Bengali, there have been software such as Bijoy, which was made by Mustafa Jabbar of Ananda Computers, Dhaka, Bangladesh, and is sold commercially [10] and the free softwares BanglaWord and Avro. Avro was created in 2003 by Mehdi Hasan Khan of Mymensingh, Bangladesh, and subsequently developed by a team at Omicron Lab, Dhaka [11]. Such software exists for other individual Indic languages. Operating systems such as Windows [12] and Ubuntu [13] offer Indic script input as well, and make use of the InScript keyboard [14] too.

When it comes to mobile phones, prior to the introduction of touchscreen smartphones, text messaging had little option to use the Indic script. With the introduction of multiple keyboards in touchscreen smartphones, there are a few options to use the Indic script. Both Android and iOS offer Indic script keyboards. Yet these are even less easy to use than computer keyboards as one needs to toggle between several sets of keyboards to access all the characters required for Indic script input. Google has recently started handwriting input which supports Indic scripts [15]. It remains to be seen how much the feature is used.

In spite of this availability of input tools in recent years, the most common method of entering Indic language is through transliteration. Just like Pinyin for the Chinese script, Indic scripts too have official transliteration standards. The Indian National Bibliography (Kolkata: Central Reference Library, 2004) maintains one such standard. However, such transliteration mechanisms require diacritical marks, which are again difficult to enter. Thus, more often than not, these transliteration standards are not followed except when one is maintaining strict academic standards.

The point that I am trying to make is that despite the availability of tools for entering Indic scripts and even well-defined standards for transliterating Indic words in the Latin script, neither is universally followed. The reason is it involves extra labour, as opposed to simple transliteration without any standards. Thus, what often one ends up with in casual written communication (which outnumbers formal written communication by a wide margin) in the digital domain, be it in the form of SMSes, messages in Whatsapp or other instant messaging applications or emails, is Indic words in non-standard transliteration into the Latin alphabet. The introduction of SMS lingo and standards two decades back had already prepared the way for the wider acceptance of Indic words in non-standard transliteration into the Latin alphabet. When one comes to a semi-casual/ semi-formal medium, such as blogs and social networks, where the receiver of the message is usually more than one, the forms of expression are slightly different.

Mimetic Desires on Public Platforms

Digital, crowd-sourced public platforms, such as blogs and largely social networks, offer a different kind of discourse. On the one hand, private habits spill into the public realm. Thus, Indic words in non-standard transliteration into the Latin alphabet are a common practice. On the other hand, the public nature of such platforms offers a space for a kind of mimetic desire. Despite the availability of the user-interface of the most commonly accessed sites such as Gmail and Facebook in Indic languages, most prefer to retain their user-interface in the default English mode [16]. It is a different issue that enabling browsers to render Indic scripts correctly is often a difficult task and sometimes despite following every instruction in the manual, the problem remains unsolved. The overall English language and English script overdose on social networks such as Facebook generate a kind of desire to mingle in with the crowd. Thus, instead of typing Indic words in non-standard transliteration into the Latin alphabet, the data entered is actually more often than not in English. Often, other than formal job reports and letters, social networks are the only platform that a lot of Indians get where they can produce verbal communication in English. Thus, in addition to a mimetic desire to fit in with the English-writing crowd, social networks also offer a semi-public platform to write one’s thoughts in English, a platform which for a lot of Indians was perhaps last available to them when they had to write essays for their compulsory English-language paper in high school. Both of these desires further hamper the incentive to write on the Internet using Indic scripts.

Blogs occupy a space somewhere in between formal websites and casual for-the-nonce social network posts. Both the structure of blogs (more structured than a social network but less structured than a website) and the status of blogs lie somewhere in between these two major platforms. Also, with the rise of social networks, the rate of growth of blogs has decreased. Thus, blogs are usually less popular than both websites as well as social networks. On blogs, the content is usually more formal, as is the presentation. Also, the mimetic desire generated by a social network is perhaps less heightened in the case of blogs. Blogs present a more one-to-many approach as opposed to a social network which largely presents a many-to-many structure.

Spelling Skills in Indic Languages

At the other hand of the social class in India, is the class which went to an English-medium school and writes predominantly in English. Oral communication is often carried out in other Indian languages but these languages are not often used for written communication. Even when casual written communication in the digital domain, such as SMSes and other instant messaging applications or emails, is carried out using Indic words, it is in non-standard transliteration into the Latin alphabet. For this class, the problem is the lack of exposure to reading and writing in Indian languages other than English. Thus, even this minimal writing in transliteration mode may further weaken their spelling skills in these Indian languages.

There are of course other categories into which one can group Internet users in India. The equally strong multi-lingual Indian, the equally weak multi-lingual Indian and the Indian strong in one language are three such categories. Irrespective of which class the Indian Internet user belongs to, the Internet user’s exposure to material written in Indic scripts on the Internet is low. So far I have discussed natively-digital resources.

Digitisation of Pre-Digital Resources

Let me now turn to digitsation of pre-digital resources. Digitisation of such resources is a task involving a lot of money and labour. There are several organisations in India which are involved in such tasks. The Centre for Development of Advanced Computing (CDAC) is one such organisation. The School of Cultural Texts and Records at Jadavpur University, Kolkata is another such organisation [17]. The Centre for Studies in Social Sciences, Kolkata too is actively involved in digitisation of such material [18]. The West Bengal Public Library repository on Dspace [19] and the Digital Library of India [20] are also significant repositories, as is the portal of the National Archives of India, titled Abhilekh Patal [21]. There are some digital archives focussed on the output of a specific person, such as the MK Gandhi portal [22]. There have been a few instances of making public searchable text files from such digitised material, such as those by the Society for Natural Language Technology Research [23] and Bichitra: Online Tagore Variorum [24]. Other digitisation programmes are in progress, such as the long-running National Mission for Manuscripts [25]. Yet, in spite of this, such efforts are miniscule compared to databases, albeit commercial and not open-access, such as Early English Books Online or Eighteenth Century Collections Online. The Internet, while it offers the opportunity for an equitable digitisation of pre-digital resources in English as well as Indic scripts, does not contain as many resources in Indic scripts as it does in the Latin script. The reasons are because whereas Indic script resources are primarily digitised by Indian organisations where the money needed for such tasks is not available in great amounts, resources in English are digitised from a number of economies with a high per capita GDP. Given the more basic needs of enhancing the reach and level of primary, secondary and tertiary education in the country, an economy with a low per capita GDP such as India does not have the financial means to digitise vast quantities of pre-digital resources, be they in the Latin script or in Indic scripts.

When it comes to electronic books in Indic scripts, the refusal of major platforms such as Amazon Kindle Direct Publishing to list books in Indic scripts [26] is a major barrier for individuals to create e-books in Indic scripts. Whereas most major newspapers in Indic scripts have online editions, the case is not so for major book publishers. Unlike a newspaper which primarily relies on advertising for its revenue, book publishers depend on book sales. There is no infrastructure in place for selling electronic books in Indic scripts. The publishers perhaps also feel that the market for consumption of e-books in such languages is not of a significant scale, and thus do not feel incentivised enough to encourage the creation of e-books. Thus, the entire Kindle reading population in India (which is not very large in the first place [27]) is deprived of the chance of buying e-books in Indic scripts. If they read e-books in Indic scripts on Kindles and tablets, then such e-books are usually pirated scanned copies. There are some sites which make available pirated scanned copies of books printed in Indic scripts. However, such sites and the number of such books is so small, that they make no major dent to the revenues of the Indic-languages publishing industry.

Effects on the Indian Internet User

As a result, casual Indian Internet researchers and readers often depend on material written in English instead of material written in Indic languages. It is true that the serious researcher will of course make the effort of visiting physical libraries and archives to access books in Indic languages. But for casual reading and research, it is too much of a trouble. For such Internet users, not only are undigitised Indic verbal texts invisible, but the lack of engagement with such texts lead to the effacement of such texts from the public discourse and domain.

For Indian school students studying in schools where the medium of instruction is not English, the absence of such texts from the Internet means that they engage less with the Internet for academic purposes. For them, the Internet becomes more of a resource meant for non-academic purposes if they have trouble reading texts in English. It is true that English is one of the three languages that school students learn yet as the state of education goes, it is not fully satisfactory [28]. On the other hand, given the English language and English script overdose on social networks, the mimetic desire forces students to generate texts in English, not only the script but also the language. As a result, what is generated is often English of a less than satisfactory standard. A political strain of thought treats the language that people generate as ‘the language’. Measuring such an output against other standards of English is considered politically incorrect. In fact, the regional acceptance of such local sub-groups of English has led to the wider acceptance of English and its growing presence across the world. Yet, as the notion of class in India based on the command over English shows, such sub-grouping also leads to the creation of separate classes.

Effects of the Internet on Indic Scripts

Does the Internet alleviate or exacerbate the problem caused by the hierarchy of English over other Indic languages? I guess that the answer is not a simple nod in either direction. On the other hand, I conclude that the Internet increases the mimetic desire to generate written communication in English. Failure to communicate in English according to certain standards of English further exacerbates the creation of the classes based on the command over English. While the Internet, to a certain extent, helps in improving English spelling skills owing to a greater exposure to English, at the same time, it leads to a greater deterioration in spelling skills in Indic languages. Owing to the lack of availability of pre-digital resources in Indic scripts in the digital domain, there is a slow effacement of such resources from the public discourse at large.

Possible Measures to Enhance the Status of Indic Scripts on the Internet

These are some of the effects that the Internet in India has had on Indic scripts. Given that the Internet is a niche medium and those shaping the general discourse are more likely to have access to the Internet in the first place, the low visibility of Indic scripts on the Internet is a cause for concern. However, it is true that with the growing accessibility of the Internet in India, the resources in Indic scripts are bound to increase. It is perhaps dependent primarily on those in power, such as the central and state governments to ensure that their websites and mobile phone applications are in Indic scripts as well and the Indic script versions of their digital resources do not lack any feature of the English-language version of such resources. The private sector, especially the publishing industry also needs to create a market for electronic publication in Indic scripts. Just like e-commerce in India did not come after the entire infrastructure was in place, but rather the infrastructure kept building up as e-commerce kept growing, similarly the publishing industry also needs to create a digital Indic-script market, and then keep building it up. E-commerce, which perhaps has the greatest incentive to build resources, can also significantly alter the scenario by offering e-commerce in Indic scripts. Snapdeal has very limited components of their website in two Indic scripts. Other major e-commerce companies have not followed suit and neither is Snapdeal’s inclusion particularly effective. Yet, as the Flipkart-owned apparel company Myntra’s recent decision to go app-only and completely do away with their website has shown, e-commerce has its ways of incentivising customers to change their habits in a drastic manner. It is with such hope that I would have liked to end this brief essay on studying the Internet in India. Yet, as the language of this essay shows, such hopes are not particularly strong, as most scholarly writing in India on the Internet continues to be in English. Scholarly journals and research platforms in Indic scripts on the Internet continue to be so limited in number that it is hard to find particularly high-impact publications from among them. If one is studying the Internet in India, chances are one is both studying and writing in English.

Endnotes

[1] http://www.trai.gov.in/WriteReadData/PressRealease/Document/PR-34-TSD-Mar-12052015.pdf

[2] http://www.internetlivestats.com/internet-users-by-country

[3] Daniel Pimienta, Daniel Prado and Álvaro Blanco, Twelve years of measuring linguistic diversity in the Internet: balance and perspectives, UNESCO publications for the World Summit on the Information Society (2009), http://unesdoc.unesco.org/images/0018/001870/187016e.pdf

[4] http://en.wikipedia.org/wiki/Languages_used_on_the_Internet and http://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers

[5] https://hbr.org/2012/05/global-business-speaks-english

[6] http://en.wikipedia.org/wiki/Abugida

[7] http://www.newrepublic.com/article/117608/chinese-number-websites-secret-meaning-urls

[8] http://isis.keymankeyboards.com/

[9] http://www.google.com/inputtools/

[10] http://www.bijoyekushe.net/

[11] https://www.omicronlab.com/

[12] http://www.bhashaindia.com/ilit/

[13] https://help.ubuntu.com/community/ibus

[14] https://en.wikipedia.org/wiki/InScript_keyboard

[15] http://googleresearch.blogspot.in/2015/04/google-handwriting-input-in-82.html

[16] There is no open-access data for this from either Google or Facebook. Third-parties conduct such studies. A study can be found here: http://www.oneskyapp.com/blog/top-10-languages-with-most-users-on-facebook/

[17] http://www.jaduniv.edu.in/view_department.php?deptid=135

[18] http://www.savifa.uni-hd.de/thematicportals/urban_history.html

[19] http://dspace.wbpublibnet.gov.in:8080/jspui/

[20] http://www.dli.ernet.in/

[21] http://www.abhilekh-patal.in/

[22] https://www.gandhiheritageportal.org/

[23] http://www.nltr.org/

[24] http://bichitra.jdvu.ac.in/index.php

[25] http://www.namami.org/index.htm

[26] https://kdp.amazon.com/help?topicId=A9FDO0A3V0119

[27] http://timesofindia.indiatimes.com/tech/tech-news/Ebook-readers-fail-to-kindle-sales-in-India/articleshow/45802786.cms

[28] Annual Status of Education Report (Rural) 2014, facilitated by Pratham, pp. 81-82, 86, 88-89, http://img.asercentre.org/docs/Publications/ASER%20Reports/ASER%202014/fullaser2014mainreport_1.pdf

The post is published under Creative Commons Attribution 4.0 International license, and copyright is retained by the author.

Related Events

Sorted By Date

Telecom

Judicial Trends: How Courts Applied the Proportionality Test

This is the second in a series of essays aimed at studying the different ways in which apex courts have evaluated national biometric digital ID programs of their countries.

Event

23 March 2024
Read more

Access to Knowledge

Information Disorders & their Regulation

The Indian media and digital sphere, perhaps a crude reflection of the socio-economic realities of the Indian political landscape, presents a unique and challenging setting for studying information disorders.

Event

5 MB
Read more

Digital Cultures

Security of Open Source Software

A Survey of Technical Stakeholders’ Perceptions and Actions

Event

2.5 MB
Read more

Access to Knowledge

Global Accessibility Awareness Day 2017

The Centre for Internet & Society along with Prakat Solutions and Mitra Jyothi is co-hosting the Global Accessibility Awareness Day in Bengaluru on May 18, 2017.

Event

18 May 2017
Read more