Munda Languages Project

Sora_villagers250

In 2005 Living Tongues Institute began a multi-year project to comprehensively documentthe lexica and grammars of the modern Munda language family. The projected output for these is a set of talking dictionaries and multi-media online grammars. To date we have begun talking dictionaries and multi-media online grammars of four languages (Ho, Remo, Gtaʔ and Sora), and have made small sample recordings of two other languages Bhumij and Santali. None of these projects currently have dedicated funding streams, but the Ho project in 2008-2009 and the Remo project in 2010-2011 did receive small supporting grants, respectively, from the Genographic Legacy Fund and the National Science Foundation under the auspices of the Documenting Endangered Languages program. Earlier dedicated funding was received from Ironbound Films in 2007 to help support our work on Ho, Remo and Sora during the course of making the film The Linguists. All of this generous support is gratefully acknowledged.

The Munda languages are a group of Austroasiatic languages spoken across portions of central and eastern India by perhaps as many as ten million people total. The Munda peoples are generally believed to represent autochthonous populations over much of their current areas of inhabitation.

Originally, Munda-speaking peoples probably extended over a somewhat larger area before being marginalized into the relatively remote hill country and (formerly) forested areas primarily in the states of Orissa and the newly constituted Jharkhand; significant Munda-speaking groups are also to be found in Madhya Pradesh, and throughout remote areas of Chhattisgarh, West Bengal, Uttar Pradesh, Andhra Pradesh, and Maharashtra, and through migration to virtually all areas of India, especially in tea-producing regions like Assam. Of course much of the original Munda-speaking territory was subsequently settled or colonized by Indo-Aryan-speakers and Dravidian-speakers.

The pre-history of the Munda languages remains obscure. Munda languages constitute the westernmost representatives of the far-flung Austroasiatic linguistic phylum. Two other Austroasiatic groups are found in the present-day territory of India, the Khasi of Meghalaya and the Nicobarese-speaking groups of the Nicobar Islands. The other subgroups of Austroasiatic are all found outside of India, and it is generally believed that the Austroasiatic ancestral language was not to be found in India but rather further to the East. Thus, at some point the ancestors of the Munda-speaking peoples must have migrated westward into the Subcontinent. When, how, and by what path they entered India remains a subject of considerable debate. Indeed, it is not even clear that there was a single migration of pre-Munda speakers, but there may have been two or more such movements.

Concensus has not yet been reached on the internal relationships of the Munda languages, but several subgroups have been proposed and some of these appear to be sound. It is hoped that further work in comparative Munda grammar and lexicon may shed light on this issue. The northern-, eastern- and westernmost groups of Munda languages are clearly related and appear to fall into two broad groupings. The first of these is the westernmost Munda language Korku which appears to be a sister to the remainder of this subgroup, the large and complex Kherwarian dialect/language chain, the better known varieties of which are Santali, Mundari and Ho. Kherwarian also includes a number of minor varieties as well, e.g. Turi, Asuri, Birhor, Bhumij, Korwa, etc. Korku and Kherwarian together are conventionally known as North Munda. The remaining Munda languages are almost only found in the state of Orissa (some Kharia speakers are found in Jharkhand, West Bengal and Chhattisgarh as well), which appears to be the epi-center of diversity of the family. How each of these non-North Munda languages or subgroups (logically known as South Munda in contrast to North Munda) are related to each other remains a topic of considerable debate. Some languages clearly form subgroups, such as Sora with Gorum/Parenga, or Gutob with Remo/Bonda. The classification of the remaining three languages (Kharia, Juang and Gtaʔ/Didayi) remains an open question.

Among the most interesting of linguistic phenomena to be found in Munda languages may be included the highly elaborated systems of demonstratives found in many Munda languages, for example Santali or Gorum. Munda vowel and consonant systems can be quite complex, with different register and secondary articulatory features, many of which are still now in need of description. Another topic of considerable interest are the elaborate and intersecting systems of voice/valence/transitivity, person-marking and tense/aspect that characterize Kherwarian verbal systems. Further, the highly elaborated system of noun incorporation found in Sora push the limits of our understanding of such constructions from a theoretical perspective. The highly developed system of reduplication and expressive formation that characterize most Munda languages also bear mention here. Finally, the interaction of tense/aspect marking and negative operators in negative formations in South Munda Gutob stand out among the most complex of such systems known. Almost all aspects of every Munda language require more analysis before we have an adequate consensus understanding of even their basic features, in particular, syntactic issues and phonetic analysis are in desperate need of further systematic investigation. Studies on topics in the semantics and discourse of Munda languages are practically non-existent. Comprehensive comparative study has not been really possible up to this point either on the lexicon or the grammar, so a more thorough and comprehensive investigation into most historical linguistic issues in Munda also remains a goal for the future.

Most Munda languages have a base-10 or combined base-10/base-20 numeral system. Sora has a curious base-12/base-20 system. Thus, in Sora ‘twelve’ is migel and ‘thirteen’ is migelboj literally [12-1], ‘fifteen’ is migeljagi [12-3] etc. ‘Twenty’ is bokuri literally [1-20] and 32 is literally [(1-20)-12] or bokuri migel and ‘thirty-three’ is bokuri migelboj [(1-20)-12-1], ‘sixty-five’ is jakuri migeljagi [(3-20)-12-3] and so on.

Estimated number of speakers of some modern Munda languages

Santali ca. 5-7,000,000	Mundari ca 2,000,000	Ho ca. 1,000,000
Korku ca. 3-400,000	Sora ca. 300,000	Kharia ca. 200,000
Gutob < 50,000	Juang < 20,000
Remo < 8,000	Gorum < 5,000	Gtaʔ < 5,000

The verbal systems of the Munda languages represent the most complicated and diverse grammatical sub-system. The tense-aspect systems of the attested Munda languages present a historically complicated picture. As is the case with many languages from across the globe, the categories of tense and aspect are often intimately connected in the Munda languages; frequently elements are grammaticalized first in a particular aspectual meaning and then shift to more generalized tense functions. In the vast majority of the Munda languages, there is some formal contrast between transitive/active and intransitive/middle markers. This may be achieved through either separate transitive and intransitive series of tense markers as in the majority of South Munda languages or through a single tense/aspect marker augmented by a consistent marker of transitivity or intransitivity in the North Munda languages. To be sure, the history of tense/aspect markers is one of the most vexing, complex, and outstanding problems in the diachrony of the verbal systems of the Munda language family. See Anderson (2007) for more details.

Ho language origin myth as told by K. C. Naik Biruli

Opino Gomango is a Sora language activist

More on the Munda Languages Project

The output of the Munda Languages project will include the digitization of existing legacy materials, a searchable cross-language database of the Munda languages that will serve as the basis for all future linguistic research on this poorly known family of languages, as well as a searchable database of annotated audio/video materials on the languages (using ELAN as the basis of the annotations).

Note that only rough estimates are available for numbers of speakers of many of the endangered languages and smaller Munda-speaking populations. This is due largely to the fact that the Indian census does not list language/ethnic groups numbering under 10,000 persons. Also there is considerable confusion of language and ethnic group names as well.

The Munda Languages project has three basic facets: documentation and archiving of endangered language materials, digitization and annotation of legacy materials dating back 45 years and the compilation of a web-accessible database of typological features of Munda languages.

Documentation Munda Languages Project

The documentation project begins with the video and audio recording of speakers of various ages, levels of competency and dialects from the endangered languages listed above. Annotations will minimally have four (or five) tiers: one rendering the Munda language in IPA transcription, one tier of interlinearized glossing using the Leipzig glossing conventions, an English translation and a translation into Oriya and/or Hindi, whichever is appropriate (or in the case of the widespread and disparate Turi, both).

The digitization is to be carried out in conjunction with, and the archiving of the data from the project will be housed in, ELAR, at the School of Oriental and African Studies, University of London, with a mirror site housed on a server at the local host institution, Department of Tribal Languages, University of Ranchi, Jharkhand State, India. Dr. Ganesh Murmu is our local contact.

Digitization of Munda Legacy Materials

A number of legacy materials in different media need to be digitized and annotated to supplement the field data. These legacy materials are in a variety of formats, ranging from analog recordings dating back 40-45 years, unpublished text collections and lexical lists, including the massive Munda comparative lexical materials described below.

Typological Munda Database

The annotated sessions of the endangered Munda languages are being entered into a searchable, web-accessible relational database, linked to audio/video files and text-type annotations according to a number of typological features, viz. vocalic (including suprasegmental) and consonantal features, features of nominal and verbal morphosyntax inflectional and derivational categories, auxiliary structures, etc.), as well as characteristics of simplex and complex clause structure. Entries consist of values and commentary discussion, time-linked to video and audio examples whenever possible.

This typological database of Munda languages when completed will ultimately serve as a complement to the large comparative Munda lexical database already under way (see below).

Additional Information on the Munda Languages

The Munda language family of eastern and central India represents one of the most fascinating and theoretically stimulating language families on the planet. Unfortunately, very little primary data on the roughly 20-odd members of the Munda language family are widely known or even available to the world wide linguistic community. This is in part due to the fact that for some languages, the data is quite out of date and for others, the only materials that exist are unpublished, or in hard to find sources and/or in languages that are not widely known by the linguistic community at large.

Where are the Munda languages spoken and how long have they been there?

Although probably immigrants from the east (where most of their sister languages in the broad Austroasiatic phylum remain today) the Munda peoples appear to be the tribal autochthons of eastern India, their ancestors having already occupied their current domains of inhabitance at a time significantly predating the arrival of Aryan- and Dravidian- speaking populations of the region. This is codified in the standard designation applied to all Munda-speaking peoples (and strictly speaking to certain non-Munda peoples as well) adivasi ‘first’.

Currently Munda-speaking peoples are found in large concentrations in the Indian states of Orissa, Jharkhand, and Madhya Pradesh, with further communities in adjacent parts of the states of Chhatisgarh, Bihar, West Bengal, Uttar Pradesh, Andhra Pradesh, and Maharashtra, and even further a field in Bangladesh and Nepal.

How many people speak these languages?

Of the roughly two dozen or so Munda languages still spoken, at least one quarter (if not more) appear to exhibit some degree of language endangerment, ranging from moribund (Gorum) to severely endangered with a few hundred (Koda/Kora) or a few thousand speakers (Hill and Plains Gta?, Remo, Turi; also Bijori, Agariya, Bhumij, Korwa and Mahali not covered in the present proposal); for at least one endangered Munda language, Koraku, no data is available as it is conflated in census statistics with Korwa or previously Korku. The non-endangered but threatened languages have in the tens to hundreds of thousands speakers still (Juang, Kharia, Sora, Gutob, Birhor, Bhumij) while stable languages often number a million (Mundari) or several million (Santali).

Turi has maybe 4,000 Kherwarian Munda speakers scattered throughout various districts of Jharkhand, West Bengal, Chhatisgarh, Orissa and Madhya Pradesh. For certain groups, what little information there is often conflicts with other such reports, e.g. KodÚa (Kora) appears to have but 1-2% language retention among the heavily Aryanized (Bengali) or Dravidianized (Kurux) population of 31,000 according to Parkin (1991: 24), i.e. yielding under 500 total speakers), but has been reported to have as many as 7-25,000 in other sources–a number that assuredly reflects ethno-linguistic identity rather than linguistic competence per se (a stated policy of the Indian census).

What languages are Munda-speaking people speaking instead of their ancestral tongue?

While many Munda-speaking peoples also command one or more Indo-Aryan or Dravidian language fluently (e.g. Bengali, Hindi, Chhatisgarhi, Desia Oriya, Sadani/Sad[a]ri, Marathi, Kurukh, Telugu), the rates of ancestral ‘mother tongue’ preservation among the youngest generation, as well as the sociolinguistic dynamics and contexts of its use in the actual Munda-speaking communities are generally lacking, even in the most recent such sources (e.g. the LSI Orissa 2002 volume; Ishtiaq (1999); Itagi and Singh (ed.) (2002)).

Who are the Munda-speaking people?

Munda peoples practice a range of traditional indigenous religions sometimes mixed with locally appropriate quasi-Hindu practices (as well as Christianity in some areas), venerating stone megaliths built by their ancestors, maintaining sacred groves, and in places still practicing an ancient water-buffalo ritual sacrifice.

Over the past centuries, some Munda-speaking peoples have been largely discriminated against in India as meat-eating non-Hindus (and non-Muslims). In terms of traditional economy, Munda-speaking peoples mainly practice[d] nomadic hunter-gatherer foraging and/or subsistence agriculture. In recent times, an urban population has developed, notably in Ranchi, the capital of the newly constituted Munda-dominant state of Jharkhand.

What are Munda languages like?

Although poorly known, what little is known about the Munda languages seem to have great relevance to several unrelated fields of inquiry in comparative linguistics, as well as to the prehistory of the Indian Subcontinent. These include general theoretical and typological linguistic studies, South Asian areal studies, and the history of the Austroasiatic language family more widely.

In general, Munda languages appear to exhibit a typological profile that is very different from that which is typical of the Mon Khmer languages to which they are related (cf. Donegan and Stampe 1983; Donegan 1993), but these differences are not always attributable to Dravidian and/or Indo-Aryan influence. For example, the verb structure of the North Munda languages is extremely synthetic, indeed significantly more synthetic than structures typical of either Dravidian or Indo-Aryan languages. In this way, they share certain structural affinities with so-called ‘pronominalized’ Tibeto-Burman languages, with which they may have formed an earlier areal group, prior to the intrusion of Dravidian- and Indo-Aryan speaking populations. A better understanding of the nature and origin of the Munda languages will help elucidate the complex issues surrounding the nature and degree of synthesis characteristic of the ancestral Proto-Austroasiatic [PAA] language as well as the original clausal syntax and system of nominal categorization and inflection found in PAA.

With regards to verbal and syntactic phenomena characteristic of Munda languages (insofar as these can be gleaned from the attested sources) there appear to be systems of noun incorporation patterns that are highly marked or even unique: double argument and even agent argument noun incorporation in Sora.

Another characteristic of the verbal systems of particular Munda languages that are rare or unique among the world’s languages is the agreement of a verb with both an argument and a logical possessor of that argument (rather than a kind of ‘possessor raising’ where the possessor of the argument is preferentially encoded as the argument itself–a system found in numerous languages worldwide) that is attested in Santali (Neukom 1999) and Santali-like Kherwarian North Munda varieties, e.g. the Turi language covered in this proposal, which has been reported to be a Santalized Mundari-like speech variety. For more see Anderson (2007).

Several Munda languages are reported to have contrastive creaky voice and/or low pitch, laryngealization or other voice register phenomena.

What kind of documentation is there of Munda languages?

The majority of the Munda languages could be considered poorly documented, including some of the larger and non-endangered ones (e.g. Ho, Sora, Korku). Even basic demographic information on certain of these groups is lacking. Despite the existence of some Munda language materials in the (1904) Linguistic Survey of India [LSI], these are far from satisfactory. As Emeneau put it in his 1955 work (cited in Mahapatra et al. [eds.] 2002) “[o]n the Munda languages little need to be said. They have so far either been badly described or known only as names in the Survey, which certainly did not succeed in mapping them all.” Many of the Munda varieties represented in the LSI are simply translations of the prodigal son tale from the Bible.

The Munda Online Comparative Dictionary

The Munda lexical database project has been underway for several years and in its current draft form holds roughly 50,000 entries from 12 languages. It was begun by Dr. Manideepa Patnaik in 1999 and joined by Dr. Gregory Anderson the following year. It currently exists in Word and Excel formats, but there are not yet any associated sound files (although many have been recorded for example for Ho and Sora, to a lesser extent Remo as well), and insufficient metadata.

This database will include not only the attested forms, but for a number of entries, intermediate proto-language forms and where possible, Proto-Munda forms are being added as well (based on currently ongoing research), thus furthering its use to researchers in other historical or social scientific disciplines dealing with India. Living Tongues Institute for Endangered Languages is currently engaged in the process of collecting sound and other media files to populate this resource.

Online Resources

Remo Talking Dictionary

Ho Talking Dictionary

Sora Talking Dictionary

Sora – English Talking Dictionary

Ho Warang Chiti Unicode Initiative

Researchers from Living Tongues Institute have been working with representatives of both the Ho community of India as well as the Unicode Consortium to facilitate constructive dialogue between these groups on the proper encoding of the indigenous Warang Chiti (Varang Kshiti) script so that the Ho community may communicate over the Internet and have an Internet presence of their own design.