Your Total Health!: Genetics and archaeogenetics of South Asia

Tuesday, May 19, 2009

Genetics and archaeogenetics of South Asia

The genetics and archaeogenetics of the ethnic groups of South Asia aim at uncovering these groups' genetic history.

One major issue is the identification of "intrusive" genetic material identified by some studies (Bamshad et al.(2001), Spencer Wells, Journey of Man(2002), Basu et al. (2003), Cordaux et al.(2004) and others) and not by others (Kivisild et al.(2003), Sengupta et al.(2005), Sahoo et al.(2006) and others).

Most of the studies based on mtDNA variation have reported genetic unity of Indian populations and that the basic clustering of maternal lineages has been reported to be not specific to a particular language or caste.

The only distinct ethnic groups in South Asia according to studies of genetic history undertaken by the Human Genome Diversity Project are the Naga, Manipuri, Balochi, Brahui, Burusho, Hazara, Kalash, and Pathan peoples, all found in either the northwestern and northeastern extremes of South Asia respectively.

mtDNA

The largest Indian MtDNA haplogroups are M, R and U. Most seem to be native to South Asia and show a wide distribution within the subcontinent.

Macrohaplogroup M

The macrohaplogroup M comprises c. 60% of Indian MtDNA includes many subgroups still poorly studied; the South Asian clades of M are mostly different from the East Asian ones.

Virtually all modern Central Asian MtDNA M lineages seem to belong to the Eastern Eurasian (Mongolian) rather than the Indian subtypes of haplogroup M, which indicates that no large-scale migration from the present Turkic-speaking populations of Central Asia occurred to India (and vice versa) could have occurred (Kivisild 2000).

Most important South Asian haplorgoups within M:

M2 is widespread through the continent except in the Northwest, where it is rare. It shows peaks in Bangladesh, Andrah Pradesh, coastal Tamil Nadu and Sri Lanka.
- M2a is extended by all peninsular India and Sri Lanka but shows its highest concentration (c. 10%) in Bangla Desh.
- M2b is also widespread (except NW) and reaches highest density in SE India.
M3 is a numerically important clade. It is widespread through all the subcontinent except the Northeast, reaching peaks of c. 20% in Rajastan and Madhya Pradesh, being also very dense in Maharastra, Uttar Pradesh, Haryana, Gujarat, Karnataka and important (>6%) in almost all South Asia. It is also found in Hadramaut (Yemen).
- M3a is the most important subclade of M3 and is specially concentrated (>10%) in the same core area: Rajastan, Madhya Pradesh and Maharastra. But found almost everywhere else too.

Metspalu makes the following observations about M3a in his study: "The frequency of M3a is at its highest amongst the Parsees of Mumbai (22%). Given the low M3a diversity amongst the Parsees – the twelve M3a mtDNAs fall into the two most common haplotypes – the high frequency is likely a result of admixture and subsequent founder events. On the other hand, it is intriguing that, despite its low frequency, M3a penetrates into central and southwestern Iran the historic origin of the Zoroastrian Parsees. In addition to the Parsees we found M3a at high frequencies amongst the Brahmins of Uttar Pradesh (16%) and the Rajputs of Rajasthan (14%)"( Metspalu et al. 2004.)

M4a is most important in Pakistan and Kashmir, with a minor presence in NW India. It is rare in the middle and lower Ganges instead, as well as SW India.
M6 is specially relevant around Kashmir and near the coasts of the Bay of Bengal, from Bangla Desh to Andrah Pradesh.
- M6a shows some density only in SE India and Sri Lanka Tamils. It is rare elsewhere but has presence in Oman.
- M6b makes up the largest part of M6 and shows a similar distribution, with peaks at Kashmir and both Bengals.
M18 is widespread through most of the subcontinent with peaks in Rajastan and Andrah Pradesh.
M25 is widespread in most of India (but rare outside it) at low rates with peaks at western Maharastra and Kerala, as well as Indian Punjab.

Macrohaplogroup R

The macrohaplogroup R (a very large and old subdivision of macrohaplogroup N) is also widely represented and accounts for the other 40%. A very old an most important subdivision of it is haplogroup U that, while also present in West Eurasia, has several subclades specific of South Asia.

Most important South Asian haplogroups within R:

R2 may be of Central Asian origin but is present also in the West of the subcontinent at rather low rates.
R5 is widely distributed by most of India also at low rates, reaching its highest concentration in coastal SW India.
R6 is also widespread at low rates, with peaks among Tamils and Kashmiris.
W is probably of West Asian origin and is found specially in Pakistan, as well as in Kashmir and Panjab. It is rare further east and not to be found in india.

Within haplogroup U (part of R):

U2* (a parahaplogroup) is sparsely distributed specially in the northern half of the subcontinent. It is also found in SW Arabia.
U2a shows relatively high density in Pakistan and NW India but also in Karnataka, where it reaches its higher density.
U2b has highest concentration in Uttar Pradesh but is also found in many other places, specially in Kerala and Sri Lanka. It is also found in Oman.
U2c is specially important in Bangla Desh and West Bengal.
U2l is maybe the most important numerically among U subclades in South Asia, reaching specially high concentrations (over 10%) in Uttar Pradesh, Sri Lanka, Sindh and parts of Karnataka. It also has some importance in Oman. mtDNA haplogroup U2i is dubbed "Western Eurasian" in Bamshad et al. study but "Eastern Eurasian (mostly India specific)" in Kivisild et al. study.
U7 is a mainly West Eurasian haplogroup that has a significative presence in NW India and Pakistan.

Y chromosome

The diversion of Haplogroup F and its descendants.

Clustering analysis from Rosenberg (2006), shows a significant level of similarity between Indo-Aryan and other Indian populations, and a notable change in cluster composition specifically in the Indo-Aryan populations of the Punjab, Sindh and Kashmir regions located in the north-west of South Asia.

In 2004 paper Cordaux argues independent origins of Indian caste and tribal paternal lineages: “Thus, the quantitative comparison of an extensive dataset of Y chromosome haplogroups in both Indian caste and tribal groups, as well as nongenetic information, support a scenario of independent origins of Indian caste and tribal paternal lineages, with recent immigration of caste Y lineages and subsequent bidirectional gene flow between caste and tribal groups. This conclusion contrasts with the earlier suggestion that both Indian caste and tribal Y chromosomes largely derive from the same Pleistocene genetic heritage, with only limited recent gene flow from external sources. In contrast with the Y chromosome evidence, the mtDNA evidence suggests a common origin of tribal and caste groups. It is likely that most maternal lineages largely represent the original mtDNA gene pool of India, implying that caste maternal lineages mainly derive from local tribal ancestors.”

This supersedes the earlier work (Kivisild et al. 2003b; Cordeaux et al. 2003), which emphasizes that the combined results from mtDNA, Y-chromosome and autosomal markers suggest that "Indian tribal and caste populations derive largely from the same genetic heritage of Pleistocene southern and western Asians and have received limited gene flow from external regions since the Holocene" (Kivisild 2003b).

Latest research in 2007 throws up evidence that both caste and tribal populations are autochthonous to India. In the "Peopling of South Asia: investigating the caste-tribe continuum in India", Metspalu M, Kivisild T. et al arrive at the following conclusion :"Molecular studies and archaeological record are both largely consistent with autochthonous differentiation of the genetic structure of the caste and tribal populations in South Asia. High level of endogamy created by numerous social boundaries within and between castes and tribes, along with the influence of several evolutionary forces such as genetic drift, fragmentation and long-term isolation, has kept the Indian populations diverse and distant from each other as well as from other continental populations."(Bioessays Jan 2007)

R1a1

Distribution of R1a (purple) and R1b (red)

The haplogroup R1a1 (M17) is often linked with the ancient Kurgan (Yamna - "ямная") culture and Proto-Indo-Europeans of Southern Russia/Ukraine, who supposedly migrated to Europe, Central Asia and India between 3000 and 1000 BC (Passarino et al. 2001; Quintana-Murci et al. 2001; Wells et al. 2001).

Alternatively, the high frequency of R1a1 found in several South Indian tribes including the Chenchu and the Badagas, together with a higher R1a1-associated STR diversity in India and Iran compared with Europe and Central Asia, has been taken as evidence for an origin of R1a1 (M17) in Southern or Western Asia (Kivisild 2003b). Stephen Oppenheimer believes that it is highly suggestive that India is the origin of the Eurasian mtDNA haplogroups which he calls the "Eurasian Eves". According to Oppenheimer it is highly probable that nearly all human maternal lineages in Europe (and similarly in East Asia) descended from only four mtDNA lines that originated in South Asia 50,000-10,000 years ago.

Unfortunately, there is not enough data to make the final conclusion about the R1a1 origin. In order to do so, comparative study of R1a1 haplogroup diversity in Ukraine (and/or South/Central Russia), Pakistan and India populations (using the same (large) set of microsatellite markers) is necessary. So far, only one attempt of such study has been made by Passarino in 2001. This study employs the 49a, f/TaqI Y specific system and the set of seven microsatellite markers to compare diversity of R1a1 (M17, Eu19) haplogroup in 29 world populations (including Ukraine, Poland, and India). According to Passarino (2001) “the 49a, f Ht 11 displays a major diversification in East Europe with respect to the other areas. Actually, in East Europe, all the derivatives of the 49a, f Ht 11 were observed (9 vs 6 in the "Balkans," 4 in the "Middle East," 1 in India, and 2 in West Europe). Moreover, Ukraine presents at least twice as many derivatives as the other East European populations. These findings suggest that East Europe is the place where this lineage originated or started to expand, particularly in Ukraine, which also includes a refuge area during the LGM.” However, more extensive studies, including Kashmiri populations are necessary to make the reliable conclusions.

Kivisild in his 2003 paper compares diversity of R1a1 (M17) haplogroup in Indian, Pakistani, Iranian, Central Asian, Czech and Estonian populations. This study shows, that diversity of R1a1 in India (Pakistan, Iran) is higher, than in Czechs and Estonians. More than 1/3 of Y chromosome gene pool in Estonians is represented by “Uralic” N3 haplotype.

Some new data on R1a (defining mutation of R1a is SRY-1523 = SRY10831, preceding the M17 mutation which defines R1a1) diversity in Southeastern Europe (Croatia, Bosnia and Herzegovina, Serbia and Montenegro, and Macedonia) are represented in 2005 paper by Peričić et al. According to this paper, R1a haplotype shows high diversity in this area (especially in Bosnia and Herzegovina), “and the estimated range expansion at 15.8 ± 2.1 KYA, consistent with its deep Paleolithic time depth”.

A study published by S.Sharma in American Society for Human Genetics in December 2007 found that R1a*, the ancestral clade to Hg R1a1, has its highest incidence among Kashmiri Pandits (Brahmins) and Saharias, a Central Indian tribe, establishing the indigenous origin of Brahmins and their link to Indian tribals.

Recent studies indicate that the haplogroups C5-M356, H-M69* , F* , L1 and R2 are indigenous to South Asia (Sengupta 2006: 211). According to Sengupta (2006), “our overall inference is that an early Holocene expansion in northwestern India (including the Indus Valley) contributed R1a1-M17 chromosomes both to the Central Asian and South Asian tribes prior to the arrival of the Indo-Europeans.”

A 2001 examination of male Y-DNA by Indian and American scientists indicated that higher castes are genetically closer to Western Eurasians than are individuals from lower castes, whose genetic profiles are similar to other Asians. According to Bamshad et al. (2001), higher caste Telugus have a higher frequency of haplogroup 3 (R1a1) than lower castes. Haplogroup 3 is also characteristic for the Eastern Europeans. In the study, Bamshad and his team wrote, "Our results demonstrate that for biparentally inherited autosomal markers, genetic distances between upper, middle, and lower castes are significantly correlated with rank; upper castes are more similar to Europeans than to Asians; and upper castes are significantly more similar to Europeans than are lower castes." There is some evidence that a few millennia ago, a group of people with (Eastern) European genetic affinities migrated into the Indian subcontinent from the northwest. In the abstract to their paper Bamshad et al stated, "In the most recent of these waves, Indo-European-speaking people from West Eurasia entered India from the northwest and diffused throughout the subcontinent. They purportedly admixed with or displaced indigenous Dravidic-speaking populations. Subsequently they may have established the Hindu caste system and placed themselves primarily in castes of higher rank". However, critics point out that a South Indian state of Andhra Pradesh might not be the best place for such a study. One of the upper castes, Kshatriyas (Rajus), belongs to the minuscule part of Telugu population. Also, historically South Indian royal families had marital relationship with Central and East Indian royal families. In other words, Kshatriyas were not as isolated as Chenchu tribe. In the regions of present day Andhra Pradesh, the dominant and generally feudal castes were Kapu, Reddys and Kammas though they were classified as Shudras. Also, terming Brahmins in South India as a proof of dominance of Indo-European people has been questioned based on the Brahmin migration to South India. Critics also point out that the European specific markers, however controversial might their origins be, is observed across the caste lines in North-West of India. The study also revealed another classic anthropological observation, that women are significantly more mobile in terms of caste and hierarchical class than men, who are barely socially mobile at all in terms of caste and hierarchical Social class. Genetic evidence reveals that over millennia, men from higher casts have married women from lower castes, but women from higher casts have rarely married men from lower castes. Thus the researchers imply that caste and class to a large extent is perpetuated by women and has also thereby contributed to the minimal mixing of Aryan blood with the natives. Recent paper in Current Biology, Cordaux et. al. (2004) confirms the Bamshad (2001) results and concludes that the paternal lineages of Indian caste groups are primarily descendants of Indo-European speakers who migrated from Central Asia about 3,500 years ago.

However, other studies (Kivisild 2003a; Kivisild 2003b) have revealed that a high frequency of haplogroup 3 (R1a1) occurs in about half of the male population of Northwestern India and is also frequent in Western Bengal. These results, together with the fact that haplogroup 3 is much less frequent in Iran and Anatolia than it is in India, indicates that haplogroup 3 among high caste Telugus did not necessarily originate from Eastern Europeans. The high diversity of haplogroup 3 and 9 in India suggests that these haplogroups may have originated in India (Kivisild 2003a).

Other haplogroups

The neolithic spread of farmers to Europe from Levant/Middle East has also been linked to 12f2 (haplogroup J) and the markers M35 (haplogroup E3b) and M201 (haplogroup G). But while M35 (E3b) is present in Europe, Anatolia, South Caucasus and Iran. Indians generally do not have the Alu insertion in their Y chromosomes. The lack of YAP+ chromosomes (haplogroup E) in India suggests that M35 appeared in the Middle East only after a migration from Iran to South Asia had taken place, but earlier than the later migration of Near and Middle Eastern farmers to Europe (Kivisild 2003a).

Most of the pro-migration papers imply that R1a1 is the genetic marker that is representative of a migration, due to its high frequency in Eurasia. But an equally likely genetic marker is haplogroup L. This haplogroup is present in Greek, Turkish, Lebanese, Iranian, Central Asian, and South Asian populations (and Europe, see Kivisild). This marker is found in locations where written sources record the presence of Indo-European languages and people: Greeks, Hittite, Mitanni, Iranians and South Asians. Its peak frequency is found in Indo-Iranian populations. However latest studies suggest that Pakistan which has maximum diversity of Hg L clades, namely L1, L2 and L3 could be the source of this haplogroup. The 'Western Eurasian' components that are found in Indian mtDNA show a distribution closer to that found in the Southern Caucasus and Middle East than to that found in Eastern Europe.This could also be the result of geographical contiguity. There is also the question of why one should assume only one Y haplogroup is representative of the Aryan gene pool. R1a1, R1b, J2, L and H - all of which are present in India and Central and West Asia - are all possibilities. However, haplogroup L has a very low level of diversity in the Punjab. This is suggestive of a recent migration or expansion event in the area, and is supported by the fact that the diversity of R1a1, J2 and haplogroup C is higher in the region. Haplogroup C is supposed to be the remmants of the "Out of Africa" migration of humans, but still retains a high level of diversity. Haplogroup L is also found in South India at relatively high freqencies and has been associated by some (along with J2) with the spread of farming and Dravidian languages. However haplogroup L1 is the dominant one in southern India, hence may represent an expansion event in the South (or elite dominance from the North).

Interestingly, studies show that there has been very little mixing of the male lines between castes/clans for some time. They show distinct haplotypes even though many clans within a region have similar haplogroups. For instance, Northwest Indians exhibit mainly haplogroups R1a1, R1b, J2 and L, yet there is very little sharing of haplotypes with other castes/clans in the same region.

The J2 haplogroup is almost absent from tribals, but occurs among some Austro-Asiatic tribals (11%). The frequency of J2 is higher in South Indian castes (19%) than in North Indian castes (11%) or Pakistan (12%) (Sengupta 2006).

Autosomal markers

One more important marker for Caucasian ancestry in admixed populations may be taken into consideration: H2 haplotype of the gene MAPT. It is shown to be Caucasian in origin, and may work as a good estimator of European admixture. “The constancy of the H2 allele frequency in Caucasian populations from the Middle East to the Orkneys suggest that its origin in European populations is ancient and coincides with the colonization of Europe.”. MAPT represented “by two distinct lineages, H1 and H2, that have diverged for as much as 3 million years and show no evidence of having recombined”. “The H2 lineage is rare in Africans, almost absent in East Asians but found at a frequency of 20% in Europeans”. There are some “evidence suggesting that Homo neanderthalensis contributed the H2 MAPT haplotype to Homo sapiens”. H2 is found in many Pakistan populations.

Interestingly, map of the worldwide frequencies of ASPM (Brain Size Determinant in Homo sapiens) haplogroup D ("derived") matches surprisingly well the map of H2 haplotype distribution. “The frequency of haplogroup D chromosomes is ... 44% in Europeans and Middle Easterners”. “Estimated the coalescence age (i.e., time to the most recent common ancestor) of haplogroup D at 5800 years, with a 95% confidence interval between 500 and 14,100 years.” Of course one should take into consideration, that ASPM “haplogroup D ... rose to high frequency under strong positive selection”, thus Frequency of the ASPM haplogroup D is expected to be higher, than MAPT haplogroup H2. However, considering the facts that only few Pakistani populations were sampled and both markers (ASPM haplogroup D, MAPT haplogroup H2) are present not only in European, but in Middle Eastern populations too, one should consider distribution of these markers only as a suggestion of the eastward migration of “Caucasian peoples” (Europeans and/or Middle Easterners). Thus distribution of these markers taken alone can hardly prove specific Indo-Aryan migration or invasion.

Intriguingly, well-discussed CCR5 delta 32 mutation may be older, than suspected before, and was detected in 2900-year-old skeletal remains from different burial sites in central Germany and southern Italy with rather high allele frequency (11.9%). Thus this mutation may work as a marker of European (vs. Middle Eastern) ancestry. According the 2002 Khaliq paper frequency of the CCR5 delta 32 allele ranged from 0.62% to 3.57% in Pakistani ethnic groups, which is much lower than that found in European populations (10% average frequency), and similar to that in the Middle East. One of the possible explanations of such geographical distribution is the migration of the mutation carriers from the territory of high mutation frequency into the area where such mutation is absent.

South Asia and Central Asia

A recent study (Sengupta 2006) found that the “influence of Central Asia on the pre-existing gene pool was minor. The ages of accumulated microsatellite variation in the majority of Indian haplogroups exceed 10,000–15,000 years, which attests to the antiquity of regional differentiation.” and it concluded: “Our reappraisal indicates that pre-Holocene and Holocene-era—not Indo-European—expansions have shaped the distinctive South Asian Y-chromosome landscape.”

According to Sahoo (2006), “The sharing of some Y-chromosomal haplogroups between Indian and Central Asian populations is most parsimoniously explained by a deep, common ancestry between the two regions, with diffusion of some Indian-specific lineages northward. The Y-chromosomal data consistently suggest a largely South Asian origin for Indian caste communities and therefore argue against any major influx, from regions north and west of India, of people associated either with the development of agriculture or the spread of the Indo-Aryan language family.”

Your Total Health!

Search 2.0

Tuesday, May 19, 2009

Genetics and archaeogenetics of South Asia