Principal Component Analysis and Clustering Reveal Human Maternal Ancestry from Complete Mitochondrial Sequences
We develop a simple, direct method to infer the phylogenetic tree for the maternal lineage of all humans using principal component analysis and consensus ensemble clustering. Unlike standard methods such as parsimony and maximum likelihood, our method is fast, gives a unique tree, makes no a-priori assumptions, uses all polymorphisms in the data and has high internal branch consensus. It confirms that modern humans came from Africa in at least two migrations and that the common maternal ancestor of humans or "mitochondrial Eve" lived in Africa ~200,000 years ago. It also suggests that the so called "R Clade", usually defined by a polymorphism at locus 12705 is too heterogeneous to have derived from a single common ancestor and places haplogroups B/R5/F in the Asian branch of the N Clade in agreement with their current location.