Post on 28-May-2018
1
Title Page
An Investigation Into The
Distribution Of Human Molecular
Genetic Variation In Sub-Saharan
Africa
By
Krishna Ranganaden Veeramah
SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
JANUARY 2008
MINOR CORRECTIONS SUBMITTED JUNE 2008
UNIVERSITY COLLEGE LONDON (UCL)
The Centre for Genetic Anthropology
Department of Biology
UCL
Supervisor: Dr Mark G.Thomas
Second Supervisor: Dr Mike E. Weale
2
Declaration of Ownership
I, Krishna Ranganaden Veeramah, confirm that the work presented in this
thesis is my own. Where information has been derived from other sources, I
confirm that this has been indicated in the thesis.
3
Abstract
Sub-Saharan Africa is believed to possess more human genetic diversity than any other
region of the world, a likely consequence of it being the probable place of origin of
anatomically modern man. Despite its evolutionary importance studies into the
distribution of this genetic variation have been somewhat limited in comparison to
Europe, Asia and the Americas, especially with respect to fine-scale studies that would
help elucidate local histories and the consequences of ethnic and linguistic interactions.
Another possible consequence of this lack of knowledge of genetic diversity is that much
information of functionally important genetic variants that are potentially relevant to
pharmacogenetic research is not available. This lack of information can add to an already
prevalent Eurocentric ascertainment bias in current knowledge of genetic variation,
depriving sub-Saharan African communities of the potential medical benefits
pharmacogenetics has to offer. This thesis describes three case studies that form part of an
investigation into human genetic variation in sub-Saharan Africa.
Chapter 2 uses sex-specific genetic systems to successfully differentiate between two
alternative oral histories of the ethnogenesis of the Nso΄ people of Cameroon. Chapter 3
establishes that substantial male and female gene flow has occurred among the peoples of
the Cross River region of Nigeria, a region that includes multiple ethnic groups speaking
distinct languages that appear to have separated hundreds and thousands of years ago.
Chapter 3 demonstrates that the drug metabolising enzyme Flavin-containing
Monooxygenase 2, which has been shown to be non-functional in all Europeans and
Asian individuals collected to date, has a putative functional allele in approximately one
third of sub-Saharan Africans, a finding that may have important implications for
therapeutic intervention strategies and xenobiotic exposure. This thesis demonstrates inter
alia the value of conducting genetic studies in sub-Saharan Africa using large datasets of
well known provenance.
4
Table Of Contents
Title Page ............................................................................................................................ 1 Declaration of Ownership ................................................................................................. 2 Abstract ............................................................................................................................... 3
Table Of Contents .............................................................................................................. 4 List of Figures and Tables ................................................................................................. 7 Abbreviations ................................................................................................................... 11 Acknowledgements .......................................................................................................... 13 1. Introduction .............................................................................................................. 15
1.1. Rationale Of The Study .......................................................................................... 15
1.2. Notable Geographical Features Of Sub-Saharan Africa ......................................... 21 1.3. The Languages Of Sub-Saharan Africa .................................................................. 22
1.3.1. Important Methods and Concepts of Historical Linguistics ............................ 22
1.3.2. The Distribution of sub-Saharan African Languages ...................................... 27 1.4. Previous Work On The Distribution Of Genetic Variation In Sub-Saharan Africa32
1.4.1. Classical Markers ............................................................................................ 32
1.4.2. Molecular Data ................................................................................................ 38 1.5. DNA Sampling Issues ............................................................................................. 56
1.6. Statement of work performed by Krishna Veeramah in this thesis ........................ 57 1.6.1. Chapter 2 ......................................................................................................... 57 1.6.2. Chapter 3 ......................................................................................................... 57
1.6.3. Chapter 4 ......................................................................................................... 57
2. Sex-Specific Genetic Data Support One of Two Alternative Versions Of The
Foundation Of The Ruling Dynasty Of The Nso´ In Cameroon ................................. 59 2.1. Introduction ............................................................................................................. 59
2.1.1. The geography, history and sociology of the Nso´ .......................................... 59
2.1.2. Expectations of sex-specific genetic variation in the Nso´ .............................. 63
2.2. Materials and Methods ............................................................................................ 70 2.2.1. Sample Collection Procedure .......................................................................... 70 2.2.2. Y-chromosome typing....................................................................................... 70
2.2.3. mtDNA typing................................................................................................... 72 2.2.4. Statistical and Population Genetic Analysis .................................................... 73 2.2.5. Dating of the Y*(xBR,A3b2) clade ................................................................... 74
2.2.6. Comparison of duy vs nshiylav and mtaar genealogy depths .......................... 81 2.3. Results and Discussion ........................................................................................... 82
2.3.1. The NRY and mtDNA distribution in the Nso΄ ................................................. 82 2.3.2. Association of the Y*(xBR,A3b2) lineage with the indigenous hunter-gatherer
Visale.......................................................................................................................... 84
2.3.3. Dating of the Y*(xBR,A3b2) lineage in the Nso´ ............................................. 88 2.3.4. The possible evolution of a relaxed patrilineal system of descent for the won
nto´ ............................................................................................................................. 91 2.4. Conclusion .............................................................................................................. 92
2.5. Supplementary Section for Chapter 2 ..................................................................... 94 2.5.1. Supplementary Section 2S.1: The expectation of NRY type frequencies in the
won nto´ and duy of the Nso´. .................................................................................... 94
5
3. It All Depends On The Scale: Little Sex-Specific Genetic Variation In The
Presence Of Substantial Language Variation In Peoples Of The Cross River Region
Of Nigeria Assessed Within The Wider Context Of West Central Africa. .............. 114 3.1. Introduction ........................................................................................................... 114
3.1.1. A brief description of the Peoples and Languages of the Cross River region
.................................................................................................................................. 115 3.1.2. Genetics and Language.................................................................................. 122 3.1.3. Expectations of the distribution of NRY and mtDNA variation in the Cross
River region ............................................................................................................. 123 3.2. Materials and Methods .......................................................................................... 128
3.2.1. Sample Collection Procedure. ....................................................................... 128 3.2.2. Y-chromosome typing..................................................................................... 129 3.2.3. mtDNA typing................................................................................................. 129 3.2.4. Statistical and Population Genetic Analysis .................................................. 130
3.3. Results ................................................................................................................... 134
3.3.1. The distribution Of NRY variation ................................................................. 134 3.3.2. The distribution of mtDNA variation ............................................................. 142
3.3.3. Are clan communities collected from different locations distinguishable? ... 144 3.3.4. Are different clans of the same language group collected from the same
location distinguishable? ......................................................................................... 145 3.3.5. Are different language groups collected from the same location
distinguishable? ....................................................................................................... 145 3.3.6. Are the same language groups collected from different locations
distinguishable? ....................................................................................................... 145
3.3.7. Are speakers of the six Cross River languages distinguishable? .................. 146 3.3.8. Are speakers of the six Cross River languages distinguishable when two
groups from Igboland are added to the analysis? ................................................... 148 3.3.9. Can differences between the Cross River region and Cameroonian and
Ghanaian groups be established? ............................................................................ 149
3.3.10. Are there correlations of genetic distances and geographic and linguistic
distances? ................................................................................................................. 154 3.3.11. The Origins of the Efik ................................................................................. 156
3.4. Discussion ............................................................................................................. 159
3.4.1. General observations regarding NRY and mtDNA variation ........................ 159 3.4.2. The Cross River region as a genetically homogenous region ....................... 160
3.4.3. Cross River, Ghana and Cameroon as genetically distinct regions .............. 163 3.4.4. No genetic evidence that the Efik Uwanse have an origin in ancient Palestine
.................................................................................................................................. 166
3.5. Conclusion ............................................................................................................ 166 3.6. Supplementary Section for Chapter 3 ................................................................... 168
4. The potentially deleterious functional variant FMO2*1 is at high frequency
throughout sub-Saharan Africa ................................................................................... 170 4.1. Introduction ........................................................................................................... 170
4.1.1. Previous work on Flavin-containing Monoxygenase 2 ................................. 170 4.1.2. The rationale for studying FMO2 in Africans ............................................... 172
4.2. Materials and Methods .......................................................................................... 172 4.2.1. Sample Collection .......................................................................................... 172
4.2.2. g.23238C>T typing ........................................................................................ 173 4.2.3. Statistical and Population Genetic Analysis .................................................. 175
4.3. Results ................................................................................................................... 184
6
4.3.1. The distribution of 23238C>T in Africa ........................................................ 184
4.3.2. Examining FMO2 for evidence of Natural Selection ..................................... 191 4.3.3. Analysis of NIEHS FMO2 re-sequencing data .............................................. 195
4.4. Discussion ............................................................................................................. 199 4.4.1. Functional FMO2 is found at high frequency throughout sub-Saharan Africa
.................................................................................................................................. 199 4.4.2. The possible consequences of FMO2 functionality in Africans ..................... 200 4.4.3. The Evolution of FMO2 ................................................................................ 202
4.5. Conclusion ............................................................................................................ 203
5. Conclusion .................................................................................................................. 205 5.1. Implications for investigating human history and behaviour ............................... 205 5.2. Implications for investigating medically relevant genetic variation ..................... 208 5.3. Future Work .......................................................................................................... 210
5.3.1. Future work derived from Chapter 2 (Sex-Specific Genetic Data Support One
Of Two Alternative Versions Of The Foundation Of The Ruling Dynasty Of The Nso`
In Cameroon) ........................................................................................................... 211 5.3.2. Future work derived from Chapter 3 (It All Depends On The Scale: Little Sex-
Specific Genetic Variation In The Presence Of Substantial Language Variation In
Peoples Of The Cross River Region Of Nigeria Assessed Within The Wider Context
Of West Africa) ........................................................................................................ 212 5.3.3. Future work derived from Chapter 4 (The potentially deleterious functional
variant FMO2*1 is at high frequency throughout sub-Saharan Africa) ................. 214 5.4. Final Comments .................................................................................................... 215
Appendix A: Criteria for and problems associated with collecting African samples
for The Centre for Genetic Anthropology (TCGA) DNA bank. ............................... 216
Appendix B: An example sociological data sheet used during DNA sample collection
.......................................................................................................................................... 220 Appendix C: Extraction of DNA from Buccal Swabs ................................................. 221
Appendix D: Legends of figures and tables found on the attached CD. ................... 223
Appendix E: LRH test Source Code ............................................................................ 225 References ....................................................................................................................... 226
7
List of Figures and Tables
Figure 1.1: A political map of Africa (from the Perry-Castañeda map collection). 23 Figure 1.2: A physical geography map of Africa (from the Perry-Castañeda map
collection). ....................................................................................................24
Figure 1.3: A simplified linguistic map of Africa according to the classification of Greenberg (1963) (a vectorisation by Mark Dingemanse). ...........................28
Figure 1.4: Average linkage tree for 42 populations. The abscissa shows the genetic distances (modified Nei) calculated on the basis of 120 allele frequencies taken from 42 classic genetic marker systems. Taken directly from Cavalli Sforza et al. (1994). ..................................................................34
Table 1.1: Some examples of natural selection detected in Africans. ..................54 Figure 2.1: Map showing towns in Cameroon where samples were collected. ....61 Figure 2.2: Lineage tree showing the relationship of won nto´ individuals and the
transition of won nto´ to duy under Royal Social Status Rule A. M = male offspring, F = female offspring, * = individual inherits the same NRY type as a fon). Won nto´ are shown in black and duy in red. ....................................64
Figure 2.3: Lineage tree showing the relationship of won nto´ individuals and the transition of won nto´ to duy under Royal Social Status Rule B. M = male offspring, F = female offspring, * = individual inherits the same NRY type as a fon). Won nto´ are shown in black and duy in red. ....................................65
Figure 2.4: Genealogical relationships of UEP markers used to define NRY haplogroups ..................................................................................................71
Table 2.1: Distribution of NRY haplogroups (NRY at UEP level) in the four Nso´ social classes. ..............................................................................................83
Table 2.2: Distribution of NRY haplogroups in the peoples of the western Grassfields and Tikar plain. ..........................................................................86
Figure 2.5: PCO plot of UEP-based population pairwise FST values. The PCO plot is constructed using pairwise genetic distances, FST, between the four Nso´ classes (labelled by name) and other populations of the western Grassfields and Tikar Plain (labelled using abbreviations as defined in Table 2.2). PCO1 and PCO2 explain 97.91% and 1.92% of the variation respectively. ............87
Table 2.3: Comparison of the depth of two genealogies. The probability of observing results equal to or more extreme than the difference between the Average Square Distance values of a) the duy and b) the nshiylav and mtaar combined. (Three independent run simulations for each set of criteria) .......90
Table 2.4: Cultural identity of won nto´ males sampled in the study as well as the cultural identity of each sample’s father, mother, father's father and mother's mother. .........................................................................................................93
Supplementary Figure 2S.1: Lineage tree showing the relationship of won nto´ individuals under Royal Social Status Rule A. (M = male offspring, F = female offspring, * = this individual inherits the same NRY type as a fon). ...95
Supplementary Figure 2S.2: Diagram showing the relative contributions of different won nto´ lineages to the won nto´ under Royal Social Status Rule A. .....................................................................................................................96
Supplementary Figure 2S.3: Lineage tree showing the relationship of won nto´ individuals under Royal Social Status Rule A for the two most recent adult generations of won nto´ males. (M = male offspring, F = female offspring, * =
8
this individual inherits the same NRY type as a fon. Numbers refer to specific Lineage Representatives).............................................................................98
Supplementary Table 2S.3: Probability of sampling Nso´ Y-chromosomes given Royal Social Status Rule A. ..........................................................................99
Supplementary Figure 2S.4: Lineage tree showing the transition of won nto´ to duy under Royal Social Status Rule A. (M = male offspring, F = female offspring, * = this individual inherits the same NRY type as a fon). Duy are shown in red. ..............................................................................................101
Supplementary Figure 2S.5: Lineage tree showing the relationship of won nto´ individuals under Royal Social Status Rule B. (M = male offspring, F = female offspring, * = this individual inherits the same NRY type as a fon). .102
Supplementary Figure 2S.6: Diagram showing the relative contributions of different won nto´ lineages to the won nto´ under Royal Social Status Rule B. ...................................................................................................................103
Supplementary Figure 2S.7: Lineage tree showing the relationship of won nto´ individuals under Royal Social Status Rule B for the two most recent adult generations of won nto´ males. (M = male offspring, F = female offspring, * = this individual inherits the same NRY type as a fon. Numbers refer to specific Lineage Representatives)...........................................................................105
Supplementary Table 2S.4: Probability of sampling Nso´ Y-chromosomes given Royal Social Status Rule B. ........................................................................106
Supplementary Figure 2S.8: Lineage tree showing the transition of won nto´ to duy under Royal Social Status Rule B. (M = male offspring, F = female offspring, * = this individual inherits the same NRY type as a fon). Duy are shown in red. ..............................................................................................107
Supplementary Figures 2S.9: Lineage tree showing the relationship of won nto´ individuals under Royal Social Status Rule C for the two most recent adult generations of won nto´ males. (M = male offspring, F = female offspring, * = this individual inherits the same NRY type as a fon. Numbers refer to specific Lineage Representatives)...........................................................................109
Supplementary Figures 2S.10: Lineage tree showing the relationship of won nto´ individuals under Royal Social Status Rule D for the two most recent adult generations of won nto´ males. (M = male offspring, F = female offspring, * = this individual inherits the same NRY type as a fon. Numbers refer to specific Lineage Representatives)...........................................................................110
Supplementary Table 2S.5: Probability of sampling Nso´ Y-chromosomes given Royal Social Status Rule C. .......................................................................111
Supplementary Table 2S.6: Probability of sampling Nso´ Y-chromosomes given Royal Social Status Rule D. .......................................................................111
Figure 3.1: Map showing the position where samples were collected from in West Central Africa. Political borders are shown by black lines. Colour bar indicates elevation in metres. .....................................................................116
Table 3.1: Summary of cultural practices of Cross River ethnic groups utilised in this study. ...................................................................................................117
Figure 3.2: Broad relationships of the differing language groups used or described in this chapter based on Williamson and Blench (2000). Branch lengths are not informative. ........................................................................118
Table 3.2: Nigerian Cross River sample collection details. ................................125 Table 3.3: First languages of parents of Cross River region samples utilised in
this study. ...................................................................................................126
9
Table 3.4a: Lexicostastic similarity percentages for various Niger-Congo languages. ‘?’ indicates no available data. .................................................135
Table 3.4b: Lexicostastic dissimilarity matrix for 6 Cross River languages, 3 Cameroon Grassfields languages and 2 Ghanaian languages. .................136
Figure 3.3: Language network based on distance matrix inferred from partial lexicostatistic matrix (Table 3.4b). ..............................................................137
Table 3.5: Haplogroup proportions in Cross River, Cameroonian Grassfield and Ghanaian groups. .......................................................................................139
Table 3.6: Hierarchical AMOVA results of Cross River, Cameroonian and Ghanaian groups at various molecular levels. Colour indicates significance level of Fixation Indices P-values: Yellow = 0.05<P<0.01, Orange = 0.01<0.001, Red = P<0.001. Each grouping is followed, indicated by ‘n’, by the number of groups and, if applicable, the number of individual populations analysed. ....................................................................................................140
Figure 3.4: Consensus neighbour joining trees for Cross River population using various methods of genetic distance for both NRY and mtDNA. Only individual node bootstrap values over 30% are shown on tree. ..................150
Table 3.7a: ETPD P-values (upper triangle) at various NRY and mtDNA levels for pooled Cameroonian, Ghanaian and Nigerian datasets. Colour code is same as Table 3.6. ...............................................................................................151
Tables 3.7b: Genetic Distances (lower triangle) and P-values (upper triangle) at various NRY and mtDNA levels for pooled Cameroonian, Ghanaian and Nigerian datasets. Colour code is same as Table 3.6. ...............................151
Figure 3.5: Various PCO plots at different NRY and mtDNA analysis levels for populations from the Cross River region, the Cameroon Grassfields and Ghana.........................................................................................................153
Table 3.8: Results of Mantel and Partial Mantel tests at different levels of NRY and mtDNA analysis using various distance matrices. Colour code is same as Table 3.6. ...............................................................................................155
Table 3.9: NRY and mtDNA haplogroup frequencies in the Efik Uwanse. .........157 Figure 3.6: Various PCO plots at different NRY and mtDNA analysis levels for
populations from the Efik Uwanse and comparison populations. ...............158 Figure 4.1: Diagrammatic representation of 23238C>T SNP restriction enzyme
assay. .........................................................................................................174 Figure 4.2: #rs6661174 Mbo/MseI Complementary Restriction Enzyme Digest
Banding Patterns. .......................................................................................175 Figure 4.3: Distribution of 10,000 EHH values calculated 0.4cM from core alleles
with A) SNP haplotype density not controlled and B) SNP haplotype density controlled at 1 SNP per 0.05cM (8 SNP extended haplotypes). .................182
Table 4.1: 23238C>T Genotype and Allele frequencies. ...................................186 Figure 4.4: Map showing the percentage of individuals with at least one FMO2*1
allele in Africa and two nearby countries. ...................................................187 Table 4.2: Pearson's Chi Square Test on individual regions. ............................188
Table 4.3: Fisher's Exact tests between regions. ..............................................188 Table 4.4: Fisher's Exact tests between CEA populations. ................................188
Figure 4.5: PCO plot of 23238 C>T-based population FST values. ....................189 Figure 4.6: Contour map based on FMO2*1 allele frequencies in Central East
African populations with areas of rapid allele frequency change shown with blue circles. ................................................................................................190
10
Figure 4.7: Spatial Autocorrelation Analysis of 23238C>T allele frequency data using (A) Moran’s II and (B) Geary’s cc. .....................................................192
Table 4.5: Table of ethnic identities found in the various populations examined in this chapter. ................................................................................................193
Table 4.6: Various Population Pairwise Fisher’s Exact Tests. ...........................194 Table 4.7: P-valuesa for EHH values calculated at various genetic (a) and
physical distances (b) from alleles present at the 23238C>T locus in the upstream (-) and downstream (+) directions in the YRI, CEU CHB+JPT datasets with a SNP haplotype density of 0.05cM per SNP in (a) and 10kb per SNP in (b). ............................................................................................196
Table 4.8: Table showing inferred haplotypes for FMO2 genomic variants from NIEHS sequencing data. ............................................................................197
Supplementary Table 2S.1: Distribution of NRY types, defined by UEP haplogroups and microsatellite haplotypes, in the four Nso´ social classes and people of the western Grassfields and Tikar Plain. .............................223
Supplementary Table 2S.2: Distribution of mtDNA types, defined by VSO haplotypes, in the four Nso′ social classes. ................................................223
Supplementary Table 2S.7: Confidence intervals for TMRCA calculations in the duy, the nshiylav and mtaar, and the won nto´ and duy, using two mutation models. .......................................................................................................223
Supplementary Tables 3S.1: Pairwise ETPD P-values for various levels of NRY and mtDNA analysis for Cross River samples, Cameroon and Ghana. Level of analysis shown in top left cell of matrix. Colour code is same as Table 3.6. ...................................................................................................................223
Supplementary Table 3S.2: Distribution of NRY types, defined by UEP haplogroups and microsatellite haplotypes, in the Cross River region, Cameroon and Nigeria. ..............................................................................223
Supplementary Table 3S.3: Pairwise genetic distances and associated P-values for various levels of NRY and mtDNA analysis. Level of analysis shown in top left cell of matrix. Colour code is same as Table 3.6. ...........................224
Supplementary Table 3S.4: Distribution of mtDNA types, defined by HVS-1 mtDNA haplogroups and VSO haplotypes, in the Cross River region, Cameroon and Nigeria. ..............................................................................224
Supplementary Table 3S.5: Distribution of NRY types, defined by UEP haplogroups and microsatellite haplotypes, in Ethiopia, Israeli and Palestinian Arabs, Lake Chad and Sudan. .................................................224
Supplementary Table 3S.6: Distribution of mtDNA types, defined by HVS-1 mtDNA haplogroups and VSO haplotypes, in Ethiopia, Israeli and Palestinian Arabs, Lake Chad and Sudan. ...................................................................224
Supplementary Table 3S.7: Pairwise genetic distances and associated P-values for various levels of NRY and mtDNA analysis for Efik Uwanse comparisons. Colour code is same as Table 3.6. .............................................................224
11
Abbreviations
AQ Amodiaquine
AMOVA Analysis of Molecular Variance
ASD Average Squared Distance
CAF Central African Republic
CEA Central East Africa
CI Confidence Interval
CYP Cytochrome P450
DRC Democratic Republic of Congo
DNA Deoxyribonucleic acid
DME Drug Metabolising Enzyme
ETA Ethionamide
ETPD Exact Test of Population Differentiation
EBSP Expansion of the Bantu-speaking peoples
EHH Extended Haplotype Homozygosity
FMO Flavin-containing Monooxygenase
HIV Human Immunodeficiency Virus
HVR-1 Hypervariable Segment 1
K2 Kimura 2 parameter
LR Lineage Representative
L-SMM Linear Length Dependent Stepwise Mutation Model
LD Linkage Disequilibrium
LRH Long Range Haplotype
MS Microsatellite
mtDNA mitochondrial DNA
MRCA Most Recent Common Ancestor
MR Multiregional
NIEHS National Institute of Environmental Health Sciences
NJ Neighbour Joining
NRY Non-Recombining portion of the Y chromosome
NA North Africa
PCR Polymerase Chain Reaction
12
PCA Principal Component Analysis
PCO Principal Co-ordinate Analysis
RAO Recent African Origin‟
RFLP Restriction Fragment Length Polymorphism
SMM Simple Stepwise Mutation Model
SNP Single Nucleotide Polymorphism
SEA South East Africa
TCGA The Centre for Genetic Anthropology
TMRCA Time to the Most Recent Common Ancestor
UEP Unique Event Polymorphism
UPGMA Unweighted Pair Group Method with Arithmetic Mean
VSO Variable Site Only
WA West Africa
WMH Won ntoʹ Modal Haplotype
13
Acknowledgements
I thank all those individuals who gave samples as well as those who helped in the
collection of samples, especially Matthew Forka and Loveline Lum who accompanied me
during fieldwork in Cameroon.
I also thank those individuals who have made my experience at UCL, and TCGA in
particular, over the last 4 years an enjoyable (yet productive!) one. This includes (in a
somewhat chronological order) Abigail Jones, Karine Rousseu, Ian Barnes, Elizabeth
Caldwell, Charlotte Mulcare, Isabel Homlquist, Andrew Loh, Luke Warren, Gianpiero
Cavalleri (JP), Catherine Ingram, Lorenzo Zannette, Ana Texieira, Chris Plaster, Yuval
Itan, Sarah Browning, Adam Powell, Laura Horsfall, Naser Ansari Pour and Lauren
Johnson.
I am also indebted to (this time in no particular order) Professor Elizabeth Shephard,
Professor Ian Philips, Professor David Zeitlyn, Dr Bruce Connell, Professor Verkijika
Fanso, Professor Robert Griffiths, Professor Dallas Swallow, Professor Sue Povey,
Professor Nancy Mendell and (last but by no means least) Dr Mike Weale for their
invaluable guidance during the different works described in this thesis.
However my main thanks are reserved for my supervisor Dr Mark Thomas, my industrial
sponsor Dr Neil Bradman and my parents, Lutcheemee and Ven Veeramah. Without their
support I would have simply been unable to undertake and complete this work.
14
Chapter 1:
Introduction
15
1. Introduction
1.1. Rationale Of The Study
Africa is the world‟s second largest continent at some 30 million km2, with the vast
majority of the landmass lying south of the Sahara desert, a region termed sub-Saharan
Africa. Sub-Saharan Africa (more specifically somewhere between eastern and southern
Africa (Quintana-Murci et al. 1999; Jobling, Hurles & Tyler-Smith 2004-Chapter 8;
Prugnolle, Manica & Balloux 2005; Ray et al. 2005; Amos & Manica 2006; Liu et al.
2006; Cramon-Taubadel & Lycett 2007)) is also now widely accepted as the likely place
of origin of anatomically modern man, the ancestor of all present day humans. Relative to
its actual geographical size the population of sub-Saharan Africa is small, at
approximately 700 million individuals, in comparison to the much geographically smaller
areas that constitute China and India. However the population growth rate is the highest
in the world at around 2.5% per year. Sub-Saharan Africa is linguistically diverse with
approximately 2000 different languages (Crystal 1997; Ethnologue 2005) distributed
across the region. Given the close ties that exist between language and social identity this
demonstrates a considerable variety of ethnic groups, many of which have complex
relationships with other neighbouring and distant groups. However, to much of the
developed world sub-Saharan Africa is best known for its high poverty rate, political
instability and increased incidence of infectious diseases such as Tuberculosis, Malaria
and HIV, aspects which, especially recently, have led to a great deal of media attention.
Though Africa has been described as the „cradle of humanity‟, prehistory in much of sub-
Saharan Africa extends to recent times (Ki-Zerbo 1989) with many ethnic groups,
especially those found inland, having documented records dating only to the arrival of
European Colonists. As a consequence archaeology, anthropology and linguistics have
been important tools in attempts to piece together sub-Saharan African histories.
Over the past 50 or so years researchers have examined the distribution of human genetic
variation in Africa („genetic‟ here referring to low resolution phenotypic data such as
blood groups as well as higher resolution molecular data). Many studies had as their focus
16
an attempt to distinguish between the multi-regional and out-of-Africa models for the
origins of Homo sapiens sapiens (Cann, Stoneking & Wilson 1987; Vigilant et al. 1991;
Chen et al. 1995; Horai et al. 1995; Jorde et al. 1995; Horai 1995; Seielstad et al. 1999;
Takahata, Lee & Satta 2001; Liu et al. 2006). They often compared sets of individuals of
African and non-African ancestry with the results favouring the out-of Africa explanation
(discussed later). In addition a number of studies have examined variation associated with
infectious diseases prevalent in sub-Saharan Africa, with many reports between the 1960s
to 1980s utilising serological techniques in the investigation of blood-based disorders
such as Glucose-6-phosphate dehydrogenase deficiency. The general consensus from
studies performed up to the present day is that human genetic variation appears to be
greater in sub-Saharan Africa than in the rest of the world (Olerup et al. 1991; Vigilant et
al. 1991; Bowcock et al. 1994; Armour et al. 1996; Tishkoff et al. 1996; Seielstad et al.
1999; Kaessmann et al. 1999; Jorde et al. 2000; Yu et al. 2002; Macfarlane & Simmonds
2004; Witherspoon et al. 2006). However it is not yet clear how this greater genetic
variation in Africa is distributed.
While there has been some work examining sub-Saharan African populations on a macro-
scale (covering multiple politically defined countries), with the general trend in
conclusions being that a) genetic distance among populations increases with geographic
distance and b) genetic diversity decreases as geographic distance increases from East
Africa (Prugnolle, Manica & Balloux 2005; Handley et al. 2007), there is very little
available information on variation at a fine-scale.
For example what is the extent of variation within defined groupings: among individuals,
within villages and among towns, within and among ethnic groups and within and among
larger geographic regions? Also is ethnic identity, language spoken or location better
correlated with genetic differences among groups?
Study of the distribution of human genetic variation in sub-Saharan Africa may lead to
interesting insights into our past. Previous work elsewhere in the world has already shown
the power of genetics, including the sex-specific genetic systems, in elucidating different,
and often very specific, aspects of population demographic history, for example origins
and migration events. As mentioned previously studies have already examined African
populations in the course of trying to ascertain the geographic origins of humanity and, at
17
the continent wide scale, studies on the distribution of genetic variation in sub-Saharan
Africa have provided insights into the expansion of the Bantu-speaking people through
demic rather than cultural diffusion processes (Underhill et al. 2001; Cruciani et al. 2002;
Salas et al. 2002; Wood et al. 2005). This expansion is considered to be one of the largest
human migration events of the recent past and is believed to have started some 4000 years
ago when agriculturists spread from around the present day Nigerian-Cameroonian border
into much of sub-Saharan Africa, propagating the many Bantu languages now
encountered in the region. However genetic analysis has been little used in the region to
uncover population history at a fine-scale, e.g. for particular ethnic groups. Given that
many peoples in sub-Saharan Africa have a relatively recent prehistory, with accounts of
their past reliant on oral histories sometimes supplemented by archaeological and
linguistic data, genetic studies may well be a useful tool when used alongside other more
traditional disciplines.
Patterns of genetic variation observed on a fine-scale in sub-Saharan Africa should be of
particular interest to linguists working in the region. Approaches used to model the
evolution of DNA are often similar to those used to describe the evolution of languages.
Relationships in both can be represented by phylogenetic trees, though these trees are
often simplifications of much more complex processes (DNA trees can be greatly affected
by recombination, while language trees are subject to problems of horizontal
transmission/word borrowing and language replacement). Despite all the difficulties, if
phylogenies based on genetic data correspond with those based on language this will
often be seen as highly significant evidence of a particular model of human behaviour or
demographic history. However linguists have often questioned the approach of genetic
studies (MacEachern 2000), especially because of the lack of appropriate sampling,
sometimes referring to the methodology as the „out of the freezer‟ approach (Blench 2006
pg 20). This may go some way to explaining why studies that show correlations between
genetic and linguistic data often involve large distances, making it difficult to disentangle
the relative contribution of geographic distance in any correlations of genetics and
language. As Africa possesses almost a third of the world‟s languages it is of interest to
establish whether careful and appropriate sampling at a fine-scale can help researchers
gain greater insight, especially in regard to linguists working on questions related to the
effects of language contact. If groups each speaking a different language can co-exist in
close geographic proximity, can these same groups maintain genetic isolation from each
18
other over long periods of time in such circumstances? At the same time linguistic
information can be important in structuring genetic studies since linguistics will often be
highly correlated with culture and therefore patterns of linguistic evolution can be used to
investigate patterns of cultural evolution. If an aim is to establish whether geography or
ethnicity are better predictors of genetic difference the languages spoken may be a vital
component in the differentiation and maintenance of identity.
No less important is the potentially beneficial medical application of knowledge of
diversity within the peoples of sub-Saharan Africa. The substantial genetic diversity
present in sub-Saharan Africa suggests that there are likely to be many alleles present in
individuals that contribute to increased resistance to a variety of infectious diseases
indigenous to the region. Some alleles may well be restricted to a particular group or
groups. In addition while common non-infectious diseases, under the common
disease/common variant hypothesis, would be expected to be caused by a few high
frequency variants present in a number of populations, rare alleles that probably lead to
complex disease susceptibility may well be restricted to particular groups, each with their
own specific set of causative alleles (Tishkoff & Williams 2002). Knowledge and
understanding of these resistance and susceptibility variants can aid in understanding the
causes of disease and contribute to new therapeutic interventions. However, gaining this
knowledge will require screening many individuals and groups, a task which is not
always easy in sub-Saharan Africa.
One way in which understanding patterns of distributions of genetic variation in sub-
Saharan Africa has the potential to bring relatively early benefits to its inhabitants is in
the selection, when choices are available, of the more appropriate pharmaceutical
intervention. Pharmacogenetics is the study of the genetic basis of individual variation in
drug response (Johnson 2003; Weinshilboum 2003; Wilke et al. 2007). This field is
receiving increasing attention from the scientific community as the ability to analyse ever
greater numbers of genetic variants, at greater speeds, rapidly increases as a consequence
of technological advances. Genetic variants have already been identified, especially
amongst the Cytochrome P450 genes, which have been shown to influence drug
metabolism. However since sub-Saharan African populations have generally been
underrepresented in pharmacogenetic studies compared to European and Asian
populations, there is likely to be substantial ascertainment bias in the literature in the
19
reporting of functional variants. Given that the rest of the world is believed to possess
only a subset of the genetic diversity present in Africa, it is likely that a large number of
potentially important variants that effect drug metabolism have yet to be uncovered.
Again these variants may well be restricted to particular regions or groups of people.
The ultimate aim of pharmacogenetics is to define individual drug administration profiles
that remove the risk of an adverse drug reaction due to an individual‟s genetic makeup
and maximises efficacy. In sub-Saharan Africa, at least, this is unlikely to become reality
in the foreseeable future due to a lack of appropriate infrastructure and to economic
constraints. However, population genetic theory presents the opportunity that it may be
possible to increase the probability of providing the most appropriate therapeutic
intervention by basing the decision on an understanding of the genetic characteristics of
the larger group to which a particular individual belongs. For example if it is shown that
in a particular region genetic identity is more closely tied to ethnic identity than to
geographic location, it may be possible to greatly reduce inappropriate pharmaceutical
intervention by administering drugs suited to the pharmacogenetic profile of each ethnic
group. Although it may be preferable to construct pharmacogenetic profiles for each
ethnic group it may be possible, by understanding the pattern of distribution of genetic
variation in a region, to infer the likely efficacy and reduce genetically determined
adverse events on a group by extrapolating from data obtained from genetic studies of
other groups. For example it may have been shown that the frequency of particular
variants alters in a clinal fashion across a region.
There is currently a paucity of information on the distribution of human genetic variation
in sub-Saharan Africa, especially at a fine geographic scale. There is therefore a great
need for studies that address this, which may lead to important insights into a) the
histories of populations, b) the cultural evolution of human society at a fine-scale as
reflected in the relationship of genetic and linguistic patterns. A further important benefit
of studying the distribution of human genetic diversity is in improving disease
management by providing insights that may assist in more appropriate pharmaceutical
intervention.
20
In this thesis each of the above three aspects are addressed in a case-study format;
presenting three independent studies that illustrate the potential utility of investigating the
distribution of human genetic variation in sub-Saharan Africa.
The three studies are briefly explained below.
Sex-Specific Genetic Data Support One Of Two Alternative Versions Of The Foundation
Of The Ruling Dynasty Of The Nso´ In Cameroon
In this study sex-specific genetic data are used to shed new light on the origins of the
Nsoʹ, a prominent ethnic group in the Grassfields of Cameroon. The alternative oral
histories of their ethnogenesis have been the subject of fierce debate among historians and
anthropologists. These oral histories have been well documented (Mzeka 1978; Mzeka
1990) and the group have a unique social class system that enabled the formulation of
hypotheses that could be evaluated by analysing genetic data.
It All Depends On The Scale: Little Sex-Specific Genetic Variation In The Presence Of
Substantial Language Variation In Peoples Of The Cross River Region Of Nigeria
Assessed Within The Wider Context Of West Central Africa.
In this study sex-specific genetic data from a very large and sociologically well
characterised dataset are used to examine genetic variation among peoples speaking
different languages in the relatively small Cross River region of south-east Nigeria. Many
of the languages have been well studied in detail by linguists who have shown them to
have varying degrees of divergence from each other, ranging from hundreds to thousands
of years. The analysis is undertaken within the context of the wider geographic area of
West Central Africa by extending it to include groups from Cameroon and Ghana.
The potentially deleterious functional variant FMO2*1 is at high frequency throughout
sub-Saharan Africa
This study examines the distribution across Africa of variation in the gene that encodes
the drug metabolising enzyme Flavin-containing Monooxygenase 2 (FMO2). The effect
of FMO2 protein expression has previously been largely ignored because, due to a
premature stop codon, it has been non-functional in all European and Asian individuals
examined to-date. Expression of the enzyme may be important in the efficacy and safety
of drugs used to treat tuberculosis in Africa. The frequency of the putative functional
21
allele in sub-Saharan Africans is assessed and is found to be high across the region.
Evidence for selection and the possible date of origin of the mutation are also assessed, in
some instances using software developed in this thesis. The study illustrates the utility of
undertaking frequency surveys in DNA collections of known provenance with a view to
improving healthcare.
Before presenting these studies some background information on sub-Saharan Africa is
described with particular emphasis on previous work on human genetic variation in the
region that will allow the reader to better place in context the findings of the three case
studies.
1.2. Notable Geographical Features Of Sub-Saharan Africa
As the name suggests the area described as sub-Saharan Africa in this thesis encompasses
all of the African continent that lies south of the present day Saharan Desert, which itself
covers most of northern Africa. The sub-Saharan African mainland consists of a total of
42 politically defined countries (see Figure 1.1 for a political map of Africa). As well as
the Sahara there are two other major desert areas, the Kalahari Desert which is semi-arid
and covers much of Botswana and parts of Namibia (though the surrounding basin
extends to Angola, Zambia, Zimbabwe and South Africa) and the more arid and hostile
Namib Desert that extends from Namibia up to southwest Angola. Marking the southern
border of the Sahara is the Sahel, a strip of semi arid grassland (see Figure 1.2 for a
physical geography map) that stretches from the Horn of Africa to the Atlantic Ocean and
traverses the gap between the arid desert to the north and tropical wetland to the south.
For obvious reasons none of the regions are particularly densely populated.
The Congo Basin is the world‟s second largest rainforest and lies in the heart of sub-
Saharan Africa. Though mostly covering the Democratic Republic of Congo (formerly
Zaire), it also extends to the north into parts of the Central African Republic and Sudan,
to the west onto parts of the Republic of Congo and to the south onto parts of Angola and
Zambia. The higher relief of the Great Lakes form a boundary to the east. The Great
Lakes region comprises many east African countries and is a rather flexible term. The
22
Great Lakes constitute mainly Lake Victoria, Lake Tanganyika, Lake Malawi, Lake
Turkana, Lake Albert and Lake Kivu. Lake Victoria is the largest of all and, along with
Lake Albert and Lake Edward, form connections with the White Nile.
The White Nile flows north from the Great Lake region until it reaches Central Sudan in
Khartoum where it joins the Blue Nile, which has its origins in Lake Tana in Ethiopia.
From there the Nile continues north through mostly desert until it reaches the
Mediterranean. There are four other major Rivers: the Congo River, which passes through
the Congo Basin towards the Atlantic; the Zambezi River, which flows from Zambia to
the Indian Ocean; the Niger River, flowing from Guinea to the Atlantic Ocean through
Nigeria via Mali and the Orange River, which flows from eastern South Africa to the
Atlantic through Namibia.
The Chad basin in north central Africa consists of Lake Chad that transverses northeast
Nigeria, northern Cameroon, Chad and Niger. It is a particularly shallow lake that
fluctuates constantly in depth and has shrunk considerable over time.
Major highlands in sub-Saharan Africa include the high relief provided by the Great
Lakes region, the Ethiopian Highlands and the Great Escarpment in South Africa.
Another distinctive topographical feature of sub-Saharan Africa is the Rift Valley that
runs from Syria in the Middle East to Mozambique. It is formed as a result of the meeting
of tectonic plates and it contains and is responsible for part of the Great Lakes region and
the high surrounding relief.
1.3. The Languages Of Sub-Saharan Africa
1.3.1. Important Methods and Concepts of Historical Linguistics
The languages different groups of people speak play a prominent role in this thesis,
especially in Chapter 3, with an individual‟s cultural identity closely linked to their
linguistic affiliation. By studying the distribution and structure of these languages it is
possible to gain useful insights into the past movements of people. Before describing how
this types of analysis (part of the wider field of „historical linguistics‟) relates to sub-
23
Figure 1.1: A political map of Africa (from the Perry-Castañeda map
collection).
24
Figure 1.2: A physical geography map of Africa (from the Perry-Castañeda
map collection).
25
Saharan African languages some important concepts such as language classification,
comparison and dating are introduced.
Languages are often grouped into „language families‟ based on evidence of shared
characteristics such as phonology, morphology, lexicon and syntax. This is based on the
concept that languages gradually change and diverge over time, resulting in new
languages. Therefore a language family is a group of languages that can be related by
descent (back in time) to a common ancestor. This common ancestor (which will no
longer exist) is referred to as a proto-language (e.g. proto-Bantu). Usually the proto-
language is not known because of a lack of written records but it can be reconstructed by
comparing two or more languages in the proposed family. This generally involves the
assembly of sets of words from each language that are likely to share a common origin
(i.e. they have the same definition), also called cognates, and inference, based on general
sound change laws, of the most plausible changes that are likely to have occurred in these
words for the languages compared to have diverged to their present status. The process
requires considerable linguistic expertise and is somewhat subjective, especially with
regard to the reconstruction of the proto-language. Therefore two linguists may assemble
very different reconstructions, though the confirmation that languages are of the same
family because of common descent is generally more objective.
By performing this analysis on multiple languages in a family it is possible to determine
the hierarchical relationship for how the languages have diverged, allowing the
construction of a phylolinguistic tree (though in reality language divergence is gradual,
not instantaneous as implied by a tree). Given this „genetic‟ ordering of languages it is
possible to further subdivide the family into smaller phylolinguistic units. However there
is no hard and fast rule for these subdivisions, as reflected by the various terminologies
that can be used (subfamilies, branches, section, group subgroup) of which there is no
consensus for their use (Blench 2006). These subdivisions are generally applied on a
case-by-case basis; for example when discussing the Niger-Congo family one might make
reference to the Benue-Congo subfamily of languages in one circumstance and the Cross
River subfamily in another.
Some language families themselves are considered language phyla. A language phylum
would be described as any language family for which the external affiliation cannot be
26
determined. For example it is not clear how Niger-Congo languages are related to other
language families (such as Afro-Asiatic) back in time with regard to a common ancestor.
Therefore this language family would be considered a language phylum (Blench 2006).
Analogous to the method used to classify language phyla, some languages are considered
isolates as they do not demonstrate any descent from a common ancestor with any other
known existing language. These include languages such as Basque of Spain/France
(Trask 1997) and Hadza of Tanzania (Sands 1998) and they are sometimes considered
language families containing only one language.
Another method to elucidate the hierarchical structure of a language family but that does
not require the reconstruction of a proto-language is lexicostatistics. This uses a list,
called a Swadesh list (named after Morris Swadesh, who developed the technique), of
around 100 concepts likely to be found in all human languages such as „sleep‟ or „nose‟
and that are free of cultural meaning (Swadesh 1955). Analysis is performed in pairwise
language comparisons so that each slot on the Swadesh list is filled by a word from each
language. The percentage of concepts that have pairs of words (one from each language
analysed) that are cognates is then determined (again a decision made by a linguist),
giving a metric for how similar a pair of languages are (a lexicostatistic). If this analysis
is performed for multiple languages it is possible to build up a phylolinguistic tree, with
the highest lexicostatistic being found between the two languages that split most recently.
This analysis is affected by issues such as (like the comparative method described for the
reconstruction of proto-languages) bias from word borrowing and the imposition of
specific cultural meanings for specific words. Some researchers have questioned whether
a universal list can ever really be applied (Gudschinsky 1956).
The lexicostatistic analysis described above can be extended to estimate the time when
languages diverged (Swadesh 1952). This methodology, called Glottochronology, uses
languages with known divergence dates and their associated lexicostatistics to calibrate a
linguistic clock that can be applied to the lexicostatistics of pairs of languages with
unknown divergence dates. Glottochronology remains a very controversial field because
of its various assumptions (see Renfrew, McMahon & Trask 2000).
The above types of analysis rely heavily on languages branching off and diverging in a
tree like manner. However in reality the phenomenon of „language shift/replacement‟ is a
27
major force in the evolution of culture and will severely skew any analysis such as
lexicostatistic calculations. Language shift is the process where a community shifts from
a native language to another language for some cultural reason. This is generally a
gradual process (though the actual speed depends on the circumstances involved) and
initially the community will become bilingual before the native language is dropped. In
Africa this can be seen directly with populations learning the lingua franca of the
different regions. For example Hausa has replaced many minority languages in Nigeria
(as part of a process called Hausaisation) while small Chadic speaking communities have
adopted Arabic, with the spread of Islam being a major factor (Blench 2006).
1.3.2. The Distribution of sub-Saharan African Languages
While it is not possible to detail all the linguistic relationships found in sub-Saharan
Africa because of the sheer volume of very specialised information collected over the
years, some features that may be useful to a readers of this thesis are briefly described
based on Roger Blench‟s book Archaeology, Language, and the African Past (2006),
which reviews the linguistic and archaeological data on Africa currently available. The
purpose of the section is to act as guide for those unaware of the histories of some of the
regions and peoples discussed later in the thesis. It should in no way be taken as a
comprehensive review of the material as the conclusions drawn for the different
migrations and histories described are of varying confidence, the details of which are
discussed in Roger Blench‟s book Archaeology, Language, and the African Past (2006).
Different points of interest have simply been summarised to create as full a picture of the
linguistic history of sub-Saharan Africa as possible for the reader. Where indicated by
Blench (2006) suitable secondary sources have also been included from which the
different conclusions are drawn.
Sub-Saharan African languages can be divided into four broad language phylums, Niger-
Congo, Afro-Asiatic, Nilo-Saharan and Khoisan (Greenberg 1963) (see Figure 1.3 for a
linguistic map of Africa). In addition there are languages that do not fit into these
traditional groupings such as Jalaa in Nigeria, Ongota in Ethiopia and Kwadi in Angola.
There are relatively few of these isolates in comparison to other regions in the world,
which is possibly a result of recent large-scale population movements. Finally
Austronesian languages, which are typically spoken outside the continent, are found
throughout the island of Madagascar, brought there by mariners from southeast Asia
28
around 2000 years ago during the great Austronesian expansions, probably from southern
Borneo (Adelaar 1995).
Figure 1.3: A simplified linguistic map of Africa according to the
classification of Greenberg (1963) (a vectorisation by Mark Dingemanse).
Niger-Congo languages are by far the most widespread group of languages found in sub-
Saharan Africa, with speakers found in Senegal as well as the tips of South Africa, with
different languages apparently continuously connected, which suggest some form of
29
population expansion. A peculiar exception to this general Niger-Congo connectivity are
groups in the Nuba Hills in Sudan that speak Kordofanian, a Niger-Congo language, yet
are completely surrounded by Nilo-Saharan speakers. There are 1,514 Niger-Congo
languages according to the latest Ethnologue build (2005). Bantu languages in particular
are numerous and geographically widespread but make up only a small part of the
diversity of the family. It is likely that the proto-Niger-Congo language arose in West
Africa earlier than 7000 years before present. The proto-Niger-Congo speakers would
have overwhelmed and/or assimilated hunter-gatherers groups and expanded due to some
technological advantage, which may have been the development of agriculture leading to
a possible crop domestication centre (Renfrew 1992; Ehret 2002), though the evidence for
this is not very concrete (Neumann, Kalheber & Uebel 1998; Ehret 2002), or the use of
more effective hunting technologies such as the bow and arrow.
Bantu speakers, a branch of Benue-Congo (the largest branch of the Niger-Congo family
with regard to the number of languages, speakers and geographic area covered), appear to
have expanded to the east 4000 years ago from a northwest Cameroonian origin
(Greenberg 1955) and spread through much of sub-Saharan Africa. This movement is
commonly termed the „Bantu Expansion‟ but it is more appropriate to refer to it as the
„expansion of the Bantu-speaking peoples‟. The key to this expansion was almost
certainly an agropastoralism lifestyle (Vansina 1990) while the development of iron
smelting may also have helped in later phases. The expansion to the south of sub-Saharan
Africa was probably split into two streams of people. One stayed on the west side of the
continent and moved through the rainforest (Banana domestication may have been
important for this) to the southeast where they encountered Khoisan speakers in the
Kalahari desert (Denbow 1986; Denbow 1990). The other stream first moved east along
the north fringe of the forest to the Great Lakes. From here they further expanded to the
south of the continent along the east coast (Huffman 1998).
There are currently 400 Afro-Asiatic languages. Most are found in North Africa and the
Middle East. However a diverse range of Chadic languages are found in the Chad Basin
and Semitic, Omotic and Cushitic languages are spoken in East Africa. Afro-Asiatic
probably arose in southwest Ethiopia 9-10 thousand years ago, probably in the same
location where Omotic languages are spoken today (Bender 1997; Blench 1999a). There
does not appear to be any evidence that Omotic speakers have moved anywhere else since
30
this origin. It is likely that Omotic and Cushitic proto-language diverged when Cushitic
speakers took up animal domestication while Omotic speakers retained their hunter-
gatherer lifestyle.
Cushitic speakers probably subsequently spread rapidly in many directions including
towards Lake Chad 4-5 thousand years ago (Blench 1999b), establishing Chadic speakers
in the region as well as heading as far south as Zambia. Another group may have headed
north and would later become Berber, Egyptian and Semitic speakers. The proto-Semitic
speakers arrived in the Near East but then turned back in a westward direction towards
Ethiopia where they probably displaced Omotic and Cushitic speakers. The Chadic
linguistic diversity is probably a result of admixture with many other settled populations.
Nilo-Saharan has only relatively recently been defined as a language group and its
existence is still contested by some researchers. It appears to be very geographically
fragmented and it is very difficult to ascribe the different languages in the phylum a
simple chronological order of origin and dispersal. However Blench has attempted to do
this in his book using glottochronological estimates of dates based on the internal
diversity of subgroups correlated with archaeological/ climatological evidence and a
proposed order of dispersal of major Nilo-Saharan language subgroups is described
below. However he warns that the dates “need to be treated with great caution”.
It is likely that the Nilo-Saharan heartland was somewhere in south Ethiopia. The Nilo-
Saharan speakers then spread westward approximately 18,000 years ago towards the
confluence of the Blue and White Nile rivers. They were still a small group of probably
hunter-gatherers at least 12,000 years ago and at that point some spread north along the
Nile while others migrated further westward towards Lake Chad in what may have been a
popular corridor for migration. At around 9000 years before present, the Nilo-Saharan
speakers in Lake Chad started to occupy parts of the Sahara desert (to become today‟s
Saharan speakers) while 6,000-8,000 years before present another group headed towards
the Niger river, the Nilo-Saharan languages now having travelled across most of the
width of the continent. Other movements occurred but, like much of the above, it is not
easy to work out how they got to where they eventually settled as the language group is
not particularly cohesive. Nubian speakers may have been a result of a back migration of
Saharan speakers from the west towards the Nile where they displaced settled Afro-
31
Asiatic speakers around 2,500 years ago. There has also been some debate regarding the
close relationship between Niger-Congo and Nilo-Saharan and whether they should be
placed within a macrophylum (Gregersen E.A. 1972), with Niger-Congo possibly being a
lower-level branch of Nilo-Saharan (Blench 1995).
Around 30 Khoisan or „click‟ languages are spoken at present (Güldemann & Voßen
2000) in small scattered communities across south western Africa though groups are also
found in Angola and Tanzania (the Hadza). They are a very diverse phylum though their
„click‟ attributes keep them linguistically grouped together as „click‟ languages are not
found elsewhere in the world. Therefore due to their common origin it has been suggested
that Khoisan speakers once extended up to Somalia though expanding Cushitic and
particularly Bantu speakers have substantially reduced their range.
Pygmies are usually by definition of short statue (eastern pygmies are smaller than
western pygmies, possibly because of less Bantu farmer introgression). Generally they
have adopted the languages of their agricultural neighbours. For example the Aka and
Biaka western pygmies speak Bantu and Adamawa respectively. However they appear to
share some common words that hint at a lost pygmy language (Bahuchet 1992; Bahuchet
1993). It has been suggested that the Pygmies were once a single group of hunter-
gatherers who have their origins in the tropical rainforest. One suggestion for their current
distribution is that reduction in the size of the rainforest around 10,000 years ago led to
separation and isolation of multiple pygmy groups. As the rainforest expanded again the
pygmy groups started to disperse at which point they encountered Bantu farmers along
side or amongst whom they settled. A recent study by Migliano et al. (2007) suggest that
the development of characteristic pygmy height is not, as previously thought, a direct
adaptation to a specific environment such as ease of movement through dense tropical
forests (Cavalli-Sforza 1986) but is instead a by-product of the need to reproduce
relatively early in life due to particular ecological conditions causing high early mortality
rates.
32
1.4. Previous Work On The Distribution Of Genetic Variation In Sub-
Saharan Africa
1.4.1. Classical Markers
The first studies on human genetic variation did not directly involve DNA but rather were
based on detecting and assessing variation using so called „classical markers‟. Types of
classical markers range from the different variants found in the blood groups systems (the
most basic being the A, B and O blood groups), the different forms of particular proteins
found in blood, liver and muscle such as the haemoglobins and the many different Human
Leukocyte Antigen (HLA) isoforms. The substantial use of classical markers through the
1970s and 80s for evolutionary studies was a result of large collections being made for
unrelated reasons such as tissue and blood matching coupled with the fact that methods of
detection of these alleles was considerably easier than examining DNA at the molecular
level (techniques for DNA typing were only introduced in the mid-1980‟s). With
technological advances classical markers were superseded and direct analysis of DNA
undertaken. However vast amounts of data have been assembled using these classical
markers that have been very useful in assessing population history. Much of the classical
marker data up to 1986 were collated and analysed in Luigi Cavalli-Sforza and
colleague‟s seminal work The History and Geography of Human Genes (1994). Though a
number of issues with regard to the methodologies used were raised after publication
many of the conclusions drawn from the study underpin present day thinking in genetic
history. The contents and timing of the book marks the period of transition from the use
of classical to molecular DNA markers, as witnessed by the limited discussion of the
latter. It did however include the well known mitochondrial (mtDNA) DNA study of
Cann et al. (1987). Though analysis in this thesis exclusively utilises molecular markers,
given the historical importance of the classical markers a brief summary of this earlier
work is provided to the extent that it relates to sub-Saharan Africa.
In The History and Geography of Human Genes sub-Saharan African samples were
analysed within two main frameworks, the first examining sub-Saharan populations as
well as populations found elsewhere in the world, the second examining sub-Saharan
populations as well as other African populations. Within the first framework 42
populations were assembled from the years 1978 to 1986 with data defining 120 classic
33
marker alleles. Six of these populations were considered sub-Saharan African: Bantu;
East African Ethiopians; Nilo-Saharan; West African; !Kung San from Botswana and
Namibia; Mbuti pygmies from Zaire. It should be noted that the 42 populations were
assembled by pooling data based on geographic, ethnic and linguistic affinities to gain a
suitable balance between available genetic data and the number of populations analysed.
The authors claim that the adverse effect of possible heterogeneity introduced by pooling
data should be minimal, while reducing the effect of noise caused by genetic drift in
particularly small populations.
Unweighted Pair Group Method with Arithmetic mean (UPGMA) trees based on both
Nei‟s genetic identity (Nei 1987) (Figure 1.4) and Reynold‟s FST (Reynolds, Weir &
Cockerham 1983) measures both clearly showed clustering of the six sub-Saharan
African and separation from non-sub-Saharan African populations, including the Berber
population from North Africa. The separation of sub-Saharan African from non-sub-
Saharan African populations was strongly maintained when bootstrapping (a method that
creates a number of resampled sets of the observed dataset using random sampling with
replacement) the data used to construct the UPGMA trees (though to a substantially lesser
extent for the East African and San populations). A similar pattern was observed in
Principal Component Analysis (PCA) Maps (the first principal component also grouped
Europeans with sub-Saharan Africans but this was thought to be a result of the
comparatively low number of sub-Saharan African samples utilised rather than any
genetic similarity between the two regions). Pooling populations in seven distinct regions
further reinforced this strong African (in reality sub-Saharan African) and non-African
split. A crude date for the split of African and non-African populations based on these
data and calibrated using archaeological findings was estimated at around 100,000 years
before present.
According to the authors the Nei‟s I-based tree and the linguistic tree showed a striking
correlation, including for four of the six the sub-Saharan African populations. The Mbuti
Pygmies could not be analysed as it was believed that they had lost their original
language and adopted that of neighbouring farmers while the Ethiopians were on very
different branches from the Berbers, though both were classified as Afro-Asiatic speaking
groups. It is difficult to assess the significance of the lack of Ethiopian-Berber linguistic
correlation given the cultural and linguistic heterogeneity found in Ethiopia. In addition
34
some correlation was observed between genetic and geographic distance of populations
though to a much lesser extent to that seen in Asia.
The authors found this analysis to correspond with an origin for modern humans from
Africa some 100,000 years ago and it should be noted that though they do not rule out
limited admixture with archaic Homo sapiens they do not even consider, let alone
formally test against, the multi-regional hypothesis of the origin of modern humans.
Figure 1.4: Average linkage tree for 42 populations. The abscissa shows the
genetic distances (modified Nei) calculated on the basis of 120 allele
frequencies taken from 42 classic genetic marker systems. Taken directly
from Cavalli Sforza et al. (1994).
35
When examining sub-Saharan African populations within an African framework 49
distinct African populations were assembled, again pooling data in a number of different
ways (ethnicity, language and geography). The average number of genes per population
was 47.6 but the actual average number of genes analysed in pairwise comparisons was
only 28.6. A UPGMA tree of these populations showed two distinct clusters, one with
only sub-Saharan African populations and a second with North African and East African
populations. This second cluster can be separated into North and Eastern populations
except that the Algerian population falls within the Eastern set while the Somali
population is found amongst sub-Saharan Africans rather than amongst the other Eastern
populations.
However, the majority of evolutionarily interesting groupings fall within the macro-sub-
Saharan African cluster where there are two major sub-clusters, the first a Central-
Southern cluster consisting of Bantu and Nilo-Saharan speaking populations and the
second consisting of West African populations. There are also four distinct outlying
groups of populations not found in either of these two sub-clusters. These include a group
of Khoisan populations, the Mbuti Pygmies and a group of Senegalese populations.
Overall the genetic tree appeared to display a fair level of congruence with both language
and geography. Of the main sub-Saharan African cluster a PCA plot showed the Bantu-
speaking populations to be particularly homogenous in comparison to the West African
and Nilo-Saharan speakers. The West African populations were more heterogeneous than
the Nilo-Saharan speakers and more dissimilar to the Bantu-speaking populations. In
regard to the Bantu speakers, neighbouring populations were more similar to each other,
perhaps reflecting the eastern and western streams of the expansion of the Bantu-speaking
peoples, but there was still a remarkable similarity between the north-western and south-
eastern Bantu-speaking populations. The Nilo-Saharan speakers‟ similarity to the Bantu
speakers should be interpreted with caution as the group was not particularly well defined
with no major representatives of the groups present in the study. The authors expressed a
view that the West African populations can be separated into three main clusters, one
Senegalese cluster, one a collection of populations lying between Senegal and Nigeria
and one consisting of four ethnically diverse Nigerian groups. The West African
heterogeneity was explained as being due to the region having had a greater time for
36
populations to differentiate in comparison to Bantu-speaking populations, possibly as a
consequence of a much older agricultural expansion, given that Bantu-speaking
populations have essentially had a maximum of around four thousand years (the start date
of the expansion) in which to differentiate.
The PCA plot also shows the East African populations lying between the sub-Saharan and
Northern African populations. The difference between the Northern and sub-Saharan
populations was expected given the barrier to gene flow the Sahara represents and the
differing physical features of people from the two regions. The intermediate status of the
East African populations is likely to have been a result of admixture between sub-Saharan
Africans in the south and Northern Africans and the Middle East individuals from the
north. Admixture percentages were estimated (using standard methods that calculate m of
a target population from two proposed parent populations based on relative gene
frequencies in the three populations (see page 55 of Cavalli-Sforza, Menozzi & Piazza
1994)) as approximately 60% of the former and 40% of the latter, though the exact level
of admixture is likely to vary depending on the location of particular groups, with
Ethiopians likely to be especially heterogeneous due to cultural and language differences
found within the country.
In regard to the outliers observed in the tree the most intriguing were where the click
speaking Khoisan and the Pygmies are placed. The Khoi, who practice farming, appear to
have been involved in substantial gene flow with southeast Bantu speakers. Admixture
for the Bantu element in the Khoi was estimated at approximately 60%. Conversely San
admixture with Bantu-speaking groups had been estimated for example in the Xhosa at
around 60%. These admixture events may have been paralleled by elements of the click
languages moving into some Bantu languages. On the other hand the San, who still retain
a hunter-gatherer lifestyle, show a loose similarity with East Africans and Asians. The
data presented by Cavalli-Sforza et al. (1994) therefore suggest that, despite the proposal
from some anthropologists that Khoisan are relics of the earliest human groups, they are
in fact the result of an early mixture between native Africans and Asians in similar
manner to Ethiopians, though the two events occurred at distinctive periods of time.
Three pygmy groups were assessed in this study using classical markers, of which the
smallest group, the Mbuti from the Democratic Republic of Congo (formally Zaire), are
37
the most genetically distinct, from both the other pygmy groups and from other sub-
Saharan Africans, hence their outlier position in both worldwide and African-specific
trees. Though it should be interpreted with caution the authors suggest a separation time
from sub-Saharan Africans of 18,000 years based on the FST estimated between the two
groups, with genetic distance calibrated with archaeological separation times. In addition
the Mbuti appear to show no obvious similarity to San or East Africans. The Biaka appear
to have experienced considerable admixture with their neighbouring Bantu-speaking
farmers while the third heterogeneous „pygmoid‟ group are almost indistinguishable from
other sub-Saharan Africans both genetically as well as phenotypically in terms of height
and culture. The work on Africa in this study finishes off with a particularly interesting
statement, commenting that “differences between most-sub-Saharan Africans other than
Khoisan and Pygmies seem rather small”.
Despite now being regarded as somewhat of a classic, the book was met with criticism in
regard to some of the analyses performed and its treatment of linguistic data. O‟Grady et
al. (1989) in particular raised various issues on the choices of pooling samples and on the
linguistic tree used. Overall, they found the fit of the genetic with linguistic data less than
„remarkable‟. Cavalli-Sforza and colleagues did respond to this criticism but serious
scepticism of choices of linguistic relationships and population affiliations, particularly in
Africa, that severely undermine the analysis still remain amongst anthropologists and
linguists (MacEachern 2000), with the criticism that many of the conclusions drawn from
these data are of an ad hoc nature.
Nei and Ota (1991) also criticised the use of UPGMA trees and an assumption of a
constant rate of evolution (which is not likely because of a) previous population
bottlenecks and b) some loci used may have been under selection, especially in regards to
Malaria). However, while using a Neighbour-Joining (NJ) tree that allowed for
evolutionary rate heterogeneity did considerable alter much of the structure (though using
a new set of classical markers (Nei & Roychoudhury 1993)) the main African and non-
African split that supported the out-of Africa hypothesis was still prominent and the San
were again an outlier from other sub-Saharan Africans.
While the many positive aspects of this study using classical markers have underpinned
later research using molecular data, including additional evidence of the common origin
38
of modern humans in sub-Saharan Africa (the book is still widely referenced), other
negative aspects of this study also persist, notably in the form of insufficiently rigorous
sampling strategies.
1.4.2. Molecular Data
Though sub-Saharan African molecular DNA has been collected and analysed in order to
investigate disease, by the far the most substantial collection of samples from African
populations has been undertaken to answer questions concerning the origins of modern
humans. A comprehensive discussion this topic, which many reviews have previously
addressed (Templeton 1997; Harpending & Rogers 2000; Hawks et al. 2000; Excoffier
2002; Brauer, Collard & Stringer 2004; Templeton 2005; Reed & Tishkoff 2006;
Garrigan & Hammer 2006; Torroni et al. 2006), is beyond the scope of this thesis but a
brief overview is presented.
1.4.2.1. The Origins of Anatomically Modern Humans (AMH)
Prior to the availability of genetic data there were two main schools of thought in regard
to the model proposed for the origin of modern humans: the „Recent African Origin‟
(RAO) model (Howells 1976; Harpending 1993) in which modern man arose somewhere
in Africa around 100,000-200,000 years ago and replaced all other archaic humans in an
expansion across the world and; the „Multiregional‟ (MR) model (Weidenreich 1946;
Thorne & Wolpoff 1981) in which modern man evolved over a period of a few million
years in the presence of continuous gene flow at a global level among groups of
geographically disparate archaic humans. In addition there are also models intermediate
of these two extremes (Smith 1985; Eswaran 2002; Templeton 2002) but in much of the
early molecular work the focus was on the two extremes. As stated above the classical
marker data appeared to side with the RAO model though not particularly strongly.
There were early indications with restriction enzyme digests of the beta-globin gene
(Wainscoat et al. 1986) and other autosomal regions that molecular data would parallel
the classical data showing a split between the African and non-African haplotype
diversity. However the study that stimulated the most debate was a multiple restriction
enzyme digest of mtDNA in a „worldwide panel‟ of individuals that was reported by
Cann et al (1987) , which appeared to show a most parsimonious tree with a clear split
between some African individuals and all other individuals, as well as identifying a
39
„mitochondrial Eve‟ that lived some 140,000-290,000 years ago. The African samples
appeared to be the most genetically diverse with all non-Africans possessing a subset of
this diversity. This was much vaunted as being evidence that supported the RAO model.
A number of issues concerning the methodology used in the study were raised: (a) its use
of African Americans rather than „true‟ Africans (Jobling, Hurles & Tyler-Smith 2004 pg
255), b) the tree used was one of a number of other most parsimonious trees that severely
altered many of the branches (Maddison 1991), c) the assumption of a continuous rate of
evolution for all branches (Wills 1992), d) its use of mid-point rooting (Darlu & Tassy
1987) and e) not taking into account the effects of natural selection acting on mtDNA
(Excoffier & Langaney 1989). Despite these criticisms the paper encouraged over the
next few years a number of other studies in which mtDNA was analysed. They improved
on the methodologies and samples used (though the actual number of African samples
included in these was still quite small). All these studies essentially came to the same
conclusion: modern man originated in Africa some 100,000-200,000 years ago
(Hasegawa & Horai 1991; Vigilant et al. 1991; Pesole et al. 1992; Horai et al. 1995).
Autosomal genes apparently showed a similar pattern using various types of marker
(Restriction Fragment Length Polymorphisms (RFLP), microsatellites) though the date of
the common ancestor appeared to fluctuate to a greater degree (Olerup et al. 1991; Nei &
Roychoudhury 1993; Bowcock et al. 1994; Armour et al. 1996; Tishkoff et al. 1996).
However it was also noted that the existence of greater genetic diversity within Africa on
its own was not sufficient to reject the MR model as this could also be the result of a
higher effective population size (Relethford & Harpending 1994), a point many of the
early studies claiming support of the RAO model had not considered. In addition some
studies did not show as great a distinction between Africans and non-Africans or as recent
a common ancestor as seen when analysing mtDNA (Jorde et al. 1995; Harding et al.
1997; Jorde et al. 1997). This led some authors to be less confident about their
conclusions concerning the best choice of model (especially without formal testing of the
models). Nevertheless the majority of researchers were agreed that Africa was a particular
important region in human evolution (Hammer et al. 1997; Stoneking et al. 1997;
Relethford & Jorde 1999). Researchers have attempted to account for higher African
population size by attributing this to an African-specific expansion that occurred before
an out-of-Africa migration (Reich & Goldstein 1998).
40
More focus was placed on statistically discriminating between the two models. Takahata
et al. (2001) used computer simulations to show that a large difference in the breeding
sizes of African and non-African populations would be needed to accept the MR
hypothesis. Data that show higher African genetic diversity using different genetic
systems has continued to mount (Seielstad et al. 1999; Jorde et al. 2000; Ingman &
Gyllensten 2001; Watkins et al. 2001; Macfarlane & Simmonds 2004; Witherspoon et al.
2006) and has led many in the scientific community to consider the case more or less
closed (see opinion box 8.4 in Jobling, Hurles and Tyler-Smith (2004)).
However some studies still demonstrate results that conflict with this general belief (Zhao
et al. 2000; Yu et al. 2001). The conspicuous absence of data directly supporting the MR
hypothesis is noteworthy. The issue is rather the inability to fully distinguish the MR
from the RAO model. It is becoming clearer that the strict MR model is of very little
value (Note though that some researches claim that the model has been misrepresented in
the literature (Templeton 2007)), which has led to proponents of multiregionalism to turn
to an „assimilation‟ model where Africa was the predominant influence but gene flow has
been experienced at a more limited level with smaller non-African archaic human
populations vulnerable to bottleneck and extinction events (Stringer 2002). This approach
seems a somewhat more useful and a promising model proposed by Eswaran (2002) in
which humans expanded from Africa as a wave and incorporated genes of archaic
humans at the wavefront appears to fit the current data quite well (Eswaran, Harpending
& Rogers 2005). As sequencing technologies increase in sensitivity, data indicating
possible gene flow of archaic alleles into modern human genomes are being uncovered.
This suggests that models incorporating some archaic human introgression (Wall &
Hammer 2006; Evans et al. 2006; Hawks et al. 2008) may become more acceptable in the
future causing the simple RAO model to be modified or rejected. Certainly more
information will be elucidated from application of the new high throughput sequencing
technologies, which have already successfully sequenced large amounts (~1,000,000bp)
of Neanderthal DNA (Green et al. 2006) (this particular investigation is part of a
collaborative effort between the Max Planck Institute and the 454 Life Sciences
Corporation to sequence the entire Neanderthal genome).
However even the assimilation models discussed above demonstrate the prominence of
the original African contribution to the modern human gene pool. If this is indeed the case
41
then current genetic data mostly likely place the origin of these „ancestral Africans‟
somewhere between eastern and southern Africa (Quintana-Murci et al. 1999; Jobling,
Hurles & Tyler-Smith 2004-Chapter 8; Prugnolle, Manica & Balloux 2005; Ray et al.
2005; Amos & Manica 2006; Liu et al. 2006; Cramon-Taubadel & Lycett 2007)).
1.4.2.2. The Genetic History of sub-Saharan Africa based on Molecular data
Molecular genetic studies in sub-Saharan Africa have been limited but have provided
interesting insights into human history, especially at large geographical scales (fine-scale
studies have been hindered by a lack of dense sample sets). The majority of inferences
have been drawn from mtDNA studies, followed closely by studies on the non-
recombining portion of the paternally inherited Y-chromosome (NRY), while autosomal
work has been limited. Analysis of autosomal data has been constrained by many factors
including difficulties in haplotype inference (Niu 2004), evaluating the effects of
selection, unavailability of suitable genotyping and sequencing technologies and
difficulties in undertaking fieldwork. These problems are now being surmounted and a
great number of studies involving autosomal genetic systems may be expected in the
future. Studies discussed below in general support the conclusions drawn in Cavalli-
Sforza‟s classical marker work.
1.4.2.2.1. The Differentiation Between sub-Saharan and Northern Africa
One striking confirmation of Cavalli-Sforza‟s work is the clear genetic separation of sub-
Saharan African populations from North African populations. The latter more closely
resemble Middle Eastern and Eurasian populations in almost all mtDNA, NRY and
autosomal studies (for good examples see Poloni et al. (1997); Scozzari et al. (1999);
Luis et al. (2004); Cruciani et al. (2002); Salas et al. (2002); Salas et al. (2004); Terreros
et al. (2005)), demonstrating the major genetic barrier that the Sahara Desert has been
through much of modern man‟s occupation of the African continent. However there is
evidence of contact in both directions involving both male and female mediated gene
flow in populations lying close to the boundaries of the Sahara: the Chad Basin, Guinea
Bissau and Algeria (see Salas et al. (2002), Coia et al. (2005), Rosa et al. (2004), Rosa et
al. (2007), Cerny et al. (2007), Flores et al. (2001), Richards et al. (2003)), with the
expansions of Berbers, migrations of Fulani and the Arab slave trade possibly being
major influences.
42
1.4.2.2.2. The Expansion of the Bantu-speaking Peoples
Cavalli-Sforza and colleagues commented on the generally homogenous nature of sub-
Saharan Africa and this appears to be reflected in most of the molecular data (Underhill et
al. 2001; Renquin et al. 2001; Salas et al. 2002; Berniell-Lee et al. 2006). MtDNA, NRY
and autosomal data show that Niger-Congo (and especially Bantu-speaking) populations
are often indistinguishable at close (Destro-Bisol et al. 2000; Renquin et al. 2001;
Donaldson et al. 2002; Lane et al. 2002), intermediate (Scozzari et al. 1994; Lecerf et al.
2007) and large geographic distances (Donaldson et al. 2002; Steinlechner et al. 2002;
Collins-Schramm et al. 2002; Alves et al. 2005) throughout much of sub-Saharan Africa.
The expansion of the Bantu-speaking people some 3,000-5,000 years ago is likely to have
been a major driving force for this homogeneity. Both NRY and mtDNA lineages appear
to have been spread through much of sub-Saharan Africa as a result of this expansion
though the patterns observed for men and women are quite distinct. The majority of men
in Bantu-speaking populations possess the NRY defined haplogroup E3a (Underhill et al.
2001; Cruciani et al. 2002) and a particular haplotype on this E3a background defined by
six microsatellites is the modal type in numerous Bantu-speaking populations, stretching
all the way from Cameroon and Nigeria to Southern Africa (Thomas et al. 2000; Pereira
et al. 2002; Berniell-Lee et al. 2006). Given its predominant presence in Bantu-speaking
populations, its relatively low within-haplogroup diversity (Scozzari et al. 1999) and an
estimated time for the most recent common ancestor for South African Bantu possessing
E3a chromosomes of 3000-5000 years before present (Thomas et al. 2000) this
distribution is best interpreted as a signature of Bantu-speaking males expanding across
sub-Saharan Africa.
The pattern seen with mtDNA is somewhat different. Usually characterised by their star
like genealogy because of a recent founder effect, there appear to be numerous lineages
that have different geographic origins and that have been dispersed as part of the
expansion (for example L1a, L2a1a, L2a2, L3e1 (Bandelt et al. 2001; Salas et al. 2002)).
This high Bantu mtDNA diversity appears to suggests that the expansion involved lots of
short bursts and involved numerous women being absorbed into groups of the incoming
Bantu-speaking men at various stages (Salas et al. 2002) while the male Bantu farmers
migrated much larger distances (Wood et al. 2005) and had lower effective population
sizes as evidenced by the low NRY haplogroup diversity, overwhelming most of the pre-
43
Bantu expansion NRY types (mainly Haplogroups A and B) throughout the continent
(Underhill et al. 2001). The features of the female-mediated aspects of the expansion have
also allowed more ancient diversity to be retained in the genetic record such as lineages
L1a and L1b that show evidence, because of their star-like structure, of internal
migrations occurring in sub-Saharan Africa starting 60-80,000 years ago (Watson et al.
1997).
MtDNA, because of its diversity, has been interpreted as providing the best genetic
evidence so far that the Bantu expansion occurred in distinct east and west streams. Both
streams appear to possess common features, especially in regard to the NRY, but also
show subtle differences (Pereira et al. 2001; Salas et al. 2002; Pereira et al. 2002). The
populations that were part of the eastern stream possess some mtDNA types mostly
specific to that region that indicate that it originated in the Great Lakes region of eastern
Africa (Soodyall et al. 1996; Salas et al. 2002). The western stream appears, as expected,
to have been greatly influenced by mtDNA types that seemingly originate in West Africa
(Plaza et al. 2004; Beleza et al. 2005) and its expansion seems to have been a more
gradual process, with a great diversity of mtDNA types and very few signs of founder
effects (Beleza et al. 2005). However it also appears that the western and eastern streams
may have connected near the end of the expansion, possibly along the southern African
savannah, as suggested by the presence of apparently distinct eastern and western stream
signature mtDNA and NRY types in the same populations, with the majority of gene flow
going from the east to the west (Pereira et al. 2002; Plaza et al. 2004; Beleza et al. 2005).
1.4.2.2.3. East Central Africa
East Central Africa, which in this sense encompasses Ethiopia, Sudan, Somalia and
Eritrea is the most genetically distinct and heterogeneous region of sub-Saharan Africa,
having experienced, because of its geographic position, possible gene flow with sub-
Saharan African, North African, Middle Eastern, European and Asian populations
(Passarino et al. 1998; Quintana-Murci et al. 1999; Underhill et al. 2001; Salas et al.
2002; Richards et al. 2003; Kivisild et al. 2004; Lovell et al. 2005; Sanchez et al. 2005).
Like much of sub-Saharan Africa, many mtDNA types are found in the region.
Haplogroup L3 is found at highest frequencies in East Africa (Salas et al. 2002) and
carriers of this lineage appear to have been among the first that migrated out of Africa
(Watson et al. 1997) (and seemingly spawning haplogroups M and N, the most
44
phylogenetically ancestral haplogroups present outside Africa), with Ethiopia being
proposed as the most likely region of origin for L3 (Quintana-Murci et al. 1999). There
has also been some debate on whether haplogroup M (as well as N) arose in East Africa
(Passarino et al. 1998; Quintana-Murci et al. 1999) or is present in the region due to a
back migration from Asia (Olivieri et al. 2006; Gonzalez et al. 2007), with different
sample sets and analyses giving subtly different signals. No general consensus has yet
been reached. The NRY profile of East Africans is also relatively heterogeneous in
comparison to much of sub-Saharan Africa and appears to possess both African-specific
clades, including the most ancestral haplogroups A and B as well as clades common
outside of Africa (Underhill et al. 2000; Underhill et al. 2001).
The diversity found in Ethiopian populations encompasses most of the genetic
heterogeneity of East Africa, appearing to be a composite of sub-Saharan African, West
Eurasian, Middle Eastern, and North African mtDNA, NRY and X chromosome lineages
(Passarino et al. 1998; Scozzari et al. 1999; Underhill et al. 2001; Cruciani et al. 2002;
Semino et al. 2002; Kivisild et al. 2004; Lovell et al. 2005), with some of these non-
African lineages entering Ethiopia through past back migrations from the Levant and/or
the Horn of Africa (Semino et al. 2002; Luis et al. 2004; Kivisild et al. 2004). In addition
male-mediated non-sub-Saharan African gene flow into Ethiopia appears to have been
substantially greater than female-mediated gene flow (Passarino et al. 1998). While one
explanation for the composite Ethiopian genetic profile is admixture between sub-Saharan
African and non-sub-Saharan African people, as suggested by Cavalli Sforza‟s work on
classical markers (Cavalli-Sforza, Menozzi & Piazza 1994), Tishkoff et al. (1996)
suggested from analysis of the CD4 autosomal locus that as Ethiopian Jews (as well as
Somalians) only possessed a subset of sub-Saharan African diversity, populations from
this region „may represent the modern descendants of the ancestral population that
spawned the migration from Africa‟ and thus already possessed a large amount of their
heterogeneity before the out-of- Africa expansion. Other points to note are a) that
haplogroup N, which is ancestral to most European mtDNA lineages, increases
significantly in frequency in a north eastern direction in Ethiopia, while haplogroup M is
much more evenly distributed (Kivisild et al. 2004) and b) previous NRY, mtDNA and
autosomal studies have been unable to differentiate between the two largest ethnic groups
of the region, the Amhara and Oromo (Scozzari et al. 1999; Sanchez-Mazas 2001;
Kivisild et al. 2004; Lovell et al. 2005). The presence of ancestral mtDNA and NRY
45
types in Ethiopian genetic profiles also appear to indicate some ancient link with Khoisan
speakers, a subject discussed later in this chapter.
In terms of genetic research, compared to Ethiopia other parts of East Africa have been
substantially less-well studied. However a study by Krings et al. (1999) did show that the
River Nile has been acting as a corridor for bidirectional gene flow, at least as far as
mtDNA diversity between Egypt and Sudan is concerned, with the greater (or older)
movement being south into Sudan. In addition there is no NRY evidence that suggests the
expansion of the Bantu-speaking peoples reached Somalia, which shows a close
similarity, as would be expected, to Ethiopia (Sanchez et al. 2005).
1.4.2.2.4. West Africa
NRY data from Senegal, Guinea Bissau, Gambia and Ghana show a high frequency of
haplogroup E3a (Semino et al. 2002; Wood et al. 2005; Rosa et al. 2007), which is
interesting given that the haplogroup is a putative signature of the Bantu expansion
(Underhill et al. 2001), suggesting it has an older and geographically more widespread
significance, possibly being a signature of the original proto Niger-Congo speakers. In
addition or alternatively the low NRY haplogroup diversity in West Africa may be a
product of agricultural expansion throughout the region or another part of the same
expansion that includes that of the Bantu-speaking peoples. Many of the mtDNA types
found in West Africa (from Senegal, Guinea Bissau and Sierra Leone) appear to be
specific to the region (Rando et al. 1998; Salas et al. 2002; Rosa et al. 2004; Jackson et al.
2005) and some of the many putative Bantu signature mtDNA lineages are not found at
high frequencies (e.g. L1a or L3e) indicating any agricultural expansion in West Africa
may have been somewhat distinct from the expansion of the Bantu-speaking peoples
towards southern Africa (Rosa et al. 2004). Given that there are no Bantu languages
spoken in West Africa and that E3a, due to its greater haplotype diversity, appears to have
been established in the region prior to the expansion of the Bantu-speaking peoples
(Cruciani et al. 2002) this region may have been the original wellspring of farming in sub-
Saharan Africa and acted as source of knowledge for the subsequent Bantu expansion
(Rosa et al. 2007) i.e. the West African agricultural expansion was older than the later
expansion of the Bantu-speaking peoples. The Mandeka of Guinea Bissau possess a
particularly high E3a microsatellite haplotype diversity and an mtDNA L2a/L2c star
phylogeny indicative of an expansion that suggests that they benefited from this
46
agriculture-driven population expansion (Rosa et al. 2004; Rosa et al. 2007). In addition
studies focusing on Guinea Bissau show mtDNA lineage sharing with North African
populations (Rosa et al. 2004) while the Felupe-Djola appear to possess NRY and
mtDNA types similar to those found in East Africa, consistent with their proposed oral
tradition of a migration from Sudan in the 15th
century (Rosa et al. 2004; Rosa et al.
2007).
1.4.2.2.5. South Eastern Africa
Though it was previously stated that to a considerable extent the sex-specific genetic
profiles of the peoples of sub-Saharan Africa have been homogenised by the expansion of
the Bantu-speaking peoples, East Africa does appear to show more genetic as well as
linguistic diversity than West Central and Central sub-Saharan Africa (Underhill et al.
2001; Salas et al. 2002; Wood et al. 2005), suggesting the movement and replacement of
existing populations by Bantu-speaking farmers was less complete in this region (Salas et
al. 2002). This is demonstrated best by the greater proportion of more ancestral NRY
(Haplogroups A and B) and mtDNA types (e.g. the putative Khoisan type L1d),
especially in the more sampled populations within Mozambique (Underhill et al. 2001;
Pereira et al. 2001; Salas et al. 2002) and Tanzania (Wood et al. 2005; Gonder et al.
2007). Mozambiquans (Brandstatter et al. 2004), Kenyans (Brandstatter et al. 2004),
Rwandan Hutus (Tofanelli et al. 2003), Zimbabweans (Gene et al. 2001) and Ugandans
(Gene et al. 2001) have all demonstrated substantial differences with each other and/or
populations further to the west at some molecular level. Mozambique datasets have also
been shown to possess European Y-chromosome lineages at a frequency of
approximately 5% (Pereira et al. 2002), possibly due to the influence of the Portuguese
slave traders.
1.4.2.2.6. The Chad Basin
The Chad Basin presents a very heterogeneous genetic profile that differs significantly
between male- and female-specific lineages, consistent with the complex population
movements the region has experienced. The NRY, mostly assessed by datasets from
northern Cameroon, show a substantial proportion of R1*-M173 types (Scozzari et al.
1999; Cruciani et al. 2002), a clade not usually found elsewhere in sub-Saharan Africa but
present in Asia (Luis et al. 2004). This has been presented as evidence for a possible back
migration from Asia to sub-Saharan African through the Levantine corridor. However,
47
mtDNA showed no such signal, suggesting that admixture of the immigrating group was
primarily male-mediated, at least once they reached their destination (Coia et al. 2005).
However the mtDNA data are still very heterogeneous with many different types showing
a mostly Central African connection but with possible gene flow from East Africa and
from West Africa (Cerny et al. 2004; Cerny et al. 2007) as well as a small North African
influence (Coia et al. 2005), demonstrating that the Sahel along which the Chad Basin lies
has been a major corridor for human migration in Africa.
1.4.2.2.7. Khoisan Speakers
The NRY (Knight et al. 2003; Cerny et al. 2007), mtDNA (Scozzari et al. 1988; Watson
et al. 1996; Knight et al. 2003) and autosomal systems (Ramsay & Jenkins 1988; Patin et
al. 2006) of the click-speaking Khoisan have been shown over the past two decades to be
consistently and substantially different from those of all other sub-Saharan Africans.
Khoisan tend to possess the most ancestral (most ancestral here referring to the NRY type
least derived from the NRY of the most recent common ancestor of all present day males)
NRY and mtDNA types at high frequencies (Scozzari et al. 1999; Underhill et al. 2000;
Underhill et al. 2001; Hammer et al. 2001) and thus are usually the earliest major outlier
branch in most phylogenies when conducting inter-population comparisons (Hammer et
al. 1997; Chen et al. 2000; Forster et al. 2000; Knight et al. 2003; Wood et al. 2005). The
Khoisan almost exclusively possess the mtDNA haplogroup L1d (Salas et al. 2002),
which is thus often used as a signature of Khoisan introgression.
Though they share very common genetic features the agricultural Khoi groups (such as
the Nama) show substantial genetic differences from the hunter-gatherer San groups
(such as the !Kung) (Soodyall et al. 1996; Scozzari et al. 1999; Renquin et al. 2001;
Cruciani et al. 2002). This dissimilarity of the two groups is probably in part due to drift
acting in the presence of mutual isolation and restricted numbers. Of the two the San tend
to be the more homogenous (Ramsay & Jenkins 1988; Watson et al. 1996). Another
reason for the differentiation (rather than simple isolation following a common origin) is
that the Khoi have experienced greater gene flow from Bantu-speaking farmers
(consistent with their farming lifestyle) which has added a substantial Bantu signature to
their mtDNA and NRY profiles (Scozzari et al. 1999; Chen et al. 2000; Cruciani et al.
2002). Conversely some southern Bantu speakers appear to have inherited Khoisan
signature types (Underhill et al. 2001; Salas et al. 2002).
48
The Khoisan share some of their ancestral NRY and mtDNA clades with Ethiopian
populations but this similarity appears to be quite ancient and the common clades of the
two groups have experienced substantial divergence (Scozzari et al. 1999; Cruciani et al.
2002; Salas et al. 2002; Semino et al. 2002). These clades are rare elsewhere in Africa
and form, for the NRY at least, the basal branch of the genealogical tree. This is
consistent with the assertion that the range of Khoisan-speakers once extended over a
much wider area including up to present day Ethiopia and it is contact within this wide
zone that has lead to the similarity that is still observed. In this scenario the Khoisan are
an offshoot of the larger ancestral population that has experienced drift as a consequence
of their isolation (Salas et al. 2002).
Supporting evidence that suggest the Khoisan were once much more widespread is the
presence of a click speaking hunter-gatherer group, the Hadza, in Tanzania. A study by
Knight et al. (2003) that compared NRY and mtDNA profiles in the Hadza to San, Bantu
and East African populations showed the Hadza to be much closer to the non-San groups
but still distinct from them. This study therefore seems to provide support for the
suggestion of an ancient common origin for the present day click speaking populations
that have become isolated and evolved independently over a wide area for perhaps tens of
thousands of years.
1.4.2.2.8. Pygmies
Five populations of Pygmy have been characterised in some way by genetic studies. They
can be broadly placed into two groups; the western pygmies: the Biaka, Mbenzele, Aka
from the Central African Republic (CAF) and the Bakolo from southern Cameroon; and
the eastern pygmies: the Mbuti of the Democratic Republic of Congo (DRC). Like the
Khoisan, the pygmies are distinguishable from other sub-Saharan Africans (including the
Khoisan) by NRY (Cruciani et al. 2002; Coia et al. 2005), mtDNA (Watson et al. 1996)
and autosomal data (Zekraoui et al. 1997; Destro-Bisol et al. 2000; Sanchez-Mazas 2001;
Renquin et al. 2001; Patin et al. 2006) with most groups being more homogenous
(Lucotte et al. 1994; Watson et al. 1996; Renquin et al. 2001) and with a greater time to
the most recent common ancestor (Watson et al. 1996) than most Bantu-speaking
populations.
49
The western (mostly from the CAF) and eastern pygmies can also be differentiated from
each other (Chen et al. 1995; Destro-Bisol et al. 2000; Batini et al. 2007) though both
groups show ancient similarities. The western pygmies possess mostly mtDNA types L1a
and L1c (Salas et al. 2002; Batini et al. 2007) and NRY haplogroup B (Wood et al. 2005),
reflecting similarity to Bantu-speaking CAF populations while the eastern pygmies
possess mtDNA types L1e and L2 (Salas et al. 2002; Batini et al. 2007) and NRY
haplogroup A (Wood et al. 2005), reflecting similarity with East African populations. The
large genetic distance between the eastern and western groups has been interpreted as
evidence of an independent origin and evolution of the pygmy characteristics in each
group (Chen et al. 1995; Destro-Bisol et al. 2004), which occurred a minimum of 18
thousand years ago, but dating of mtDNA clades point to a much older event. It should be
noted that the western pygmy populations tend to closely resemble each other and show
evidence of Bantu introgression (Destro-Bisol et al. 2000; Destro-Bisol et al. 2004; Coia
et al. 2004) though Batini et al. (2007) has suggested most of the Bantu/Pygmy similarity
is ancient rather than the result of recent introgression with a split some 30 to 60 thousand
years ago
1.4.2.2.9. The Fulani
Genetic studies on the Fulani have been fairly limited but those that have been conducted
using NRY, mtDNA and autosomal systems have shown nomadic Fulani to be generally
distinct from neighbouring populations (Scozzari et al. 1999; Modiano et al. 2001; Cerny
et al. 2006) and mtDNA data have shown Fulani from Burkina Faso, Chad and Cameroon
to be somewhat similar to each other (Cerny et al. 2006) in comparison to neighbouring
sedentary populations, with 10% of female lineages being of non-sub-Saharan African
origin, possibly from the northern massifs of the Central Sahara. However a relatively old
mtDNA study showed that the Senegalese Fulani, the Puel, who are semi-settled farmers,
could not be discriminated from other Senegalese populations (Scozzari et al. 1988) while
a HLA-Class I study demonstrated significant differences between Burkina Faso and
Gambian Fulani (Modiano et al. 2001).
1.4.2.2.10. Sao Tome
The islands of Sao Tome on the Gulf of Guinea have been the subject of an unusually
large amount of genetic investigation, as they provide a potentially excellent model in
which to observe the interplay of drift and admixture. The islands were, from the 15th
50
Century, a Portuguese Colony. Eventually they were inhabited by slaves of mixed sub-
Saharan African origin. The first study of the islands looked at the HVS-1 mtDNA profile
in comparison to the neighbouring Bioko Island, which underwent a much more ancient
and smaller migration with the immigrants eventually becoming today‟s Bubi tribe. As
would be expected the Bioko island was much more homogenous while the sampled
population from Sao Tome possessed many different African mainland mtDNA types
(Mateu et al. 1997). Focusing in on three Sao Tome Islands (Angolares, Forros and
Tongas) mtDNA data showed the Angloares to be most differentiated from the other two
islands but no probable European ancestry was detected (Trovoada et al. 2004). Y-
chromosome studies revealed the Angloares to be the most homogenous island (Trovoada
et al. 2001) while 10% of lineages on the other two islands were of European ancestry
(Trovoada et al. 2007; Goncalves, Spinola & Brehm 2007). This is consistent with
Angloares being a more isolated island with only central and south western African slaves
present on the island while Forros and Tongas experienced substantial numbers of unions
between Portuguese men and African slave women. Interestingly one autosomal study
found no European component (Gusmao et al. 2001) while another found approximately
10% European admixture (Tomas et al. 2002).
1.4.2.2.11. The Lemba
Another interesting genetic history investigation in sub-Saharan Africa is that on the
southern African Lemba people, whose oral tradition has been interpreted as indicating
that they are a „lost tribe of Israel‟ having migrated from Judea to Sena (which was
possibly in the Yemen) (Parfitt 1997), before finally settling in Mozambique, Zimbabwe
and South Africa. An initial NRY survey by Spurdle et al. (1996) using a four marker
system showed that approximately 50% of NRY lineages were of possible Semitic origin
due to the presence of the p12f2 marker though alternative Arabic and Jewish origins
could not be established. However a more high-resolution NRY study by Thomas et al.
(2000) was able to establish that some of the Semitic clades in the Lemba possessed the
Cohen Modal haplotype, a signature compound biallelic / microsatellite haplotype found
at particularly high frequencies in Ashkenazic and Sephardic Cohanim Jews, thus
supporting their remarkable claim of a Jewish origin.
51
1.4.2.3. Pharmacogenetics in Africa
Though there is no strict definition, pharmacogenetics is the study of the genetic basis of
drug metabolism. A review of the entire field is beyond the scope of this thesis (there are
entire journals dedicated to the subject and there are many excellent review articles
available (Johnson 2003; Weinshilboum 2003; Evans 2003; Agrawal & Khan 2007; Hall
& Sayers 2007; Kayser 2007; Lanfear & McLeod 2007; Swen et al. 2007; Wilke et al.
2007)). However some previous pharmacogenetic work that has involved sub-Saharan
African populations is detailed below. Studies on the molecular genetic basis of drug
metabolism in sub-Saharan African populations have been very limited (there has been
substantially more involving African Americans), especially in comparison to European-
and Asian-specific investigations.
Much of the early work focused on establishing whether genetic variants of drug
metabolising enzyme (DME) genes that were initially discovered in Europe and Asia
were also found in sub-Saharan African populations. The majority of the focus was
directed towards the Cytochrome P450 gene CYP2D6, which was and is by far the best
characterised of the CYP family because of the remarkable correlation it demonstrated
between genotype and phenotype (particularly so far as variation in the rate of drug
metabolism is concerned) (Weinshilboum 2003). The initial studies on the Shona of
Zimbabwe by Masimirembwa et al. (1993) showed marked differences in the frequency
of the Eurasian determined variants in Africans and also noted a marked difference in
expected phenotype from the genotypic profile (Masimirembwa et al. 1996a), with a
tendency towards slower metabolisers in Africans. It was shortly found, by re-sequencing
Shona individuals, that approximately 34% of individuals possessed another SNP in exon
2, defined as haplotype CYP2D6*17, that accounted for much of this phenotypic
discrepancy in Africa (Masimirembwa et al. 1996b) (other SNPs were later shown to
contribute to the resultant CYP2D6*17 phenotype (Oscarson et al. 1997)). Subsequent
studies showed this allele to be very prevalent in many sub-Saharan African populations
(South Africa =24% (Dandara et al. 2001), Gabon =23% (Panserat et al. 1999), Tanzania
=17-23% (Wennerholm et al. 1999; Dandara et al. 2001; Wennerholm et al. 2001),
Ghana=27.7% (Griese et al. 1999)) though it was a lot lower in Ethiopia=9% (Aklillu et
al. 1996) and was absent in Europeans (Wennerholm et al. 2002). Another allele,
„CYP2D*29‟, would also subsequently be shown to contribute substantially (Tanzania
52
=19%) to the African-specific „lower rate of metabolism‟ phenotype (Wennerholm et al.
2001).
The practice of assaying for Eurasian-derived variants in limited numbers of poorly
defined sub-Saharan African populations (especially in Zimbabwe Shona, South African
Venda and Ethiopians) continued for other DMEs (e.g. CYP2C19 (Masimirembwa et al.
1995; Persson et al. 1996; Bathum et al. 1999; Allabi et al. 2003), CYP3A4 (Tayeb et al.
2000; Zeigler-Johnson et al. 2002; Cavaco et al. 2003; Chelule et al. 2003), CYP2C9
(Allabi et al. 2003), N-acetyltransferase (NAT) 1 (Loktionov et al. 2002), NAT2
(Loktionov et al. 2002) and ABCB1 (Schaeffeler et al. 2001; Chelule et al. 2003) and thus
many of the frequencies observed were unremarkable. As suggested by Bapiro et al.
(2002) for the CYP2D*17 variant and Wojnowski et al. (2004) for CYP3A5, predicting
phenotypic effects of pharmaceuticals in African populations based on Eurasian
ascertained variation is likely to be hazardous due to the impact of African-specific
variation. Therefore the valuable insights, following the example of CYP2D6, have and
will come from sequencing African individuals in order to identify African-specific
variation that may affect phenotypic response.
Aklillu et al. (2002) (CYP1B1), Dandara et al. (2003) (NAT), Allabi et al. (2005)
(ABCB1) and Quaranta et al. (2006) (CYP3A5) have sequenced African individuals
(albeit a small number) and identified novel variants, some of which are found at high
frequencies and may affect drug metabolism. The survey by Quaranta et al. (2006) in
particular demonstrates the potentially large amount of information that has yet to be
revealed in sub-Saharan Africa in comparison to the rest of the continent. They used a
polymerase chain reaction-single strand conformational polymorphism approach to
interrogate the CYP3A5 gene in individuals from French Caucasians („Caucasian‟ here
referring to „white skinned‟ individuals of recent European origin), Gabonese and
Tunisians and found 8, 17 and 10 novel SNPs respectively. The study by Aklillu et al.
(2003) on CYP1A2 in Ethiopians not only found a novel variant but was also able to
show that the variant substantially lowered enzyme activity.
More recent work has been directed towards the functional aspects of alleles with
particular emphasis on the interaction of genetic variants found in sub-Saharan Africans
with pharmaceuticals that may be practically relevant. Mehlotra et al. (2006) identified
53
that the CYP2B6 enzyme was part of the metabolism of artemisinin, a drug used to treat
multi-drug resistant strains of falciparium malaria which is prevalent in sub-Saharan
Africa and thus determined the frequencies of a range of CYP2B6 haplotype in West
Africans while Penzak et al. (2007) examined the G516T SNP in HIV-infected
individuals in Uganda and showed the genotype to influence nevirapine (a drug used to
treat HIV infection) concentrations. Röwer et al. (2005) noted that amodiaquine (AQ) has
become the first line treatment of malaria in Ghana and given that it metabolises AQ,
decided to type CYP2C8 alleles in Ghanaian children. They were able to demonstrate an
allele frequency of 17% (much higher than found previously in Caucasians) for the
CYP2C8*2 allele that is associated with a decrease in enzyme activity, and thus it is
possible that a fair proportion of Ghanaian individuals may experience adverse drug
affects during the administering of AQ. Sim et al. (2006) and Mirghani et al. (2006) have
similarly performed investigations that examine practically relevant pharmacogenetic
variation at a functional level in sub-Saharan Africans.
Consistent with most genetic studies major differences between the pharmacogenetic
profiles of Africans and non-Africans have been demonstrated (Loktionov et al. 2002;
Allabi et al. 2003; Chelule et al. 2003; Garsa, McLeod & Marsh 2005; Mehlotra et al.
2006; Quaranta et al. 2006). However very few pharmacogenetic variation studies have
involved inter-sub-Saharan African population comparisons, the best example possibly
being that of Dandara et al. (2002), which itself only examines three broadly defined east
African groups of low sample size. Hopefully this paucity of sub-Saharan African
information will be addressed in the future.
1.4.2.4. Natural Selection in Africa
Human sub-Saharan African genetic variation appears to have been substantially shaped
by demographic events, particularly the expansion of the Bantu-speaking peoples.
However natural selection can also be a major force. Table 1.1 lists some genes that show
unique signals of natural selection in sub-Saharan Africa as well as, when available,
possible reasons for this selection. This list is far from exhaustive and is simply to guide
the reader towards some of the more prominent examples present in the literature. It is
notable in Table 1.1 that the majority of the statistically significant signatures of selection
in sub-Saharan Africa have only been elucidated very recently. This is a consequence of
the amounts of available data increasing rapidly because of the recent development and
54
Table 1.1: Some examples of natural selection detected in Africans.
Gene Name
(Gene Symbol)
Chromosome
Location Type Of Selection Signal Of Selection
Statistical Method of Selection
Detection
Possible Environmental Pressure
Causing Selection
Duffy blood
group,
chemokine
receptor (DARC)
1 Positive selection
for FY*O allele
Low level of sequence variation
(Hamblin & Di Rienzo 2000;
Hamblin, Thompson & Di
Rienzo 2002)
HKA, Fu and Li‟s D, FST, Fay
and Wu‟s H
Homozygotes are resistant to p
Plasmodium vivax (Livingstone 1984)
β-hemoglobin
(HBB) 11
Balancing selection
of sickle cell trait
High degree of haplotype
similarity for βs alleles
(Hanchard et al. 2006; Hanchard
et al. 2007)
Long Range haplotype
similarity
Heterozygotes are resistant to
Plasmodium falciparium (Aidoo et al.
2002)
Glucose-6-
phosphate
dehydrogenase
(G6PD)
X Positive selection
for G6PD A- allele
Long range LD (Sabeti et al.
2002b; Saunders et al. 2005) on
A- allele background and low
sequence diversity (Tishkoff et
al. 2001; Saunders, Hammer &
Nachman 2002)
Coalescent simulation of
neutral microsatellite variation
and LD, LRH
G6PD A- allele confers resistance to
Plasmodium falciparium
CD40 ligand
(CD40LG) X
Positive selection
for TNFSF5-CH4
haplotype (with
726C SNP)
Long range LD on CH4
haplotype (Sabeti et al. 2002b)
and low sequence diversity
(Sabeti et al. 2002a)
LRH
TNFSF5 heavily involved in immune
response and CH4 may confer
resistance to Plasmodium falciparium
Taste receptor,
type 2, member
16 (TAS2R16)
7 Balancing selection
around K172N SNP
Excess of derived alleles
detected but higher frequency of
ancestral K172 allele specific to
central Africa (Soranzo et al.
2005)
Fay and Wu‟s H, Exact Test of
LD
172N allele increases protection
against cyanogenic plants foods but
K172 allele allows low level ingestion
of cyanogenic foods, which increases
resistant to Plasmodium falciparium
Lactase (LCT) 2
Positive selection
for lactase
persistence allele -
14010C
Long Range LD on -14010C
allele background (Tishkoff et
al. 2007)
LRH
Lactase persistence and thus drinking
fresh milk offers increase nutritional
benefits and milk is a good source of
water in arid environments.
Continues overleaf…
55
Table 1.1 continued
Like-
glycosyltransferase
(LARGE)
22 Positive selection
Long Range LD and extreme
derived allele frequency
difference between populations
(Sabeti et al. 2007)
LRH
Loss of normal glycosylase function
prevents α-dystroglycan modification,
preventing binding of Lassa Fever
virus
Dystrophin (DMD) X Positive selection
Long Range LD and extreme
derived allele frequency
difference between populations
(Sabeti et al. 2007)
LRH
Loss of normal cytosolic adaptor
function prevents normal α-
dystroglycan function, preventing
binding of Lassa Fever virus
Major
histocompatibility
complex, class II,
DR beta 1 (HLA-
DRB1)
2 Balancing selection
Extreme level of heterozygosity
in DRB1 gene (Renquin et al.
2001)
Ewens-Watterson‟s and
Slatkin‟s neutrality test
High levels of haplotype diversity in
HLA genes such as DRB1 allow
greater response to a broader range of
pathogens
Melanocortin 1
receptor (MC1R) 16 Purifying selection
Deficit in expected number of
non-synonymous changes (John
et al. 2003)
MacDonald-Kreitman,
Tajima‟s D, Fu and Li‟s F, D,
F* and D
Functional constraint required to
maintain dark skin pigmentation in
sub-Saharan Africans due to effect of
high sunshine rate
ATP-binding
cassette, sub-
family B
(MDR/TAP),
member 1
(ABCB1)
7 Positive selection
for mh7 haplotype
Long range LD on SNP
e26/3435C background (Tang et
al. 2004) (note: in African
Americans, not sub-Saharan
Africans)
LRH
MDR1 may regulate drug, xenobiotic
and enveloped virus traffic but actual
selection stimulus somewhat unknown
Opsin 1 (cone
pigments), long-
wave-sensitive
(OPN1LW)
X Balancing selection
Excess of non-synonymous
changes (Verrelli & Tishkoff
2004)
HKA, MacDonald-Kreitman,
Tajima‟s D
Variation in L-cone colour vision may
have allowed adaptive evolution of
hunter-gatherers.
vav 3 guanine
nucleotide
exchange factor
(VAV3)
1 Positive Long range haplotype LD
(Walsh et al. 2006) LRH
There is no known mechanism of
selection for VAV3, a hemopoetic cell-
specific guanine nucleotide exchange
factor
56
application of high-throughput DNA variation typing and sequencing technologies as well
the development of more powerful statistical methods for analysing these data (for a
recent review see Sabeti et al. (2006)). Understanding the effect of natural selection and
being able to discriminate these effects from demographic processes will be important for
understanding patterns of genetic variation in sub-Saharan Africa so further technological
and analytical developments will be vital in the near future.
1.5. DNA Sampling Issues
As methods to characterise the DNA of individuals, including new sequencing
technologies like 454 (Margulies et al. 2005), Solexa and SOLid (Shendure et al. 2005)
sequencing, continue to advance at a rapid rate it is clear that DNA sampling will be the
major methodological bottleneck when conducting genetic studies in sub-Saharan Africa.
The majority of genetic studies described above have used either relatively poorly defined
or pooled samples sets (e.g. populations labelled as „West Africa‟ or „Tanzanian‟) or, if
samples have been more carefully defined (e.g. populations labelled as Amharic born in
Addis Ababa), there has been very little other available population data characterised in
similar detail with which such datasets can be compared. Sampling criteria have, in
addition, been very variable and the original rules governing the making of fieldwork
collection are often no longer available. DNA sampling in sub-Saharan Africa presents a
unique challenge for the investigator and can be particular troublesome. Inconsistent
sampling strategies, while understandable, should be avoided. As investigations are
undertaken at ever finer scales such as those described in Chapters 2 and 3 of this thesis
the criteria used to collect samples become increasingly important. In addition subsequent
analysis must take account of the adopted sampling strategy. It is clear that stringent
standardised declared sampling strategies must be an important consideration of future
studies.
The Centre for Genetic Anthropology criteria are described in Appendix A together with
the practical problems encountered by the author of this thesis in the field. Definitive
criteria for collecting samples in the field in Africa have not yet been formulated but are
increasingly necessary. Until such time as they are the necessary minimal requirement is
57
that all research reports should contain in the fullest detail information of when, where,
how and by whom samples were collected so that this information can be taken into
account when data are analysed. In the three case studies (especially Chapters 2 and 3,
where appropriate population definition is critical) described in this thesis a large number
of densely sampled datasets, each of significant size, that have been relatively well
described have been used in an attempt to address some of the sampling issues of
previous studies.
1.6. Statement of work performed by Krishna Veeramah in this thesis
1.6.1. Chapter 2
All field sampling, DNA extraction and Y-chromosome typing of samples from Bafut,
Foumban, Nkambe, Wum, Bankim, Magba and Sabongari of Cameroon was performed
by Krishna Veeramah. All mtDNA typing and processing of samples from Cameroon was
performed by me. All statistical analysis was performed by Krishna Veeramah.
1.6.2. Chapter 3
All field sampling, DNA extraction, Y-chromosome typing, mtDNA typing and
processing of samples from Cameroon was performed by Krishna Veeramah. All
processing of mtDNA data from Nigeria was performed by me. All statistical analysis
was performed by Krishna Veeramah.
1.6.3. Chapter 4
All FMO2 g.23238C>T SNP typing was performed by Krishna Veeramah. All statistical
analysis was performed by Krishna Veeramah, except for the Logistic Regression, which
was carried out by Professor Nancy R. Mendell, and the interpolation step of the genetic
boundary analysis, which was carried out by Dr Mark Thomas.
58
Chapter 2:
Sex-Specific Genetic Data Support
One Of Two Alternative Versions
Of The Foundation Of The Ruling
Dynasty Of The Nso´ In Cameroon
59
2. Sex-Specific Genetic Data Support One of
Two Alternative Versions Of The Foundation
Of The Ruling Dynasty Of The Nso´ In
Cameroon
2.1. Introduction
2.1.1. The geography, history and sociology of the Nso´
The history of western Cameroon is dominated by rival polities which by the eighteenth
and nineteenth centuries had formed city states (also called kingdoms or fon-doms). Their
rivalries were a feature of the pre-colonial history of the Grassfields, the highlands which
form the West and North West Provinces of present day Cameroon. Although they are not
the only groups living in the region, these centralised polities have dominated regional
politics for centuries. This chapter discusses the early history of one of the most
celebrated kingdoms, the fon-dom of Nso´, using novel genetic data that throw new light
on a long-standing controversy among Nso´ historians.
Nso´ is one of the Grassfields states (see Figure 2.1 for geographic location) whose royal
family claims Tikar descent. This chapter will not attempt to revisit a complex topic
which has been much discussed in the literature (see Jeffreys 1964; Chilver & Kaberry
1971; Price 1979; Fowler & Zeitlyn 1996). For the purposes of this study it is sufficient to
note that, like many but not all royal families in the region, the Nso´ royal family traces
its origins to the royal family of the Tikar of the Tikar Plain, from near present-day
Bankim (there may be significant political advantages for a group to claim Royal Tikar
descent).
By the nineteenth century the Nso´ state had expanded, so that it was in effect a small
empire holding sway over surrounding ethnic groups, over whose control wars were
fought with rival states such as the Bamum state centred on Foumban (Kaberry 1962b;
60
Tardits 1980). Concerning the period before the establishment of the larger state there are
oral history accounts of uncertain antiquity which deal with the ethnogenesis of the Nso´
people. The most common account tells of a Princess Ngonnso´ travelling with followers
from the Tikar region, approximately 100 km to the east, separating from her brothers
(who founded neighbouring settlements) on the journey and encountering a small
indigenous group of hunter-gatherers (the Visale) amongst whom she settled (Mzeka
1990). Opinion differs between Mzeka (1978), who states that she was accompanied by a
husband, and most members of the Nso´ Historical Society1 who claim that she settled
without an accompanying husband (Tatah Humprey Mbuy-Senior member of the Nso´
Historical Society, personal communication). However, both sides of the debate agree
that her son became the first fon2 of the Nso´ and it is from him that the current fon is
directly descended along the paternal line.
Nso´ is unusual among the Grassfields kingdoms in having a system of named descent-
based social classes with varying rules of affiliation and transmission. They were first
described by Kaberry (1952; 1959; 1962a) and Chilver and Kaberry (1960), more
recently by Goheen (1996) and Chem-Langhëë and Fanso (1997). The four groups are 1)
the won nto´ (descendants of a fon down to the third or fourth generation (see below), 2)
the duy (descendants of a fon who ruled more than three or four generations ago (see
below) together, according to Chem-Langhëë and Fanso (1997) , with some members of
commoner lineages whose heads are descendants of princesses, and associated patriclans
or clan segments providing state counsellors, allegedly founded by immigrant royals), 3)
the nshiylav (subjects born or recruited3 into Palace service (patrilineally inherited) and 4)
the mtaar (commoners (patrilineally inherited)). Although the majority of the Nso´ are
self-identifying Christians of the Roman Catholic denomination, the fon has, through the
generations, maintained a polygynous household, which in 2005 numbered over 70
1 The Nso´ historical society is open to all Nso´ people as well as non-Nso´ individuals undertaking
research on Nso´ history and traditions. Address: Nso´ History Society, Tourist Home, P.O. Box 33, Kumbo
Nso´, North West Province, Cameroon. Telephone number: 00237 348 17 65.
2 A fon can be thought of as a traditional ruler or leader of a group or village though the actual degree of
power held by a fon is variable from group to group. While the term is somewhat specific to the Grassfields
of Cameroon, a fon is analogous to a traditional tribal chief.
3 Members of the nshiylav may also be recruited from the other categories (for example, from the mtaar) by
the fon and given a special (high) status (like personal page).
61
women4. Access to the fon‟s wives has traditionally been strictly controlled with illicit
unions subject to capital punishment (Chilver & Kaberry 1968). While paternal descent
from a fon is a necessary precondition for enthronement, the new Fon‟s mother must, in
the same tradition, be a mtaar commoner (Mzeka 1978).
Figure 2.1: Map showing towns in Cameroon where samples were
collected.
The membership rules, as commonly stated in abbreviated form, do not cover all possible
cases, particularly where the change in status from won nto´ to duy is concerned. Since
this has implications for the distribution of sex-specific genetic markers in the wider
population David Zeitlyn5 and Verkijika G. Fanso
6, who the author of this thesis worked
4 This information came from the late Emmanuel Nkem Mbinglo, a paternal brother of the fon.
5 David Zeitlyn is the Professor of Anthropology at the University of Kent who works primarily on the
Mambila population in North-West Cameroon.
62
closely with during this investigation, conducted some field research in an attempt to
clarify the position.
The problem is that Chem-Langhëë and Fanso (1997) define the rule differently from
Kaberry (1959). The former state that individuals are won nto´ members if they are
„descendants of any fon of Nso´ to the fourth generation through agnatic lines [strictly
patrilineal descent] and to the third generation through uterine connexions [cognatic or
strictly matrilineal descent]‟ (Chem-Langhee & Fanso 1997) (this is named Royal Social
Status Rule A; see Figure 2.2). They then go on to state that individuals descended from
one or more won nto´ are duy if either they are descendants of a fon more than four
generations ago along agnatic lines or are descendants of a fon more than three
generations ago along uterine connexions (and in both cases not a descendant of a more
recent fon). Kaberry (1959) simply says „Descendants of a Fon down to the third or fourth
generation are described as wonto‟. There is consequently uncertainty about the status of
members of the fourth generation from a fon. To explore this a fictional family tree was
drawn, designed to fit on a single side of paper, which was used as the basis of a set of
interviews with some knowledgeable Nso´ (five males) conducted by David Zeitlyn and
Verkijika G. Fanso in April 2007. The conclusion drawn (David Zeitlyn personal
communication) was there really is a degree of uncertainty when there are some female
links in any particular descent line (informants varied on whether the son of either a
second or third generation female descendent of a Fon is won nto’ or duy). In practice this
can be exploited tactically as part of Nso´ politics. The 'Kaberry' formulation of the rule
can be developed to remove uncertainty in a variety of ways, for example as „a person is a
member of won nto´ (down to the fourth generation (if a man) and third generation (if a
woman)) if she or he is both a child of a won nto´ member and a descendant of a fon (this
is named Royal Social Status Rule B; see Figure 2.3)‟. The interviews ruled out any
interpretation that won nto´ status is inherited solely along paternal lines. However, it
should be emphasised that uncertainty about group membership has not, to date, been
seen as a problem in Nsoʹ. There are few circumstances when an individual has to state
their category membership and the five Nso´ informants agreed it might be possible for
someone to be accepted in some circumstances as a won nto´ member but in others as a
6 Verkijika G. Fanso teaches in the Department of History at the University of Yaoundé 1, Cameroon, and
is of Nso´ ethnicity.
63
member of duy. There are formal criteria critically about whether a father has the right to
bestow his daughter in marriage in which case he is duy, or must offer her to the Palace to
bestow in which case he is a member of won nto´7. Opinions varied also about whether
the Ŋgwerong8 society could arbitrate when membership was disputed. However, at the
boundary, the question of membership does not appear to be controversial and none of the
informants could recall disputes about category membership arising. These enquiries
were viewed as abstract academic questions, concerned about the overall system, and
were not taken very seriously. The discrepancy between Royal Social Status Rules A and
B sets limits for genetic modelling and historical reconstruction, which are addressed in
the text below.
2.1.2. Expectations of sex-specific genetic variation in the Nso´
Analysis of sex-specific9 genetic systems (the non-recombining portion of the paternally
inherited Y-chromosome (NRY) and the maternally inherited mitochondrial DNA
(mtDNA)) has proved useful in elucidating the history of diverse ethnic groups where
well-defined alternative scenarios can be identified (see e.g. Thomas et al. 2000; Tambets
et al. 2004). There was a possibility that genetic data would be consistent with only one of
the two variants of the oral history regarding the origin of the father of the first fon of
Nso´ i.e. that Princess Ngonnso´ was already married to a man of Tikar origin when she
encountered the indigenous hunter-gatherers (Visale) or that after arrival in the
Grassfields she took a Visale husband who consequently fathered her child. To be able to
make such a distinction two conditions must be satisfied. These are a) the distribution of
NRY and mtDNA variation in the Nso´ was consistent with the expectations arising from
the group's declared social practices and b) the NRY profile in the Visale was distinct
from that of migrants from the Tikar plain.
7 As Chem-Langhëë and Fanso (1997) make clear there are controversial distinctions within the category of
won nto´ (not all patrilineally descended males are eligible to be fon, some say that it is only sons conceived
while a man is fon who are eligible to succeed to the Fonship, not those born before his selection). There
are also distinctions within the category of duy: the duy shiŋgwaŋ (duy of the salt) and the duy nsaansa’
(general duy), but these are beyond the scope of this chapter.
8The Ŋgwerong are the Nsoʹ regulatory society that in charge of law and policy enforcement.
9 Strictly these systems are not sex-specific but are inherited in a sex-specific manner. The term is used
throughout this thesis as a matter of convenience.
64
Figure 2.2: Lineage tree showing the relationship of won nto´ individuals and the transition of won nto´ to duy under Royal
Social Status Rule A. M = male offspring, F = female offspring, * = individual inherits the same NRY type as a fon). Won
nto´ are shown in black and duy in red.
65
Figure 2.3: Lineage tree showing the relationship of won nto´ individuals and the transition of won nto´ to duy under Royal
Social Status Rule B. M = male offspring, F = female offspring, * = individual inherits the same NRY type as a fon). Won
nto´ are shown in black and duy in red.
66
2.1.2.1. Expectations arising from the group's declared social practices
To undertake the analysis it was first examined whether it is possible to conclude that
either Royal Social Status A or Royal Social Status B has been followed. If Royal Social
Status Rule A has been followed it would be expected that 33.4% - 46.7% of won nto´
Nso´ males sampled would share identical NRY types while this same NRY type would
be expected asymptotically to approach a frequency of 12.5% in won nto´-descended-duy,
depending on the number of generations since the original fon. However, if Royal Social
Status Rule B has been followed, it would be expected that only 1.0% - 24.1% of sampled
won nto´ males would share the same NRY type. This same NRY type would be expected
to be at a frequency of 12.5% in the won nto´-descended-duy, irrespective of the number
of generations of descent from a fon. These expectations make it possible to establish
whether a) Royal Social Status Rule A, b) Royal Social Status Rule B or c) neither A nor
B has been followed. Generating the expectations given above (33.4% - 46.7% for won
nto´ under Royal Social Status Rule A, 1.0% - 24.1% for won nto´ under Royal Social
Status Rule B and 12.5% for duy under both rules) involves a very detailed process that
would disrupt the narrative of the chapter somewhat and so is fully described at the end of
the chapter in the section entitled Supplementary Section 2S.1. However a more concise
decription of the process is given below.
Royal Social Status Rules A and B are described above and represented in Figures 2.2
and 2.3 respectively and the expectations of the percentage of individuals expected to
carry a fon‟s NRY type are derived from these rules. Assumptions when generating these
expectations include: (a) that the Y-chromosome of fons and their patrilineal descendents
can be distinguished from those of non-patrilineal descendents; (b) that the numbers of
males and females throughout each generation are equal; (c) that generations are discrete;
(d) that the number of children a fon has (excluding his heir) is „2n‟ („n‟ of whom are
males) and; (e) the number of children a non-fon won nto´ has is „2y‟ („y‟ of whom are
males). Simple expectations based upon these rules can be generated given these
assumptions. However in this study such expectations are complicated by the sampling
strategy utilised (which is performed as a matter of routine within the TCGA laboratory)
where sampling of individuals who are the brother, father, son or paternal line cousin of
an individual from whom a buccal swab has already been collected is not permitted. In
addition, given the range of ages of samples collected (data not shown but available on
request), it must also be assumed that sampling is from both the most recent and second
67
most recent generation of adult won nto´ males (for simplicity possible sampling from the
third most recent generation is ignored).
As previously described the key to acquiring won nto´ status is descent from a fon at least
three and, in certain circumstances, four generations ago. Under Royal Social Status
Rules A and B any particular generation of won nto´ should contain individuals who
descend from the previous four fons, though the contribution of the fourth most recent fon
differs between the two rules and it is the won nto´ that descend from this fourth fon that
leads to different expectations under the two rules.
Any one won nto´ individual can be categorised based on how many generations ago they
descend from a fon (not including the fon from their own generations) and along what
lineage type/path they trace this ancestry (for example the third most recent fon along a
purely patrilineal path, or from the second most recent fon where their mother was a won
nto´ who‟s own father was a fon). Therefore it is possible to calculate, using the „n‟ and
„y‟ notation introduced earlier for the number of children a fon or won nto´ has (as well as
the other assumptions), the relative contributions to the total won nto´ of any particular
generation for each specific category (based on fon of descent and lineage path) of won
nto´ individual. From these relative contributions it is also possible to use probability
theory to calculate the probability of actually sampling each of these categories given the
sampling strategy utilised and that sampling is from two consecutive but discrete
generations of won nto´. Some of these categories of won nto´ are descended from a fon
down a strictly patrilineal lineage and so will possess the fon NRY type. Summing the
probabilities of sampling these fon NRY type categories and dividing this by the sum of
the probabilities for all possible categories of won nto´ will give the expected ratio of
individuals expected to be sampled that will possess the fon NRY type in the won nto´.
An algebraic simplification of these ratios for Royal Social Status Rules A and B are
shown below.
Royal Social Status Rule A:
( 1 / ( y + 1 ) ) ( n ( 1 + y )2 + 1) + 1
( 1 / ( y + 1 ) ) ( 3n ( 1 + y )2 + 1 ) + 1
68
Royal Social Status Rule B:
( 1 / ( y + 1 ) ) ( n ( 1 + y )2 + 1) + 1
( 1 / ( y + 1 ) ) ( 4 n y3 + 10 n y2 +9 n y +3 n + 1 ) + 1
The actual values of these ratios will vary depending on which values of „n‟ and „y‟ are
used. In order to take account of uncertainty concerning the real values of „n‟ and „y‟ a
range of different combinations of „n‟ and „y‟ extending from 1 to 25 were substituted
into the above expressions. Under Royal Social Status Rule A 33.4% - 46.7% of won nto´
males sampled should possess a fon‟s NRY type while 1.0% - 24.1% should be sampled
under Royal Social Status Rule B.
As duy status is inherited patrilineally the expected frequency of fon NRY types within
this social category is much simpler to predict. One additional assumption includes
ignoring duy status acquired by methods other than descent from a fon. Ideally every fon
since the first fon of Nsoʹ except for the most recent three fons will contribute duy to the
current generation. Though the numbers of duy descended from any fon should increase
with every passing generation that actual proportion that possess the fon NRY type will,
as shown in Figures 2.2 and 2.3, always be 12.5%. Under Royal Social Status Rule A the
proportion never actually reaches 12.5% but asymptotically approaches it depending on
the number of generations since the first fon. This is because for the fourth most recent
fon the lineage possessing the fon NRY type does not become duy until the subsequent
generation. However the effect of this solitary fon should be negligible.
The overall pattern of a shared NRY type in the won nto´ and duy should be most evident
with respect to a battery of rapidly evolving microsatellites, a finding which could
demonstrate that male line continuity of fons has been maintained for at least the past four
generations. The NRY of the won nto´ would be expected to be significantly less diverse
than those of the other social classes. Furthermore, if the rules governing selection of a
fon had been strictly adhered to and there had been no false paternity in the line of fons
since the foundation of the Nso´ then this homogenous NRY type would be that possessed
by the first fon of Nso´ and his father. In addition, given a) the requirement for the mother
of a fon to be a commoner, b) women move more freely among social categories in the
69
patrilineal Nso´ society and c) extreme polygyny is practiced by fons it would be
expected that distribution of mtDNA types among all four social classes would be similar.
2.1.2.2. Expectations arising from a possible distinct Visale NRY profile
If the above expectations were met then current knowledge of NRY variation in sub-
Saharan Africa could be used to explore the oral history of the Nsoʹ. In a previous
publication Underhill et al. (2001) suggested that previously common NRY lineages may,
throughout sub-Saharan Africa, have been replaced by a lineage associated with the
expansion of the Bantu-speaking peoples (EBSP)10
. Underhill et al. (2001) and Scozzari
et al. (1999) have identified the modal NRY of the EBSP to be E3a (using the
nomenclature of the Y-chromosome consortium (2002)). It would be expected that the
putative replaced NRY lineages would be observed at low frequencies with a patchy
distribution across sub-Saharan Africa. These NRY types would be remnants of past
hunter-gatherer populations that have been overwhelmed by the E3a NRY type and
become isolated from each other for a significant time period. This would be reflected by
high genetic distances at the microsatellite haplotype level among geographically
separated groups possessing the same SNP defined low frequency NRY lineages
(Underhill et al. 2001). However, these replaced NRY lineages may be found at high
local frequencies in existing populations that pre-date the EBSP.
It would therefore be reasonable to assume that the hunter-gatherer Visale may have
possessed one of these putative replaced NRY lineages at a significant frequency in
comparison to the Tikar (who speak a Bantoid language, so are connected to the EBSP).
If the signature NRY type found in the won nto´ was shown to be one of the pre-EBSP
lineages and was also not found in neighbouring Tikar populations (as well as other
nearby ethnic groups that may have experienced some contact with Tikar in the past) this
would favour the scenario whereby the immigrant princess married an indigenous Visale.
Conversely, the presence of a homogenous E3a lineage would favour Princess Ngonnso ´
travelling with a husband of Tikar origin who then fathered the first fon of Nso.
10
The considerable simplification implicit in this statement is noted. Gene flow is not necessarily associated
with language dispersion but there is enough hard data to suggest a close correlation for this to be sufficient
for the present chapter which is not primarily about the Bantu expansion. Some of the complexity has been
discussed by MacEachern (2000), Zeitlyn and Connell (2003) and Vansina (1995).
70
2.2. Materials and Methods
2.2.1. Sample Collection Procedure
Buccal swabs were collected from males over eighteen years old in the Cameroonian
town of Kumbo (n=151). In addition, buccal swabs were collected in four other western
Grassfields towns (Bafut (n=103), Foumban (n=117), Nkambe (n=82) and Wum
(n=116)), seven towns or villages on the Tikar plain (Atta (n=29), Bankim (n=73), Magba
(n=96), Nyamboya (n=98), Sabongari (n=94), Somie (n=100) and Songkolong (n=43))
and one town north of the Tikar plain (Mayo Darle (n=111)) (see Figure 2.1). All samples
were collected anonymously with informed consent. Individuals were recruited in a
fashion blinded to social class and a local resident assisted in ensuring that only one in
each of the following sets participated: a) brothers, b) father and sons, c) grandfather and
paternal line grandsons. The practice of not sampling individuals who are brothers,
fathers, sons or paternal line cousins of participants was adopted for ethical reasons and to
ensure consistency with other DNA sample collections at the The Centre for Genetic
Anthropology (TCGA). Sociological data were also collected from each individual
including age, current residence, birthplace, self-declared cultural identity (and Nso´
social class) and religion for the individual and the individual‟s father, mother, paternal
grandfather and maternal grandmother.
Standard phenol-chloroform DNA extractions were performed on all samples (see
Appendix C).
2.2.2. Y-chromosome typing
Standard TCGA kits were used to characterise six microsatellites (DYS19, DYS388,
DYS390, DYS391, DYS392, DYS393) and eleven biallelic Unique Event Polymorphism
(UEP) markers (92R7, M9, M13, M17, M20, SRY+465, SRY4064, SRY10831, sY81,
Tat, YAP), as described by Thomas et al. (1999). Microsatellite repeat sizes were
assigned according to the nomenclature of Kayser et al. (1997). Where necessary an
additional marker, p12f2, was typed as described by Rosser et al. (2000). NRY
Haplogroups were defined by the twelve UEP markers according to the nomenclature
proposed by the Y-chromosome Consortium (2002) (see Figure 2.4).
71
Figure 2.4: Genealogical relationships of UEP markers used to define NRY
haplogroups
These multiplex UEP/ microsatellite kits have already been shown to be reliable under a
wide range of conditions, consistently giving similar signal intensities across all UEPs
and microsatellites within each kit (Thomas, Bradman & Flinn 1999). Therefore any
multiplex runs that showed at least one UEP or microsatellite peak of substantially low
intensity were repeated. Any samples that gave UEP-1 and UEP-2 results that were
incompatible to the known phylogenetic tree for the NRY were also retyped for both kits.
Microsatellite results were also analysed for outliers and homoplasy amongst UEP
haplogroups and retyped for confirmation.
The UEPs used in this study were chosen primarily on the prior development and
standard application of multiplex UEP typing kits in the TCGA laboratory. It was
recognised beforehand that the use of these UEPs leads to a relatively crude resolution of
NRY haplogroups, with only a few markers likely to be relevant to investigating sub-
72
Saharan African individuals (which tend to fall within haplogroups A, B and E).
However the use of microsatellites in this study should aid in further resolving the fine-
scale phylogenetic relationship of samples (though further SNP typing would still be
preferred; see Chapter 5 for further discussion). It should be noted that only six
microsatellites were typed in this study, which limits the effectiveness of elucidating
these relationships somewhat and further typing would also have been preferable (over 50
NRY microsatellites have currently been identified), especially with regard to the
estimation of TMRCA dates. Unfortunately, given economic restrictions and the time
available the development and typing of further UEPs and microsatellites was not
possible though this is certainly a priority for possible future work (see Chapter 5).
To characterise NRY lineages potentially associated with populations replaced by the
EBSP as proposed by Underhill et al. (2001) the samples described above were analysed
(given a group label of „Grassfields of Cameroon‟ (n=1213)) along with unpublished
data (n=8072) held in The Centre for Genetic Anthropology database consisting of
sample sets collected from populations in sub-Saharan Africa: northern Cameroon
(n=778), southern Cameroon (n=174), Ethiopia (43 different locations covering most of
the country) (n=3368), north eastern Ghana (n=258), north western Ghana (n=471), south
eastern Ghana (n=161), south western Ghana (n=206), central Malawi (n=207), northern
Malawi (n=56), Mozambique (n=86), Cross River region-Nigeria (n=1247), southern
Senegal (n=95), western Senegal (n=90), Pretoria-South Africa (n=96), Sudan (n=647),
Tanzania (n=45), Uganda (n=36) and Zimbabwe (n=51).
2.2.3. mtDNA typing
The mtDNA HVS-1 region of all samples collected from Kumbo was sequenced as
described by Thomas et al. (2002) except that primers conL1-mod, conL2 and conH3
were replaced by conL849 (CTA TCT CCC TAA TTG AAA ACA AAA TA), conL884
(TGT CCT TGT AGT ATA A) and conHmt3 (CCA GAT GTC GGA TAC AGT TC)
respectively. HVS-1 Variable Site Only (VSO) haplotypes were determined for all
samples with sequence data covering a minimum of nucleotides 16020-16400 by
comparison to the Cambridge Reference Sequence (Anderson et al. 1981), with
haplotypes consisting of the nucleotide positions where substitutions, insertions or
deletions occurred as well as the actual base change.
73
Each sample‟s chromatogram was manually inspected for generally high levels of
background noise across its whole length of sequence. The 5ʹ and 3ʹ ends of raw
chromatograms were trimmed until at least 10 out of 15 bases at these ends had
confidence scores above 25%. The ends were then trimmed further by manually
inspecting the sequence. For each 96 sample sequencing run each position with a
proposed SNP, insertion, deletion or ambiguous position was examined manually. All
samples with any ambiguous sites after manual curation were sequenced again. In
addition sequencing of samples was repeated when the forward and reverse sequences did
not match.
2.2.4. Statistical and Population Genetic Analysis
The Pearson's chi-square goodness of fit test was performed within the R programming
environment. Genetic diversity, h, (the probability of randomly sampling two different
haplotypes in a population) and its standard error was estimated from unbiased formulae
of Nei (1987). Genetic differences between pairs of populations when individuals in
populations were described by mtDNA HVS-1 VSO haplotypes were assessed using an
Exact Test of Pairwise Population Differentiation (ETPD) with 10,000 Markov steps
(Raymond & Rousset 1995; Goudet et al. 1996). This test is analogous to a Fisher‟s
Exact test (Lee et al. 2004) but the size of the contingency table is extended to the number
of populations being compared (two in a pairwise population comparison, two or greater
in a global test) by the total number of different haplotypes present. Due to the
complexity introduced by the sheer number of extra rows and columns a null distribution
of tables to test against the observed data is generated using a random walk via a Markov
chain rather than comparison to some predefined distribution such as the hypogeometric
distribution.
Population Genetic Structure was estimated using Hierarchical Analysis of Molecular
Variance (AMOVA) (Excoffier, Smouse & Quattro 1992) based on a particular mutation
model (which allowed the evolutionary distance between pairs of haplotypes to be taken
into account) to generate a single Fixation Index statistic, FST, when a simple structure of
populations within a single group was defined. Significance of Fixation Indices are
assessed by randomly permuting individuals (given that only haploid systems were
considered) among populations or groups of populations, depending on the Fixation Index
being tested and after every round of permutations, of which 10,000 were performed,
74
Fixation Indices are recalculated to create a null distribution. Population pairwise genetic
distances were estimated from Analysis of Molecular Variance φST values (Excoffier,
Smouse & Quattro 1992). The genetic distances used were a) FST (Reynolds, Weir &
Cockerham 1983) (when individuals in populations were described by UEP haplogroups)
and b) RST (Slatkin 1995) (when NRYs of a particular haplogroup were characterised by
the six microsatellites). Significance of genetic distances was assessed by permutation of
individuals as described above for testing significance of Fixation Indices. All the above
was performed using Arlequin software (Schneider, Roessli & Excoffier 2000). AMOVA
is analogous to a traditional analysis of variance (ANOVA) (Sokal & Rohlf 1994) except
that it takes into account the degree of difference between haplotypes. In addition all
hypotheses are tested using permutation analysis and so no assumption of a normal
distribution is required. However assumptions of AMOVA include that all samples are
independent and randomly chosen, that mate choice is random and that inbreeding does
not occur within the populations.
Principal Coordinates Analysis (PCO) (Gower 1966) was performed using the „R‟
statistical package (www.R-project.org) by implementing the „cmdscale‟ function found
in the „mva‟ package on pairwise FST matrices and visualised using MSExcel.
2.2.5. Dating of the Y*(xBR,A3b2) clade
Y-time software (Behar et al. 2003) (URL:
http://www.ucl.ac.uk/tcga/software/index.html) was used to estimate the TMRCA, as well
as its associated confidence intervals, of the Y*(xBR,A3b2) NRYs identified in the Nso´
under three schemes;
(Scheme A): the TMRCA for all duy sampled who possess Y*(xBR,A3b2); (Scheme B):
the TMRCA for all nshiylav and mtaar sampled who possess Y*(xBR,A3b2); and
(Scheme C) the TMRCA for all won nto´ and duy sampled who possess Y*(xBR,A3b2).
The analysis utilised six microsatellites, DYS19, DYS388, DYS390, DYS391, DYS392
and DYS393. Due to uncertainty in its mutation behaviour (it may not be mutating in a
consistent stepwise manner as it displays a bimodal distribution within haplogroup
P*(xR1a) (Thomas et al. 2000)) all analysis was also repeated without DYS388.
75
It should be noted that as all samples collected were unrelated at the paternal grandfather
level all TMRCA estimates were effectively that of the sample‟s paternal grandfathers.
Consequently after all TMRCA point estimates and confidence intervals were calculated
they were increased by two generations or 40 years to allow for the effect of sampling
strategy utilised in this study.
2.2.5.1. Y-time Parameters
The Y-time parameters used, with their corresponding Y-time code given in parenthesis,
are listed below:
Ancestral haplotype (anc) = „14 20 11 14 13‟ or „14 12 20 11 14 13‟, number of
chromosomes (n) = Various (see below), number of microsatellite loci (nloci) = 5 or 6,
mutation rate per generation under Simple Stepwise Mutation Model (mua) =
0.001925752 (Behar et al. 2003), mua under Linear Length Dependent Stepwise Mutation
Model (mua) = -0.004758677 (see Y-time user guide), mub under Linear Length
Dependent Stepwise Mutation Model (mub) = 4.46E-04 (see Y-time user guide), lower
and upper boundary for equal-tailed 95% confidence limits (q) = 0.025-0.975, upper
boundary for one-tailed 95% confidence limits (q) = 0.05 or 0.95, the number of
simulations to perform at each value of T (MCruns) = 1000 and population growth model
(Rgrowth) =Various (see below).
The ancestral haplotype was chosen based on its status as the modal haplotype in
Schemes A, B and C. This analysis does not take into account error in the choice of
ancestral haplotype.
2.2.5.2. Mutation Rate and Mutation Models
As there is limited data available for individual loci, the mutation rate used in this chapter
of 0.00193 is an average value of numerous pedigree-based estimates of tri and tetra NRY
microsatellites (Heyer et al. 1997; Bianchi et al. 1998; Forster et al. 1998; Kayser et al.
2000) as utilised by Behar et al. (2003). It should be noted that more refined estimates are
now available such as from Gusmao et al. (2005) but the average from this study is only
around 10% lower so should not greatly impact the results presented here, increasing any
date estimates by approximately 11%. While pedigree-based estimates tend to agree with
those estimated from sperm-based analysis (Holtkemper et al. 2001), those based on
unrelated population data (which involves counting the number of mutations in a
76
phylogenetic network for a population and calibrating against a known event in that
population) are almost 10-fold lower (Caglia et al. 1997; Forster et al. 2000), which
would increase any date estimates by 900%. While there is no clear consensus of what
methodologies of mutation rate estimation are most reliable, the number of studies
utilising pedigree-based analysis far outweigh that of the population-based approach and
there are also a number of assumptions applied by the population-based method (e.g. the
date used for calibration of mutation rate, only considering mutations that change by one
step), that may be leading to an underestimation of the mutation rate (Zhivotovsky et al.
2004).
Under a Simple Stepwise Mutation Model the mutation rate is independent of the number
of repeats and when a mutation occurs the repeat length will change by one repeat, with
an increase or decrease being equally likely (e.g. a loci with 12 repeats is equally likely to
mutate to 13 or 11 repeats). This concept can be extended to the Linear Length Dependent
Stepwise Mutation Model, a more realistic representation of the mutation process. Under
this model increases and decreases by one repeat size are again equally likely. However
the rate at which these mutations occur increases as a linear function of microsatellite
length (i.e. the greater the number of repeats, the more likely a mutation is to occur). This
mechanism is based on the principle that if mutations are occurring because of replication
slippage and can occur between any two adjacent repeat units with equal probability, the
more repeat units that are available the more likely a mutation will take place. The
changing mutation rate can be represented by the equation µ = a + bL, where µ is the
mutation rate, L is the repeat length at a particular time and a and b are constants.
2.2.5.3. Population Growth Models
Various population growth models (Rgrowth) were tested for Schemes A, B and C.
Below is description of the rationale for selecting the various growth models.
2.2.5.3.1. Star Genealogy
When a simple microsatellite network of all Y*(xBR, A3b2) Y-chromosomes in the Nso′
is drawn, the network strongly resembles a Star genealogy. However, the likely
genealogies of the samples used in Schemes A, B and C will probably be highly
correlated, resulting in an underestimation of confidence intervals.
77
2.2.5.3.2. Rgrowth=0
This setting results in an assumed genealogy of constant size, which, except in the case of
an extreme bottleneck, should take into account any uncertainty in the genealogy with
respect to the level of tree correlation. A consequence of this is that confidence intervals
are likely to be overestimated so this approach is conservative.
2.2.5.3.3. Other Rgrowth
When not assigned „STAR‟ or „0‟, Rgrowth is determined by two other independent
parameters, N and r, and are related by the following equation:
Rgrowth = N * r
where N = the current effective population size and r = instantaneous growth rate11
.
Separate values for N and r were considered (including r=0.05-see below) for Schemes A,
B and C respectively and are discussed below.
r = 0.05
r=0.05 as a suitable value for a lower boundary for the growth rate in a sub-Saharan
African population was adopted as a rough estimate having regard to calculations of
population sizes in sub-Saharan Africa during the period 400BC-1970AD (See Table
2.1.1 from Cavalli-Sforza, Menozzi & Piazza 1994).
Scheme A:
N=682
An effective population size (N) for Nso′ duy individuals with a Y*(xBR, A3b2) NRY
was estimated on the basis that (a) all males with Y*(xBR, A3b2) Y-chromosomes in the
Nso´ duy are paternal line descendants of the first fon, and no males with other Y-
chromosomes are descendants of the first fon, (b) there are 200,000 Nso′ (according to the
latest census (Second general census of population and housing of Cameroon. Volume
3:preliminary analysis 1987)) and half of the Nso′ are male, (c) 51/132 of Nso′ males are
duy (estimated from Nso′ DNA sample survey), (d) effective population size is typically
taken as 1/10th of the census population size, (e) samples were collected randomly from
members of the four classes and (f) nine chromosomes out of 51 duy were Y*(xBR,
A3b2). Therefore the effective population size (N) was calculated as:
11
The instantaneous growth rate assumes overlapping generations and a constant breeding period .
78
200,000 * 0.5 * 9 =682
132 * 10
r = 0.252 and 0.706
Two other estimates for the instantaneous growth rate were calculated using the
continuous population growth model based on features of the oral history:
lnx = lnx0 + rt
where x = current actual population size, x0 = initial actual population size, t = time in
generations and r = instantaneous growth rate.
As the interest here is in the TMRCA from the first fon, x0 = 1, while x = 200,000 * 0.5 *
(9/132) (as above) = 6818. Two different estimates of r were calculated using upper and
lower boundaries for the date of origin taken from alternative accounts of the oral
tradition (Mzeka 1990).
The lower boundary for the time of origin of the Nso´ from oral history is 700 years, or
35 generations at 20 years per generation. The lower boundary for the instantaneous
growth rate using the oral history is therefore:
r(lower) = (ln(6818) - ln(1))/35 = 0.252
The upper boundary for the time of origin of the Nso´ from oral history is 250 years, or
12.5 generations at 20 years per generation. The upper boundary for the instantaneous
growth rate using the oral history is therefore:
r(upper) = (ln(6818) - ln(1))/12.5 = 0.706
Scheme B:
N=833
An effective population size (N) for Nso′ nshiylav and mtaar individuals with a Y*(xBR,
A3b2) NRY was estimated on the basis that (a) all males with Y*(xBR, A3b2) Y-
chromosomes in the Nso´ nshiylav and mtaar are paternal line descendants of the Visale,
and no males with other Y-chromosomes are descendants of the Visale, (b) there are
200,000 Nso′ (according to the latest census (Second general census of population and
79
housing of Cameroon. Volume 3:preliminary analysis 1987)) and half of the Nso′ are
male, (c) 63/132 of Nso′ males are either nshiylav or mtaar (estimated from the Nso′
DNA sample survey), (d) effective population size is typically taken as 1/10th of the
census population size, (e) samples were collected randomly from members of the four
classes and (f) eleven chromosomes out of 63 nshiylav and mtaar were Y*(xBR, A3b2).
Therefore the effective population size (N) was calculated as:
200,000 * 0.5 * 11 = 833
132 * 10
r = 0.161 and 0.450
Two other estimates for the instantaneous growth rate were calculated using the
continuous population growth model based on features of the oral history:
lnx = lnx0 + rt
where x = current actual population size, x0 = initial actual population size, t = time in
generations and r = instantaneous growth rate
According to oral tradition, there were 30 Visali males when Princess Ngonnso′
encountered the Visale (x0 = 30) while x = 200,000 * 0.5 * (11/132) (as above) = 8333.
Two different estimates of r were calculated using upper and lower boundaries for the
date of origin taken from alternative accounts of the oral tradition (Mzeka 1990).
The lower boundary for the time of origin of the Nso´ from oral history is 700 years, or
35 generations at 20 years per generation. The lower boundary for the instantaneous
growth rate using the oral history is therefore:
r(lower) = (ln(8333) - ln(30))/35 = 0.161
The upper boundary for the time of origin of the Nso´ from oral history is 250 years, or
12.5 generations at 20 years per generation. The upper boundary for the instantaneous
growth rate using the oral history is therefore:
r(upper) = (ln(8333) - ln(30))/12.5 = 0.450
80
Scheme C:
N=1439
An effective population size (N) for Nso′ won nto´ and duy individuals with a Y*(xBR,
A3b2) NRY was estimated on the basis that (a) all males with Y*(xBR, A3b2) Y-
chromosomes in the Nso´ won nto´ and duy are paternal line descendants of the first fon,
and no males with other Y-chromosomes are descendants of the first fon, (b) there are
200,000 Nso′ (according to the latest census (Second general census of population and
housing of Cameroon. Volume 3:preliminary analysis 1987)) and half of the Nso′ are
male, (c) 69/132 of Nso′ males are either won nto or duy (estimated from the Nso′ DNA
sample survey), (d) effective population size is typically taken as 1/10th of the census
population size, (e) samples were collected randomly from members of the four classes
and (f) 19 chromosomes out of 69 won nto and duy were Y*(xBR, A3b2). Therefore the
effective population size (N) was calculated as:
200,000* 0.5 * 19 =1439
132 * 10
r = 0.274 and 0.766
Two other estimates for the instantaneous growth rate were calculated using the
continuous population growth model based on features of the oral history:
lnx = lnx0 + rt
where x = current actual population size, x0 = initial actual population size, t = time in
generations and r = instantaneous growth rate
As the interest here is in the TMRCA from the first fon, x0 = 1, while x = 200,000 * 0.5 *
(19/132) (as above) = 14394. Two different estimates of r were calculated using upper
and lower boundaries for the date of origin taken from alternative accounts of the oral
tradition (Mzeka 1990).
The lower boundary for the time of origin of the Nso´ from oral history is 700 years, or
35 generations at 20 years per generation. The lower boundary for the instantaneous
growth rate using the oral history is therefore:
r(lower) = (ln(14394) - ln(1))/35 = 0.274
81
The upper boundary for the time of origin of the Nso´ from oral history is 250 years, or
12.5 generations at 20 years per generation. The upper boundary for the instantaneous
growth rate using the oral history is therefore:
r(upper) = (ln(14394) - ln(1))/12.5 = 0.766
2.2.6. Comparison of duy vs nshiylav and mtaar genealogy depths
In order to establish whether the nshiylav and mtaar Y*(xBR,A3b2) genealogy was
deeper than that of the duy, the probability was estimated that the observed results would
be equal to or more extreme than the difference calculated between a) duy and b) nshiylav
and mtaar, assuming the two groups were from the same genealogy. This methodology is
described below.
The duy and nshiylav and mtaar Y*(xBR,A3b2) NRYs were grouped together (n=20) and
the Average Squared Distance for these samples calculated (ASD=0.0667 (0.06 without
using DYS388)). Trees were then simulated under this ASD value and the two mutation
and four demographic criteria described below.
For each simulated tree the 20 samples at the tips of the tree were randomly assigned to
either a group of final size n=9 (representing the duy) or a group of final size n=11
(representing the nshiylav and mtaar). The ASD was then calculated for each group. If the
ASD of the group with n=9 was equal to 0.0 (the ASD of the original duy) the pair of
ASD results were recorded. If the ASD of the group with n=9 was greater than 0.0 the
results were discarded. This process was repeated until 10,000 pairs of ASD values were
recorded where the group with n=9 had an ASD of 0.0.
A P-value was estimated by calculating the number of pairs of ASD values where the
difference between the two pairs was equal to or greater than 0.1212 (the ASD of the
original nshiylav and mtaar (0.1091 without using DYS388)) with P<0.05 taken as the
level of significance.
This analysis was performed for four demographic models and two mutation models, a
Simple Stepwise Mutation Model and a Linear Length Dependent Stepwise Mutation
Model.
82
The four demographic models used were:
a) „Star‟
b) Rgrowth=0
c) Rgrowth= 754.5041
d) Rgrowth=10,000,000
a) and b) were used as the these are the two most extreme demographic scenarios
possible.
c) was used as this is a more realistic demographic model and was calculated in a similar
manner as parameters described in section 2.2.5. Here N=1515 and r=0.498.
d) was used as it was an unfeasibly large growth model that was still not as extreme as a
„Star‟ demography
All genealogy comparisons were performed using adapted Y-time routines recoded in
Python (Code available on request from Krishna Veeramah).
2.3. Results and Discussion
2.3.1. The NRY and mtDNA distribution in the Nso΄
The modal NRY haplogroup in the won nto´ was Y*(xBR,A3b2) with a frequency of
55.6% (See Table 2.1). This haplogroup was also found at a frequency of 17.9% in the
duy. Furthermore, all of these Y*(xBR,A3b2) chromosomes had the same microsatellite
haplotype (14-12-20-11-14-14) (see Supplementary Table 2S.1 for all relevant NRY
data). For convenience only this NRY type and the associated microsatellite haplotype is
referred to as the won nto´ Modal haplotype (WMH). The modal NRY haplogroup in the
non-won nto´ social classes was E3a with a diverse range of NRY types at the
microsatellite haplotype level (h= 0.94 ± 0.01). Y*(xBR,A3b2) NRYs were found in the
other non-royal social classes but these included microsatellite haplotypes that were 1-3
mutation steps different from the WMH, suggesting that they had originated in the won
nto´ or paternal ancestors of a founder of the won nto´ some time ago and subsequently
diverged from the WMH. This accords with Nso´ rules of class inheritance.
83
Table 2.1: Distribution of NRY haplogroups (NRY at UEP level) in the four
Nso´ social classes.
Assigned NRY
haplogroup
Sample Cultural Identity
won nto´
(n=18)
duy
(n=51)
mtaar
(n=21)
nshiylav
(n=42)
Total
(n=132)
P*(xR1a) 0 (0.000) 1 (0.020) 0 (0.000) 0 (0.000) 1 (0.008)
BR*(xDE,JR) 2 (0.111) 0 (0.000) 0 (0.000) 1 (0.024) 3 (0.023)
E*(xE3a) 0 (0.000) 2 (0.039) 0 (0.000) 1 (0.024) 3 (0.023)
Y*(xBR,A3b2) 10 (0.556) 9 (0.176) 3 (0.143) 8 (0.190) 30 (0.227)
E3a 6 (0.333) 39 (0.765) 18 (0.857) 32 (0.762) 95 (0.720)
Note. Figures indicate the number of NRY characterised while relative frequencies are
shown in brackets. Haplogroup nomenclature is that proposed by the Y-chromosome
Consortium (2002).
In regard to expectations inferred from the Nso´s' declared social practices the frequency
(at approximately 56%) and extreme homogeneity of Y*(xBR,A3b2) observed in the won
nto´ made the WMH the likely candidate to be the NRY type possessed by Nso´ fons and
the knowledge that a high status man generally considered to be a paternal descendant of
a recent fon possessed the WMH confirmed this. A Pearson's chi-square goodness of fit
test was performed to test the deviation of the observed WMH frequency in the won nto´
from the expectation of the proportion of individuals who possess the fon's NRY type
from both Royal Social Status Rule A and Royal Social Status Rule B respectively. That
Royal Social Status Rule A has been followed could not be rejected at the 1% level (and
only barely at the 5% level) (Chi-square test against upper limit of expected frequency of
46.7%: P = 0.45, X2 = 0.567, df =1, and lower limit of 33.4%: P = 0.046, X
2 = 3.97, df
=1) in contrast to Royal Social Status Rule B for which non-compliance with the rule is
statistically significant (Chi-square test against upper limit of expected frequency of
24.1%: P = 0.001, X2=9.73, df = 1 and lower limit of 1.0%: P < 0.0001, X
2 = 541.15, df
=1). While these tests are dependent on twin assumptions of random sampling and equal
reproductive success of non-fon won nto’ males, both of which may not hold exactly, the
size of the difference in P-values is strongly indicative of a real effect. This support for
Royal Social Status Rule A is notable given that the WMH types appear in non- won nto´
males at a low frequency and therefore fon NRY types could enter the won nto´ through
84
non-won nto´ males resulting in the prior expectation being an underestimate. These data
support male line continuity of Nso´ fons up to at least the fourth generation and the
WMH can thus be considered a likely candidate for the NRY type passed down from the
first fon of Nso´. Also, there was insufficient statistical support to reject the hypothesis
that the frequency of Y*(xBR,A3b2) found in the duy is in accordance with expectations
based on declared social practices (Chi-square test against expected frequency of 12.5%:
P = 0.26, X2=1.23, df=1). There was no statistical difference in the frequency of mtDNA
types (see Supplementary Table 2S.2) in combined or pairwise comparisons among the
four Nso´ classes (Global ETPD P-value=0.82±0.05, pairwise ETPD P-values > 0.25).
Therefore the pattern of both NRY and mtDNA variation in the Nso´ was in concordance
with expectations based on Royal Social Status Rule A.
2.3.2. Association of the Y*(xBR,A3b2) lineage with the indigenous hunter-gatherer
Visale
The merits of the two versions of the oral history were then examined. This first required
the investigation of the likely origins of Y*(xBR,A3b2). To see whether it is credible that
Y*(xBR,A3b2) is one of the NRY lineages replaced by the EBSP as proposed by
Underhill et al. (2001), samples were analysed from the Cameroon Grassfields, including
the Nso´, (total n=1213) alongside unreported data held in TCGA database consisting of
sample sets collected from across sub-Saharan Africa, including from the region of the
EBSP (n=8072). The frequencies of E3a and Y*(xBR,A3b2) NRYs (again characterised
by a battery of twelve UEPs and six microsatellites) were compared. Consistent with the
suggestion of Underhill et al (2001) E3a was the most common haplogroup within each
population (lowest population frequency= 46.3%, mean=80.2%, standard
deviation=0.149), except in Ethiopia, Sudan and the Lake Chad region of northern
Cameroon where the EBSP is not believed to have had a major impact, while the
frequency of Y*(xBR,A3b2) never exceeded 14%. In eight of the populations examined
(northern Cameroon, north eastern Ghana, Mozambique, western Senegal, Sudan,
Tanzania, Uganda and Zimbabwe) Y*(xBR,A3b2) was not represented. Y*(xBR,A3b2)
was represented, however, in eleven other, widely distributed, populations (southern
Cameroon*, Grassfields of Cameroon*, Ethiopia*, north western Ghana, south eastern
Ghana, south western Ghana*, central Malawi, northern Malawi, Pretoria-South Africa,
Cross River region-Nigeria and southern Senegal*). In the five populations in which the
Y*(xBR,A3b2) count was greater than 10 (indicated by an asterisk in the list above) the
85
modal haplogroup E3a had an among-group variance, assessed using AMOVA
(Excoffier, Smouse & Quattro 1992; Michalakis & Excoffier 1996), of 1.97%, a low
figure (relative to other haplogroups) which is consistent with either a recent common
origin or high inter-group gene flow and low effective population size. The putative
replaced Y*(xBR,A3b2), on the other hand, had a high among-group variance of 87.31%
which is consistent with inter-group isolation12
. In previous publications Y*(xBR,A3b2)
(or haplogroups of relative equivalence) has been reported at 20-45% (Hammer et al.
1998; Scozzari et al. 1999; Underhill et al. 2001) in Khoisan groups, which have origins
that pre-date the EBSP. This distribution therefore suggests that Y*(xBR,A3b2) could be
common in hunter-gather populations that predate the EBSP.
To establish that the WMH was not common in other groups inhabiting the Grassfields or
the land to the east, including the Tikar plain, the NRY of males from 10 other
neighbouring ethnic groups (n=780) (Table 2.2) were analysed. Only one self-declared
non-Nso´ had a Y*(xBR,A3b2) chromosome and this individual was born in Kumbo, the
Nso´ capital.
A PCO plot (Figure 2.5) based on a pairwise FST distance matrix calculated using NRY
haplogroup frequencies clearly distanced the won nto´ from both the other Nso´ social
classes and the other ethnic groups, demonstrating that high frequencies of
Y*(xBR,A3b2) is not typical of the Grassfields and Tikar plain NRY profiles.
Accordingly, as Y*(xBR,A3b2) is typical of a hunter-gather population and WMH is the
most likely candidate to be the NRY type of the father of the first fon of Nso´, the NRY
data favour the oral tradition of the Princess marrying an indigenous Visale from which
all subsequent fons descend.
12
It should be noted that the AMOVA among-group variance is used following the approach of Di
Giacomo et al (2004) as a convenient statistic for comparing the distribution of haplotypes within a single
haplogroup where the haplogroup is present in multiple ethnic groups (in this case haplogroups E3a and
Y*(xBR,A3b2)). In doing so E3a and Y*(xBR,A3b2) are treated as separate haploid populations from
which samples have been selected at random. No inferences are drawn other than that low among-group
variance is consistent with a recent common origin or gene flow between the members of the haplogroup
and high among-group variance is consistent with isolation of the separate collections of representatives of
the haplogroup.
86
Table 2.2: Distribution of NRY haplogroups in the peoples of the western Grassfields and Tikar plain.
Assigned NRY
haplogroup
Cultural identitya
A
(n=99)
B
(n=66)
BT
(n=30)
Bl
(n=20)
Bm
(n=152)
K
(n=75)
M
(n=154)
T
(n=81)
W
(n=56)
Y
(n=47)
Total
(n=780)
P*(xR1a) 0
(0.000)
0
(0.000)
0
(0.000)
0
(0.000)
0
(0.000)
1
(0.013)
3
(0.019)
1
(0.012)
0
(0.000)
0
(0.000)
5
(0.006)
BR*(xDE,JR) 5
(0.051)
0
(0.000)
0
(0.000)
0
(0.000)
10
(0.066)
1
(0.013)
2
(0.013)
8
(0.099)
2
(0.036)
1
(0.021)
29
(0.037)
E*(xE3a) 0
(0.000)
0
(0.000)
0
(0.000)
0
(0.000)
6
(0.039)
1
(0.013)
7
(0.045)
2
(0.025)
0
(0.000)
1
(0.021)
17
(0.022)
A3b2 0
(0.000)
0
(0.000)
0
(0.000)
0
(0.000)
0
(0.000)
0
(0.000)
8
(0.052)
0
(0.000)
0
(0.000)
0
(0.000)
8
(0.010)
Y*(xBR,A3b2) 0
(0.000)
0
(0.000)
0
(0.000)
0
(0.000)
0
(0.000)
0
(0.000)
0
(0.000)
0
(0.000)
1
(0.018)
0
(0.000)
1
(0.001)
E3a 94
(0.949)
66
(1.000)
30
(1.000)
20
(1.000)
136
(0.895)
72
(0.960)
134
(0.870)
70
(0.864)
53
(0.946)
45
(0.957)
720
(0.923)
NOTE.-Figures indicate the number of NRY characterised while relative frequencies are shown in brackets. Haplogroup nomenclature is that
proposed by the Y-chromosome Consortium (2002). aA = Aghem speakers, located in Wum; B = Bafut speakers, located in Bafut and did not
declare a Tikar ethnic identity; BT = Bafut speakers, located in Bafut and declared a Tikar ethnic identity; Bl = Bamileke speakers, located
throughout the western Grassfields and Tikar plain after, it is claimed, being displaced from their homeland of Mbam living on the Tikar plain;
Bm = Bamun speakers, located in Foumban; K = Kwandja speakers, located in towns in the north-eastern region of the Tikar plain such as
Nyamboya; M = Mambila speakers located in towns near the Nigerian border on the Tikar plain, such as Atta, Somie and Songkolong as well as
Mayo Darle; T = Tikar speakers, located in towns on the Tikar plain such as Magba, Sabongari and Bankim; W = Wimbum speakers, located in
Nkambe; Y = Yamba speakers, located in towns throughout the Tikar plain, such as Sabongari, Magba, Bankim, Somie, Songkolong and Atta as
well as Mayo Darle.
87
Figure 2.5: PCO plot of UEP-based population pairwise FST values. The PCO plot is constructed using pairwise genetic
distances, FST, between the four Nso´ classes (labelled by name) and other populations of the western Grassfields and
Tikar Plain (labelled using abbreviations as defined in Table 2.2). PCO1 and PCO2 explain 97.91% and 1.92% of the
variation respectively.
88
2.3.3. Dating of the Y*(xBR,A3b2) lineage in the Nso´
Given the close match between the distribution of NRY types and one of the two main
versions of the foundation story of the ruling dynasty and the potentially large number of
offspring likely to be descended from the founder of the line, the time to the most recent
common ancestor (TMRCA) of the randomly collected Y*(xBR,A3b2) NRYs observed
in different social classes were estimated to investigate specific aspects of Nso´ history.
Oral history suggests a time since the first fon of some 250-700 years before present
(Mzeka 1990).
Considering the rules of social class inheritance and the previous assertion that the first
ever fon of Nso´ carried a WMH it was reasonable to postulate that all sampled duy with
a Y*(xBR,A3b2) chromosome were male line descendants of the WMH carrying first
fon. Consequently, in order to see how similar an estimate of the TMRCA for all duy with
a Y*(xBR,A3b2) chromosome was to the period suggested by oral history, the method of
Behar et al. (2003) was applied using both a Simple Stepwise Mutation Model and a
Linear Length Dependent Mutation Model as well as utilising a variety of demographic
models to compute associated confidence intervals (see Methods and Materials Section
2.2.5. for a full explanation). As all individuals analysed had the same microsatellite
haplotype (the WMH) the actual point estimate of the TMRCA was non-informative
(Average Squared Distance (ASD) = 0.0 with and without DYS388) but the upper limit of
the 95% one-tailed confidence interval (CI) under realistic demographic models was 1035
years (assuming an intergeneration time of 20 years13
) (1112 years without using
DYS388) (see Supplementary Table 2S.7 for associated confidence intervals). Therefore
these data are consistent with the oral history that suggests that the founding of the Nso´
Royal family was a recent event that occurred within the last 1000 years. Even under a
more conservative demographic model of constant population size, which is known not to
be the case for the general population in recent centuries and is unlikely for a paternally
inherited genetic system possessed by an agnatically defined elite social group practicing
male polygamy, the upper limit was 1497 years (1771 years without using DYS388). This
analysis was also repeated with the addition of using the Y*(xBR,A3b2) NRYs found in
the won nto´, who also all descend from the first fon, but the results are not reported here
13
An intergeneration time of 20 years was applied after consultation with David Zeitlyn who is specialises
in working within the Grassfields region.
89
(though they can be found in the Supplementary Table 2S.7). This was because the won
nto´ may enjoy a reproductive advantage due to their elevated social position (as least so
far as a reigning fon is concerned), which is likely to inflate the contribution made by
their recent shared ancestry to the TMRCA calculation, and thus adversely affect the
confidence intervals for the TMRCA estimates on these samples (in this case by reducing
it).
Analysis of the Y*(xBR,A3b2) NRYs in the nshiylav and the mtaar gave an ASD of
0.1212 (0.1091 without DYS388). This gave a TMRCA point estimate of 1299 years
[64.94 generations] (1173 years without using DYS388 [58.65 generations]) under the
Simple Stepwise Mutation Model and 1672 years [83.606 generations] (1351 years
without using DYS388 [67.57 generations]) under the Linear Length Dependent Stepwise
Mutation Model (combined two-tailed 95 % CIs for both estimates: 176-7119 years
(116-6293 years without using DYS388)). The upper limit of the CI for the nshiylav and
the mtaar TMRCA estimate is much older than that for the duy. However, as both CIs
overlap it was not possible, from this analysis alone, to distinguish between the depths of
the genealogies of these two groups.
Given that the ancestral haplotype predicted for a) the duy Y*(xBR,A3b2) NRYs, and b)
the nshiylav and mtaar Y*(xBR,A3b2) NRYs is identical (the WMH), if the depths of the
genealogies of the two groups, a) and b), was the same, it is likely that they would share
the same MRCA at the root of a common genealogy. As a consequence the expectation
would be that the Average Square Distance (ASD) of a) duy and b) nshiylav and mtaar
Y*(xBR,A3b2) NRYs combined would be similar. If, however, the nshiylav and mtaar
had an older genealogy than did the duy then the expectation would be that the ASD of
the nshiylav and mtaar combined would be greater than that of the duy. Numerous
genealogies of final generation size n = 20 (representing the total number of duy (n=9),
nshiylav and mtaar (n=11) Y*(xBR,A3b2) NRYs) were simulated under the ASD value
of all duy, nshiylav and mtaar Y*(xBR,A3b2) NRYs combined (0.067 with DYS388,
0.060 without DYS388) under various demographic and mutation models. Individuals
were then randomly assorted at the tips of the simulated trees to one of two groups of size
n=9 (representing the original duy) and n=11 (representing the original nshiylav and
mtaar) and the ASD of the two groups calculated to estimate the probability of observing
a duy / nshiylav and mtaar ASD difference equal to or more extreme than that calculated
90
in the survey if the two groups, a) and b), shared a common MRCA (see Methods and
Materials Section 2.2.6. for full explanation). The survey-based data were for the duy
(ASD = 0.0) and for the nshiylav and mtaar (ASD with DYS388 = 0.1212, ASD without
DYS388 = 0.1091). Under all demographic and mutation models significantly low P-
values (P<0.05) were obtained, except under a star genealogy where the P-value was
approaching significance (P=0.093) (Table 2.3). As a star genealogy is not a particularly
realistic demographic model in this case it is reasonable to reject the hypothesis that a)
and b) share the same MRCA and therefore assert that the nshiylav and the mtaar have a
significantly older genealogy than the duy.
Table 2.3: Comparison of the depth of two genealogies. The probability of
observing results equal to or more extreme than the difference between the
Average Square Distance values of a) the duy and b) the nshiylav and mtaar
combined. (Three independent run simulations for each set of criteria)
Demographic Model P-value
P-value minus
DYS388
SSM L-SMM SSM L-SMM
Star genealogy
Run 1 0.090 NA 0.061 NA
Run 2 0.091 NA 0.058 NA
Run 3 0.093 NA 0.062 NA
Rgrowth=0
Run 1 0.003 0.003 0.002 0.002
Run 2 0.003 0.004 0.003 0.003
Run 3 0.003 0.003 0.002 0.003
Rgrowth=894.21
Run 1 0.018 0.021 0.016 0.017
Run 2 0.019 0.023 0.016 0.017
Run 3 0.017 0.022 0.016 0.017
Rgrowth=10,000,000
Run 1 0.047 0.052 0.032 0.038
Run 2 0.044 0.054 0.034 0.036
Run 3 0.043 0.049 0.029 0.034
NOTE.-SSM = Single Stepwise mutation model. L-SMM = Linear Length Dependent
Stepwise Mutation Model.
91
This finding suggests that the Y*(xBR,A3b2) NRYs in the nshiylav and the mtaar
descend not just from individuals of the Royal social class but also from those Visale
individuals that were not made part of the royal family when the Princess arrived and
instead were made commoners, as the hunter-gatherer Visale would be expected to have a
much older TMRCA than the duy. This is consistent with the previously held belief that
the indigenous Visale accepted the rule of the Princess and her heir and became a mtaar
lineage (there are believed to be approximately 20 existing mtaar lineages (Chilver &
Kaberry 1968) ) with the condition that all future fons must have a mother that is of mtaar
social class (Mzeka 1978).
2.3.4. The possible evolution of a relaxed patrilineal system of descent for the won nto´
When testing whether the observed distribution of Nso´ NRY types met expectations of
social practice, though Royal Social Status Rule A could not be rejected, the observed
frequency of the WMH at 55.6% was somewhat above the expected range. A higher than
expected frequency is not a problem in the subsequent analysis since male line continuity
of fons has been clearly demonstrated, permitting the definition of a putative NRY type
for the first fon of Nso´. Examination of the sociological data collected along with the
DNA of Nso´ males (see Methods and Materials) show that 15 of 18 males (83.3%)
inherited won nto´ status through their father and paternal grandfather (one further won
nto´ male had a won nto´ father but no won nto´ grandfather) while their mothers and
paternal grandmothers were of other social classes (see Table 2.4). The two remaining
won nto´ males appear to have inherited their won nto´ status through the matrilineal line.
Given that the sampling strategy utilised in this study (described in full in the Methods
and Materials section) under-records the proportion of fon NRY types in the actual
population, the elevated frequency of the WMH is striking as is the extremely large
number of individuals claiming won nto´ membership through paternal inheritance
compared to those with affiliations through a uterine connection. One possible
explanation is that the Nso´ royal family may have evolved or is evolving into a more
patrilineally defined group. An almost strictly patrilineal model of won nto´ status
inheritance, where non-patrilineally inherited membership of the won nto´ is restricted to
children of a fon's daughter, was named Royal Social Status Rule C (see Supplementary
Section 2S.1). This rule generates an expected range of 50.7%-93.1% for fon NRY types
given the sampling strategy used (estimated using a methodology similar to that used for
the expectations for Royal Social Status Rules A and B). While the lower limit of 50.7%
92
appears a reasonable fit (P = 0.68, X2 = 0.17, df =1) to our observed data the upper limit
(93.1%) shows a significant deviation (P < 0.0001, X2 = 39.49, df = 1). A more relaxed
model, Royal Social Status Rule D (see Supplementary Section 2S.1), where won nto´
membership is restricted to paternal line inheritance plus inheritance through a line of
three generations containing only one female, generates an expected range of 40.7%-
53.9% for the percentage of males with a fon‟s NRY that may be expected to be sampled.
In this case neither the upper nor lower limit can be rejected using a Pearson‟s Chi Square
test (Upper limit: P = 0.89, X2=0.02, df = 1; Lower limit: P = 0.20, X
2 = 1.65, df = 1).
Given the above it is possible that the Nso´ royal inheritance system may in practice be
more patrilineal than previously described. Clearly further field work may establish
whether this is in fact the case. The limited exploration of the rules of won nto´ affiliation
undertaken with Nso´ elders described above suggests that continuing development of the
rules of won nto´ membership is a possibility.
2.4. Conclusion
It is frequently difficult to establish in what manner, where and when events that are the
subject of oral history occurred, even in accounts in which categorical assertions are
made. Nevertheless, such narratives can prove valuable sources from which information
can be extracted. Confidence in conclusions reached from the analysis of oral tradition is
increased when they are supported by data from other sources, for example linguistics and
archaeological excavation. This study has shown that the distribution of NRY and
mtDNA is consistent with an oral history that describes a) fusion of an indigenous hunter-
gatherer group with later migrants and b) paternal descent of the ruling dynasty from the
indigenous inhabitants of the land over the period covered by the oral history.
The frequency of the won nto´ Modal Haplotype (WMH) in the won nto´ social class
accords very well with what one would predict from population genetic theory and the
sampling strategy utilised in this study and illustrates the power of genetic anthropology
to confirm the genetic consequences of social practices and labels. Notably support has
been provided for one description of the social system put forward by local researchers as
opposed to that advanced by western-based scholars. In this study it has also been
93
illustrated how, in the investigation of the histories of groups living in sub-Saharan
Africa, genetic analysis may prove a valuable additional tool in the armoury of scholars
seeking to elucidate, on a fine-scale, the pre-histories of sub-Saharan African populations.
Table 2.4: Cultural identity of won nto´ males sampled in the study as well
as the cultural identity of each sample’s father, mother, father's father and
mother's mother.
Sample
Identifier
Self-declared
cultural
identity
Father's
cultural
identity
Father's father's
cultural identity
Mother's
cultural
identity
Mother's
mother's
cultural identity
NSO-01 won nto´ won nto´ won nto´ duy Nso´
NSO-02 won nto´ won nto´ won nto´ nshiylav Bamun
NSO-03 won nto´ won nto´ won nto´ mtaar mtaar
NSO-04 won nto´ won nto´ won nto´ duy Nso´
NSO-05 won nto´ won nto´ won nto´ nshiylav nshiylav
NSO-06 won nto´ won nto´ won nto´ nshiylav nshiylav
NSO-07 won nto´ won nto´ won nto´ duy nshiylav
NSO-08 won nto´ won nto´ won nto´ duy duy
NSO-09 won nto´ won nto´ won nto´ won nto´ mtaar
NSO-10 won nto´ won nto´ won nto´ mtaar nshiylav
NSO-11 won nto´ won nto´ won nto´ Nsungli Wimbum
NSO-12 won nto´ won nto´ won nto´ Nso nshiylav
NSO-13 won nto´ won nto´ won nto´ mtaar mtaar
NSO-14 won nto´ won nto´ won nto´ nshiylav duy
NSO-15 won nto´ won nto´ won nto´ duy mtaar
NSO-16 won nto´ won nto´ Nooni nshiylav duy
NSO-17 won nto´ Duy duy won nto´ won nto´
NSO-18 won nto´ Nshiylav nshiylav won nto´ won nto´
94
2.5. Supplementary Section for Chapter 2
Because of their large size, for Supplementary Tables 2S.1, 2S.2 and 2S.7 please see
attached CD-ROM.
2.5.1. Supplementary Section 2S.1: The expectation of NRY type frequencies in the won
nto´ and duy of the Nso´.
Within the main text it is stated that:
If Royal Social Status Rule A has been followed it would be expected that 33.4% - 46.7%
of won nto´ Nso´ males sampled would share identical NRY types while this same NRY
type would be expected asymptotically to approach a frequency of 12.5% in won nto´-
descended-duy, depending on the number of generations since the original fon. However,
if Royal Social Status Rule B has been followed, it would be expected that only 1.0% -
24.1% of sampled won nto´ males would share the same NRY type. This same NRY type
would be expected to be at a frequency of 12.5% in the won nto´-descended-duy,
irrespective of the number of generations of descent from a fon.
Below is a description of how the above conclusions were elucidated.
Assumptions:
For the expected frequencies in the won nto´:
(a) All fons with paternal descent from the first fon will share the same NRY
type.
(b) Male and female births and survival rates are similar.
(c) Generations do not overlap.
(d) Female won nto´ marry males who are not patrilineal descendants of the
first fon.
95
2.5.1.1. Royal Social Status Rule A
Royal Social Status Rule A can be described as: individuals are assigned won nto´ status
if either for up to four generations they are descendants of a fon along agnatic lines
(interpreted to mean, when expressed unambiguously, as „an exclusively paternal line of
inheritance‟) or for up to three generations they are descendants of a fon along uterine
connexions (mixed gender or strictly matrilineal lineages).
Ignoring duy status acquired by other means, individuals are duy if either they are
descendants of a fon along agnatic lines of not less than five generations or are
descendants of a fon for not less than four generations along lines with uterine
connexions. Inheritance of duy status is thereafter patrilineal.
The genealogy of the won nto´ descendants of a fon over four generations is illustrated
below (Supplementary Figure 2S.1).
Supplementary Figure 2S.1: Lineage tree showing the relationship of won
nto´ individuals under Royal Social Status Rule A. (M = male offspring, F =
female offspring, * = this individual inherits the same NRY type as a fon).
96
won nto´ males of the current generation are a summation of the following:
a) sons of the current fon.
b) grandsons of the previous fon.
c) great grandsons of the second previous fon.
d) great great grandsons of the third previous fon through exclusively paternal descent.
If every fon has an equal number of sons and daughters (the total of which can be any
even number) as well as one extra son who becomes the next fon, while all other
individuals have only one son and one daughter, then, as can be seen in Supplementary
Figure 2S.2, the relative proportions of a)-d) individuals in the current generation of male
won nto´ would be: a) 0.125, b) 0.25, c) 0.5 and d) 0.125.
Supplementary Figure 2S.2: Diagram showing the relative contributions of
different won nto´ lineages to the won nto´ under Royal Social Status Rule
A.
The percentage of males in a)-d) who would share the same NRY type (*) of a fon would
be: a) 100%, b) 50%, c) 25% and d) 100%.
97
Therefore the relative proportions of a)-d) individuals in the current generation of male
won nto´ who possess the NRY* type would be:
a) 100% * 0.125 = 0.125
b) 50% * 0.25 = 0.125
c) 25% * 0.5 = 0.125
d) 100% * 0.125 = 0.125
The proportion of males in the current generation who would share the NRY* type is the
sum of the above i.e. 0.5. Therefore in any one generation it would be expected that half
of won nto´ males would share the same NRY type. However, since it is not reasonable to
assume that won nto´ have only two children and that sampling of individuals who are the
brother, father, son or paternal line cousin of an individual from whom a buccal swab has
already been collected is not permitted the expectation is different (as now described).
If it is assumed a) that sampling is from both the most recent and second most recent
generation of adult won nto´ males and that the third most recent generation are not
sampled, b) that individuals are not sampled who are the brother, father, son or paternal
line cousin of another subject, c) the number of children a fon has (excluding his heir) is
„2n‟ („n‟ of whom are males) and d) that the number of children a non-fon won nto´ has is
„2y‟ („y‟ of whom are males), the proportion of males who possess a fon‟s NRY type that
are expected to be sampled if rule A has been followed is:
( 1 / ( y + 1 ) ) ( n ( 1 + y )2 + 1) + 1
( 1 / ( y + 1 ) ) ( 3n ( 1 + y )2 + 1 ) + 1
The above expression assesses the probability of sampling the different individual
lineages included in figure Supplementary Figure 2S.3 given the sampling strategy
utilised in this study (see Supplementary Table 2S.3 for probabilities). To assist
understanding of the approach adopted a description of how the proportions of individuals
belonging to a specific lineage were calculated is given below.
The proportions of individuals of lineage representative (LR) 2 in Supplementary Figure
2S.3 from whom buccal swabs are taken is calculated as follows: calculate the probability
of sampling individuals of LR 2 in the population rather than their sons (LR 3). This is a
98
Supplementary Figure 2S.3: Lineage tree showing the relationship of won nto´ individuals under Royal Social Status Rule
A for the two most recent adult generations of won nto´ males. (M = male offspring, F = female offspring, * = this individual
inherits the same NRY type as a fon. Numbers refer to specific Lineage Representatives).
99
function of the relative contributions of LRs 2 and 3 combined to the total population.
The contribution of LR 2 will be, starting from Fon4 and working down the lineage, „n‟ *
„y‟ * „y‟ or ny2. Similarly the contribution of LR 3 will be ny
3. Therefore the probability
of sampling from LR 2 rather than LR 3 will be the contribution of LR 2 divided by the
contribution of the sum of LR 2 and LR 3 i.e. ny2
/ (ny2 + ny
3). As expressed in b) above
only one individual of LR 2 is sampled among all that share the same father and paternal
grandfather. Consequently the maximum number of LR 2 that can be sampled is ‘n’. The
number of males that can be sampled of LR 2 is therefore ‘n’ multiplied by ny2
/ (ny2 +
ny3), as shown in the second row of Table A1, while the number for LR 3 is ‘ny’
multiplied by ny3
/ (ny2 + ny
3) as shown in the third row of Supplementary Table 2S.3.
Supplementary Table 2S.3: Probability of sampling Nso´ Y-chromosomes
given Royal Social Status Rule A.
Lineage Representative
Total number of males of this lineage
type that can be sampled conditional on
the probability of sampling this individual
in terms of 'n' and 'y'
Proportion of the
lineage possessing a
fon's NRY type
1 ny ( ny3 / ( ny
4 + ny
3 ) ) 1
2 n ( ny2 / ( ny
3 + ny
2 ) ) 1
3 ny ( ny3 / ( ny
3 + ny
2 ) ) 1
4 ny ( ny2 / ( ny
3 + ny
2 ) ) 0
5 ny / ( ny
2 + ny
) 1
6 n ( ny2 / ( ny
2 + ny
) ) 1
7 ny ( ny2 / ( ny
2 + ny
) ) 0
8,9,10 1 1
11 n ( ny / ( ny
+ n
) ) 0
12 n ( ny / ( ny
2 + ny
) ) 0
13 n ( ny2 / ( ny
2 + ny
) ) 0
14 ny ( ny2 / ( ny
2 + ny
) ) 0
15 n ( ny2 / ( ny
3 + ny
2 ) ) 0
16 ny ( ny2 / ( ny
3 + ny
2 ) ) 0
NOTE.-'2n' is the number of children a fon has ('n' of which are male). '2y' is the
number of children a won nto' has ('y' of which are male).
100
The process is repeated for each lineage in Supplementary Figure 2S.3 and summed to
yield the total number of won nto´ sampled. Some of the lineages are patrilineal with an
origin in a fon and will therefore have the same NRY type as the fon. These lineages have
a probability of having a fon NRY of ‘1’ as shown in column 3 of Supplementary Table
2S.3. Summing all the lineages yields the total number of individuals who share the fon’s
NRY type expected to be sampled. This figure is divided by the total number of samples
(described above) to calculate the proportion of won nto´ expected to share the fon’s
NRY. The expression above is an algebraic simplification performed following
summation.
Given a range of different combinations of „n‟ and „y‟ extending from 1 to 25 (to take
account of uncertainty concerning the real values of „n‟ and „y‟) and applying the above
expression it is observed that, under Royal Social Status Rule A and the sampling strategy
utilised, 33.4% - 46.7% of won nto´ males tested will possess a fon‟s NRY type.
For the expected frequencies in the duy:
All the assumptions above apply as well as:
(a) Only duy who descend from the first fon are considered; those individuals who
acquire duy status in other ways are not considered e.g. because of claimed
royal descent originating in other ethnic groups incorporated into the Nso´
empire.
(b) duy status once acquired is inherited in a strictly patrilineal manner.
(c) duy do not marry won nto´ that have paternal line descent from a fon.
Supplementary Figure 2S.4 illustrates the transition of a fon‟s descendants from won nto´
to duy.
Section continues overleaf…
101
Supplementary Figure 2S.4: Lineage tree showing the transition of won nto´
to duy under Royal Social Status Rule A. (M = male offspring, F = female
offspring, * = this individual inherits the same NRY type as a fon). Duy are
shown in red.
In Supplementary Figure 2S.4 there are eight male duy of the present generation, only one
of whom possesses a fon‟s NRY. All things being equal, every fon should contribute the
same number of male duy individuals, 12.5% of whom possess a fon‟s NRY. However,
when sampling from the most recent generation the present and three previous fons will
not have had sufficient descendant generations to contribute any duy while the fon of four
generations ago will have contributed seven of the eight males. Nevertheless he would not
have produced the one duy with a fon‟s NRY. If sampling from the most recent
generation the frequency of male duy with a fon‟s NRY will approach 12.5% but never
reach it. The more fons there have been since the first fon, the closer the proportion
102
approaches 0.125. Note that 12.5% is independent of both „n‟ and „y‟ and the sampling
strategy utilised in this study.
2.5.1.2. Royal Social Status Rule B
Royal Social Status Rule B can be described as: a person is a member of won nto´ (down
to the fourth generation (if a man) and third generation (if a woman)) if she or he is both a
child of a won nto´ and a descendant of a fon.
The genealogy of the won nto´ descendants of a fon over four generations is illustrated
below (Supplementary Figure 2S.5).
Supplementary Figure 2S.5: Lineage tree showing the relationship of won
nto´ individuals under Royal Social Status Rule B. (M = male offspring, F =
female offspring, * = this individual inherits the same NRY type as a fon).
won nto´ males of the current generation are a summation of the following:
a) sons of the current fon.
b) grandsons of the previous fon.
c) great grandsons of the second previous fon.
d) great great grandsons of the third previous fon.
103
If every fon has an equal number of sons and daughters (the total of whom can be any
even number) as well as one extra son who becomes the next fon, and all other
individuals have only one son and one daughter, then, as can be seen in figure
Supplementary Figure 2S.6, the relative proportions of a)-d) individuals in the current
population of male won nto´ would be: a) 0.067, b) 0.133, c) 0.267 and d) 0.533.
Supplementary Figure 2S.6: Diagram showing the relative contributions of
different won nto´ lineages to the won nto´ under Royal Social Status Rule
B.
The percentage of males in a)-d) who would share the same NRY type (*) of a fon would
be: a) 100%, b) 50%, c) 25% and d) 12.5%.
Therefore the relative proportions of a)-d) individuals in the current population of male
won nto´ who possess the NRY* type would be:
a) 100% * 0.067 = 0.067
b) 50% * 0.133 = 0.067
c) 25% * 0.267 = 0.067
d) 12.5% * 0.533 = 0.067
The proportion of males in the current population who would share the NRY* type is the
sum of the above i.e. 0.27. Therefore in any one generation it would be expected that just
over one quarter of won nto´ males would share the same NRY type. However, since it is
104
not reasonable to assume that won nto´ have only two children and sampling individuals
who are the brother, father, son or paternal line cousin of an individual from whom a
buccal swab has already been collected is not permitted the expectation is different (as
now described).
If it is assumed a) that sampling is from both the most recent and second most recent
generation of adult won nto´ males and that the third most recent generation are not
sampled, b) that individuals are not sampled who are the brother, father, son or paternal
line cousin of another subject, c) the number of children a fon has (excluding his heir) is
„2n‟ („n‟ of whom are males) and d) that the number of children a non-fon won nto´ has is
„2y‟ („y‟ of whom are males), the proportion of males who possess a fon‟s NRY type that
are expected to be sampled if rule B has been followed is:
( 1 / ( y + 1 ) ) ( n ( 1 + y )2 + 1) + 1
( 1 / ( y + 1 ) ) ( 4 n y3 + 10 n y2 +9 n y +3 n + 1 ) + 1
The above expression assesses the probability of sampling the different individual
lineages included in Supplementary Figure 2S.7 given the sampling strategy utilised in
this study (see Supplementary Table 2S.4 for probabilities and Royal Social Status Rule
A for how these probabilities are calculated). Given a range of different combinations of
„n‟ and „y‟ extending from 1 to 25 (to take account of uncertainty concerning the real
values of „n‟ and „y‟) and applying the above expression it is observed that, under Royal
Social Status Rule B and the sampling strategy utilised in this study, 1.0% - 24.1% of won
nto´ males tested will possess a fon‟s NRY type.
For the expected frequencies in the duy:
All the assumptions above apply as well as:
(a) Only duy who descend from the first fon are considered; those individuals who
acquire duy status in other ways are not considered e.g. because of claimed
royal descent originating in other ethnic groups incorporated into the Nso´
empire.
(b) duy status once acquired is inherited in a strictly patrilineal manner.
(c) duy do not marry won nto´ that have paternal line descent from a fon.
105
Supplementary Figure 2S.7: Lineage tree showing the relationship of won nto´ individuals under Royal Social Status Rule
B for the two most recent adult generations of won nto´ males. (M = male offspring, F = female offspring, * = this individual
inherits the same NRY type as a fon. Numbers refer to specific Lineage Representatives).
106
Supplementary Table 2S.4: Probability of sampling Nso´ Y-chromosomes
given Royal Social Status Rule B.
Lineage
Representative
Total number of males of this lineage type that can
be sampled conditional on the probability of
sampling this individual in terms of 'n' and 'y'
Proportion of the
lineage possessing a
fon's NRY type
1 ny ( ny3 / ( ny
4 + ny
3 ) ) 1
2 ny2 ( ny
3 / ( ny
4 + ny
3 ) ) 0
3 ny ( ny3 / ( ny
4 + ny
3 ) ) 0
4 ny2 ( ny
3 / ( ny
4 + ny
3 ) ) 0
5 n ( ny2 / ( ny
3 + ny
2 ) ) 1
6 ny ( ny3 / ( ny
3 + ny
2 ) ) 1
7 ny2 ( ny
3 / ( ny
3 + ny
2 ) ) 0
8 ny ( ny2 / ( ny
3 + ny
2 ) ) 0
9 ny ( ny3 / ( ny3
+ ny2
) ) 0
10 ny2 ( ny
3 / ( ny
3 + ny
2 ) ) 0
11 ny / ( ny
2 + ny
) 1
12 n ( ny2 / ( ny
2 + ny
) ) 1
13 ny ( ny2 / ( ny
2 + ny
) ) 0
14, 15, 16 1 1
17 n ( ny / ( ny
+ n
) ) 0
18 n ( ny / ( ny
2 + ny
) ) 0
19 n ( ny2 / ( ny
2 + ny
) ) 0
20 ny ( ny2 / ( ny
2 + ny
) ) 0
21 n ( ny2 / ( ny
3 + ny
2 ) ) 0
22 ny ( ny3 / ( ny
3 + ny
2 ) ) 0
23 ny2 ( ny
3 / ( ny
3 + ny
2 ) ) 0
24 ny ( ny2 / ( ny
3 + ny
2 ) ) 0
25 ny ( ny3 / ( ny
3 + ny
2 ) ) 0
26 ny2 ( ny
3 / ( ny
3 + ny
2 ) ) 0
27 ny ( ny3 / ( ny
4 + ny
3 ) ) 0
28 ny2 ( ny
3 / ( ny
4 + ny
3 ) ) 0
29 ny ( ny3 / ( ny
4 + ny
3 ) ) 0
30 ny2 ( ny
3 / ( ny
4 + ny
3 ) ) 0
NOTE.-'2n' is the number of children a fon has ('n' of which are male). '2y' is the
number of children a won nto' has ('y' of which are male).
107
Supplementary Figure 2S.8 illustrates the transition of a fon‟s descendants from won nto´
to duy.
Supplementary Figure 2S.8: Lineage tree showing the transition of won nto´
to duy under Royal Social Status Rule B. (M = male offspring, F = female
offspring, * = this individual inherits the same NRY type as a fon). Duy are
shown in red.
In Supplementary Figure 2S.8 there are eight male duy of the present generation, only one
of whom possesses a fon‟s NRY. All things being equal, every fon should contribute the
same number of male duy individuals, 12.5% of whom possess a fon‟s NRY. When
sampling from the present generation, the overall frequency of male duy with a fon‟s
NRY will be 12.5% (unlike Royal Social Status Rule A which will approach but never
108
reach 12.5%). Note that 12.5% is independent of both „n‟ and „y‟ and the sampling
strategy utilised in this study.
2.5.1.3. Royal Social Status Rule C and D
The expectations for the proposed Royal Social Status Rules C and D are generated in
similar manner to Royal Social Status Rules A and B by assessing the probabilities of
sampling the individual lineages included in Supplementary Figures 2S.9 and 2S.10 (see
Supplementary Tables 2S.5 and 2S.6 for probabilities).
2.5.1.4. The implications of won nto´ women marrying a) won nto´ men and b) non- won
nto´ men carrying the fon’s NRY type
Table 2.3 shows one case of a marriage between a won nto´ man and a won nto´ woman.
In evaluating the effect of the sampling strategy utilised it was assumed that exclusively
won nto´ marriages do not occur. Such marriages could affect the expectation since each
lineage might no longer be discrete. Correcting for this assumption is complicated given
the ways lineages may interact. It has not been done since approximate calculations
indicate that at realistic levels of family size the effect would be small and most probably
increase the proportion of fon NRY types in the won nto´. Furthermore any such small
increase in the expectation for the incidence of fon NRY types should not affect the
conclusions set out above.
However the relatively simple correction to take account of fon NRY types entering the
won nto´ class as a consequence of won nto´ women marrying non-won nto´ men carrying
the fon NRY type has been made. The correction requires assumptions concerning (a) the
expected proportion of marriages of won nto´ women with men of a different class or
ethnicity and (b) the proportion of men in other groups carrying the fon NRY type. (a) is
assumed based on the proportion of men sampled of each class/ethnicity included in the
survey (the correction applies the survey proportions after allowing for unions with non-
Nso´ men; see Table 2.3 which contains one case out of 18 of a marriage between a non-
Nso´ and a member of the won nto´). Therefore the proportion of marriages to each of the
Nso´ classes is reduced by 17/18 to take into account non-Nso´ males marrying won nto´
females and it is assumed that none of the non-Nso´ males carry the fon NRY type). Since
(b) was estimated by later typing the correction is post hoc and consequently is not
included in the principal text of the chapter.
109
Supplementary Figures 2S.9: Lineage tree showing the relationship of won nto´ individuals under Royal Social Status
Rule C for the two most recent adult generations of won nto´ males. (M = male offspring, F = female offspring, * = this
individual inherits the same NRY type as a fon. Numbers refer to specific Lineage Representatives).
110
Supplementary Figures 2S.10: Lineage tree showing the relationship of won nto´ individuals under Royal Social Status
Rule D for the two most recent adult generations of won nto´ males. (M = male offspring, F = female offspring, * = this
individual inherits the same NRY type as a fon. Numbers refer to specific Lineage Representatives).
111
Supplementary Table 2S.5: Probability of sampling Nso´ Y-chromosomes
given Royal Social Status Rule C.
Lineage Representative
Total number of males of this lineage type that
can be sampled conditional on the probability of
sampling this individual in terms of 'n' and 'y'
Proportion of the lineage
possessing a fon's NRY
type
1 ny ( ny3
/ ( ny4
+ ny3
) ) 1
2 n ( ny2
/ ( ny3
+ ny2
) ) 1
3 ny ( ny3
/ ( ny3
+ ny2
) ) 1
4 ny / ( ny
2 + ny
) 1
5 n ( ny2
/ ( ny2
+ ny ) ) 1
6, 7, 8 1 1
9 n ( ny / ( ny
+ n
) ) 0
10 n ( ny / ( ny
2 + ny
) ) 0
11 n ( ny2
/ ( ny2
+ ny ) ) 0
12 n ( ny2
/ ( ny3
+ ny2
) ) 0
NOTE.-'2n' is the number of children a fon has ('n' of which are male). '2y' is the
number of children a won nto' has ('y' of which are male).
Supplementary Table 2S.6: Probability of sampling Nso´ Y-chromosomes
given Royal Social Status Rule D.
Lineage Representative
Total number of males of this lineage type that
can be sampled conditional on the probability of
sampling this individual in terms of 'n' and 'y'
Proportion of the lineage
possessing a fon's NRY
type
1 ny ( ny3
/ ( ny4
+ ny3
) ) 1
2 n ( ny2
/ ( ny3
+ ny2
) ) 1
3 ny ( ny3
/ ( ny3
+ ny2
) ) 1
4 ny ( ny2
/ ( ny3
+ ny2
) ) 0
5 ny / ( ny
2 + ny
) 1
6 n ( ny2
/ ( ny2
+ ny ) ) 1
7 ny ( ny2
/ ( ny2
+ ny ) ) 0
8, 9, 10 1 1
11 n ( ny / ( ny
+ n
) ) 0
12 n ( ny / ( ny
2 + ny
) ) 0
13 n ( ny2
/ ( ny2
+ ny ) ) 0
14 n ( ny2 / ( ny
3 + ny
2 ) ) 0
NOTE.-'2n' is the number of children a fon has ('n' of which are male). '2y' is the
number of children a won nto' has ('y' of which are male).
112
If a1 is the adjusted expected number of sampled fon NRY types, a0 is the initial expected
number of sampled fon NRY and xi are the three non-won nto´ classes and non-Nso´
ethnicities, the correction is:
a1 = a0 + [(1-a0)*∑((proportion of fon NRY types in non-won nto´ group xi * (17/18)) * proportion of sampled xi males in Nso´)].
Applying this correction for Royal Social Status Rule A, a Pearson‟s Chi Square test
demonstrates greater support for the rule, with neither the new upper limit (upper limit
expectation of 52.8%: P = 0.815, X2= 0.055, df = 1) or lower limit (lower limit
expectation of 41.0%: P = 0.209, X2 = 1.58, df = 1) significantly different from the
observed data. For Royal Social Status Rule B both the adjusted upper limit (upper limit
expectation of 32.8%): P = 0.040, X2=4.23, df = 1) and lower limit (lower limit
expectation of 12.3%: P <0.0001, X2 = 31.22, df = 1) expectations deviated significantly
from the observed data. Therefore Royal Social Status Rule A still appears the more
likely scenario for won nto´ social practice. The conclusions drawn from analysis of
Royal Social Status Rules C and D are unchanged despite applying the correction
described above (Royal Social Status Rule C [upper limit expectation of 59.1%: P =
0.760, X2 = 0.09, df = 1; lower limit expectation of 47.5%: P = 0.494, X2 = 0.47, df = 1],
Royal Social Status Rule D [upper limit expectation of 93.9%: P < 0.0001, X2 = 46.20,
df=1; lower limit expectation of 56.4%: P = 0.942, X2 = 0.005, df = 1]).
113
Chapter 3:
It All Depends On The Scale: Little
Sex-Specific Genetic Variation In The
Presence Of Substantial Language
Variation In Peoples Of The Cross
River Region Of Nigeria Assessed
Within The Wider Context Of West
Central Africa.
114
3. It All Depends On The Scale: Little Sex-
Specific Genetic Variation In The Presence
Of Substantial Language Variation In
Peoples Of The Cross River Region Of
Nigeria Assessed Within The Wider Context
Of West Central Africa.
3.1. Introduction
There have been many studies seeking to compare genetic and language differences
among peoples. Some have demonstrated genetic continuity across linguistic boundaries
(Rosser et al. 2000; Zegura et al. 2004), while others have concluded that language
boundaries are associated with increased genetic distances (Karafet et al. 2002; Wood et
al. 2005). Studies of possible associations between languages and sex-specific genetic
systems in sub-Saharan Africa however are few in number and limited in scale. This
study attempts to address this gap by examining genetic variation in the paternally
inherited non-recombining portion of the Y-chromosome (NRY) and the maternally
inherited mitochondrial DNA (mtDNA) in multiple groups from West Central Africa at
various levels of identity (clan, self declared ethnic identity, first language affiliation) and
geographic separation (Cross River region of Nigeria, Grassfields of Cameroon and
Ghana).
While the sex-specific genetic systems represent what are in effect just two loci, each
comprised of linked markers, they have the considerable advantage in population studies
because of their smaller effective population size (Jobling, Hurles & Tyler-Smith 2004 pg
134), leading to increased rates of genetic drift and thus population differentiation. This is
useful when seeking to identify evidence of isolation among communities. It is unlikely
115
that the frequencies of well characterised NRY and mtDNA types will both be
statistically similar if there is not either a recent common origin of the groups or if there
has been substantial gene flow among them. This study examines a large number of
groups that speak clearly distinguishable languages with estimated times to separation
ranging from 500 (Connell & Maison 1994) to several thousands of years and whether the
language separation has taken place or at least has been maintained in the presence of
substantial male and/or female gene flow, as evidenced by sex-specific genetic systems.
In the course of doing so possible associations between geographic and genetic distance
are also examined. Finally since one of the groups (the Efik) has a long-standing claim of
an ancient origin in the Palestine of antiquity the sex-specific genetic systems are
examined for evidence supporting this claim.
First the regions, peoples and languages included in this study are described.
3.1.1. A brief description of the Peoples and Languages of the Cross River region
The Cross River region (named after the river of the same name which passes through the
region) is situated in the extreme southeast of Nigeria and adjacent parts of Cameroon.
The physical geography is varied with mountains, rainforest and an alluvial plain at its
estuary at the Atlantic.
Linguistically and culturally, it is one of the most diverse regions of the world (as
assessed by the number of languages given the size of the region). It is home to more than
60 distinct languages. Early European missionaries reported that every village had its own
language. These languages can be classified into a number of distinct language groups.
The most notable, though not only, groupings are „Cross River‟ and „Bantoid‟, both of
which include many subgroups.
The land to the north east of the Cross River region (Figure 3.1) is now generally
accepted as the area from which the expansion of the Bantu-speaking peoples began
approximately three to five thousand years ago (Greenberg 1955; Vansina 1990; Blench
2006). Bantu languages are now spoken throughout most of sub-Saharan Africa south of
the equator.
116
Figure 3.1: Map showing the position where samples were collected from in
West Central Africa. Political borders are shown by black lines. Colour bar
indicates elevation in metres.
The Cross River region was a major source of slaves during the Atlantic slave trade with
Calabar, at the confluence of the Cross and Calabar Rivers, becoming both the region‟s
principal urban centre and one of the trade‟s most active ports. The Efik, the most
numerous group in the town, played a significant role in the trade, often as intermediaries
between Europeans on the coast and in-land groups.
The resident peoples have been characterised as „Syncretic Christian‟; that is to say,
nominally Christian but retaining aspects of traditional animist worship. In general their
social structures are „acephalous‟ (absence of a fixed, centralised political structure)
although the Efik do have a „king‟, or paramount ruler, the Obong. Nevertheless the
power and role of the Obong is not equivalent to, say, either that of the fon in Grassfields
societies or the Oba in former African kingdoms such as those of Benin or of the Yoruba.
Interestingly the more centralised system of the Efik developed during the rise of Calabar
and is a direct result of close contact with their British trading partners (Latham 1973;
Noah 1980).
Brief details of the cultural practices of the Cross River peoples included in this study are
given in Table 3.1 (information on Lower Cross groups: Anaang, Efik, Ibibio, and Oro is
117
from Forde and Jones (1950), Udo (1983) and Uya (Uya 1984); Efut: from Connell
(1983); Ejagham: Talbot (1912); Igbo: Basden (1966) and Forde and Jones (1950)). In all
cases information has been supplemented from Connell unpublished field notes (1983).
Because of its linguistic and cultural diversity, proximity to the Bantu homeland and role
in the slave trade, the peoples of the Cross River are of considerable interest to linguists
(especially those concerned with historical linguistics and the nature of language contact),
historians and other researchers interested in the mechanisms and consequences of
population movements. They also provide an opportunity to examine possible
associations of language and genetic difference on a fine-scale. In this study data from
1113 residents of the Cross River region speaking six languages as their mother tongue
and drawn from 24 clans and 20 locations were analysed.
Table 3.1: Summary of cultural practices of Cross River ethnic groups
utilised in this study.
Ethnic
Group
Marriage
Practice
Patrilocal/
Matrilocal
Patrilineal/
Matrilineal Religion
Ruling
Structure
Anaang exogamous patrilocal patrilineal syncretic
Christian acephalous
Efik exogamous patrilocal patrilineal syncretic
Christian
centralised,
paramount ruler
Efut exogamous patrilocal patrilineal syncretic
Christian acephalous
Ejagham exogamous patrilocal patrilineal syncretic
Christian acephalous
Ibibio exogamous patrilocal patrilineal syncretic
Christian acephalous
Igbo exogamous patrilocal patrilineal syncretic
Christian acephalous
Oron exogamous patrilocal patrilineal syncretic
Christian acephalous
118
In the past most of the extensive variety of languages in the region were categorised as
“Semi-Bantu, a linguistic designation considered obsolete since the work of Greenberg
(1963). Currently the accepted classification (subject to some dispute14
) identifies
„Bantoid‟ and „Cross River‟ as the two most important groups of languages found in the
Cross River region. As branches of Benue-Congo (one of the main families within the
large and diverse Niger-Congo phylum) they share a common parent language (though at
a time remote in the past of at least 6000 years ago) (Figure 3.2).
Figure 3.2: Broad relationships of the differing language groups used or
described in this chapter based on Williamson and Blench (2000). Branch
lengths are not informative.
14
See Connell (1994), Connell (1998) and Williamson and Blench (2000) for further details. Williamson &
Blench (2000) argue that Cross River and Bantoid are sufficiently similar to be grouped together while still
falling under Benue-Congo.
119
Cross River is divided into Bendi and Delta Cross, with the latter comprised of four
subgroups: Central Delta, Ogoni, Upper Cross and Lower Cross. The best studied of
these, from a comparative perspective, is Lower Cross. Lower Cross itself is comprised of
some twenty languages (Connell 1994; Connell & Maison 1994) including Anaang, Efik,
Ibibio and Oron; and is spoken over most of the lower region of the Cross River basin –
the alluvial plain of its geography – and consequently includes Calabar. Dialect variation
exists within some of the Lower Cross languages, particularly Ibibio and Anaang, and this
variation has sometimes been claimed to correspond to clan groupings.
Details of the relationships among the four Delta Cross subgroups are not fully
understood; indeed, further work may lead to a reassessment of this grouping. Similarly,
solid evidence to unite Bendi with Delta Cross is at present lacking and some scholars,
most recently Blench (2001) are more comfortable placing it within Bantoid.
Evidence from comparative linguistics, oral tradition (Connell 1994; Connell & Maison
1994) and documentary material (Ardener 1968; Latham 1973) indicate that the Lower
Cross languages together with the people that speak them are in the process of separating
and spatially dispersing. Connell & Maison (1994) suggest the major dispersal, with
perhaps one or two earlier exceptions, began approximately 500-600 years ago. It appears
to have consisted of a general movement towards the coast from an inland-situated
homeland. Some of the available oral traditions speak of these migrations (see below),
explaining them as a response to the arrival of Europeans and a search for increased trade
opportunities. Latham (1973), citing reports of Europeans from this period, concludes that
the site that has since become Calabar was only settled after the first contact with
Europeans.
The component groupings within Bantoid, on a broad sweep, are shown in Figure 3.2.
The primary branching is of North and South Bantoid. North Bantoid is comprised of
Mambiloid, and more controversially Dakoid and Tikar15
. South Bantoid comprises
numerous subgroups, including Bantu (made up of several hundred languages). Those in
proximity to the Cross River region include Tivoid, Grassfields, Beboid, Nyang and
15
Boyd (1996) questions the inclusion of Dakoid, while Connell (2000) suggests the existence of the
division itself is questionable
120
Ekoid. Of Bantu itself, the Northwest group of languages (also known as „A‟ Bantu in
Guthrie‟s (1967) alpha-numeric nomenclature) is found in and adjacent to the Cross River
region.
A further refinement of the linguistic classification, now widely accepted, divides Benue-
Congo into East and West branches (EB-C and WB-C). Cross River and Bantoid are both
part of EB-C. Another language grouping found partly in, but primarily to the west of the
Cross River region, is Igboid, which consists mainly of a range of Igbo dialects. Despite
the geographical proximity of Igboland to the Cross River basin, Igboid languages are
classified as WB-C, which reflects the considerable time (some thousands of years) since
the existence of a common parent (viz. Proto Benue-Congo) of Igbo on one hand and
Cross River and Bantoid on the other.
The oral traditions of the different Lower Cross groups have been examined in some
detail in Connell & Maison (1994). While movements of peoples in various directions
are indicated, they, in general, relate an expansionary movement southward in search of
trade opportunities with the newly arrived Europeans. A village or region named „Ibom‟
is often suggested as a point of origin. There is today a village called Ibom near the Igbo
town of Arochukwu which is situated in the northwest of the Cross River area, in the
border region between Igbo and Ibibio territories. An alternative account suggests
dispersal from the Ibom Arochukwu area was a response to conflict with the expanding
Igbo speaking population. It should also be noted that most Lower Cross traditions deal
with the relatively recent past in comparison to the oral traditions discussed.
Several of the Lower Cross groups also have diverging traditions, for example of having
migrated from Cameroon. The Efik, in particular, have a variety of conflicting traditions,
which are summarised in Noah (1980). Among them is a claim of origin in, and migration
from, ancient Palestine (Akak 1986). This story tells of a migration from the Middle East
via Sudan, Chad and Benin, with stops among the Igbo and then Ibibio, and the founding
of Calabar. However some versions of the account claim no more than that the Efik are of
Igbo origin. Most of the Efik traditions have as a common thread a final stop among the
Ibibio, specifically in the Uruan area. The Hart Commission (Hart 1964) investigated
various Efik claims and essentially concluded that they were without foundation. The
report concluded:
121
“The last tribes among whom the Efiks might have lived were the Ibibio. If
they had lived among the Ibo [sic.] and were in fact Ibo [sic.] in origin,
there is no means ready to hand to determine the truth or falsity of this
claim of origin.”
The Efut, another group found within the boundaries of Calabar, claim an origin in a
Bantu-speaking area to the east of the Cross River estuary in Cameroon. It is claimed by
some that their language was once Londo, a Northwest Bantu language (A11-according to
the nomenclature of Guthrie (1967)) which is still spoken in Cameroon (Connell 1983;
Thompson 1983), but they have since adopted Efik as their primary tongue.
The oral traditions of the Ejagham, also known as Ekoi, are less well documented. The
main body of the Ejagham population is to be found in the Upper Cross River basin and
extends southward. One Ejagham subgroup (also known as ‟Qua‟ or „Ekin‟) occupies a
part of Calabar, and claim to have arrived there before the Efik, having migrated
southward from the main Ejagham area (Noah 1980). This claim is supported by the
practice, continued to the present day, in which the Efik pay tribute to the Qua (Hart
1964).
The Igbo constitute the third largest ethnic group in Nigeria, numbering (approximately)
18,000,000 (Ethnologue 2005). They occupy much of the southeast of the country,
forming an arc around the Cross River region. The Igbo are well known as traders and
merchants and are found in every major urban area in Nigeria, including a sizeable
population in Calabar. Many Igbo were brought to Calabar during the era of the slave
trade. In more recent times, many others have settled and established businesses.
The Igbo are a relatively diverse group and from the linguistic standpoint comprise over
20 different lects (Manfredi 1989). Their oral traditions broadly speak of a north to south
expansion (Forde & Jones 1950). This expansion may still be in progress since only in
relatively recent times has a sizeable Igbo population settled in the coastal areas of the
Niger Delta.
122
3.1.2. Genetics and Language
Exploration of correlations between differences among the languages of peoples and
variation in their sex-specific genetic systems has been encouraged by the representation
of both languages and genetic systems by bifurcating and multi-furcating trees. Inferences
drawn from such trees, notwithstanding that the trees themselves are frequently gross
approximations to the actual demographic processes involved, have provided interesting
insights into human history and social behaviour.
Most studies to date of the correlation between genetics and language have concentrated
on the relationship over a broad canvas, often at a continental or intercontinental scale,
with considerable emphasis on any link between long-range language dispersals and the
spread of early farmers. Rosser et al. (2000) for example found the distribution of NRY in
Europe to be associated primarily with geography rather than language and suggested that
the current European genetic landscape has been greatly influenced by the expansion of
farmers from the Near East during the Neolithic. In contrast Wood et al. (2005) found a
correlation in Africa between genetic and linguistic distances when analysing NRY, and
to a lesser extent mtDNA, with differences between Bantu-speaking and non-Bantu-
speaking groups having an especially large influence on the correlation. Other studies on
the peoples of the Americas (Zegura et al. 2004), Pacific Islands (Hurles et al. 2002) and
Siberia (Karafet et al. 2002) have also had varying degrees of success in attempting to
establish a linguistic/genetics link. More recent work has begun to examine, and find,
relationships between linguistics and DNA at a finer scale. (See for example the study of
Lansing et al. (2007) on the Sumba populations of eastern Indonesia. This found a
correlation between NRY frequencies and the level of influence of incoming farmers on
the languages of different islands.)
An advantage of analysing NRY and mtDNA in fine-scale studies where peoples are in
close geographic proximity is that both systems, being effectively single loci and of
smaller effective population size than the autosomal system, are more prone to drift.
Although it has not yet been conclusively demonstrated, given a sufficient battery of
markers (say for the NRY six microsatellites and for mtDNA 350 nucleotides of the
HVR-I region) and sufficiently large sample sizes (~50), in the absence of inter-group
gene flow or recent common origin it is likely that two groups will have significantly
different distributions of either NRY types, mtDNA types or both (Nasidze et al. 2004;
123
Thomas et al. 2007; Trovoada et al. 2007; Chaubey et al. 2007; Cox 2007). The NRY
frequently demonstrates a greater degree of population structuring than do other systems,
which is likely due to the practice of patrilocality (Seielstad, Minch & Cavalli-Sforza
1998). It is of course important to appreciate that failure to detect dissimilarity is not to
have established identity. Other studies (see for example Chapter 2 in this thesis) have
revealed that susceptibility to drift can lead to substantial differences in the distribution of
NRY types even among classes and caste like clans of the same ethnic identity.
Prior to this study the variation in ethnic identities, cultural practices, oral histories and
languages of the peoples of the Cross River was well known with many tongues believed
to have separated hundreds, and in some cases thousands, of years ago. It is interesting
therefore to examine whether patterns of distribution of differences in sex-specific genetic
systems among the groups are similar to those suggested by the linguistic data. The
absence of detectable differences would on the other hand suggest either that the
relationships postulated by the linguistic analysis do not reflect reality or, in the
alternative, languages, cultural practices and oral histories have all been maintained in the
face of extensive gene flow.
3.1.3. Expectations of the distribution of NRY and mtDNA variation in the Cross River
region
In this study the NRY and mtDNA in multiple well characterised groups in the
linguistically diverse Cross River region were surveyed in what is the most densely
sampled and well defined sub-Saharan African dataset collected to date from a localised
geographic area. Groups speaking six different Benue-Congo languages known to be
predominant in the Cross River region were included: Anaang, Efik, Ejagham, Igbo,
Ibibio, and Oron, and samples were collected at multiple locations and at various levels of
ethnic identity (Table 3.2).
The principal aim was to establish whether there had been substantial inter-language
group gene flow in the Cross River region, analysis for which this particular dataset was
well suited. Crude expectations of the level of gene flow between different language
groups were generated based on sociological data that were collected from each
individual who would subsequently be analysed for NRY and mtDNA genetic markers as
part of this study.
124
Of the 1113 males analysed in this study, 918 had fathers that spoke as their first
language one of the six languages described in the paragraph above. Of these 918, 88.2%
had mothers who spoke the same language as their first language. In the same manner 887
of the Cross River samples had mothers that spoke one of the six languages as their first
language, 89.4% of whom had fathers that spoke the same language as their first language
(see Table 3.3). While in sociological-anthropological analysis it may appear that
language is a strong factor in mate choice, in the context of population genetic theory
these figures equate to a high migration rate among language groups (treating each
language group as a distinct population). Under a very crude Wright Island model with
„islands‟ of at least 1000 individuals this migration rate of 10% would, given sufficient
time, give a Fixation Index of at most 0.002, a very low value that suggests a substantial
amount of gene flow between „islands‟.
However the sociological information on inter-group gene flow is based on data from
only the last two generations before present (samples were collected from adult males of a
wide range of ages) while the Fixation Index referred to is based on a model that assumes
a substantially longer time period. If substantial genetic structuring was observed among
Cross River language groups this would suggest that the practice of high male and/or
female gene flow is a recent phenomenon while an overall homogenous NRY and/or
mtDNA distribution would suggest that gene flow has been maintained over a long period
despite some apparently very important cultural differences among peoples of the region.
The Cross River dataset also allowed the investigation, in a more limited way and without
any preconceived expectations, of whether, in the small geographical area of the Cross
River region, differences at other, varying, levels of grouping could be observed.
Specifically these questions were posed: a) are clan communities collected from different
locations distinguishable? b) are clans of the same language group collected from the
same location distinguishable? c) are different language groups collected from the same
location distinguishable? d) are representatives of the same language group collected
from different locations distinguishable?
The analysis was then extended to interpret the results within the broader geographical
context of West Central Africa by analysing NRY and mtDNA from groups resident in
Cameroon and Ghana (see Figure 3.1 and Table 3.2). Examination of the sociological
125
Table 3.2: Nigerian Cross River sample collection details.
Code Language Place collected Clan/Secondary affiliation
Latitude Longitude total n
SOUTH EAST NIGERIA
AN-EA Annang Afaha Esang, Ikot Ubom
Ediene Abak 5.050 7.717 26
AN-AO Annang Afaha Esang, Ikot Ubom
Afaha Obong 5.050 7.717 37
AN-IO Annang Abak, Ikot Obioma, Ikot Ekpene, Ukanafun
4.992 7.758 47
EF-EE Efik Eniong, Atan Ono Yom
Efut 5.167 7.983 50
EF-INE Efik Ikot Nakanda, Ikot Ene
Efut 4.908 8.442 48
EF-OEU Efik Oyo Efam, Ikot Abasi Obori
Uwanse 4.950 8.317 50
EK-CA Ejagham Calabar Akampka 4.950 8.317 18
EK-CC Ejagham Calabar Calabar 4.950 8.317 29
EK-CI Ejagham Calabar Ikom 4.950 8.317 40
EK-NA Ejagham Netim Akampka 5.350 8.350 51
IB-ANMWN Ibibio Afaha Nsit, Mbiokporo
Western Nsit 4.833 7.900 38
IB-EAEEUAE Ibibio Etebe Afaha Eket, Ekpene Ukpa
Afaha Eket 4.717 7.867 50
IB-EUE Ibibio Ette Ukpom Ette 4.620 7.650 50
IB-IAAUA Ibibio Ikot Akpan, Afaha Ubiom
Awa 4.690 7.815 28
IB-IEINOI Ibibio Ikot Essien, Ikot Ntu
Oku-Iboku 5.133 7.933 50
IB-IMIEI Ibibio Ikot Mbonde, Ikot Ekang
Itam 5.042 7.842 50
IB-IOINO Ibibio Ikot Oku, Ikot Ntuenoku
Oku 5.100 7.967 50
IB-MNENN Ibibio Mkpok Ndon Eyo Nnung Ndem 4.633 7.850 50
IB-NEI Ibibio Ndiya Edienne Ikono 4.783 7.883 50
IB-OII Ibibio Obong Itam Itam 5.133 7.967 50
IB-ONMNI Ibibio Onoh, Ntan Mbat Ntan Ibiono 5.233 7.933 50
IG-C Igbo Calabar 4.950 8.317 100
OR-AO Oron Oron Afaha Okpo 4.833 8.233 28
OR-ENEEAU Oron Eyo Nsik, Eyo Ekpe
Afaha Ukwong 4.750 8.250 73
IG-E Igbo Enugu 6.433 7.483 57
IG-N Igbo Nenwe 6.117 7.517 52
CAMEROON
CA-BT Tikar Bankim 6.083 11.500 34
CA-FB Bamun Foumban 5.717 10.917 117
CA-WA Aghem Wum 6.383 10.067 118
GHANA
GH-AEW Akan Enchi 5.817 -2.817 21
GH-AKE Akan Kibi 6.167 -0.550 51
GH-ASWW Akan Sefwi-Wiawso 6.333 -2.267 22
GH-FEWR Akan Enchi 5.817 -2.817 61
GH-EHVR Ewe Ho 6.600 0.467 88
126
Table 3.3: First languages of parents of Cross River region samples utilised
in this study.
Father's first language of
samples belonging to the 6 Cross
River languages analysed in this study
Mother's first
language of same samples
number of samples
Mother's first language of
samples belonging to the 6 Cross
River languages analysed in this study
Father's first
language of same samples
number of samples
Annang
Annang 57 Annang
Annang 57
Ibibio 8 Ibibio 3
Efik 3 Efik 1
Bekwara 1
Annang Total 69 Annang Total 61
Efik
Efik 101 Efik
Efik 101
Ibibio 5 Ibibio 21
Annang 1 Ejagham 6
Bekwara 1 Annang 3
English 1 Abakpa 1
Igbo 1 Boki 1
Oron 1 English 1
Tiv 1 Igbo 1
Ugep 1
Umon 1
Yoruba 1
Efik Total 115 Efik Total 135
Ejagham
Ejagham 115 Ejagham
Ejagham 115
Efik 6 Nde 1
Ibibio 5
English 2
Mbembe 2
Igbo 1
Umon 1
Ejagham Total 132 Ejagham Total 116
Continues overleaf….
127
Table 3.3 continued…
Ibibio
Ibibio 402 Ibibio
Ibibio 402
Efik 21 Annang 8
Igbo 15 Eket 8
Eket 6 Efik 5
Ijaw 5 Ejagham 5
Annang 3 English 4
Yoruba 3 Pidgin 3
Hausa 1 Igbo 1
Nembe 1
Pidgin 1
Ibibio Total 458 Ibibio Total 436
Igbo Igbo 85 Igbo Igbo 85
Efik 1 Ibibio 15
Ibibio 1 Efik 1
Ejagham 1
English 1
Igbo Total 87 Igbo Total 103
Oron Oron 17 Oron Oron 17
Yoruba 2 Efik 1
Oron Total 19 Oron Total 18
Grand Total 880 Grand Total 869
Proportion of samples where both parents speak the same
language fixed on father's first language type
0.882
Proportion of samples where both parents speak the same
language fixed on father's first language type
0.894
data showed that there were no instances where an individual from the Cross River,
Ghanaian or Cameroonian datasets had one parent from one of the three groups and
another parent from a different member of the three groups. Under the same Wright
Island model as previously, even if allowing for one migrant every generation (0.1%), a
Fixation Index of around 0.2 would be expected, a value that is consistent with substantial
inter-group isolation. Therefore observable differences among the NRY and mtDNA
profiles of these three regions would be expected.
Finally it was examined whether the NRY and mtDNA genetic data drawn from the Efik
Uwanse sample provided support for an origin in the Palestine of antiquity by comparing
this group to a possible source population (Israeli Arabs/Palestinians) as well as possible
contributing populations that the Efik Uwanse may have met along their proposed route
128
of migration (Ethiopians, Sudanese, a population from Lake Chad, Igbo speakers and
Ibibio speakers).
3.2. Materials and Methods
3.2.1. Sample Collection Procedure.
Buccal swabs were collected from males over eighteen years old unrelated at the paternal
grandfather level from locations in South East Nigeria as shown in Table 2. All buccal
swabs were collected anonymously with informed consent. Sociological data were also
collected from each individual including age, current residence, birthplace, self-declared
cultural identity, first language, second language and (when available) clan affiliation for
the individual as well as similar information on the individual‟s father, mother, paternal
grandfather and maternal grandmother. The samples were classified into groups primarily
by first language spoken, then by place of collection and thirdly, when available, by clan
or some other subsidiary criterion. Where collections from a particular group were made
in more than one location (for example the Ediene Abak were collected from two
neighbouring villages: Afaha Esang and Ikot Ubom) and co-ordinate data are available
for both sites, locations are represented by averages.
Buccal swabs and similar sociological data as described above were also collected from
males eighteen years or older unrelated at the paternal grandfather level from the
following groups:
LC-AFα β
: Afade Speakers from Lake Chad, Cameroon (n=48), CA-BTα β
: Tikar speakers
from Bankim Cameroon (n=34), CA-FBα β
: Bamoun speakers from Foumban Cameroon
(n=117), CA-WAα β
: Aghem speakers from Wum Cameroon (n=118), ET-AAα β
: Amharic
speakers from Addis Ababa Ethiopia (n=72), GH-AEWα β
: Twi speakers from Enchi
Ghana (n=21), GH-AKEα β
: Twi speakers from Kibi Ghana (n=51), GH-ASWWα β
: Twi
speakers from Sefwi Wiawso Ghana (n=22), GH-EHVRα β
: Ewe speakers from Ho Ghana
(n=88), GH-FEWRα β
: Fante speakers from Enchi (n=61), SU-KHα: Arabic speakers from
Khartoum Sudan (n=75), SU-KAβ
: Sudanese from Kassala (n=75) and IPAα: Israeli
Arabs/Palestinians (n=143).
129
Standard phenol-chloroform DNA extractions were performed on all samples (see
Appendix C).
3.2.2. Y-chromosome typing
The NRY of all South East Nigerian samples as well as those samples in groups with the α
notation were typed in the following manner. Standard TCGA kits were used to
characterise six microsatellites (DYS19, DYS388, DYS390, DYS391, DYS392,
DYS393) and eleven biallelic Unique Event Polymorphism (UEP) markers (92R7, M9,
M13, M17, M20, SRY+465, SRY4064, SRY10831, sY81, Tat, YAP), as described by
Thomas et al. (1999). Microsatellite repeat sizes were assigned according to the
nomenclature of Kayser et al. (1997). Where necessary an additional marker, p12f2, was
typed as described by Rosser et al. (2000). NRY Haplogroups were defined by the twelve
UEP markers according to the nomenclature proposed by the Y-chromosome Consortium
(2002) (see Figure 2.4). See Chapter 2 for a discussion on the choice of UEP and
microsatellite markers used.
These multiplex UEP/ microsatellite kits have already been shown to be reliable under a
wide range of conditions, consistently giving similar signal intensities across all UEPs
and microsatellites within each kit (Thomas, Bradman & Flinn 1999). Therefore any
multiplex runs that showed at least one UEP or microsatellite peak of substantially low
intensity were repeated. Any samples that gave UEP-1 and UEP-2 results that were
incompatible to the known phylogenetic tree for the NRY were also retyped for both kits.
Microsatellite results were also analysed for outliers and homomplasy amongst UEP
haplogroups and retyped for confirmation.
3.2.3. mtDNA typing
The mtDNA HVS-1 region of all South East Nigerian samples as well as those samples in
groups with the β notation was sequenced as described by Thomas et al. (2002) except
that primers conL1-mod, conL2 and conH3 were replaced by conL849 (CTA TCT CCC
TAA TTG AAA ACA AAA TA), conL884 (TGT CCT TGT AGT ATA A) and conHmt3
(CCA GAT GTC GGA TAC AGT TC) respectively. HVS-1 Variable Site Only (VSO)
haplotypes were determined for all samples from South East Nigeria by comparing
sequence data covering nucleotides 16020-16400 with the Cambridge Reference
Sequence (Anderson et al. 1981). Haplotypes were defined by base changes and
130
nucleotide positions where substitutions, insertions or deletions occurred. Tentative
mtDNA Africa-specific haplogroup classification was based on the scheme of Salas et al.
(2004). HVS-1 Variable Site Only (VSO) haplotypes were also determined for all
samples from groups with the β notation with sequence data covering nucleotides 16023-
16380. South East Nigerian HVS-1 coverage was reduced to this range during
comparisons with these groups. In addition the IPA2 β
: Israeli Arabs/Palestinians mtDNA
dataset was taken from Richards et al. (2000).
Each sample‟s chromatogram was manually inspected for generally high levels of
background noise across its whole length of sequence. The 5ʹ and 3ʹ ends of raw
chromatograms were trimmed until at least 10 out of 15 bases at these ends had
confidence scores above 25%. The ends were then trimmed further by manually
inspecting the sequence. For each 96 sample sequencing run each position with a
proposed SNP, insert, deletion or ambiguous position was examined manually. All
samples with any ambiguous sites after manual curation were sequenced again. In
addition sequencing of samples was repeated when the forward and reverse sequences did
not match.
3.2.4. Statistical and Population Genetic Analysis
Genetic diversity, h, (the probability of randomly sampling two different haplotypes in a
population) and its standard error was estimated from unbiased formulae of Nei (1987).
Genetic differences between pairs of populations when individuals in populations were
described by a) NRY UEP haplogroups, b) combined NRY UEP haplogroup and six
microsatellite haplotypes (UEP+MS) or c) mtDNA HVS-1 VSO haplotypes were
assessed using an Exact Test of Pairwise Population Differentiation (ETPD) with 10,000
Markov steps (Raymond & Rousset 1995; Goudet et al. 1996). This test is analogous to a
Fisher‟s Exact test (Lee et al. 2004) but the size of the contingency table is extended to
the number of populations being compared (two in a pairwise population comparison, two
or greater in a global test) by the total number of different haplotypes present. Due to the
complexity introduced by the sheer number of extra rows and columns a null distribution
of tables to test against the observed data is generated using a random walk via a Markov
chain rather than comparison to some predefined distribution such as the hypogeometric
distribution.
131
Population Genetic Structure was estimated using Hierarchical Analysis of Molecular
Variance (AMOVA) (Excoffier, Smouse & Quattro 1992) based on a particular mutation
model (which allowed the evolutionary distance between pairs of haplotypes to be taken
into account) to generate a single Fixation Index statistic, FST, when a simple structure of
populations within a single group was defined, or three Fixation Indices, FST (the within-
population Fixation Index), FSC (the among-populations within-group Fixation Index) and
FCT (the among-group Fixation Index), when a more complex structure of populations
within multiple groups was defined. Significance of Fixation Indices are assessed by
randomly permuting individuals (given that only haploid systems are considered) among
populations or groups of populations, depending on the Fixation Index being tested and
after every round of permutations, of which 10,000 were performed, Fixation Indices are
recalculated to create a null distribution.
Population pairwise genetic distances were estimated from Analysis of Molecular
Variance φST values (Excoffier, Smouse & Quattro 1992). The genetic distances used
were a) FST (Reynolds, Weir & Cockerham 1983) (when individuals in populations were
described by UEP haplogroups, UEP+MS haplotypes and mtDNA HVS-1 VSO
haplotypes), b) RST (Slatkin 1995) (when NRY were characterised by the six
microsatellites) and c) the Kimura-2 parameter model (which allows different transition
and transversion rates) with gamma distribution of value 0.47 (K2) (Kimura 1980) (when
mtDNA was characterised by HVS-1 sequences with gaps removed). Significance of
genetic distances was assessed by permutation of individuals as described above for
testing significance of Fixation Indices. All the above was performed using Arlequin
software (Schneider, Roessli & Excoffier 2000). AMOVA is analogous to a traditional
analysis of variance (ANOVA) (Sokal & Rohlf 1994) except that it takes into account the
degree of difference between haplotypes. In addition all hypotheses are tested using
permutation analysis and so no assumption of a normal distribution is required. However
assumptions of AMOVA include that all samples are independent and randomly chosen,
that mate choice is random and that inbreeding does not occur within the populations.
It should be noted that on occasion, in instances when individuals were described by
UEP+MS haplotypes, where populations were significantly different at the 1% level
using the ETPD the most frequently observed haplotype from each population (as long as
this haplotype was not the modal haplotype in either or both populations) that was not
132
shared with the other population was removed to establish whether, given the overall
similarity of Cross River populations, the observed significant difference was capable of
being caused by overrepresentation of just one particular haplotype in each group.
The TMRCA and confidence intervals for the NRY were estimated using Y-time software
(Behar et al. 2003) (URL: http://www.ucl.ac.uk/tcga/software/index.html).
Principal Coordinates Analysis (PCO) (Gower 1966) was performed using the „R‟
statistical package (www.R-project.org) by implementing the „cmdscale‟ function found
in the „mva‟ package on pairwise FST matrices and visualised using MSExcel.
3.2.4.1. Phylogenetic Analysis
NRY UEP+MS haplotype and mtDNA HVS-1 haplotype FST distance matrices were
constructed using Phylip 3.67 package Gendist. The FST genetic distance used in Gendist
was that of Reynolds, Weir, and Cockerham (1983). In addition 1,000 bootstrap replicates
of the observed data were constructed by randomly sampling haplotypes with
replacement within each separate population to generate 1,000 new datasets (source code
available on request from Krishna Veeramah). FST distance matrices for these 1,000
bootstrapped replicates were generated as for the original observed data. Phylogenetic
analysis was performed on these 1,001 distance matrices using the Phylip 3.67 packages
Neighbour and Consense to create a consensus tree with internal node confidence values.
NRY microsatellite phylogenetic analysis was performed using POPTREE software
written by N. Takezaki. These Neighbour Joining trees were constructed using genetic
distance matrices based on Goldstein et al‟s (1995) δμ2 pairwise distance measure and
1,000 bootstrap datasets were created for internal node confidence values.
In order to generate mtDNA K2-based trees using HVS-1 sequence data a genetic
distance matrix was constructed using the average net number of substitutions measure of
Nei (1987) based on a K2 mutation model (all positions with insertions and deletions in
comparison with the reference sequence were removed from the sequence alignment prior
to distance calculations). In addition 100 bootstrap replicates of the observed data were
constructed by randomly sampling entire sequences with replacement from each separate
population and the K2-based genetic distance matrices recalculated for each matrix
133
(source code available on request from Krishna Veeramah). Phylogenetic analysis was
performed on these 101 distance matrices using the Phylip 3.67 packages Neighbour and
Consense to create a consensus tree with internal node confidence values.
All trees were visualised using Treeview software (Page 1996).
3.2.4.2. Mantel and Partial Mantel Tests
Mantel and Partial Mantel tests (Sokal & Rohlf 1994) were performed between genetic
distance and both geographic and linguistic distance using the „R‟ package „Vegan‟ which
uses the Pearson product-moment method. Significance was assessed by permuting the
rows and columns of the matrices 1,000 times.
Geographic distances were Great Circle distances estimated from latitude and longitude
data. Linguistic distances were constructed using the method described below.
Lexicostatistic similarity percentages shown in Table 3.4a were compiled using the
following sources: the pairwise values for the Lower Cross languages (Anaang, Efik,
Ibibio and Oron) were taken from Connell & Maison (1994). No lexicostatistic similarity
percentages were available for Ejagham languages in comparison with the other five
Cross River region languages. Therefore data for three other Bantoid languages Tunen,
Mambila (which represents different branches of Bantoid spoken near to the Cameroon-
Nigeria borderland) and Bobangi (a Southern Bantu language spoken in the Democratic
Republic of Congo) were used as surrogates for Ejagham as lexicostatistic similarity
percentages had been calculated for these languages in comparisons between Efik and
Igbo as well as each other by Schadeberg (1986). The pairwise value between Akan and
Ewe is from Schadeberg (1986), Asante being a particular dialect, representing Akan. The
pairwise value between Ekoid (The Ekoid language in question being Nkim, not
Ejagham) and Mambila is from Piron (1995b). The pairwise value between Aghem and
Tikar is from Piron (1995a). The pairwise comparisons between Tikar and Mambila and
Tikar and Tunen are from Piron (1998). No suitable lexicostatistics were available for
Foumban so its similarity to Aghem (both are Narrow Grassfields groups) was estimated
on the assumption that the similarity is larger than the average similarity between the
three Southern Bantoid languages Tunen, Tikar, Bobangi but smaller than the average
similarity between Oron and the three Lower Cross languages.
134
An incomplete lexicostatistic distance matrix was then calculated for the six Cross River,
three Cameroonian and two Ghanaian languages used in this study by subtracting the
lexicostatistic similarity percentages from 100% as performed by Weng and Sokal (1995),
with cells containing Ejagham pairwise comparisons found by taking the average
lexicostatistic dissimilarity for the appropriate Tunen, Mambila, Bobangi and Nkim
pairwise comparisons. Missing data in the distance matrix shown in Table 3.4a (indicated
by a question mark) were then estimated using the weighted least-square approach of
Makarenkov and Lapointe (2004) via the T-Rex software package
(http://www.labunix.uqam.ca/~makarenv/trex.html) to give the linguistic distance matrix
shown in Table 3.4b. The neighbour joining tree generated by this distance matrix (see
Figure 3.3) is of similar structure to that proposed by other sources such as the
Ethnologue (2005).
3.3. Results
3.3.1. The distribution Of NRY variation
3.3.1.1. Cross River region
The twelve typed UEP makers define 14 distinct NRY haplogroups, of which eight were
observed in the Cross River dataset (n=1081). The modal haplogroup was E3a (87%)
using the nomenclature of the Y-chromosome Consortium (The Y Chromosome
Consortium 2002) (see Table 3.5). Gene diversity based on UEP haplogroups for the
entire region was 0.231±0.017 and for the individual clans ranged from 0.067 to 0.378
with a mean of 0.23 and a variance of 0.007; for individual locations it ranged from 0.117
to 0.378 with a mean of 0.25 and a variance of 0.006 and for individual language groups
it ranged from 0.188 to 0.265 with a mean of 0.229 and a variance of 0.0006. In all clans
the E3a haplogroup was modal (mean: 0.87, variance: 0.003, range: 0.77-0.97). There
were seven pairwise differences between clans (assessed using a Pairwise ETPD) at 5%
significance and none at 1% significance (see Supplementary Table 3S.1 for all ETPD
results tables). Furthermore of the seven significant pairwise comparisons none were
significant even at 5% significance when haplotypes were defined by UEP+MS. Gene
135
Table 3.4a: Lexicostastic similarity percentages for various Niger-Congo languages. ‘?’ indicates no available data.
Anaang Efik Ibibio Oron Tunen Mambila Bobangi Aghem Bamun Nkim Tikar Igbo Asante Ewe
Anaang --
Efik 83 --
Ibibio 90 90 --
Oron 70 73 71 --
Tunen ? 29 ? ? --
Mambila ? 21 ? ? 34 --
Bobangi ? 25 ? ? 40 31 --
Aghem ? ? ? ? ? ? ? --
Bamun ? ? ? ? ? ? ? --
Nkim ? ? ? ? ? ? ? ? ?
Tikar ? ? ? ? 20 20 ? 32 ? 34 --
Igbo ? 24 ? ? 29 20 23 ? ? ? ? --
Asante ? 17 ? ? 22 19 21 ? ? ? ? 24 --
Ewe ? 17 ? ? 18 17 20 ? ? ? ? 26 26 --
Continues overleaf…
136
Table 3.4 continued
Table 3.4b: Lexicostastic dissimilarity matrix for 6 Cross River languages, 3 Cameroon Grassfields languages and 2
Ghanaian languages.
Anaang Ibibio Efik Oron Ejagham Aghem Bamun Tikar Igbo Akan Ewe
Anaang 0.0
Ibibio 10.0 0.0
Efik 14.2 12.8 0.0
Oron 29.5 28.1 28.4 0.0
Ejagham 75.9 74.5 74.7 75.0 0.0
Aghem 75.9 74.4 74.7 75.0 68.0 0.0
Bamun 75.9 74.4 74.7 75.0 68.0 49.0 0.0
Tikar 75.9 74.4 74.7 75.0 66.0 68.0 68.0 0.0
Igbo 77.8 76.3 76.6 76.9 75.2 75.2 75.2 75.2 0.0
Akan 83.2 81.7 82.0 82.3 80.6 80.6 80.6 80.6 74.7 0.0
Ewe 83.9 82.4 82.7 82.9 81.3 81.2 81.2 81.2 75.3 74.0 0.0
137
Figure 3.3: Language network based on distance matrix inferred from
partial lexicostatistic matrix (Table 3.4b).
138
diversity based on UEP+MS haplotypes for the entire region was 0.937±0.005 and for the
individual clans ranged from 0.882 to 0.966 with a mean of 0.93 and a variance of
0.0005, for locations from 0.913 to 0.966 with a mean of 0.94 and a variance of 0.0002
and for language groups 0.919 to 0.949 with a mean of 0.94 and a variance of 0.0001. As
expected (see Materials and Methods), of the three cases where inter-clan differences
were observed using UEP+MS haplotypes at 1% significance none were maintained even
at the 5% threshold when the most frequently observed unshared haplotype from each
group in a pairwise comparison was removed. Interestingly in all clans but one the
UEP+MS modal haplotype was E3a-15-12-21-10-11-13 (mean: 0.21, variance: 0.003,
range: 0.13-0.32) (see Supplementary Table 3S.2 for all NRY data), which has been
identified as a possible signature type for the expansion of the Bantu-speaking peoples
(Thomas et al. 2000). The one clan in which it was not modal (the Ejagham Akampka
from Calabar (EK-CA)) comprised only 18 samples and its frequency was 0.11 ± SE
0.07. In pairwise comparisons using RST (see Supplementary Table 3S.3 for all genetic
distance results tables), pairwise genetic distances were not significantly different in any
clan comparisons (P>0.01). The AMOVA-based Fixation Indices at UEP, UEP+MS and
RST levels for all clans were not significant (P-value: >0.131; see Table 3.6 for all
AMOVA results).
3.3.1.2. Cameroon
Six haplogroups were found in the Cameroon Grassfields dataset (n=266, number of
subgroups =3)) where the modal type was again E3a (90%). Gene diversity based on UEP
haplogroups for the pooled dataset was 0.189±0.032 and for the three individual groups
ranged from 0.083-0.280 with a mean of 0.20 and a variance of 0.012. In all groups the
E3a haplogroup was modal (mean: 0.89, variance: 0.003, range: 0.85-0.96). There were
two pairwise differences between groups at the 5% significance level and none at the 1%
level. However differences at the UEP+MS level in all three population pairwise
comparisons were highly significant (P<0.0001) as were all pairwise RST (P<0.0001).
Gene diversity based on UEP+MS haplotypes for the entire region was 0.946±0.005 and
for the individual clans ranged from 0.887 to 0.958 with a mean of 0.92 and a variance of
0.001. The UEP+MS modal haplotype was different in each of the three groups (CA-BT:
E3a-16-10-21-10-11-13 (Freq=0.18), CA-FB: E3a-16-12-21-10-11-16 (Freq=0.26), CA-
WA: E3a-15-12-21-10-11-15 (Freq=0.23) while the putative Bantu Expansion haplotype
ranged from 0.026-0.090 among the three groups. The AMOVA-based Fixation Index for
139
Table 3.5: Haplogroup proportions in Cross River, Cameroonian Grassfield
and Ghanaian groups.
NRY UEP Haplogroup (according to the
nomenclature of the Y-chromosome
consortium(2002))
P*(
xR
1a)
BR
*(xD
E,J
R)
E*(
xE
3a)
K*(
xL,N
3,O
2b,P
)
Y*(
xB
R,A
3b2)
DE
*(xE
)
A3b
2
E3a
J
AN-AO 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.97 0.00
AN-EA 0.00 0.08 0.08 0.00 0.00 0.00 0.00 0.85 0.00
AN-IO 0.00 0.17 0.02 0.00 0.00 0.00 0.00 0.81 0.00
EF-EE 0.00 0.06 0.04 0.00 0.00 0.00 0.00 0.90 0.00
EF-INE 0.02 0.19 0.02 0.00 0.00 0.00 0.00 0.77 0.00
EF-OEU 0.00 0.10 0.00 0.00 0.02 0.00 0.00 0.88 0.00
EK-CA 0.06 0.06 0.00 0.00 0.00 0.00 0.00 0.89 0.00
EK-CC 0.00 0.07 0.00 0.00 0.00 0.00 0.00 0.93 0.00
EK-CI 0.03 0.05 0.05 0.00 0.00 0.00 0.00 0.86 0.00
EK-NA 0.02 0.08 0.04 0.00 0.00 0.00 0.00 0.86 0.00
IB-ANMWN 0.00 0.06 0.03 0.00 0.00 0.00 0.00 0.92 0.00
IB-EAEEUAE 0.00 0.13 0.04 0.00 0.00 0.00 0.00 0.83 0.00
IB-EUE 0.00 0.08 0.04 0.00 0.00 0.02 0.00 0.86 0.00
IB-IAAUA 0.00 0.18 0.04 0.00 0.00 0.00 0.00 0.79 0.00
IB-IEINOI 0.00 0.12 0.00 0.00 0.00 0.00 0.00 0.88 0.00
IB-IMIEI 0.00 0.04 0.00 0.00 0.02 0.00 0.00 0.94 0.00
IB-IOINO 0.00 0.04 0.06 0.00 0.00 0.00 0.00 0.90 0.00
IB-MNENN 0.00 0.02 0.02 0.00 0.00 0.02 0.00 0.94 0.00
IB-NEI 0.00 0.13 0.06 0.00 0.00 0.00 0.00 0.81 0.00
IB-OII 0.02 0.08 0.04 0.02 0.00 0.02 0.00 0.80 0.02
IB-ONMNI 0.00 0.08 0.02 0.00 0.00 0.00 0.00 0.90 0.00
IG-C 0.00 0.07 0.01 0.00 0.01 0.01 0.00 0.90 0.00
OR-AO 0.00 0.11 0.00 0.00 0.00 0.00 0.00 0.89 0.00
OR-ENEEAU 0.00 0.04 0.05 0.00 0.01 0.01 0.00 0.88 0.00
Cross River Grand Total
0.00 0.08 0.03 0.00 0.00 0.00 0.00 0.87 0.00
IG-E 0.00 0.06 0.02 0.00 0.00 0.02 0.00 0.91 0.00
IG-N 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.96 0.00
CA-BT 0.03 0.06 0.06 0.00 0.00 0.00 0.00 0.85 0.00
CA-FB 0.00 0.09 0.04 0.00 0.01 0.00 0.01 0.85 0.00
CA-WA 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.96 0.00
GH-AEW 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00
GH-AKE 0.00 0.00 0.08 0.00 0.00 0.00 0.00 0.92 0.00
GH-ASWW 0.00 0.00 0.00 0.00 0.23 0.00 0.00 0.77 0.00
GH-EHVR 0.02 0.02 0.05 0.00 0.00 0.00 0.00 0.91 0.00
GH-FEWR 0.03 0.00 0.02 0.00 0.02 0.00 0.00 0.93 0.00
All Populations Total 0.01 0.07 0.03 0.00 0.01 0.00 0.00 0.89 0.00
140
Table 3.6: Hierarchical AMOVA results of Cross River, Cameroonian and Ghanaian groups at various molecular levels.
Colour indicates significance level of Fixation Indices P-values: Yellow = 0.05<P<0.01, Orange = 0.01<0.001, Red =
P<0.001. Each grouping is followed, indicated by ‘n’, by the number of groups and, if applicable, the number of individual
populations analysed.
Genetic System
and Level of
molecular
resolution
Cross River
region (n=24)
Cameroon
Grassfields
(n=3)
Ghana (n=5) Ibibio (n=11)
Cross River
pooled groups
of language
speakers (n=6)
Cross River
clans grouped
by language
(n=6,24)
Cross River
clans grouped
by language
with 2 Igbo
populations
(n=6,26)
Cross River
region +
Ghana+
Cameroon
Grassfields
(n=3,32)
FST P-value FST P-value FST P-value FST P-value FST P-value FCT P-value FCT P-value FCT P-value
NRY UEP FST 0.000 0.474 0.020 0.034 0.041 0.005 0.002 0.355 -0.003 0.876 -0.004 0.919 -0.001 0.597 0.006 0.049
NRY UEP+MS
FST 0.000 0.562 0.139 < 0.0001 0.003 0.212 -0.002 0.831 -0.001 0.800 -0.001 0.778 0.000 0.475 0.015 0.000
NRY MS RST 0.004 0.131 0.004 <0.0001 0.008 0.175 0.004 0.189 -0.001 0.613 -0.003 0.886 -0.002 0.772 -0.025 0.025
mtDNA HVS-1
VSO FST 0.000 0.242 0.010 0.000 0.000 0.374 0.001 0.138 0.001 0.100 0.001 0.130 0.000 0.202 0.005 <0.001
mtDNA HVS-1
K2 -0.001 0.663 0.001 0.351 0.001 0.368 0.000 0.498 0.001 0.191 0.002 0.105 0.002 0.086 0.016 <0.001
141
all three groups at the UEP level was significant at the 5% threshold, while at the
UEP+MS and RST levels the Fixation index was highly significant (P-value: < 0.0001).
3.3.1.3. Ghana
Five haplogroups were found in the Ghanaian dataset (n=242, number of sub-groups=5)
where the modal type was E3a (91%). Gene diversity based on UEP haplogroups for the
pooled dataset was 0.164±0.032 and for the five individual groups ranged from 0.000-
0.368 with a mean of 0.16 and a variance of 0.018. In all groups the E3a haplogroup was
modal (mean: 0.91, variance: 0.007, range: 0.77-1.00). There were four pairwise
differences between clans at 5% significance and three at 1% significance. Of the four
significant pairwise comparisons none were significant even at the 5% significance level
when haplotypes were defined by UEP+MS. Gene diversity based on UEP+MS
haplotypes for the pooled dataset was 0.958±0.006 and for the five individual groups
ranged from 0.933 to 0.954 with a mean of 0.94 and a variance of 0.0001. For the one
case where an inter-ethnic group pairwise difference using UEP+MS haplotypes was
observed at 1% significance (GH-EHVR:GHFEWR) it was not maintained even at 5%
significance when the most frequently observed unshared haplotype from each group was
removed (P=0.232). The putative Bantu Expansion signature haplotype E3a-15-12-21-10-
11-13 was the UEP+MS modal haplotype in three of the five Ghanaian groups (mean:
0.16, variance: 0.0005, range: 0.14-0.18). In GH-AEW it was the co-modal haplotype
along with E3a-17-12-21-10-11-15 (Freq=0.19) while in GH-AKE it was the second most
frequently observed haplotype (Freq=0.137) following its one-step neighbour E3a-15-12-
21-10-11-14 (Freq=0.178). In pairwise comparisons using RST, genetic distances were not
significantly different in any ethnic group comparisons (P>0.01). The AMOVA-based
Fixation Index at the UEP level for all groups was significant at 1% but at the UEP+MS
and RST levels the structuring was not considered statistically significant at the 1%
threshold.
3.3.1.4. Igboland
Four haplogroups were found in the Igboland dataset (n=109, number of subgroups =2),
where the modal type was E3a (93%). Gene diversity based on UEP haplogroups for the
pooled dataset was 0.130±0.044 and for the two individual groups ranged from 0.080-
0.176. In both groups the E3a haplogroup was modal (range: 0.91-0.96). No significant
difference was found between the two groups based on UEP frequencies at 5%
142
significance (P=0.526). In addition no significant difference was found using UEP+MS
haplotypes at the 5% level, though the P-value was close to 0.05 (P=0.058). Gene
diversity based on UEP+MS haplotypes for the pooled dataset was 0.925±0.011 and for
the two individual groups ranged from 0.915 to 0.928. The putative Bantu Expansion
signature haplotype was the UEP+MS modal haplotype in IG-N (Freq=0.16) and was the
joint third most frequent haplotype in IG-E (Freq=0.13), where E3a-15-12-21-10-11-14
and E3a-17-12-21-10-11-14 were the co-modal haplotypes (Freq=0.15. In pairwise
comparisons using RST, the one pairwise genetic distance between IG-N and IG-E was
significantly different at the 5% threshold but not at 1% (P=0.048).
3.3.2. The distribution of mtDNA variation
Tentative mtDNA haplogroup classifications according to the nomenclature of Salas et al.
(2004) have been reported for the interest of the reader. However, because of the
difficulty of correctly predicting mtDNA haplogroups through HVS-1 sequence data
alone as described by Torrino et al. (2000) no statistical analysis has been performed at
this level and any conclusions based on this classification are at best tentative. The
subject of interest in this chapter lies in exploring the similarity and dissimilarity of
existing populations for which a predetermined phylogenetic relationship of mtDNA
types is not required.
3.3.2.1. Cross River region
363 distinct mtDNA HVS-1 haplotypes were observed in the Cross River dataset
(n=1088) (see Supplementary Table 3S.4 for all mtDNA data). Gene diversity based on
mtDNA HVS-1 haplotypes for the entire region was 0.991±0.001 and for the individual
clans ranged from 0.978 to 1.000 with a mean of 0.991 and a variance of 0.00002, for
individual locations it ranged from 0.978 to 0.997 with a mean of 0.990 and a variance of
0.00002 and for individual language groups it ranged from 0.986 to 0.992 with a mean of
0.990 and a variance of 0.00001. Of the 24 Cross River clans there were 44 haplotypes
that were modal or co-modal amongst the groups, which would be expected given the
high mtDNA haplotype diversity. However one particular haplotype, 126C-187T-189C-
223T-264T-270T-278T-293G-311C, was modal or co-modal in ten populations and its
overall frequency was the highest observed in the Cross River region (Freq=0.043). The
closest to this haplotype in frequency was 129A-209C-223T-292T-295T-311C
(Freq=0.030), which was modal or co-modal in seven populations. Four other haplotypes
143
were co-modal in three populations, all of which had a frequency of 0.22% or less. 205 of
the 363 haplotypes were observed only once in the dataset (4.4%, 5.4% and 1% of which
were found at varying frequencies in the Cameroonian, Ghanaian and Igboland datasets
respectively). The mean number of pairwise differences per pair of sequences per
population ranged from 8.02 to 10.99 with a mean of 9.75 and a variance of 0.492. There
were twelve population pairwise differences between clans (assessed using a Pairwise
ETPD) at 5% significance and one (IB-MNENN verses IB-IOINO) at 1%. In pairwise
comparisons using the K2 model three pairwise genetic distances were significant
(0.01>P>0.001) but the IB-MNENN/IB-IOINO comparison was not one of them. The
AMOVA-based Global Fixation Indices at the mtDNA VSO haplotype FST and mtDNA
K2 levels for all clans considered were not significant (P>0.242).
3.3.2.2. Cameroon
In all 133 distinct mtDNA HVS-1 haplotypes were observed in the Cameroonian
Grassfield dataset (n=256). Gene diversity based on mtDNA HVS-1 haplotypes for the
entire region was 0.991±0.001 and for three groups ranged from 0.968 to 0.990 with a
mean of 0.981 and a variance of 0.0001. Each of the three groups possessed a different
modal haplotype ranging in frequency from 0.056-0.147. 78 of the 113 haplotypes were
observed only once in the dataset (19.2%, 10.3% and 6% of which were found at varying
frequencies in the Cross River, Ghanaian and Igboland datasets respectively). The mean
number of pairwise differences per pair of sequences per population ranged from 8.93 to
9.31 with a mean of 9.15 and a variance of 0.039. Two of the three population pairwise
comparisons showed highly significant differences between populations (P<0.001) using
pairwise ETPD but no significantly large genetic differences were found even at 5%
significance using K2-based genetic distances. The AMOVA-based Global Fixation
Index at the mtDNA VSO haplotype FST level was highly significant (P<0.0001) but was
not significant using a K2 model (P=0.351).
3.3.2.3. Ghana
There were 144 distinct mtDNA HVS-1 haplotypes observed in the Ghanaian dataset
(n=238). Gene diversity based on mtDNA HVS-1 haplotypes for the entire region was
0.988±0.003 and for the five groups ranged from 0.985 to 0.995 with a mean of 0.989 and
a variance of 0.00001. The 223T-278T-294T-309G-390A haplotype was modal in two of
the five groups (Freq 0.06-0.08) and was co-modal in a further one. Different modal
144
haplotypes were found in the other two groups. 108 of the 144 haplotypes were observed
only once in the dataset (21.3%, 7.4% and 11% of which were found at varying
frequencies in the Cross River, Cameroonian and Igboland datasets respectively). The
mean number of pairwise differences per pair of sequences per population ranged from
6.84 to 8.14 with a mean of 7.26 and a variance of 0.415. There was one population
pairwise difference between groups at 5% significance but it was not significant at a
1%.threshold. In pairwise comparisons using the K2 model no pairwise genetic distances
were significantly different in any population comparison (P>0.05). The AMOVA-based
Global Fixation Indices at mtDNA VSO haplotype FST and mtDNA K2 levels was not
significant (P-value> 0.368).
3.3.2.4. Igboland
74 distinct mtDNA HVS-1 haplotypes were observed in the Igboland dataset (n=105).
Gene diversity based on mtDNA HVS-1 haplotypes for the entire region was 0.988±0.004
and for the two groups ranged from 0.982 to 0.991. The 172C-183C-189C-223T-320T
haplotype was modal in IG-E (Freq=0.07) and 126C-187T-189C-223T-264T-270T-278T-
293G-311C was modal in IG-N (Freq=0.12). 56 of the 74 haplotypes were observed only
once in the dataset (41.1%, 17.9% and 19.6% of which were found at varying frequencies
in the Cross River, Cameroonian and Ghanaian datasets respectively). The mean number
of pairwise differences per pair of sequences per population ranged from 9.54 to 10.00.
There was no pairwise difference between IG-N and IG-E at 5% significance while the
K2 pairwise genetic distance was also not significant (P>0.05).
A series of questions posed in the introduction that examine the Cross River data at
various levels of grouping (clan, location and language) are now addressed.
3.3.3. Are clan communities collected from different locations distinguishable?
In two cases datasets consisting of the same clan or secondary affiliation were collected
from more than one location: i) the Ejagham of Akampka from a) Calabar (EK-CC) and
b) Netim (EK-NA) and ii) the Efik Efut from a) Eniong and Atan Ono Yom (EF-EE) and
b) Ikot Nakanda and Ikot Ene (EF-INE). The Ejagham of Akampka showed no significant
differences in the NRY (UEP and UEP+MS level assessed using the ETDP and MS level
assessed using AMOVA RST-based genetic distance) or mtDNA comparisons (HVS-1
haplotype level assessed using ETPD and HVS-1 sequence level assessed using
145
AMOVA-based K2 genetic distance). However the Efut did show a significant difference
between clans at the UEP+MS level (P<0.01) though, as stated earlier, this significant
difference was lost even at 5% significance when the most frequently observed unshared
haplotype from each clan was removed. No significant differences were found at any
other NRY or mtDNA level for the Efut.
3.3.4. Are different clans of the same language group collected from the same location
distinguishable?
In two cases datasets consisting of different clans (or other parallel secondary affiliations)
of the same language group were collected from the same location: from i) Afaha
Esang/Ikot Ubom the a) Annang Ediene Abak (AN-EA) and b) the Annang Afaha Obong
(AN-AO) and from ii) Calabar the a) Ejagham of Akampka (EK-CA), b) the Ejagham of
Ikom (EK-CI) and c) the Ejagham of Calabar (EK-CC). The two clans from Afaha
Esang/Ikot Ubom showed no significant differences at any NRY or mtDNA level.
However a highly significant difference was found in Calabar between the Ejagham of
Akampka and the Ejagham of Calabar (P<0.01) at the UEP+MS level though once again
this difference was lost, even at 5% significance, when the most frequently observed
unshared haplotype was removed from each group. In addition significant differences
were found between the Ejagham of Akampka and the Ejagham of Ikom at 5%
significance at the UEP+MS and RST levels but not at 1% significance. No significant
pairwise differences were found at any mtDNA level.
3.3.5. Are different language groups collected from the same location distinguishable?
There was one dataset where two different language groups were collected from the same
location: from i) Calabar a) Ejagham speakers and b) Igbo speakers. There were no
significant differences between these two language groups at any NRY level (P-value:
UEP = 0.60, UEP+MS = 0.47, RST= 0.48). An ETPD was significant at the mtDNA
haplotype level at 5% significance but not at 1% (P=0.048) and the K2 genetic distance
was not significant (P=0.16).
3.3.6. Are the same language groups collected from different locations distinguishable?
In the Cross River dataset there are five language groups where samples were collected
from the same language speakers in two or more locations: the Annang (two locations),
the Efik (three locations), the Ejagham (two locations), the Ibibio (eleven locations) and
146
the Oron (two locations). No significant differences were observed at any NRY or
mtDNA levels for the Annang, Ejagham and Oron while the one difference in Efik
pairwise comparisons was the same one as previously described in testing for differences
between clans at different locations when examining the two Efut clans. For the 11 Ibibio
groups there were a total of 55 pairwise comparisons at each of the different levels of
analysis. At all NRY levels there were no significant differences at the 1% threshold. At
the mtDNA level there was one significant difference (P<0.01) found using pairwise
ETPD while K2-based genetic distances revealed one significant pairwise genetic
distance (P<0.01). Because of the large number of pairwise comparisons the Ibibio were
also additionally analysed using hierarchical AMOVA. The AMOVA-based Fixation
Index for the individual Ibibio clans/locations was not significant at any NRY or mtDNA
level (P-value > 0.138).
3.3.7. Are speakers of the six Cross River languages distinguishable?
This section addresses the principal question posed in the introduction: do the six Cross
River language group datasets indicate any sex-specific genetic system structuring or has
gene flow among them been sufficient to prevent differences developing? Using pooled
datasets of language speakers in the Cross River region (where clans were pooled based
on their principal language) the hierarchical AMOVA-based Fixation indexes were not
significant at any NRY or mtDNA level. There were also no pairwise significant
differences between language groups at any NRY level. However at the mtDNA level two
pairwise significant differences (P<0.01) were observed using an ETPD (between the
Ejagham and Ibibio and between the Ejagham and Efik). The Ejagham and Ibibio
pairwise comparison also gave a significant (P<0.01) AMOVA-based K2 genetic
distance.
To take into account any differences among language groups due to differences within
language groups each was analysed clan separately but within a framework where clans
were also grouped by their language spoken.
The AMOVA-based Fixation Indices for among-group differences (with populations (in
this case clans) grouped by language spoken; FCT is the Among-Group Fixation Index)
were not significant at any NRY or mtDNA level of analysis (P-value>0.105).
147
Though the Fixation Indices above indicate a lack of among-group structure between
different language-speaking clans, significant individual pairwise differences were
observed at every NRY and mtDNA level. Below is a description the distribution of these
differences and for each language group. The percentage of pairwise comparisons of
clans within this language group with clans of all other language groups that were
significantly different at the 5% threshold at each NRY and mtDNA level of analysis are
also reported. While there are possibly issues of multiple testing because of the large
number of non-independent pairwise comparisons these figures do provide a useful report
of the distribution of pairwise differences and can indicate potentially interesting patterns
and candidate outliers.
At the UEP haplogroup level five significant pairwise comparison were observed
between clans of different language groups at the 5% level, four of which involved EF-
INE, while none were found at 1% significance. The percentage of pairwise comparisons
involving each language group with all others that were significant at the 5% threshold
was: Annang: 1.6%, Efik: 6.3%, Ejagham: 0.0%, Ibibio: 2.8%, Igbo: 0.0%, Oron: 2.3%.
At the UEP+MS level differences were found in eight pairwise comparisons at 5%
significance, five of which involved EF-EE. One difference was found at 1%
significance, that between AN-AO and IB-EAEEUAE, which was not significant even at
5% when the most frequently observed unshared haplotype was removed from each
group. The percentage of pairwise comparisons involving each language group with all
others that were significant at the 5% threshold was: Annang: 3.2%, Efik: 9.5%,
Ejagham: 3.8%, Ibibio: 4.9%, Igbo: 0.0%, Oron: 0.0%.
Twelve significant genetic distances were observed using RST at the 5% level, five of
which involved EF-INE, while none were observed at 1%. The percentage of pairwise
genetic distances involving each language group with all others that were significant at
5% was; Annang: 4.8%, Efik: 7.9%, Ejagham: 6.3%, Ibibio: 6.3%, Igbo: 4.3%, Oron:
2.3%.
At the mtDNA haplotype level seven pairwise comparisons were significantly different at
the 5% level and none at 1%. The percentage of pairwise comparisons involving each
148
language group with all others that were significant at the 5% threshold was: Annang:
6.3%, Efik: 0.0%, Ejagham: 1.3%, Ibibio: 4.2%, Igbo: 0.0%, Oron: 6.8%.
Thirteen significant genetic distances were observed using the K2 distance model at the
5% level, seven of which involved AN-EA, while two were observed at the 1% threshold,
both of which involved IB-ANMWN. The percentage of pairwise genetic distances
involving each language group with all others that were significant at the 5% threshold
was: Annang: 12.7%, Efik: 4.8%, Ejagham: 7.5%, Ibibio: 7.7%, Igbo: 4.3%, Oron: 2.3%.
3.3.8. Are speakers of the six Cross River languages distinguishable when two groups
from Igboland are added to the analysis?
In the Cross River region Calabar is considered a particularly cosmopolitan city where
different ethnicities reside together at an unusually high frequency for the region as a
whole. Of the six language groups considered here the Igbo would be expected to be the
most genetically distinct. Given that samples were collected from only one group of Igbo
speakers from Calabar two groups from Igboland to the west of the Cross River region
(IG-E and IG-N) were added to the inter-language group analysis to take into account the
potentially unusually high levels of inter-ethnic admixture (in comparison to the region as
a whole) that may have taken place involving Igbo from Calabar.
The AMOVA-based FCT values were slightly higher at all NRY but not mtDNA levels
when the IG-N and IG-E were grouped with the Igbo speaking group from Calabar (all
other language group structures were the same ) but none of these FCT values were
significant (P-value>0.086).
In comparisons between the two Igboland groups and the Igbo from Calabar no
significant differences were found at the 5% level using UEP frequencies. Significant
differences were found between the Igbo from Calabar and both Igboland groups using
UEP+MS haplotypes at the 5% level but not at 1% (P>0.025). However when using RST
genetic distances these two pairwise comparisons were not significantly different even at
the 5% threshold (P>0.382). No significant differences were found using mtDNA
haplotype frequencies (P>0.055) but the K2-based genetic distance between the Igbo
from Calabar and IG-N was significant at the 5% level (P=0.024).
149
In comparisons with the non-Igbo Cross River region groups, at the UEP level IG-E
showed no significant pairwise differences even at the 5% level with Cross River clans
while IG-N showed three significant differences at the 5% level and none at the 1% level.
At the UEP+MS level IG-E showed significant differences with four populations at 5%
significance and six populations at 1% significance while IG-N showed differences with
ten populations at 5% significance and seven populations at 1% significance. At the
microsatellite level there were no significant RST genetic distances between IG-E and any
other Cross River population, even at 5% significance, while IG-N showed four
significant genetic distances at the 5% level and one at 1%.
At the mtDNA haplotype level IG-E showed two significant pairwise differences with
Cross River populations at 5% significance and none at 1%, while IG-N showed no
significant differences at 5% and one at 1%. mtDNA K2 genetic distances were not
significant in any comparisons involving IG-E while for IG-N there were four significant
K2 genetic distances at 5% and three significant genetic distances at 1% , in which all
five other language groups apart from the Oron were involved on some occasion.
As expected from AMOVA results, phylogenetic analyses of Cross River clans through
consensus neighbour joining trees at various NRY and mtDNA (Figure 3.4) levels
showed no consistent language groupings with very low internal node bootstrap values
across the trees, suggesting the various branches for each tree are somewhat
interchangeable, even in the presence of the two Igboland populations.
3.3.9. Can differences between the Cross River region and Cameroonian and Ghanaian
groups be established?
Using three pooled datasets consisting of the 24 Cross River region clans, five Ghanaian
groups and three Cameroonian groups respectively, pairwise ETPD showed significant
differences at the 1% threshold between all three datasets at all NRY and mtDNA levels
except at the UEP level where there was no significant difference between the Cross
River region and Cameroon while NRY RST and mtDNA K2 genetic distances were also
significant at the 1% threshold (see Table 3.7).
150
Figure 3.4: Consensus neighbour joining trees for Cross River population using various methods of genetic distance for
both NRY and mtDNA. Only individual node bootstrap values over 30% are shown on tree.
151
Table 3.7a: ETPD P-values (upper triangle) at various NRY and mtDNA levels for pooled Cameroonian, Ghanaian and
Nigerian datasets. Colour code is same as Table 3.6.
NRY UEP level NRY UEP+ms level
mtDNA VSO level
Cameroon Ghana Nigeria Cameroon Ghana Nigeria Cameroon Ghana Nigeria
Cameroon
Ghana 0.001 0.000 0.000
Nigeria 0.620 0.000 0.000 0.000 0.000 0.000
Tables 3.7b: Genetic Distances (lower triangle) and P-values (upper triangle) at various NRY and mtDNA levels for pooled
Cameroonian, Ghanaian and Nigerian datasets. Colour code is same as Table 3.6.
NRY UEP-based F ST NRY UEP+ms-based FST NRY Microsatellite-based R ST
Cameroon Ghana Nigeria Cameroon Ghana Nigeria Cameroon Ghana Nigeria
Cameroon * 0.045 0.312 * 0.000 0.000 * 0.000 0.000
Ghana 0.007 * 0.002 0.021 * 0.002 0.062 * 0.010
Nigeria 0.000 0.016 * 0.028 0.004 * 0.042 0.006 *
mtDNA VSO-based F ST mtDNA VSO-based K2
Cameroon Ghana Nigeria Cameroon Ghana Nigeria
Cameroon * 0.000 0.000 * 0.000 0.000
Ghana 0.007 * 0.000 0.029 * 0.000
Nigeria 0.005 0.004 * 0.014 0.015 *
152
To account for possible within-region differentiation the Cross River clans and Ghanaian
and Cameroonian groups were compared on a population-by-population basis but within
a framework where populations were also grouped by their country of origin.
The AMOVA-based Fixation Indices for among-group differences, FCT, (with populations
grouped by one of the three countries) were significant at the 5% threshold using NRY
UEP defined haplogroups and RST and were significant at the 1% level using UEP+MS
haplotypes and at both levels of mtDNA analysis (P-value<0.001).
The percentage of pairwise comparisons of Cameroon Grassfields populations with Cross
River clans that were significantly different at the 5% level was: for NRY UEP-based
pairwise ETPD=16.7%, for NRY UEP+MS-based pairwise ETPD =86.1% (the majority
are highly significant), for RST genetic distance=88.9% (again the majority are highly
significant), for mtDNA haplotype-based ETPD=97.2% (again the majority are highly
significant), for mtDNA K2 genetic distance= 36.1%.
The percentage of pairwise comparisons of Ghanaian populations with Cross River clans
that were significantly different at the 5% level was: for NRY UEP-based pairwise
analogue Fisher‟s test ETPD = 40.8%, for NRY UEP+MS-based pairwise ETPD =
16.7%, for RST genetic distance =17.5%, for mtDNA haplotype-based ETPD = 44.2%, for
mtDNA K2 genetic distance = 30.8%.
Supplementary Table 3S.1 and 3S.3 also shows that at the UEP+MS, RST and mtDNA
haplotype levels (and to some extent mtDNA K2 levels) pairwise comparisons between
Ghanaian and Cameroonian populations indicated highly significant differences.
PCO plots of NRY and mtDNA genetic distances at various levels of resolution showed a
general pattern (see Figure 3.5) at all levels where the Cross River datasets clustered
together, with the Cameroonian and Ghanaian populations tending to lie on the periphery
(though when examining each Cameroonian and Ghanaian population individually some
populations were observed deep within the Cross River cluster while others are distinct
outliers).
153
Figure 3.5: Various PCO plots at different NRY and mtDNA analysis levels
for populations from the Cross River region, the Cameroon Grassfields and
Ghana.
3.3.9.1. Estimation of the TMRCA of individuals possessing the E3a haplogroup in the 3
West Central African regions
A crude estimate of the TMRCA of the E3a clade of all samples analysed from the Cross
River region, the Cameroon Grassfields and Ghana using Y-time was, assuming a) a star
genealogy, b) a mutation rate per generation of 0.00193 (Behar et al. 2003), c) a Simple
Stepwise Mutation Model and d) that the ancestral haplotype was the Bantu signature
haplotype was 279 generations before present or 5580 years (assuming an inter-
154
generation time of 20 years) (95% Confidence Interval (CI) = 268 (5360 years) – 291
(5820) generations (years) before present).
3.3.10. Are there correlations of genetic distances and geographic and linguistic
distances?
A Mantel test of correlation between genetic and linguistic distance for the Cross River
clans showed no correlation at any NRY or mtDNA level (P> 0.085) (see Table 3.8 for all
Mantel and Partial Mantel test results). A test for correlation between genetic and
linguistic distance while holding geographic distance constant resulted in a P-value using
UEP+MS haplotype-based FSTs of 0.058 (r=0.058), while genetic distances at all other
NRY and mtDNA levels of analysis gave very non-significant P-values (P-value> 0.264).
In addition no correlation was found between genetic and geographic distance at any level
(P>0.386), even when holding linguistic distance constant (P>0.091).
Performing these Mantel and Partial Mantel tests for correlations between genetic,
linguistic and geographic distances but restricting the dataset to only Lower Cross
languages (therefore excluding Ejagham and Igbo) suggests no correlation at any NRY or
mtDNA level with very high P-values in all cases (P>0.271).
However expanding the Cross River dataset to include the Igboland populations does
reveal a highly significant correlation between NRY UEP+MS FSTs and linguistic
distance using a normal Mantel test (P=0.004, r=0.354) (but there is no correlation at any
other NRY or mtDNA level of analysis (P>0.157)). The correlation between NRY
UEP+MS FST and geographic distances using this same Igboland included dataset is also
close to significance (P=0.074, r=0.253). A partial Mantel test of the correlation between
NRY UEP+MS genetic and geographic distance while holding linguistic distance
constant was not significant (P>0.159, r=0.103). However a significant correlation
between NRY UEP+MS genetic and linguistic distance is still apparent when holding
geographic distance constant (P=0.034, r=0.275), though the correlation is less
pronounced.
When the 24 Cross River region populations were considered with the five Ghanaian and
three Cameroonian groups highly significant correlations were found between genetic and
linguistic distance (P<0.01) at all NRY and mtDNA levels apart from at the UEP
155
Table 3.8: Results of Mantel and Partial Mantel tests at different levels of
NRY and mtDNA analysis using various distance matrices. Colour code is
same as Table 3.6.
Correlation Analysis
type
Groups utilised
Genetic distance matrix type calculated
NRY UEP-based FST
NRY UEP+ms-based FST
NRY
Microsatellite-
based RST
mtDNA VSO-based FST
mtDNA VSO-based K2
R P-
value R
P-value
R P-
value R
P-value
R P-
value
Ma
nte
l
Geogra
phy
Cross River +
Cameroon + Ghana
0.458 0.001 0.123 0.212 0.235 0.060 0.300 0.035 0.432 0.001
Nigeria (Includes IG-N and
IG-E)
0.107 0.196 0.253 0.074 0.078 0.263 0.098 0.223 0.142 0.182
Cross River
-0.012 0.524 -0.078 0.838 0.008 0.446 -0.033 0.656 0.018 0.386
Lower Cross
-0.006 0.495 -0.087 0.797 -0.046 0.664 -0.141 0.910 -0.052 0.561
Lin
guis
tics
Cross River +
Cameroon + Ghana
0.166 0.073 0.364 0.001 0.317 0.002 0.372 0.000 0.347 0.001
Nigeria (Includes IG-N and
IG-E)
-0.101 0.820 0.354 0.004 0.111 0.165 0.113 0.149 0.080 0.263
Cross River
-0.2243 0.980 0.270 0.085 0.067 0.298 0.074 0.261 -0.015 0.509
Lower Cross
-0.137 0.835 -0.164 0.887 -0.281 0.972 -0.062 0.647 -0.145 0.906
Part
ial M
ante
l Geogra
phy c
ontr
olli
ng
Lin
guis
tics
Cross River +
Cameroon + Ghana
0.455 0.001 -0.131 0.819 0.057 0.321 0.103 0.201 0.298 0.005
Nigeria (Includes IG-N and
IG-E)
0.176 0.091 0.103 0.159 0.029 0.382 0.051 0.353 0.119 0.201
Cross River
0.054 0.287 -0.167 0.987 -0.011 0.534 -0.056 0.761 0.023 0.365
Lower Cross
0.045 0.310 -0.032 0.651 0.060 0.264 -0.128 0.892 -0.135 0.893
Lin
guis
tics c
ontr
olli
ng
Geogra
phy
Cross River +
Cameroon + Ghana
-0.156 0.945 0.366 0.004 0.227 0.036 0.251 0.015 0.120 0.123
Nigeria (Includes IG-N and
IG-E)
-0.173 0.964 0.275 0.034 0.084 0.227 0.076 0.246 0.014 0.398
Cross River
-0.230 0.980 0.305 0.058 0.067 0.314 0.087 0.264 -0.021 0.546
Lower Cross
-0.144 0.872 -0.143 0.845 -0.284 0.972 -0.013 0.537 -0.001 0.429
156
haplogroup level, which was close to significance at the 5% threshold (P>0.073). When
geographic distance was held constant the UEP-based correlation was still not significant
while the correlation using RST was only significant at the 5% threshold (P=0.036).
However the correlation using UEP+MS FSTs was still highly significant (P=0.004,
r=0.366). The correlation between mtDNA K2-based genetic distances and linguistic
distance was no longer significant (P>0.123, r=0.123) but the correlation using mtDNA
haplotype FST was still significant but to a lesser degree than previously (P=0.015,
r=0.251). Highly significant correlations were also found between genetic and geographic
distance using this geographically widespread dataset at the UEP haplogroup FST and
mtDNA K2 levels (P<0.001) while using the mtDNA FST distance produced a significant
correlation at 5% significance (P=0.035, r=0.300). The RST-based distance was almost
significant (P>0.06, r=0.235). However the significant correlation using mtDNA-based
FST was lost when linguistic distance was held constant (P>0.201, r=0.103).
3.3.11. The Origins of the Efik
Initial examination of the distribution of NRY and inferred mtDNA haplogroups (Table
3.9) revealed an extremely high frequency of African-specific types in the Efik Uwanse,
which immediately reduced the likelihood that they had a Middle Eastern origin. To
investigate this further, given the expectations set out in the introduction, the Efik
Uwanse (EF-OUE) were compared to the Ibibio and Igbo as well as the following non-
Nigerian populations: Arabe speakers from Lake Chad (LC-AF), Amharic speakers from
Ethiopia (ET-AA), Israeli and Palestinian Arabs (IPA for NRY data, IPA2 for mtDNA
data) and Sudanese (SU-KH for NRY data, SU-KA for mtDNA (see Supplementary
Tables 3S.5 and 3S.6 for NRY and mtDNA data). It should be noted that this is the best
comparative dataset currently available and it is not claimed that each group completely
represents its area of origin. However they are likely to possess the major genetic
signatures that the Efik might have acquired from origin or admixture in the past. The two
Efik Efut populations (EF-EE and EF-INE) who claim a separate Cameroonian origin and
have recently adopted the Efik language were, for comparison, also separately included in
the pairwise analysis.
PCO plots of pairwise genetic distances (see Figure 3.6) at all NRY levels showed the
EF-OUE to be firmly clustered with the Ibibio and Igbo populations and considerably
157
differentiated from the four non-Nigerian comparison populations. The genetic distances
on which these PCO plots were based showed highly significant differences between the
EF-OUE and the four non-Nigerian populations (see Supplementary Table 3S.7).
Table 3.9: NRY and mtDNA haplogroup frequencies in the Efik Uwanse.
NRY UEP
Haplogroup
(according to the
nomenclature of the Y-
chromosome
consortium(2002))
EF
-OE
U
Inferred mtDNA
HVS-1
Haplogroup
(according to the
nomenclature of Salas et
al. (2002))
EF
-OE
U
BR*(xDE,JR) 5 L0a1 2
Y*(xBR,A3b2) 1 L1* 2
E3a 44 L1b 8
Total 50 L1c1 1
L1c2 5
L2a 10
L2b 4
L2d 1
L3* M* N* 1
L3b 1
L3e1* 4
L3e2* 3
L3e2b 2
L3e3 1
L3e4 1
L3f 2
Total 48
The general level of genetic differentiation was less pronounced at the mtDNA level,
especially when using the K2 mutation model, but PCO plots showed the EF-OUE to still
be clustered amongst the Ibibio and Igbo populations rather than the non-Nigerian
populations. All genetic distances between the EF-OUE and the four non-Nigerian
populations at both levels of mtDNA analysis were significant at the 1% threshold except
between it and the Lake Chad dataset using the K2 distance where the genetic distance
158
was not significant even at the 5% threshold. The two Efik Efut groups were also
indistinguishable from the main Ibibio/Igbo cluster at all levels.
Figure 3.6: Various PCO plots at different NRY and mtDNA analysis levels
for populations from the Efik Uwanse and comparison populations.
159
3.4. Discussion
The main finding of this study is that the Cross River region can be genetically
differentiated, at least by the sex-specific genetic systems, from other geographically
separated regions in West Central Africa but the different ethnic groups found with the
region, which all speak different languages, cannot be distinguished in the main from
each other. This appears to fit the prior expectation that gene flow is more restricted
between geographically distant populations in comparison to populations that lie within a
common region despite the presence of significant cultural and linguistic differences.
However, despite the overall homogeneity displayed in the Cross River region,
differences were found among groups at various levels. Therefore, while at a macro-scale
differences among groups would be predicted, as the scale is reduced the populations
should homogenise, though random, unpredictable differences may arise.
3.4.1. General observations regarding NRY and mtDNA variation
The NRY and mtDNA types found in the populations included in this study are fairly
typical of those observed in West Central Africa and sub-Saharan Africa (excluding East
Africa) as a whole. As would be expected E3a (which has previously been found at high
frequencies across sub-Saharan Africa (Wood et al. 2005)) is by far the predominant UEP
haplogroup in all Cross River, Cameroonian and Ghanaian populations, suggesting a
recent common paternal ancestry of most sub-Saharan African males (again disregarding
East Africa). Much of this common ancestry appears to have been driven by expanding
Bantu-speaking farmers spreading the E3a NRY type across the continent (Underhill et
al. 2001) as evidenced by the presence of the proposed Bantu signature haplotype as the
modal type as far away as South Africa (Thomas et al. 2000). However given that none of
the groups studied here actually speak a Bantu language the effect of the Bantu expansion
on this region is likely to have been limited.
The putative Bantu signature E3a UEP+MS haplotype is found at relatively high
frequencies in the Cross River region, which, given its proximity to the proposed Bantu
homeland, suggests that the proposed signature haplotype is likely to have been well
established at high frequency in Western Central Africa prior to the start of the Bantu
expansion, while its significant presence in the Ghanaian region, where Bantu languages
160
are not spoken, is likely due to some other movement of peoples that either brought the
haplotype into or, as implied by Rosa et al. (2007), from West Africa. Interestingly the
Bantu signature haplotype is not the modal NRY type in any of the Cameroon Grassfields
groups, another region very close to the proposed Bantu homeland. This suggests that this
part of Cameroon was somewhat isolated from the farmers that initiated the expansion of
Bantu languages peoples and may have retained much of its prior genetic diversity.
The distribution of mtDNA variation was, as expected, much more diverse than for the
NRY and was also very similar to that which has previously been reported for sub-
Saharan Africa, with almost all HVS-1 haplotypes observed in the Cross River,
Cameroonian and Ghanaian datasets able to be placed, albeit tentatively, in a number of
„L‟ haplogroups. The major haplogroups found in „Central‟ and „West‟ Africa by Salas et
al. (Salas et al. 2002) (L1a, L1b, L1c, L2a and L3e in Central Africa, L1b, L2a, L3b/d and
L3e in West Africa) all appear to be represented at appreciable frequencies amongst the
three datasets included in this study. The extremely high h values for the HVS-1 region
for all three groups were comparable to those previously observed in West Africa by
Salas et al. (2002) (mean h = 0.99) and slightly higher than those found across all of sub-
Saharan African (mean h = 0.97, excluding pygmies and Khoisan speakers) while the
average number of pairwise differences values were also similar to those found across
sub-Saharan Africa by Salas et al. (2002) (mean = 7.92).
3.4.2. The Cross River region as a genetically homogenous region
The results of this study showed very little sex-specific genetic differentiation at any
NRY or mtDNA level amongst the different groups of peoples living in the Cross River
region of Nigeria. As mentioned previously the vast majority of members of the different
groups of the region are likely to have shared a recent common paternal ancestor as
evidenced by the high E3a frequency observed in all populations. The main reason for the
homogeneity is likely to be that gene flow has been substantial over a long period (as is
supported for recent times by the sociological data). It is notable that the level of gene
flow mediated by men and women appears similarly high, though the data assembled here
are not directly comparable. (In fact while neither genetic system showed significant
genetic structuring, FST and FCT P-values were much closer to significance at the mtDNA
level, the opposite to what would be expected given that all the language groups
considered here are considered to comprise patrilocal communities). A major
161
consequence of this gene flow appears to be that there is no genetic differentiation among
the six different language groups studied, even in comparisons with the Igbo speakers of
Calabar, a group which is believed to have separated from the Lower Cross groups some
thousands of years ago. This demonstrates that major language differences can be
maintained in the presence of substantial gene flow, a finding that will be of considerable
interest to linguistics working on aspects of language contact and suggests that a)
demographic history and language spoken can, in West Central Africa at least, be
independent and b) oral histories may relate more to the extant group as a cultural
construct than as an entity defined by biological ancestry.
Given the lack of genetic differentiation among the Cross River region populations it is
unsurprising that no correlation was found between either a) genetic and linguistic
distance and b) genetic and geographic distance, suggesting that gene flow has been
multi-directional. However at the UEP+MS level the addition of the two Igboland groups
did appear to result in a significant correlation between genetic and linguistic distance
even when controlling for geographic distance. In addition a number of significant
differences were found between these two Igboland groups (especially IG-N) and the
other Cross River region clans at various NRY and mtDNA levels. These were notable
when compared with the number of pairwise significant differences observed just among
the Cross River groups themselves. Differences were even identified when comparing the
Igboland groups to the Igbo from Calabar. These differences, coupled with the general
lack of differentiation within the region, appears to support the idea of the Cross River
region being a distinct genetic region, though how far this region extends, and therefore
how far the same level of gene flow extends, is unclear. As the correlation between
genetic and linguistic distance was almost significant for the Cross River region clans at
the NRY UEP+MS level but was significant when the neighbouring Igbo groups were
added to the analyses, this suggests that the groups present in Cross River region
experience a level of male-mediated gene flow that is close to the permitted limit if
linguistic difference is to be maintained.
One factor that may have contributed to the Cross River region being particularly
homogenous was its position as a major slave post, which may have led to extensive
mixing of members of different ethnic groups that would normally have had somewhat
less contact with each other as a consequence of geographic separation. This process,
162
which may have occurred for as long as 200 years, could have significantly increased
gene flow among speakers of different languages. This may go some way to explaining
the very high levels of both male and female mediated gene flow among primarily
patrilineal groups. Intriguingly some Y chromosome haplogroups that are possibly
indicative of European ancestry (P*(xR1a), J) are found at very low frequencies (less than
1%) amongst the Cross River samples. It is possible that these may have entered the
Cross River gene pool as a consequence of male introgression of slave traders. However,
neither of the two haplogroups described above are unequivocally European and further
UEP delineation would be required to truly test for the presence of this process (for
example, Haplogroup P*(xR1a) contains Haplogroup R1b, which is found amongst
Western Europeans (Zalloua et al. 2008), as well as Haplogroup R2, which is found
amongst South Asians (Kivisild et al. 2003)). A few mtDNA lineages may also
potentially demonstrate recent European ancestry but it impossible to truly establish this
based only on HVS-1 data and female European introgression would be unlikely, at least
with regard to impact of the slave trade.
However, in spite of a general level of genetic homogeneity, significant differences were
observed at different resolutions of groupings, such as between clans from different
locations (Efik Efut) and between language groups from different locations. Conversely,
significant differences were not observed in other comparisons of the same type (Ejagham
of Akampa from Calabar and Netim). There appears to be no obvious pattern and these
differences are either an artefact, a consequence of multiple pairwise comparisons, or
local transient differences that gene flow will eventually extinguish (although some, at
least, may represent emerging differences which might be determined by further
anthropological fieldwork). Often the significant differences appeared at the UEP+MS
level of analysis and were lost when only one haplotype was removed from each group,
showing that at such a fine-scale of analysis differential reproductive success of just one
man in each group can potentially cause significant differentiation.
Therefore the data presented here appear to be consistent with the following demographic
model of male and female mediated gene flow in the Cross River region:
Culturally defined demes (be they defined by language or clan) are experiencing
substantial multidirectional gene flow, independent of geographic location, that has led to
a highly homogenous meta-population (that encompasses all demes). At times genetic
163
differentiation may occur within the meta-population among a subset of demes as a
consequence of reproductive success of one or more individuals within one or more
demes of this subset. However within a relatively few generations the high level of
among-deme gene flow that characterises the region will cause the distribution of the sex-
specific genetic systems of the culturally distinct demes to be statistically similar.
It would be interesting in the future to estimate, via computer simulations, the range of
values that parameters such as migration rate, migratory distance, deme population size,
generation time, deme and meta-population growth rate and reproductive success can
have that is consistent with the patterns of genetic variation presented here.
However, as discussed in more detail in Chapter 5, the resolution of the majority of Y
chromosomes in this study is limited upto haplogroup E3a. Further delineation of E3a
with other UEP markers has the potential to reveal more genetic structuring amongst the
Cross River region (even despite the microsatellites also not demonstrating any
structuring) and therefore the demographic model described above should be treated with
some caution until further work is performed in the future.
3.4.3. Cross River, Ghana and Cameroon as genetically distinct regions
The results presented here clearly show genetic differentiation of both NRY and mtDNA
systems among the three geographically separated regions included in this study: the
Cross River region, the Grassfields of Cameroon and Ghana. This was expected as a
consequence of the large geographic distances involved, which would substantially
reduce of gene flow and thus regional heterogeneity. The underlying genetic relationship
of the three regional groupings is evidenced by the high E3a haplogroup frequencies in all
three regions. This similarity at the UEP haplogroup level is an important contributor to
the AMOVA-based Fixation Indices being significant only at the 5% threshold. The
estimate of 5580 years before present of the TMRCA of all E3a-possessing individuals in
the three West Central African regions appears to support the scenario of E3a being
established in West Africa and expanding towards the Cameroon-Nigeria border in an
event that occurred prior to the expansion of the Bantu-speaking peoples since it does not
lie in the 3000-5000 year range previously attributed to the start of the latter migration.
164
Despite the crude assumptions used to generate this estimate16
if an inter-generation time
of 25 years was used the TMRCA would have been 6975 years before present, which is
considerably older than the 4,839 years before present given by Thomas et al. (2000) for
the TMRCA of their E3a South African Bantu Y-chromosomes using the same method
(including a 25 years inter-generation time).
The three Cameroon Grassfields groups appear to be more genetically differentiated from
the Cross River region than the five Ghanaian groups. Yet the Cameroonian groups are
closer geographically and linguistically to the Cross River region. In addition, amongst
themselves the Cameroon Grassfields groups are more genetically heterogeneous than are
the Ghanaian populations, which are relatively homogenous despite being more
geographically disparate. The Grassfields, despite its name, is a largely highland area
made up valleys broken up by hills and mountains (Mount Oku is located in the
Grassfields and is the second highest mountain in West Central Africa). Therefore the
greater differentiation of the three Cameroonian populations both from each other and
from the Cross River region groups is perhaps unsurprising given that the topology of the
region may have presented major physical barriers to gene flow among populations.
All three regions appear to broadly share the same NRY and mtDNA haplogroup types
and therefore share some recent common ancestry. As a consequence the genetic
differentiation between the Cameroon groups and the other two groups is not as
pronounced in UEP and mtDNA K2 genetic distance-based calculations (where the older
evolutionary relationships of NRY and mtDNA types have greater influence in the
generation of genetic distances) than in UEP+MS, MS and mtDNA FST genetic distance-
based calculations (where differences of recent origin are given equal or greater weight
than are those due to evolutionary older differences).
A significant correlation was observed between geographic and genetic distance at the
UEP and mtDNA K2 levels, but also between linguistic and genetic distance at the
UEP+MS, MS and mtDNA VSO levels. The correlations in all cases are likely driven
16
The 95% CI of 5360-5820 years for this TMRCA estimate is likely a substantial underestimate because
of the crude assumptions applied in the method. Therefore, while presented for the interest of the reader as a
matter of routine, it is recommended that conclusions are not drawn from these CI‟s with regard to
population history.
165
primarily by the many pairs of small genetic and geographic distances resulting from
pairwise comparisons between the numerous Cross River groups and larger genetic and
geographic distances in pairwise comparisons of Cross River groups and non-Cross River
groups. Whether the correlation is ultimately best explained by geographic or linguistic
distance is substantially driven by the level of genetic differentiation of the Cameroonian
groups from the Cross River groups. Figure 3.3 shows that the linguistic distance matrix
records only a slightly greater distance between the Cross River languages and Ghanaian
languages than between the Cross River languages and the Cameroonian Grassfield
languages. The geographic matrix however records a much greater geographic distance
between the Cross River region and Ghana than between the Cross River region and
Cameroon. The level of Cameroonian and Cross River genetic differentiation is lowest in
the UEP-and K2-based calculations and hence genetic distances in this analysis show a
better fit with the geographic matrix, while the level of Cameroonian and Cross River
genetic differentiation is highest at the UEP+MS, MS and mtDNA VSO levels and thus
the genetic distances show a better fit with the linguistic distance matrix.
While these results show that both geographical location and language spoken are likely
to have impacted on the pattern of genetic diversity observed among and within the three
West Central African regions, the high level of heterogeneity in Cameroon demonstrates
the additional major influence of topography on this diversity. Therefore attempting to
interpret the significant correlations observed between genetic distance and linguistic
distance in some cases, and geographic distance in others is probably an over-simplistic
approach that will lead to explanations that do not truly take into account the complex
demographic processes involved at such a fine geographical scale and thus will be of little
value. Both factors may, of course, be involved and applying other analytical techniques
such as multiple regression analysis (Lichstein 2007) may allow, in the future, the relative
contributions (Freckleton 2002) of geographic and language separation to sex-specific
differentiation among the datasets in this study to be established. Indeed the analyses
undertaken to date may be interpreted as indicating that over longer timescales
geographic distance has played a larger part in genetic isolation while languages currently
spoken have, to the extent that there is genetic differentiation, played their part in more
recent times. Ease of movement over the landscape e.g. travel time on foot, is clearly a
better measure than raw distance. (Such an approach would be similar to that used on a
larger scale by Prugnolle et al. (2005). However at this time such measurements are not
166
available and further work will be required to generate them. Better definition of the
linguistic matrix may also improve the analyses somewhat. (The matrix used here has a
fair measure of uncertainty attached to it, not least because some distances have been
estimated based on approximate lexicostatistics derived from figures for
phylolinguistically similar languages while others were inferred by other crude means.)
3.4.4. No genetic evidence that the Efik Uwanse have an origin in ancient Palestine
Analysis of the NRY and mtDNA profiles of the Efik Uwanse and comparisons to other
groups showed little or no evidence of a Palestinian origin for the group. PCO plots of
genetic distance clearly showed the Efik to be genetically similar, at least with regard to
the sex-specific systems, to the Ibibio and possibly the Igbo, though it was difficult to
differentiate between the contributions of these two Nigerian groups. However the almost
complete lack of sharing of NRY and mtDNA types found in the Efik Uwanse with
possible founder and source populations from Palestine, and along the proposed route to
present day Calabar, argues against admixture with any of these populations. This is
consistent with the findings of the Hart report (Hart 1964). Given the homogeneity of the
region it is not clear that the data support an Ibibio origin but such an origin is likely
given that all oral traditions, even those that claim an original eastern origin, record that
in the past the Efik lived amongst the Ibibio.
3.5. Conclusion
This study demonstrates the value of having dense sampling strategies and DNA of
known and detailed provenance, when at all possible, in studies of the distribution of
human genetic diversity in sub-Saharan Africa. There has, unfortunately, been a tendency
in some studies to use a limited number of sample sets, often of small size and undeclared
origin and relationships. This study has utilised a large, sociologically well defined,
dataset of a total of 1113 males collected from the Cross River region of Nigeria, an area
of some 7,000km2. A recent Y-chromosome study of Guinea-Bissau by Rosa et al. (2007)
analysed 282 samples to represent different ethnic groups from across the country, which
has a total area of some 35,000km2. That study stated that its sample set “extends
significantly the Y-chromosomal coverage of West African populations … both in size
167
and number of surveyed ethnic groups”. Whilst the findings of this particular paper are
not disputed here, given the paucity of information regarding genetic variation at a fine-
scale across sub-Saharan Africa as whole there is no reason to believe that sample sizes of
the magnitude previously used are large and varied enough to permit genetic analysis to
make a significant contribution to answering the many complex questions likely to be
encountered in the course of unravelling demographic histories of specific African
ethnicities. Similarly, one must be careful about extrapolating to the rest of sub-Saharan
Africa or even to West Central Africa as a whole the conclusions drawn from this study
of the Cross River region. The Cross River region is in close proximity to the proposed
place of origin of the expansion of the Bantu-speaking peoples but is not part of it.
Therefore it may contain genetic characteristics that are atypical when viewed in a wider
geographical context.
In summation it has been shown that major cultural and language differences among
individuals and groups in West Central Africa can be maintained even in the presence of
substantial male and female mediated gene flow. Gene flow was inferred to be reduced as
geographic and linguistic distance among populations was increased, resulting in genetic
differentiation among neighbouring regions in sub-Saharan Africa. However it is likely
that much more complex processes are at work in these regions than are revealed by the
somewhat simplistic population genetic models used in this study. The value of well
defined datasets collected at a fine geographic scale as previously called for by
anthropologists and linguists (MacEachern 2000) working in Africa has been
demonstrated. Given the interesting similarities and differences observed among
culturally distinct groups living in close proximity revealed in this study, the undertaking
of further genetic surveys elsewhere in sub-Saharan Africa utilising in depth sampling
strategies and more advanced analysis should be encouraged.
168
3.6. Supplementary Section for Chapter 3
Because of their large size for Supplementary Tables 3S.1, 3S.2, 3S.3, 3S.4, 3S.5, 3.S6
and 3.S7 please see attached CD-ROM.
169
Chapter 4:
The Potentially Deleterious Functional
Variant FMO2*1 Is At High
Frequency Throughout Sub-Saharan
Africa
170
4. The potentially deleterious functional
variant FMO2*1 is at high frequency
throughout sub-Saharan Africa
4.1. Introduction
Flavin-containing Monoxygenases (FMOs, EC1.14.13.8) catalyze the NADPH-dependent
oxidative metabolism of a variety of foreign chemicals that contain, as their site of
oxidation, a soft nucleophilic heteroatom, such as nitrogen, phosphorus, sulphur or
selenium (Cashman 2000; Krueger & Williams 2005). Substrates include therapeutic
drugs, dietary-derived compounds and environmental pollutants.
Humans possess five functional FMO genes, designated FMO1 to FMO5 (Lawton et al.
1994; Phillips et al. 1995; Hernandez et al. 2004). All but the FMO5 gene are present
within a 220-kb cluster on chromosome 1q24.3 (Hernandez et al. 2004). FMO5 is located
~26Mb closer to the centromere at 1q21.1 (Hernandez et al. 2004). A sixth gene, FMO6,
present within the cluster, does not produce a correctly spliced mRNA and thus appears to
be a pseudogene (Hines et al. 2002). A second FMO gene cluster, containing five
pseudogenes, FMO7P to FMO11P, is located ~4Mb centromeric of the FMO gene cluster
(Hernandez et al. 2004).
4.1.1. Previous work on Flavin-containing Monoxygenase 2
In most mammals, including non-human primates, FMO2 is the major isoform expressed
in the lung (Phillips et al. 1995; Yueh, Krueger & Williams 1997; Dolphin et al. 1998;
Krueger et al. 2001; Janmohamed et al. 2004). A single-nucleotide polymorphism (SNP)
(g.23238C>T, dbSNP #rs6661174), in exon 9 that converts a glutamine codon at position
472 to a stop codon (Q472X), resulting in the production of a truncated polypeptide that
is functionally inactive (Dolphin et al. 1998) has been identified in humans. In
populations of European (n=79) and Asian (n=118) origin all individuals tested have been
171
found to be homozygous for this allele (FMO2*2A) (Dolphin et al. 1998; Whetstine et al.
2000). However, an allele, FMO2*1, that has previously been shown to encode a full-
length, functionally active protein (Dolphin et al. 1998; Krueger et al. 2002) has been
found in African-Americans (26%, n=180) (Dolphin et al. 1998; Whetstine et al. 2000;
Furnes et al. 2003) and Hispanics17
(2-7%, n=280 and 327) (Krueger et al. 2004).
Substrates of human FMO2 include thioether-containing organophosphate pesticides,
such as phorate and disulfoton (Henderson et al. 2004a). In this case, products of the
FMO2-catalyzed reaction are substantially less toxic than the parent compounds (Neal &
Halpert 1982) and thus the enzyme has a protective role. However, FMO2 has also been
shown to catalyze S-oxygenation of thiourea and some of its derivatives (Henderson et al.
2004b), producing sulfenic and/or sulfinic acid metabolites, which are more toxic than the
parent compound (Neal & Halpert 1982). Sulfenic acid derivatives of thioureas can
deplete glutathione, leading to oxidative stress (Krieter et al. 1984); they can also bind to
sulphydryl groups on proteins and thus may directly perturb cell function (Onderwater et
al. 1999). Thus, if exposed to thiourea or its derivatives, individuals who possess an
FMO2*1 allele are predicted to be at increased risk of pulmonary toxicity. With an
estimated global production of 10,000 tonnes (CICADA2003), thioureas are present in a
wide range of industrial, household and medical products and, consequently, exposure to
these chemicals is widespread.
FMOs are also involved in the metabolism of therapeutic drugs, including several that are
used to treat multidrug-resistant tuberculosis (Vannelli, Dykman & Ortiz de Montellano
2002; Fraaije et al. 2004; Qian & Ortiz de Montellano 2006), which is a major health
problem in Africa, with an estimated 544,000 deaths in 2005
[http://www.who.int/mediacentre/factsheets/fs104/en/]. There is evidence that at least one
of these drugs, ethionamide (ETA), is a substrate for human FMO2 (Krueger & Williams
2005), but it is not known whether metabolism of the drug by FMO2 will increase or
decrease its efficacy or toxicity.
17
According to the authors of the study „Hispanic‟ in this case referred to individuals of Mexican or Puerto
Rican descent.
172
4.1.2. The rationale for studying FMO2 in Africans
It has been shown previously that most African Americans have a significant European
contribution to their ancestry (~4 to ~30% (Reed 1969; Parra et al. 1998; Destro-Bisol et
al. 1999)) so it is likely that a functional FMO2 will be found at an even higher incidence
in sub-Saharan Africans than in African Americans. Since this may be important in regard
to drug efficacy and public safety the distribution of the FMO2*1 and FMO2*2A alleles
in multiple populations across Africa was assessed. Samples from the Middle East
(Turkey and Yemen) were also characterised to determine whether the FMO2*1 allele
was present at appreciable frequencies in populations outside but close to Africa.
In addition, the Long-Range Haplotype test (Sabeti et al. 2002b), which examines the
level of allele-specific haplotype linkage disequilibrium (LD), was used to analyse data
from the International HapMap project for evidence of positive selection at the
g.23238C>T SNP and sequence data were used from the NIEHS SNP program to a)
examine the haplotype backgrounds of the two g.23238C>T alleles and b) estimate the
time of origin of the FMO2*2A allele. This will help provide preliminary insights into the
evolutionary history and future of the FMO2 enzyme.
4.2. Materials and Methods
4.2.1. Sample Collection
DNA samples were prepared from buccal swabs from a sample of males over eighteen
years old unrelated at the paternal grandfather level from the following locations in and
around Africa: Algeria-Mostaganem (n=43), Algeria-Port Say (n=118), Cameroon-Mayo
Darle (n=119), Cameroon-Lake Chad (n=76), Ethiopia-Gambella (n=106), Ethiopia-
Addis Ababa (n=24), Ethiopia-Borena (and surrounding area) Wollo (n=36), Ethiopia-
Dessie (and surrounding area) Wollo (n=26), Ghana-Sandema (n=90), Ghana-Navrongo
(n=45), Malawi-Lilongwe (n=144), Malawi-Mangochi (n=60), Malawi-Mzuzu (n=56),
Morocco-Ifrane (n=70), Mozambique-Sena (n=84), Nigeria-Calabar (n=88), Senegal-
southern region (n=94), Senegal-Dakar (n=95), South Africa-Pretoria (n=41), Sudan-
northern region (n=136), Sudan-southern region (n=126), Tanzania-Kilimanjaro (n=50),
Turkey-East Anatolia (n=31), Turkey-West Anatolia (n=28), Uganda-Ssese Islands
173
(n=39), Yemen-Sena (n=34), Yemen-Hadramaut region (n=83), Zimbabwe-Mposi
(n=34). All samples were collected anonymously with informed consent. Sociological
data, including age, current residence, birthplace, self-declared cultural identity and
religion of the individual and of the individual‟s father, mother, paternal grandfather and
maternal grandmother were also collected. In addition, the African populations sampled
were grouped into four geographic regions (North Africa-NA, West Africa-WA, Central
East Africa-CEA, South East Africa-SEA), as delineated in Table 4.1. The two Anatolian
Turkish samples were considered to be from a single region (TU), as were the two
Yemeni samples (YE).
4.2.2. g.23238C>T typing
A 68-bp region containing the g.23238C>T SNP was amplified by PCR using the primers
FMO2-1414-UM (5ʹ-TGG CTG TGA GAC TCT ATT TCG GAC CCT GCA ACT CCG
A-3ʹ) and FMO2-1414-LM (5ʹ-CCA TTG CCC AGG CCC AAC CAG GCG ATA TT-
3ʹ). Each primer contained a single mismatch to its target sequence at the 3ʹ-end
penultimate nucleotide (underlined). The design of the primers was such that the
amplification product would contain recognition sites for the restriction endonucleases
MboI (GATC), if the target sequence contained a C at position 23238, and MseI (TTAA),
if the target sequence contained a T at position 23238.
DNA was amplified in 10-µl reaction volumes containing 0.4 µM of each primer, 0.13
units Taq DNA polymerase (HT Biotech, Cambridge, UK), 9.3 nM TaqStartTM
monoclonal antibody (BD Biosciences Clontech, Oxford, UK), 200 µM dNTPs and
reaction buffer supplied with the Taq polymerase. The cycling parameters were: 5 min of
pre-incubation at 93ºC, followed by 37 cycles of 93ºC for 1 min, 55ºC for 1 min and 72ºC
for 1 min.
The resultant PCR product was used for two independent, complementary restriction
endonuclease (RE) digestions that each targeted one of the two introduced RE sites (See
Figure 4.1) RE digestions were carried out in 10-µl volumes containing 4 µl of PCR
product, 0.7 units RE (MboI or MseI), BSA and reaction buffer according to the
supplier‟s recommendations (New England Biolabs, Hitchin, UK). All reactions were
incubated overnight at 37°C. After RE digestion DNA fragments were resolved by
electrophoresis through a 3.5% agarose gel. When full-length PCR product is digested
174
with MboI, FMO2*1 alleles are cleaved, resulting in two fragments of length 35bp and
33bp, respectively. When full-length PCR product is digested with MseI, FMO2*2A
alleles are cleaved, resulting in two fragments of length 38bp and 30bp, respectively. The
gel-banding patterns observed for the two assays and associated genotypes are shown in
Figure 4.2. Samples where the genotype had already been determined by a previous
laboratory (Professor Ian Phillip, School of Biological and Chemical Sciences, Queen Mary,
University of London) using alternate methodologies (sequencing and allele-specific PCR)
were used to test the assay described above.
Figure 4.1: Diagrammatic representation of 23238C>T SNP restriction
enzyme assay.
175
Figure 4.2: #rs6661174 Mbo/MseI Complementary Restriction Enzyme
Digest Banding Patterns.
4.2.3. Statistical and Population Genetic Analysis
Tests for departure of observed genotype frequencies from those expected under Hardy-
Weinberg equilibrium (Guo & Thompson 1992) were performed using Arlequin software
(Schneider, Roessli & Excoffier 2000). Pairwise FST values were estimated from
AMOVA ST values (Reynolds, Weir & Cockerham 1983).
Logistic Regression analysis was performed to evaluate the differences in the FMO2*1
allele frequency among subgroups within regions and among regions in which the
subgroups had similar allele frequencies. This was undertaken by first testing for fit of the
subgroup frequencies to a model which allowed only for regional differences in the
FMO2 allele frequencies. Pearson chi square tests were subsequently performed to test
for overall heterogeneity within individual regions. If significant heterogeneity was found
176
in a region, further pairwise comparisons of the subgroups within the region were made
by Fisher Exact tests. For logistic regression analysis and post hoc region and subgroup
comparisons, individuals were categorised into two groups on the basis of whether or not
(Y =0,1) they possessed at least one FMO2*1 allele (in this way the sample size equalled
the number of individuals studied, n, rather than the number of chromosomes, 2n, thus
ensuring that the observations were truly independent). This analysis assumes that
individuals in populations are unrelated.
Principal coordinates analysis (PCO) was performed, using GENSTAT5 software, on
pairwise similarity matrices. Here similarity was quantified as being equal to the value of
the genetic distance subtracted from 1.0 (1-FST). Values along the main diagonal,
representing the similarity of each population sample to itself, were calculated from the
estimated genetic distance between two copies of the same sample. For AMOVA-based
FST distances, the resulting similarity of a sample to itself simplifies to n/(n–1).
A Mantel test for the correlation between a matrix of pairwise FST values and a
corresponding matrix of pairwise geographic distances was performed within the R-
programming environment, using routines found in the APE package.
Spatial autocorrelation analysis was performed using AIDA software where ten distance
classes were set, each of which produce the greatest similarity in the number of pairwise
comparisons of chromosomes within that class, and II and cc indices (Bertorelle &
Barbujani 1995) were calculated (analogous to Moran's I and Geary's c, respectively)
within each class. Graphs were plotted using Microsoft Excel.
Genetic boundary analysis was performed as described by Barbujani et al. (1989). Surface
interpolation was performed on observed FMO2*1 allele frequencies using SURFACE,
part of the Generic Mapping Tools package (Wessel & Smith 1998), to create a grid
(surface) of estimated allele frequencies every 0.5° latitude and 0.5° longitude over a
region covering Central East Africa (Latitude: 6-17° N, Longitude: 27-41° E). A vector
consisting of the measures Average Value of Absolute Magnitude (AVMA) and Average
Direction (AD) was calculated from the centre of all 0.5° longitude by 0.5° latitude
regions (termed pixels) across the entire surface. Major genetic boundaries were found
using „criterion 2‟ of Barbujani et al. (1989) The highest decile described by Barbujani et
177
al. (1989) was replaced with the highest 5% of AVMAs and the second highest decile was
replaced with the next highest 5% of AVMAs. This analysis was performed using
software developed at TCGA (Python code available on request from Krishna Veeramah).
A contour map (tension factor = 0.25) of estimated FMO2*1 allele frequencies for this
region was created using Generic Mapping Tools software (Wessel & Smith 1998) and
proposed boundaries plotted onto this map.
4.2.3.1. The Long-Range Haplotype Test
The Long-Range Haplotype (Sabeti et al. 2002b) test involves calculating the EHH
statistic at some pre-defined distance from a core region.
When the core region is a binary SNP, the EHH for an allele x at the core SNP is the
probability that two randomly chosen samples from a population of individuals with x
have the same SNP-based haplotype extending from the x to a SNP at some pre-defined
distance. EHH therefore is a measure of haplotype conservation or linkage disequilibrium
from the allele x and is on a scale of 0-1.
The Long-Range Haplotype test was performed in this chapter using two different
International HapMap Project dataset releases [http://www.hapmap.org] from four
different populations.
HapMap Phase I encompasses the following: the #16c.1 YRI build, consisting of
1,076,451 SNPs genotyped in 30 parent-offspring trios from the Yoruba in Ibadan, the
rel#16c.1 CEU build, consisting of 1,105,072 SNPs genotyped in 30 parent-offspring
trios from the Centre d'Etude du Polymorphisme Humain (CEPH-Utah residents with
ancestry from northern and western Europe) panel, the rel#16c.1 CHB build, consisting of
1,088,689 SNPs genotyped in 45 unrelated Han Chinese from Beijing, China and the
rel#16c.1 JPT build, consisting of 1,088,426 SNPs genotyped in 45 unrelated Japanese in
Tokyo, Japan. In each dataset approximately 1 SNP is genotyped every 5kb across the
human genome.
HapMap Phase II encompasses the following: the rel#21 YRI build, consisting of
3,241,616 SNPs genotyped in 30 parent-offspring trios from the Yoruba in Ibadan,
Nigeria, the rel#21 CEU build, consisting of 1,105,072 SNPs genotyped in 30 parent-
178
offspring trios from the CEPH panel and the rel#21 CHB+JPT build, consisting of
3,305,784 SNPs genotyped in the 45 CHB+JPT panel. Because of their high genetic
similarity it is accepted practice to pool the CHB and JPT datasets. In each dataset
approximately 1 SNP is genotyped every 2kb across the human genome.
Haplotype phase inference for these data was performed by the HapMap consortium
using Phase 2.0 software. Recombination rate data are based on averaged recombination
rates across all four HapMap populations (Hudson 2001; Myers et al. 2005): YRI, CEU,
CHB and JPT.
The iHS method of Voight et al. (2006) was applied to the g.23238C>T SNP in the
HapMap Phase I YRI dataset and to the FMO2 gene in the HapMap Phase I YRI, CEU
datasets and a pooled JPT+CHB dataset, using the web-based tool Haplotter
[http://pritch.bsd.uchicago.edu/data.html].
A similar Long-Range Haplotype test method for detecting selection using HapMap
Phase II data was developed specifically during this study. This method is described
below.
4.2.3.1.1. Calculating EHH at the 23238C>T locus
SNP haplotypes were extracted from HapMap dataset for each of the three populations
(either the 60 unrelated YRI parents, the 60 unrelated CEU parents or all 90 unrelated
CHB+JPT individuals) over a region extending 2.0cM either side of the core SNP
(23238C>T). To enable a more direct comparison with EHH values generated for other
core SNPs, the overall haplotype SNP density was controlled at approximately one SNP
every 0.05cM. Individual haplotypes were then placed into one of two groups based on
which allele they possessed at the core SNP (or just one group if the SNP was
monomorphic, as is the case for 23238C>T in the CEU and CHB+JPT populations), and
EHH values calculated for both alleles (if applicable) at twenty pre-defined genetic
distances (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8,
1.9 and 2.0cM) either side of the core SNP, as described by Sabeti et al. (2002b) and
Mueller and Andreoli (2004).
179
4.2.3.1.2. Creating an empirical null distribution using genetic and physical distances
To test whether any of the EHH values generated from analysis of 23238C>T indicated
positive selection in the YRI dataset, 10,000 SNPs of similar homozygosity (See Table
1.10.1 of Cavalli-Sforza, Menozzi & Piazza 1994) (±2.5%) to 23238C>T, and with a
minimum haplotype density of one SNP every 0.05cM for up to 2cM either side of the
core SNP, were randomly chosen from across the genome (X and Y-chromosomes
excluded) from within the same HapMap population dataset. EHH values were then
calculated for these SNPs, as described for the 23238C>T SNP above. This resulted in
two empirical distributions of 10,000 EHH values at each of the twenty pre-defined
genetic distances, one for lower frequency alleles and one for higher frequency alleles.
When testing for significance of EHH values for the only allele present in the CEU and
CHB+JPT samples, FMO2*2A, all random core SNPs were required to have a
homozygosity value of one. Because in these populations the SNPs were monomorphic
there was only one empirically observed distribution at each genetic distance.
EHH values for the 23238C>T alleles could then be compared with the relevant
distribution, via a one-tail test, to establish whether the level of LD extending over a
particular distance from the core SNP was unusual in comparison with other alleles of
similar frequency found across the genome. EHH values were considered outliers if they
lay in the upper 5% tail of the relevant distribution. All cM genetic distances were
estimated using the recombination rates determined by HapMap. This results in
evaluation of EHH values for each allele at 20 predefined genetic distances. If positive
selection had taken place EHH values within the upper 5% tail of the relevant distribution
would be expected over a relatively continuous region (i.e. a number of consecutive cM
intervals, significance of which could be assessed via a Runs Test (Sokal & Rohlf 1994)).
The LRH test was repeated using physical rather than genetic distances. EHH was
estimated at 20 pre-defined physical distances (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9,
1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9 and 2.0Mb) either side of the core SNP and a
haplotype SNP density of approximately one SNP every 10kb. The null distribution was
based on only 1,000 data points, with the intention that this could be increased to 10,000
if a particular run showed indications of selection. The rationale for repeating this
analysis using physical distances was that both the estimated genetic distances and EHH
values from core SNPs are derived from the same HapMap data. If genetic distances are
180
utilised and the SNP is biallelic, the recombination rate is estimated from the SNP and
will be an average for both allele backgrounds. In this case, extreme EHH outliers from
the average rate can be detected and are signatures of selection (though this analysis may
be considered conservative). However, if the SNP being tested is monomorphic the
recombination rate estimated from this point will be directly correlated to the single EHH
value estimated and so no potential selection will be detected. By using physical
distances, EHH acts a proxy for LD and allows for the identification of possible positive
selection if the recombination rate for a region extending from a particular locus for a pre-
defined distance is exceptionally low. However, this method is not ideal as it will not
control for local variation in recombination rates (e.g., recombination hotspots), which is
why the use of genetic distances is preferable when applicable (non-monomorphic data).
Interestingly, comparison of the Illumina pedigree-based genetic map
[http://www.illumina.com/pages.ilmn?ID=191] with the HapMap recombination map, in
the region in which the g.23238C>T (rs6661174) SNP (which is not included in the
Illumina map) lies, showed there to be a greater genetic distance between two SNPs that
are included in both maps (rs913257 and rs7877) in the Illumina map (1.26cM) than in
the HapMap map (0.99cM). On the other hand, the distance between the first (rs884080)
and last (rs2027432) SNPs in the Illumina map for chromosome 1 is shorter (273.14cM)
than in the HapMap map (280.391cM). This raises the possibility that the FMO2 gene in
HapMap has a lower than expected LD in comparison with the rest of the chromosome,
resulting in extreme EHH values for a particular allele at the 23238C>T locus being even
stronger evidence of selection. This analysis was performed using software designed at
TCGA (Python code available on request from Krishna Veeramah).
In comparison to the method of Voight et al. (2006), the approach described above has
the advantage of controlling for haplotype SNP density, which may severely impact
actual EHH estimates (see Figure 4.3 for an example of two distributions using different
haplotype densities), whereas the Voight et al. (2006) approach is likely to be a more
sensitive test because the level of haplotype homozygosity surrounding a SNP is more
finely described by the standardised iHS statistic. However, the underlying principle of
looking for unusually high levels of haplotype homozygosity in large sets of empirical
distributions is similar and the method presented here effectively complements that of
Voight et al. (2006).
181
4.2.3.2. Estimating the age of the g.23238C>T mutation
Individuals of various ethnicities, including a subset of HapMap samples, have been
sequenced for all exons of environmental response genes, including FMO2, as part of the
National Institute of Environmental Health Sciences (NIEHS) SNP program (NIEHS
SNPs. NIEHS Environmental Genome Project, University of Washington, Seattle, WA
[http://egp.gs.washington.edu]) [ (June, 2007)].). FMO2 sequencing data from the
NIEHS SNP program were utilised to a) examine the distribution of genetic variants in
individuals of recent African descent as well as b) to estimate the time when the
g.23238C>T mutation occurred.
All FMO2 exon variants for all 95 NIEHS Panel 2 samples were extracted and haplotypes
were inferred using PHASE version 2.1 (Stephens, Smith & Donnelly 2001; Stephens &
Donnelly 2003). Haplotypes for a subset of NIEHS variants (those variants with a
relatively large minor allele frequency with few or no undetermined genotypes) had
already been inferred by the NIEHS and this information was utilised when inferring
phase of the remaining variants. The pairs of haplotypes of the 12 Yoruba and 15 African-
American NIEHS samples were then examined for the presence and distribution of
synonymous mutations, non-synonymous mutations, insertions and deletions.
To estimate the time of origin of the g.23238C>T SNP all biallelic SNPs, insertions and
deletions found within FMO2-defined exons, introns and untranslated regions by the
NIEHS SNP program for all 95 NIEHS Panel 2 samples were extracted and haplotypes
inferred using fastPhase version 1.2 (Scheet & Stephens 2006) (Phase was not used on
this occasion because of the large number of variants present. With regard to the
haplotype inference for only the coding region variants that was performed using Phase
2.1, only phased haplotypes of two individuals were found to differ in comparison to the
fastPHASE 2.1 output. One individual differed by a single variant that was only found
once in the entire dataset, and therefore is irresolvable by either method, and the other
individual differed at only two SNPs, but was homozygous T for the g.23238C>T SNP,
therefore making the impact on any further analysis minimal). Because the g.23238C
allele was present only in Yoruba and African-American NIEHS Panel 2 samples and the
African-American samples are of uncertain demographic origin, only the NIEHS Yoruba
samples were used in subsequent analysis.
182
Figure 4.3: Distribution of 10,000 EHH values calculated 0.4cM from core alleles with A) SNP haplotype density not
controlled and B) SNP haplotype density controlled at 1 SNP per 0.05cM (8 SNP extended haplotypes).
183
Within the SNAP workbench (Price & Carbone 2005; Aylor, Price & Carbone 2006),
RecPars software (Hein 1990) showed that recombination was likely to have occurred a
number of times within the sequences found in the Yoruba samples. Therefore the
coalescent-based approach of Griffiths and Marjoram (1996) (using the program
recomb58) was applied, which takes recombination into account when calculating
maximum likelihood estimates of mutation rate and recombination rate as well as
estimating the time to the most recent common ancestor and the time since the origin of
every variant nucleotide within a collection of observed sequences. However, the
approach of Griffiths and Marjoram (1996) will fail with very large numbers of
recombination events. Therefore LDhat version 2.0 was used to identify a region of
sequence surrounding the g.23238C>T SNP in the Yoruba that had sufficiently low
numbers of recombination events to enable recomb58 to be used. This was done by
iteratively removing 500bp of sequence from the 5ʹ end of the gene and running LDhat to
estimate the recombination rate (ρ) (the number of recombination events per gene
(sequence) per generation) (sequence from the 3ʹ was not removed because the
g.23238C>T SNP is located close to this end of the gene). A region 8500bp in length was
eventually found (extending 6280bp upstream and 2219bp downstream of g.23238C>T
respectively), which contained 38 segregating sites (inclusive of g.23238C>T), with a ρ
value of approximately 2, which was deemed reasonable to be used in recom58. LDhat
also gave Watterson‟s estimate of θ (Watterson 1975) (mutation rate – the number of
mutations per gene (sequence) per generation) for this region, of approximately 10. The
ancestral state of each segregating site was found by comparison with the chimpanzee and
macaque FMO2 genome assembly sequences found within Ensembl Genome Database.
Recom58 was run on observed data (derived from the sequence of the 8500bp region of
FMO2 that contained g.23238C>T, from 24 phased NIEHS Yoruba chromosomes) for
5,000,000 iterations, with initial generating parameters of θ = 10 and ρ=2, over a
likelihood surface for θ ranging from 0.25 to 20.00, with 0.25 increments, and for ρ
ranging from 0.1 to 4.0, with 0.1 increments. From this first run, maximum-likelihood
estimates of θ and ρ were 10.50 and 0.80, respectively. Recom58 was then rerun on the
same observed data using these new estimated parameter values of θ and ρ (again running
for 5,000,000 iterations over the same likelihood surface range), to investigate
characteristics of the ancestral distributions, including the time of origin of the
g.23238C>T SNP. The values of θ and ρ estimated on this occasion were 11.25 and 0.70,
184
respectively, similar to the previous run. The mean of the expected time since mutation
for the g.23238C>T SNP from this run, in coalescent units (T), was 1.1962 (standard
deviation = 0.2127). T can be converted into time t, in years, using the expression t = 2 *
Ne * T *(generation time in years), where Ne is the effective population size. An Ne figure
of 7500 was used. This value was estimated using linkage disequilibrium data from the
Yoruba HapMap samples (Tenesa et al. 2007). Given T, Ne and an inter-generation time
for humans of 28 years (Fenner 2005), an estimated time since mutation for g.23238C>T
in years (t) can be calculated.
Lower and upper boundaries were also calculated for the estimated age of the mutation. A
gamma distribution (α = 31.616, θ = 0.037) was estimated from the mean and standard
deviation of the coalescent time estimate from recomb58 in order to calculate 95%
confidence intervals (2.5% = 0.816, 97.5% = 1.648) in the R programming environment.
Upper (8751) and lower (4889) estimates of Ne were taken from maximum and minimum
values of Ne calculated for the Yoruba HapMap sample for each individual chromosome
(disregarding the acrocentric chromosomes 21 and 22 and X, which all demonstrate
substantially different Ne estimates to the other chromosomes and thus may be behaving
somewhat abnormally), adjusted here to take into account the 18% underestimate from
HapMap data due to ascertainment bias (Tenesa et al. 2007), and upper and lower
generation times were taken as 19.4 years (the mean female age of first birth in hunter-
gatherer societies) and 36.1 (the mean female age of last birth in Nation states) (Fenner
2005). Given T and the above values of Ne and inter-generation time for humans, a lower
and upper boundary for the time since mutation for g.23238C>T in years (t) can be
calculated.
4.3. Results
4.3.1. The distribution of 23238C>T in Africa
The g.23238C>T allele frequencies and geographic locations for populations typed in this
study are shown in Table 4.1 and Figure 4.4. The overall FMO2*1 allele frequency for all
samples from Africa (n=1800) was 0.153, with 28.3% of individuals having at least one
FMO2*1 allele. Across all 24 populations in Africa the observed percentage of
185
individuals who have at least one FMO2*1 allele ranged from 4.3-49.1. For samples from
sub-Saharan Africa (n=1569) the overall FMO2*1 allele frequency was 0.170, with
31.4% of individuals having at least one FMO2*1 allele and across these 21 populations
the observed range of frequencies of FMO2*1-carrying individuals was 17.8-49.1%. The
Yemen sample (n=117) had an overall FMO2*1 allele frequency of 0.047, with 8.5% of
individuals having at least one FMO2*1 allele. The FMO2*1 allele was not observed in
the Anatolian Turkish sample (n=59). No population deviated significantly from Hardy-
Weinberg equilibrium (P>0.12).
Using Logistic Regression on the proportion of individuals with at least one FMO2*1
allele, significant differences were found both among regions (P<0.0001, df=5) and
among populations within regions (P<0.04, df=23). The major factor contributing to
among-region differences is likely to be the noticeably lower FMO2*1 frequencies
observed in non-sub-Saharan African populations in comparison to sub-Saharan African
populations. Pearson‟s Chi Square tests were performed to explore within-region
differences (see Table 4.2). The only statistically heterogeneous region was Central East
Africa (CEA) (P<0.003, df=5). Exclusion of CEA from the Logistic Regression analysis
resulted in no significant differences (P=0.25, df 15) among populations within the
remaining regions.
In order to make pairwise comparisons of regions using Fisher‟s Exact test, populations
within each region were pooled, except in the case of CEA, which had previously been
identified as having statistically significant heterogeneity and therefore was excluded
from this analysis. From these pairwise comparisons (see Table 4.3) the following
arrangement of regions based on frequencies of individuals with at least one FMO2*1
allele could be discerned:
TU<(YE=NA)<(SEA=WA)
Further examination of populations in CEA, with pairwise Fisher‟s Exact tests (see Table
4.4), showed the populations in this region to be roughly split into two main groups, one
consisting of north Sudan and the four Amharic populations and the other consisting of
the Anuak and south Sudan. A PCO plot of pairwise FST values for all populations in
Africa (Figure 4.5) showed the Anuak of Gambella and south Sudan to be genetically
close to each other with respect to the g.23238C>T SNP, probably due to them both
186
Table 4.1: 23238C>T Genotype and Allele frequencies.
Regio
n
Country FMO2 genotype
frequency
n FMO2*1
frequency
FMO2*2A
frequency
At least one
FMO2*1
allele Latitu
de
Lon
gitu
de
Country Location
Cultural Identity *1/
*1
*1/
*2A
*2A/
*2A
West A
fric
a (
WA
)
Cameroon
Mayo Darle Various 2 39 78 119 0.181 0.819 34.45% 6.54 11.45
Lake Chad Various 2 25 49 76 0.191 0.809 35.53% 12.28 14.75
Ghanaian
Sandema Bulsa 3 21 66 90 0.15 0.85 26.67% 10.73 -1.28
Navrongo Kasena 1 7 37 45 0.1 0.9 17.78% 10.88 -1.09
Nigeria
Calabar Igbo 0 22 66 88 0.125 0.875 25.00% 4.96 8.31
Senegal
South Manj 3 25 66 94 0.165 0.835 29.79% 12.99 -15.88
Dakar Wolof 1 24 70 95 0.137 0.863 26.32% 14.69 -17.45
Centr
al E
ast A
fric
a (
CE
A) Ethiopian
Gambella Anuak 5 47 54 106 0.269 0.731 49.06% 8.25 34.58
Addis Ababa Amharic 1 7 16 24 0.188 0.813 33.33% 9.01 38.85
Borena, Wollo Amharic 0 7 29 36 0.097 0.903 19.44% 10.75 38.77
Dessie, Wollo Amharic 1 6 19 26 0.154 0.846 26.92% 11.23 39.53
Sudan
North Various 2 37 97 136 0.151 0.849 28.68% 15.21 33.04
South Various 10 43 73 126 0.25 0.75 42.06% 10.85 29.77
South
East A
fric
a (
SE
A)
Malawi
Lilongwe Various 4 35 105 144 0.149 0.851 27.08% -13.98 33.77
Mangochi Various 1 16 43 60 0.15 0.85 28.33% -14.47 35.27
Mzuzu Various 0 17 39 56 0.152 0.848 30.36% -11.47 34.02
Mozambique
Sena Sena 2 22 60 84 0.155 0.845 28.57% -17.44 35.027
South Africa
Pretoria Bantu 2 15 24 41 0.232 0.768 41.46% -25.75 28.30
Tanzania
Kilimanjaro Chagga 2 18 30 50 0.22 0.78 40.00% -5.38 38.05
Uganda
Ssese Bantu 0 10 29 39 0.128 0.872 25.64% -0.45 32.56
Zimbabwe
Mposi Shona 0 7 27 34 0.103 0.309 20.59% -17.31 31.328
Nort
h A
fric
a (
NA
)
Algeria
Mostaganem Unspecified 0 5 38 43 0.058 0.942 11.63% 35.94 0.09
Port Say Unspecified 0 10 108 118 0.042 0.958 8.47% 35.08 -2.18
Morocoo
Ifrane Berbers 0 3 67 70 0.021 0.979 4.29% 33.59 -5.17
Tu
rkey
(TU
) Turkey
East Anatolia Anatolian Turks 0 0 31 31 0 1 0.00% 40.28 33.25
West Anatolia Anatolian Turks 0 0 28 28 0 1 0.00% 39.68 31.21
Yem
en
(YE
) Yemen
Sena Unspecified 0 4 30 34 0.059 0.941 11.76% 15.41 44.24
Hadramaut Unspecified 1 5 77 83 0.042 0.958 7.23% 16.81 49.94
Total 43 477 1456 1976 0.142 0.858 26.32%
187
Figure 4.4: Map showing the percentage of individuals with at least one
FMO2*1 allele in Africa and two nearby countries.
188
Table 4.2: Pearson's Chi Square Test on individual regions.
Region df Chi square P-value
CEA 5 18.12 0.003*
NA 2 2.15 0.34
SEA 7 7.51 0.37
WA 6 7.34 0.29
Yemen 1 0.64 0.43
NOTE.- df = degrees of freedom. * indicates P-value is less than 0.05.
Table 4.3: Fisher's Exact tests between regions.
WA SEA NA TU
SEA 0.7915 \ \ \
NA 0.0001* 0.0001* \ \
TU 0.0001* 0.0001* 0.0294* \
YE 0.0001* 0.0001* 0.8361 0.0321
NOTE.- * indicates P-value is less than 0.05.
Table 4.4: Fisher's Exact tests between CEA populations.
Gambella Addis
Ababa
Borena,
Wollo
Dessie,
Wollo
Sudan-
North
Addis
Ababa 0.1812
Borena,
Wollo 0.0018* 0.2418
Dessie,
Wollo 0.0492* 0.7598 0.5475
Sudan-
North 0.0013* 0.6340 0.2982 1.0000
Sudan-
South 0.2933 0.5008 0.0180* 0.1883 0.0277*
NOTE.- * indicates P-value is less than 0.05.
possessing slightly elevated FMO2*1 frequencies in comparison with the other African
populations surveyed here. Addis Ababa appears to be somewhat separated from all
populations, but this may be a stochastic effect due to its low sample size. A Pearson‟s
Chi Squared test comparing the frequencies of individuals with at least one FMO2*1
allele in all populations in sub-Saharan Africa was significant (P<0.003), but removing
only the Anuak and south Sudan populations resulted in non-significance (P=0.526),
189
Figure 4.5: PCO plot of 23238 C>T-based population FST values.
190
emphasising that these two populations are outliers from the overall allele distribution
observed across sub-Saharan Africa. Genetic Boundary analysis on this region also
revealed that, despite their geographical proximity, the Anuak in Gambella are separated
from other Ethiopian groups by a sharp allele frequency gradient (Figure 4.6).
Figure 4.6: Contour map based on FMO2*1 allele frequencies in Central
East African populations with areas of rapid allele frequency change shown
with blue circles.
A significant correlation between matrices of pairwise genetic distances (FST) and
geographic distances (km) was found using the Mantel test when all populations typed in
this study (P<0.001) and only African populations (P< 0.003) were considered, but not
when only sub-Saharan African populations were analysed (P=0.741). In addition,
191
autocorrelation indices II and cc for sub-Saharan African populations showed no apparent
correlation with geographic distance (see Figure 4.7), confirming the generally similar
distribution of g.23238C>T alleles across sub-Saharan Africa.
When samples were grouped by self-declared ethnic identity (they were included as a
separate group if there were 15 samples or more with the same self-declared ethnic
identity (see Table 4.5)), no significant differences were found between the same ethnic
group living in multiple locations (Fisher‟s Exact, P>0.24) (see Table 4.6), for example,
the Amharic speakers who were sampled in three locations (Pearson‟s Chi Square,
P=0.47 df = 2), or among different ethnic groups collected at the same location (Fisher‟s
Exact, P >0.09).
4.3.2. Examining FMO2 for evidence of Natural Selection
Typing of the g.23238C>T SNP and many neighbouring SNPs by the International
HapMap project allowed the investigation, using the Long-Range Haplotype test (Sabeti
et al. 2002b), of whether a signal suggestive of positive selection of either allele at this
locus could be detected. The FMO2*1 allele frequency in the YRI dataset is 0.175, which
is similar to that observed in sub-Saharan Africa. In contrast, FMO2*1 was absent in the
CEU and CHB+JPT datasets, consistent with previous studies (Dolphin et al. 1998;
Whetstine et al. 2000).
The method of Voight et al. (2006), which uses a derivative of the EHH statistic,
standardised iHS, allows direct comparisons of SNPs of different frequencies and
provides a measure of haplotype conservation around the target SNP in comparison to the
rest of the genome. The web-based tool Haplotter, which applies the method of Voight et
al. (2006) on HapMap Phase I data, was used to look for evidence of recent positive
selection at the g.23238C>T locus in the YRI dataset.
The standardised iHS for this locus is 0.992, a value which lies in the 84th
percentile on a
standard normal curve. This indicates that the increased level of haplotype homozygosity
on the derived T allele (as iHS is positive) is not significantly different (P>0.05, two
tailed test) from that expected from the genome as a whole and therefore provides no
evidence of recent positive selection for either allele.
192
Figure 4.7: Spatial Autocorrelation Analysis of 23238C>T allele frequency
data using (A) Moran’s II and (B) Geary’s cc.
193
Table 4.5: Table of ethnic identities found in the various populations
examined in this chapter.
Country Location Ethnic Group (Self
Declared) CC CT TT n
Cameroon
Mayo Darle
Fulbe 0 19 33 52
Haousa 0 8 15 23
Mambila 0 6 13 19
Other 2 6 17 25
Total 2 39 78 119
Cameroon
Lake Chad
Kotoko 1 12 24 37
Other 1 13 25 39
Total 2 25 49 76
Ghana Sandema Bulsa 3 21 66 90
Ghana Navrongo Kasena 1 7 37 45
Nigeria Calabar Igbo 0 22 66 88
Senegal South Manj 3 25 66 94
Senegal Dakar Wolof 1 24 70 95
Ethiopia Gambella Anuak 5 47 54 106
Ethiopia Addis Ababa Amharic 1 7 16 24
Ethiopia Borena, Wollo Amharic 0 7 29 36
Ethiopia Dessie, Wollo Amharic 1 6 19 26
Sudan
North
Ga'ali 1 10 33 44
Shaigi 0 4 14 18
Other 1 23 50 74
Total 2 37 97 136
Sudan
South
Dinka 1 13 28 42
Nuer 0 2 14 16
Other 9 28 31 68
Total 10 43 73 126
Malawi
Mangochi
Yao 0 7 20 27
Chewa 1 5 9 15
Other 0 4 14 18
Total 1 16 43 60
Malawi
Mzuzu
Tumbuka 0 13 29 42
Other 0 4 10 14
Total 0 17 39 56
Malawi
Lilongwe
Chewa 1 17 51 69
Yao 1 7 20 28
Tumbuka 1 3 14 18
Other 1 8 20 29
Total 4 35 105 144
Mozambique
Sena
Sena 1 2 16 19
Tembo 0 8 11 19
Other 1 12 33 46
Total 2 22 60 84
Tanzania Kilimanjaro Chagga 2 18 30 50
Uganda Ssese Bantu 0 10 29 39
Zimbabwe Mposi Shona 0 7 27 34
South Africa Pretoria Bantu 2 15 24 41
Algeria
Mostaganem
Undeclared 0 5 35 40
Other 0 0 3 3
Total 0 5 38 43
Algeria
Port Say
Undeclared 0 5 77 82
Other 0 5 31 36
Total 0 10 108 118
Morocco Ifrane Berbers 0 3 67 70
Yemen Yemen-Sena Other 0 4 30 34
Yemen Yemen-Hadramaut Other 1 5 77 83
Turkey East Anatolia Anatolian Turks 0 0 31 31
Turkey West Anatolia Anatolian Turks 0 0 28 28
Note.- Declared identities below 15 per individual label and undeclared identities at
each location have been grouped under the term 'Other'.
194
Table 4.6: Various Population Pairwise Fisher’s Exact Tests.
Same geographic location
Same ethnic group
Same country Cameroon
Mayo Darle Lake Chad
Fulbe Haousa Mambila
Mayo Darle Haousa 1.0000 \
Mambila 0.7842 1.0000 \
Lake Chad Kotoko 1.0000 1.0000 1.0000
Ghana
Kasena
Bulsa 0.2895
Senegal
Wolof
Manj 0.6297
Ethiopia
Addis Ababa Borena, Wollo
Amharic Amharic
Borena, Wollo Amharic 0.2418 \
Dessie, Wollo Amharic 0.7598 0.5475
Sudan
South Sudan North Sudan
Dinka Nuer Ga'ali
South Sudan Nuer 0.1884 \
North Sudan Ga'ali 0.4786 0.4814 \
Shaigi 0.5417 0.6602 1.0000
Malawi
Mangochi Mzuzu Lilonghgwe
Yao Tumbuka Chewa Yao
Mzuzu Tumbuka 0.7877 \
Lilonghgwe
Chewa 1.0000 0.6640 \
Yao 1.0000 1.0000 0.8047 \
Tumbuka 1.0000 0.5501 1.0000 0.7393
Mozambique
Tembo
Sena 0.0902
195
This particular analysis was not possible using the CEU or JPT+CHB datasets because
the g.23238C>T SNP is monomorphic in these populations. However, examination of the
whole FMO2 gene, which involves examining the proportion of SNPs in the gene that
have extreme iHS values in comparison to other genes (see Voight et al. (2006)), again
using Haplotter, showed no evidence of selection in any population (P-values: CEU=
0.351631, YRI= 0.999955, JPT+CHB= 0.99954).
An alternative method of the Long-Range Haplotype test, developed as a part of this
study (see Methods and Materials), which, unlike the method of Voight et al. (2006),
controlled for haplotype SNP density, which may heavily influence estimated EHH
values, and used HapMap Phase II data, was also used to complement the analysis
described above. Using this method no EHH values for either allele in the YRI dataset or
for the FMO2*2A allele in the CEU and CHB-JPT datasets (the FMO2*1 allele could not
be evaluated because it was not present at all in these two datasets) were found that were
significantly different from their corresponding null distributions, using either genetic or
physical distances from the core SNP (Table 4.7), except at 0.2cM upstream of the
FMO2*2A allele in the CHB-JPT dataset. This elevated EHH value could be regarded as
a stochastic effect because (a) there is only significance at one point, therefore not
reaching the criterion of elevated EHH values over a continuous region and (b) it was
detected using a genetic rather than a physical map, which, as discussed in the methods
section, is unreliable for monomorphic data. Therefore, consistent with results using the
method of Voight et al. (2006), no evidence was found that suggests that either the
FMO2*1 allele or the FMO2*2A allele have been favoured by recent positive selection
inside or outside of Africa using the alternative Long-Range-Haplotype method
developed in this thesis.
4.3.3. Analysis of NIEHS FMO2 re-sequencing data
The NIEHS SNP program identified, from whole gene sequencing, 19 FMO2 coding-
region variants among the Panel 2 samples (see Table 4.8), 14 of which were previously
reported (Furnes et al. 2003) in African Americans. Four mutations were synonymous,
nine were non-synonymous, one was found in the 3ʹ untranslated region, two were
insertions (one of which was found in the 3ʹ untranslated region), one was a deletion and
two were premature stop codons (including 23238C>T).
196
Table 4.7: P-valuesa for EHH values calculated at various genetic (a) and
physical distances (b) from alleles present at the 23238C>T locus in the
upstream (-) and downstream (+) directions in the YRI, CEU CHB+JPT
datasets with a SNP haplotype density of 0.05cM per SNP in (a) and 10kb
per SNP in (b).
EHH EHH
(a) Distance from core
SNP (cMs)
YRI CEU CHB+
JPT
(b) Distance from core
SNP (Mbs)
YRI CEU CHB+
JPT
FMO2
*1
FMO2
*2A
FMO2
*2A
FMO2
*2A
FMO2
*1
FMO2
*2A
FMO2
*2A
FMO2
*2A
-2 0.202 1.000 0.566 0.763 -2 0.157 1.000 0.100 1.000
-1.9 0.221 1.000 0.658 0.83 -1.9 0.517 1.000 0.564 1.000
-1.8 0.243 0.902 0.756 0.74 -1.8 0.538 1.000 0.578 1.000
-1.7 0.275 0.929 0.837 0.819 -1.7 0.553 1.000 0.594 1.000
-1.6 0.315 0.952 0.890 0.839 -1.6 0.571 0.853 0.614 1.000
-1.5 0.364 0.789 0.930 0.832 -1.5 0.592 0.713 0.627 1.000
-1.4 0.169 0.871 0.820 0.826 -1.4 0.608 0.740 0.646 0.616
-1.3 0.227 0.931 0.870 0.909 -1.3 0.626 0.757 0.669 0.632
-1.2 0.178 0.648 0.780 0.727 -1.2 0.647 0.779 0.690 0.652
-1.1 0.264 0.622 0.870 0.582 -1.1 0.669 0.742 0.720 0.691
-1 0.378 0.612 0.903 0.464 -1 0.690 0.714 0.743 0.661
-0.9 0.297 0.618 0.831 0.489 -0.9 0.561 0.694 0.687 0.637
-0.8 0.148 0.578 0.654 0.242 -0.8 0.595 0.718 0.698 0.614
-0.7 0.273 0.489 0.549 0.353 -0.7 0.555 0.743 0.714 0.645
-0.6 0.349 0.456 0.492 0.289 -0.6 0.590 0.769 0.693 0.707
-0.5 0.282 0.453 0.478 0.338 -0.5 0.560 0.706 0.710 0.670
-0.4 0.155 0.220 0.307 0.159 -0.4 0.603 0.767 0.746 0.736
-0.3 0.064 0.123 0.119 0.095 -0.3 0.589 0.842 0.817 0.824
-0.2 0.189 0.306 0.068 0.023 -0.2 0.494 0.798 0.766 0.820
-0.1 0.537 0.303 0.108 0.139 -0.1 0.490 0.791 0.672 0.738
0.1 0.475 0.400 0.602 0.919 0.1 0.388 0.217 0.526 0.551
0.2 0.155 0.098 0.225 0.554 0.2 0.288 0.079 0.356 0.356
0.3 0.372 0.400 0.290 0.311 0.3 0.232 0.046 0.210 0.237
0.4 0.407 0.397 0.165 0.206 0.4 0.342 0.131 0.255 0.171
0.5 0.282 0.256 0.145 0.161 0.5 0.322 0.138 0.178 0.124
0.6 0.134 0.113 0.131 0.285 0.6 0.244 0.092 0.128 0.075
0.7 0.578 0.266 0.243 0.412 0.7 0.453 0.253 0.262 0.193
0.8 0.457 0.283 0.148 0.577 0.8 0.392 0.279 0.258 0.158
0.9 0.402 0.254 0.127 0.478 0.9 0.365 0.309 0.319 0.223
1 0.773 0.298 0.170 0.535 1 0.345 0.391 0.303 0.295
1.1 1.000 0.225 0.219 0.534 1.1 0.305 0.371 0.356 0.333
1.2 1.000 0.445 0.396 0.680 1.2 0.276 0.354 0.309 0.296
1.3 1.000 0.662 0.545 0.617 1.3 0.296 0.335 0.363 0.272
1.4 1.000 0.587 0.610 0.491 1.4 0.363 0.345 0.342 0.241
1.5 1.000 0.444 0.563 0.371 1.5 0.341 0.315 0.319 0.255
1.6 1.000 0.413 0.789 0.397 1.6 0.319 0.306 0.347 0.315
1.7 1.000 0.300 0.697 0.292 1.7 0.294 0.294 0.323 0.293
1.8 1.000 0.212 0.902 0.254 1.8 0.286 0.272 0.359 0.270
1.9 1.000 0.145 0.860 0.189 1.9 0.263 0.259 0.340 0.243
2 1.000 0.580 0.800 0.262 2 0.157 0.596 0.396 1.000
NOTE.- aP-values here are the proportion of values in the relevant 10,000 (or 1000 for physical
distances) value distribution that are equal to or are more extreme than the 1414C>T-based value.
197
Table 4.8: Table showing inferred haplotypes for FMO2 genomic variants from NIEHS sequencing data.
Hap
loty
pe
SN
P p
ositio
n a
nd
fun
ctio
na
l ch
ang
e
g.7
A>
G (
D3
6G
) *
g.7
695
T>
A
(F
69Y
)
g.7
702
_7
70
3in
sG
AC
(FS
) *
g.7
731
T>
C
(F
81S
)
g.1
095
1d
elG
(
FS
) *
g.1
369
3T
>C
(
F18
2S
)
* g.1
373
2C
>T
(
S1
95L
)
* g.1
373
3G
>A
(
Syno
n)
* g.1
823
7G
>A
(
R23
8Q
)
* g.1
826
9C
>T
(
R2
49X
)
g.1
967
9A
>G
(
E31
4G
)
g.1
983
9A
>G
(
Syno
n)
* g.1
991
0G
>C
(
R3
91
T)
* g.2
202
7G
>A
(
Syno
n)
* g.2
206
0T
>G
(
N413K
)
* g.2
308
7A
>G
(
Syno
n)
g.2
32
38C
>T
(
Q47
2X
)
* g.2
330
0A
>G
(
3ʹU
TR
)
* g.2
341
2_
23
41
3in
sT
(3ʹU
TR
) *
Ethnic Identity of NIEHS samples
To
tal
Anc type → A T - T + T C G G C A A G G T A C A - AA YR AS EU HI
1 A A 2 3 0 0 0 5
2 A A A 2 1 0 0 0 3
3 A 1 0 0 0 0 1
4 A C A 2 0 0 0 0 2
5 C A G T G 1 0 0 0 0 1
6 C G T G 1 4 1 2 3 11
7 C G G T + 0 0 0 1 0 1
8 A G G T 0 1 0 0 0 1
9 A G T 9 0 22 23 13 67
10 A G T G 0 0 0 2 2 4
11 A G G T 0 0 0 0 1 1
12 A G G T + 0 0 0 1 0 1
13 A T G G T 0 1 0 0 0 1
14 G T G 1 0 0 0 0 1
15 G G T + 0 0 0 1 0 1
16 T G G T 0 0 0 1 0 1
17 T G T 2 3 0 7 4 16
18 T G T + 0 4 0 0 0 4
19 T G T G 0 0 5 3 6 14
20 T G G T + 1 0 12 0 9 22
21 + C - T G G T 6 5 0 0 1 12
22 + C - T G T 2 0 0 0 0 2
23 + C - T G T G 0 0 0 1 0 1
24 G A T G T 0 1 0 0 2 3
25 G T G T 0 1 8 1 1 11
26 G T G G T + 0 0 0 1 2 3
Total 30 24 48 44 44 190
NOTE.- Anc Type = Ancestral type from Chimpanzee and Macaque, FS = Frame shift mutation, UTR = Untranslated Region mutation, AA = African
American, YR = Yoruban, AS = Asian, EU = European, HI = Hispanic. * indicates that variant was found by Furnes et al. (2003). g.23238C>T SNP is shown
in bold type
198
After haplotype inference of the 12 NIEHS Yoruba individuals (24 chromosomes), four
chromosomes were shown to possess the 23238C allele (see Table 4.8). Three of these
chromosomes had identical haplotypes (haplotype 1), with two synonymous changes
(g.13733G>A, g.22027G>A) in comparison with an ancestral reference sequence
(elucidated from chimpanzee and macaque data), one of which was found only on a
23238C background (g.22027G>A). The fourth chromosome had an additional, non-
synonymous, mutation (g.18237G>A (R238Q)) that was only found on a 23238C
background (haplotype 2).
Addition of the 15 phased NIEHS African-American samples (30 chromosomes) showed
that a further seven chromosomes possessed the 23238C SNP. Six of the seven had the
two synonymous mutations while the other lacked the g.13733G>A variant (haplotype 3).
The R238Q variant was also found in two 23238C African-American individuals while an
additional non-synonymous mutation (g.19910G>C (R391T)) was found in a further two
23238C chromosomes (haplotype 4).
The 23238T-possessing chromosomes found in the Yoruba, African-American, European,
Hispanic and Asian NEIHS samples possessed a number of variants including non-
synonymous and synonymous mutations as well as insertions and deletions, often in
combination. For example, the g.7702_7703insGAC insertion is found on the same
background as a deletion (g.10951delG), a stop codon (g.23238T) and two non-
synonymous mutations (g.7731T>C (F81S) and g.13732C>T (S195L)) (n = 15,
haplotypes 21, 22 and 23).
Utilizing phased FMO2 genomic data for the Yoruba NEIHS samples produced an
estimate of the time of occurrence of the 23238C>T mutation of 502,404 years before
present (lower boundary: 2 * 4889 * 0.816 * 19.4 = 154,790 years before present, upper
boundary: 2 * 8751 * 1.648 * 36.1 = 1,041,243 years before present), using the
coalescent-based method described by Griffiths and Majoram (1996).
199
4.4. Discussion
4.4.1. Functional FMO2 is found at high frequency throughout sub-Saharan Africa
The g.23238C>T SNP allele distribution reported in this study is consistent with the
expectation based on the proportion of FMO2*1 in African-American and Hispanic
individuals. The ancestral allele of g.23238C>T is present at even higher frequencies in
most sub-Saharan populations than in the admixed populations of the Americas, with
approximately one third of individuals throughout the sub-continent possessing this
variant.
The results in this chapter suggest that frequencies of g.23238C>T alleles are fairly
similar throughout most of sub-Saharan Africa. However, there are two groupings, the
Anuak and south Sudan, which display significantly higher frequencies of the ancestral
allele than was found elsewhere in this survey. In Ethiopia the Anuak from Gambella
display a marked difference in g.23238C>T allele frequency compared with all three
Amharic Ethiopian groups. The distribution of the g.23238C>T polymorphism in the
population from southern Sudan is also significantly different from that in the northern
Sudanese. If these two populations were not included in this survey, CEA would have
been similar to both WA and SEA, emphasising the overall similarity throughout sub-
Saharan Africa. It should also be noted that the Anuak in Ethiopia are thought to be an
immigrant population associated with a larger group of Anuak, who reside in south-
eastern Sudan (personal correspondence Ambaye Ogato). This may go some way to
explaining the similar allele frequencies observed in the southern Sudanese group and the
Anuak.
The data presented here are somewhat similar to the observed distribution of Y-
chromosome variation in Africa, with a great deal of similarity among Niger-Congo
speaking populations, a large part of which is likely to be a consequence of the expansion
of the Bantu-speaking peoples, and more genetic differentiation among populations
speaking the tongues of other language families, such as Afro-Asiatic and Nilo-Saharan
(Wood et al. 2005).
200
The substantial difference in FMO2*1 allele frequencies between northern-African and
sub-Saharan African populations is consistent with other genetic studies, using classical
markers, and more recent studies, using the non-recombining portion of the Y-
chromosome (NRY) (Cruciani et al. 2002) and mitochondrial DNA (Salas et al. 2002)
data, which show large genetic differences between the two regions, with the Saharan
desert acting as a major barrier to gene flow. The presence of the FMO2*1 allele at a low
frequency in the Maghreb as well as in the Yemen could be due to the transfer of
indigenous sub-Saharan Africans to northern Africa and the Arabian Peninsula in the
course of the Arab slave trade during the 8th
to 19th
centuries (Fisher 2001; Richards et al.
2003). The absence of FMO2*1 from the Turkish datasets is in agreement with previous
work, which has shown that the FMO2*1 allele is not present in populations that are not
of recent-African descent (Dolphin et al. 1998; Whetstine et al. 2000).
Although the dataset used in this study was sufficient to explore the general distribution
of the g.23238C>T variant across Africa, more localised sampling will be needed to
answer other potentially important questions. For example, despite the absence of many
statistically significant inter-group differences among the sub-Saharan populations typed
in this study, the range in frequency of individuals possessing at least one FMO2*1 allele
was wide, at 31.3% (17.8-49.1%). If the FMO2*1 variant is shown to be of medical
importance then fine-scale surveys involving greater numbers of subjects will be needed
to identify local groups with particularly high frequencies.
4.4.2. The possible consequences of FMO2 functionality in Africans
Given the observed similarity in the distribution of the g.23238C>T polymorphism across
sub-Saharan Africa, it is possible to extrapolate from the data reported here to estimate
the number of people in sub-Saharan Africa as a whole who have at least one FMO2*1
allele. Based on a study of Hispanic-Americans of Puerto-Rican and Mexican origin
(Krueger et al. 2005), which found that three mutations known to decrease enzyme
function segregated with the truncation mutation, it is currently reasonable to assume that
the FMO2*1 allele found in Africans results in a fully functional FMO2 enzyme
(however, other, unidentified, mutations may render the FMO2 enzyme less catalytically
active or even completely inactive). Given that the total population of sub-Saharan Africa
is 726 million (725,800,000 – 2004 World Bank estimate [http://www.worldbank.org]),
226 million individuals may possess at least one allele that encodes functional FMO2.
201
Sequence data from the NIEHS SNP programme for Yoruba and African-American
samples support the suggestion that the FMO2*1 allele results in functionally active
FMO2. While no statistical support is offered, because of uncertainty in regard to certain
aspects of the NIEHS data (i.e., there is possible error in haplotype inference because of
the presence of very rare variants and there are large regions where successful sequencing
coverage in all samples has not been achieved), it would appear that a large majority of
variants that may affect the functional activity of the enzyme lie on an FMO2*2A
background. This suggests that chromosomes possessing this allele are in mutational free
fall (the evolutionary pressure to conserve sequence identity has been relaxed) because of
the loss of function caused by the g.23238C>T mutation, whereas chromosomes with
FMO2*1 may have been evolutionarily conserved as they still retain enzymatic activity.
However, given the small number of g.23238C-possessing individuals (n=12) in the
NIEHS Yoruba dataset it is necessary to be cautious in drawing conclusions about FMO2
activity in Africa from these data alone. With such a considerable number of individuals
potentially at risk of thiourea toxicity, however, the effect of FMO2 expression in humans
on the metabolism of this family of chemicals (as well as of other chemical families that
may also act as substrates of FMO2) requires further investigation. If the action of the
enzyme is shown to be detrimental then the risk of future exposure to offending substrates
will need to be considered very carefully by regulatory authorities.
Drugs that are primarily metabolised by FMOs may, in general, have certain advantages
over those metabolised by cytochrome P450 enzymes (CYPs), because FMOs are not as
readily inhibited or induced, thus reducing the risk of drug-drug interactions (Cashman
2005). If, however, FMO2 is involved in the metabolic pathway of drugs used to treat
common diseases in Africa and if products of enzymatic activity have a toxic effect then
great caution should be applied in the distribution and use of such drugs. Given the large
numbers potentially at risk it is important that the activity of the enzyme encoded by the
FMO2*1 allele in African populations is established as quickly as possible, not least
because of the current widespread use of ETA in the treatment of tuberculosis.
Knowledge of local allele frequencies of important drug-metabolizing enzyme variants
that are easy to type, such as g.23238C>T, could prove useful in predicting drug response
in Africa. This is because a) compiling individual profiles (Johnson 2003; Weinshilboum
2003; Evans 2003) of the activity of drug-metabolizing enzymes is unlikely to be feasible
202
for the foreseeable future, due to economic constraints and a lack of appropriate
infrastructure, and b) variation in individual drug response may well be geographically
and ethnically structured (Wilson et al. 2001). In addition, small, isolated populations in
which genetic drift may lead to significant changes in allele frequencies, as may have
been observed in the Anuak, could well benefit from the collection of such data. It may
also be prudent, as genetic characterisation becomes more common in the developed
world, for individuals with a significant sub-Saharan African ancestry to be typed for the
g.23238C>T SNP.
4.4.3. The Evolution of FMO2
The Long-Range Haplotype test revealed no evidence for positive selection on either
allele at the g.23238C>T SNP in any of three HapMap populations (YRI, CEU and
CHB+JPT), so the high frequency of the derived FMO2*2A allele cannot readily be
explained by it having a recent selective advantage. As a consequence of this and the
presence of FMO2*1 throughout sub-Saharan Africa at roughly similar frequencies it is
suggested that the most likely explanation for why the FMO2*1 allele is not present
outside Africa is because it was lost in a bottleneck when anatomically modern humans
migrated out of Africa sometime after 65,000 years ago (see Mellars 2006 and Chapter 1)
and that therefore the g.23238C>T SNP must have a sub-Saharan African origin prior to
this event. However, Sabeti et al. (2002b) have indicated that the EHH statistic is unable
to detect positive selection that has occurred more than 30,000 years ago, so the
possibility that a strong selective pressure existed before this date cannot be dismissed,
which resulted in the increase in FMO2*2A allele frequency and the complete loss of the
FMO2*1 allele outside of Africa. Another explanation is that selection acted only on the
populations migrating out of Africa, and since the allele went to fixation the signal is not
visible via the iHS test. However, under that scenario extended LD would be expected
around the FMO2 gene in non-African HapMap populations, which is not found.
Interestingly there is evidence that selection has acted on one member of the FMO family,
FMO3, which has been the subject of balancing selection (Allerston et al. 2007).
Dating of when the g.23238C>T SNP arose, through the use of NIEHS sequencing data,
appears, notwithstanding the need to apply somewhat crude assumptions, to support the
ancient origin of this SNP with a time of 502,404 years before present, well before any
estimates of the first exodus of modern humans from Africa into the rest of the world.
203
Even the lower boundary yields a time some 90,000 years before this event. It will be
interesting to observe the frequency of FMO2*1 in isolated traditional hunter-gatherer
groups such as the Khoisan (which, like any pygmy populations, were not available as
they are currently not part of the TCGA African DNA sample database) that are thought
to be one of the earliest diverging human populations. Similarly, analysis of linked
microsatellites may prove useful in understanding more about the mechanism of its
dispersal.
4.5. Conclusion
The peoples of sub-Saharan Africa demonstrate one of the fastest population growth rates
in the world, while the region itself is widely accepted as the place of origin of
anatomically modern humans. However, in comparison with other regions, studies
investigating the distribution of human genetic variation at the molecular level have been
sparse. Those that have been performed have often been limited in scope to a few
populations and small sample sizes. This study has contributed to redressing this
imbalance and shown that a gene previously considered to be of little interest, but now
thought to encode an enzymatic variant that may be important in human healthcare, is
present at relatively high frequencies in multiple populations throughout the continent.
Surveys such as this are not only of benefit to the indigenous populations of Africa, but
are also of increasing importance in the planning of healthcare in the developed world,
where the number of individuals of recent African descent is growing and, in some areas,
such as the Americas and Europe, is already substantial. Sub-Saharan Africa is thought to
possess more human genetic diversity than the rest of the world combined. However, it is
not yet clear how this diversity is distributed and indeed what part of that diversity is not
present outside the continent. It is obvious that variation not recognised cannot be studied
in vivo. Paucity of such knowledge can lead to inappropriate therapeutic, prophylactic
and diagnostic intervention and increase the risk of an adverse drug reaction. There is a
need for more studies on human genetic diversity in Africa; research from which all
people of recent African descent, wherever living, should benefit.
204
Chapter 5:
Conclusion
205
5. Conclusion
This chapter discusses implications of the findings from the three case studies described
in this thesis for genetic studies to elucidate a) local histories, b) the structure and extent
of genetic diversity in the presence of cultural diversity and c) potentially medically
relevant variation in sub-Saharan Africa. It also describes how the methodologies used
can be developed to increase their utility. Finally further research relating both to the
questions addressed in this thesis and related matters are suggested.
It is clear from the three case studies that even rather rudimentary molecular techniques
and relatively conventional statistical analysis applied at a fine-scale level of
discrimination can produce results that are of interest over a wide range of disciplines. In
each case the question addressed should be well defined with expectations or
hypothesises clearly stated. Sampling strategies and methods must be carefully planned.
Samples collected or selected for each study must be appropriate for testing prior
hypothesises, of sufficient number and with a known provenance. Studies that do not
meet these criteria are of limited value.
5.1. Implications for investigating human history and behaviour
The majority of studies investigating genetic diversity in sub-Saharan Africa have
covered a large geographic area and utilised samples already available. Sometimes
samples from multiple ethnic groups are pooled with little justification (for example see
Watson et al. (1996) and Hammer et al. (1997)) while phylogeographical approaches that
seek to fit genetic data to known demographic or prospective selective events are applied
in a somewhat ad hoc manner (for example see Underhill et al. (2001) and Salas et al.
(2002)). At best these approaches offer starting points for future investigation since a
single genetic outcome can usually be explained by multiple demographic scenarios, and
multiple genetic outcomes can result from the same demographic scenario as a result of
evolutionary variance (i.e., drift effects).
206
Academics in the social sciences such as anthropology and linguistics have previously
appealed for fine-scale studies in sub-Saharan Africa with dense sampling strategies
(MacEachern 2000) and Chapters 2 and 3 have demonstrated the value of such research.
The survey of FMO2 variation, while suited to addressing the questions posed in the
study, is of limited use in elucidating issues relating to human history and language
evolution. Though one plausible demographic explanation for the present distribution of
FMO2*1 has been suggested, many other scenarios are also possible.
Sample selection should follow the formulation of testable hypothesis. To ensure that
sample collection is appropriate geneticists should collaborate closely with linguists,
anthropologists, historians (including local historians) and archaeologists, all of whom
can contribute to understanding the complex processes and events that may have
occurred, or are still in progress. The very precise expectations formulated with respect to
royal social status inheritance described in Chapter 2 illustrate the advantages of this
approach
The structure of language trees can be the subject of fierce debate among linguists and,
often, these differences of opinion are insufficiently understood by geneticists when
seeking correlations between „genetics‟ and „language‟ (see criticism from outside the
genetics community e.g. O‟Grady et al. (1989), Bolnick et al. (2004), Campbell et al.
(2006) ). Some studies have sought to account for this uncertainty by varying branch
lengths between languages but still fail to take into account different interpretations of the
underlying shape of the language tree. To address these issues in the study described in
Chapter 3, the author of this thesis worked closely with Dr Bruce Connell, a linguist
specialising in South Nigerian languages. Such cross disciplinary collaboration is
particularly necessary in the formulation of the questions to be addressed when particular
aspects of language practices of the ethnic groups being studies might easily be
overlooked by a non-specialist.
It is interesting to note in Chapter 2 the congruence of oral history with the genetic data in
regard to the ethnogenesis of the Nso΄ and in Chapter 3 the lack of congruence in regard
to the origins of the Efik Uwanse. Traditional historians have tended to be rather sceptical
about the utility of oral histories (Blench 2006) even though they can be potentially
valuable sources for understanding the past, especially in sub-Saharan Africa where
207
written records are somewhat recent (Ki-Zerbo 1989). At the source of this difficulty is
deciding which oral histories, or parts of an oral history, record real events and which do
not.
Chapters 2 and 3 have shown the potential of genetic studies to provide supporting
evidence for one or more alternative accounts. This finding should be of particular
interest to anthropologists and local historians. They also show the value of appropriate
DNA sampling methodologies and the necessity in analysis to take full account of the
sampling strategy adopted. The Nso΄ study made use of the characteristics of
agriculturists and hunter-gatherers, both of which appear to have left distinct genetic
signatures (at least for the NRY) that can be detected across sub-Saharan Africa
(Underhill et al. 2001), and the group‟s well defined hierarchical social system. Oral
histories incorporating both of these tendencies are likely to be particularly amenable to
in depth genetic investigation.
Within the discipline of genetic history (the elucidation of past events through the
interpretations of genetic data) the sex-specific systems analysed in this thesis have
proved particularly useful. Though studies using NRY and mtDNA have been popular
over the past 10-15 years because of their relative ease of characterisation, recent
advances in haplotype inference(Niu 2004; Li et al. 2005; see Browning & Browning
2007) and sequencing technologies (Mitnik et al. 2001; see Mitchelson 2003) have
increased the availability of useable autosomal data. Nevertheless, in appropriate
circumstances the increased susceptibility to drift of NRY and mtDNA combined with
sex-specific demographic events recorded in many accounts of local histories, and
cultural evolution, ensure that NRY and mtDNA are frequently the genetic systems of
choice. For examples autosomal markers are unlikely to have been particularly useful in
elucidating the history of the Nso΄.
However this is not to suggest that analysis of autosomal data will not be of any use.
Much of the sex-specific genetic variation in sub-Saharan Africa is likely to have been
shaped by the expansion of the Bantu speaking peoples. As the autosomes are a) less
prone to genetic drift because of their four-fold effective population size (ignoring the
effects of reproductive variance (see Jobling, Hurles & Tyler-Smith 2004 page 134 Box
5.1)) and b) possess more genetic material to analyse evidence of demographic events and
208
origins may be preserved. New large scale sequencing technologies (Schuster 2008) such
as 454 (Margulies et al. 2005), Solexa (Bentley 2006) and SOLid (Shendure et al. 2005)
sequencing and development of more realistic (and presumably complex) models of
human evolution combined with developments in analysis of large datasets should enable
parts at least of this archive to be interpreted.
5.2. Implications for investigating medically relevant genetic variation
It is obvious that finding genetic variation in sub-Saharan Africa that is absent elsewhere
should be of potential medical benefit. However it is the approach to achieving this
objective that is of most interest in this discussion. Ideally one would collect large
samples from every ethnic group and sequence entire genomes. However this is currently
impractical. Are there approaches that quickly and cheaply identify important variants of
relatively immediate and widespread practical relevance given the economic constraints
of working in sub-Saharan Africa? Chapter 4 is one good, albeit simple, approach when
seeking pharmacogenetically relevant variants.
The immediate objective is to identify potentially important variants i.e. genetic variation
of therapeutic, diagnostic or prophylactic importance present at significant frequencies in
one or more ethnic or geographic groupings („significant frequency‟ in this situation is to
be assessed in the context of medical cost/benefit assessments, which can vary from
group to group). Meeting this criterion should ensure that knowledge of the variation can
be used to benefit peoples of sub-Saharan Africa. One target is genetic variation in genes
coding for drug metabolising enzymes (especially those involved in the metabolism of
drugs used to treat diseases prevalent in sub-Saharan Africa). Often there will be reports
of their existence in African Americans (Whetstine et al. 2000; Hirunsatit et al. 2007; e.g.
Gong et al. 2007).
The distribution of genetic variants can then be assessed in sample sets of populations
across sub-Saharan Africa as in Chapter 4. This should determine geographic structuring
and the likelihood of local variation at a continent wide level. In particular, based on
existing genotype/phenotype association studies, the significance of variation can be
209
assessed and the likely number of individuals affected determined. Establishing the above
would enable researchers to more efficiently focus on variation that that is likely to be of
benefit to the greatest number of individuals possible, as seen in Chapter 4 with the
finding that the 23238C allele is likely present in over 2,000,000 sub-Saharan Africans, a
variant that may possibly have a substantial effect on how these individuals respond to
treatment for tuberculosis.
The next step is to determine whether a variant of potential functional significance based
on observations made outside Africa has the same functional association within African
populations. This might not be so since inter alia redundancy within drug metabolising
enzyme systems might prevent the expression of a phenotypic effect. This will be
achieved by functional expression studies, focusing especially on the effects of the variant
on the metabolism of drugs used to treat diseases prevalent in sub-Saharan Africa. Such
work requires close collaboration between genetics and biochemistry laboratories.
Having established that a genetic variant has a sufficiently important functional
consequence within sub-Saharan African populations information on its consequences
and distribution should be provided to health care providers and an economic cost benefit
analysis undertaken on which to base future policy.
In the absence of individualised profiling it is anticipated that first choice therapeutic,
diagnostic and prophylactic intervention will be based on information about geographical
and inter-ethnic group distributions of variation (Tishkoff & Kidd 2004; Vizirianakis
2004; Reinbold 2007). Because characterisation of each and every group is impractical
and even though genetic drift shaped by demographic history may cause considerable
variation in small isolated populations, knowledge of genetic variation at a higher but still
more local geographic scales combined with knowledge of relationships informed from
anthropology and linguistics may permit useful predictions of the pharmacogenetic
profiles of uncharacterised groups within a region. For example Chapter 3 showed
differences among three neighbouring region in West Central Africa as a result of
differential gene flow. A better understanding of the factors that have caused these
differences and development of appropriate models of relationships between genetic
variation and demographic history could aid in the prediction of pharmacogenetic profiles
of individuals in these regions. Relatively small scale sampling and typing of variants
210
combined with the knowledge of population sizes, social structures and practice might
make important contributions to the improvement of efficacy and the reduction of adverse
events in healthcare. Fundamental to this approach is the greater use of fine-scale surveys
to generate data which can be used to construct more sensitive models.
Of course not all, or perhaps even most, variation in drug efficacy and safety is due to
genetic variation. Environmental influences, drug-drug interactions and poor compliance
with medical advice all make a contribution to therapeutic outcomes. Nevertheless
medical interaction based on a better prediction of genetic control and metabolic
pathways has the potential to benefit people in sub-Saharan Africa, a region in which
medically relevant genetic variation is likely to be greater than elsewhere in the world.
What is more there could be benefits in the relatively near future while the notion of
individualised pharmacogenetic targeting is unlikely to have applications in sub-Saharan
Africa in the foreseeable future. Technology is approaching the point when, in the near
future, entire genomes will be routinely sequenced in a matter of days or even hours.
Theoretical methods that can handle such large masses of data will also have to be
developed but the greatest challenge, in Africa, may be the economic cost of
implementation. If knowledge of human genetic variation is to be harnessed in the pursuit
of better healthcare investigators will need access to DNA biobanks with well
provenanced samples. Given the current poor infra-structure to support such collections
(Tishkoff & Williams 2002) it is important that when investigators do have the
opportunity to collect samples they work with anthropologists and other social scientists
to select appropriate targets.
5.3. Future Work
Each of the three case studies described in this thesis have been performed within the
time frame and using the resources available. However each study has also revealed scope
for additional research, both to evaluate more thoroughly the findings of Chapters 2-4 and
also to gain insight into aspects not addresses in the projects. Below is a description of
potential further work not discussed in the case studies themselves that might be
undertaken.
211
5.3.1. Future work derived from Chapter 2 (Sex-Specific Genetic Data Support One Of
Two Alternative Versions Of The Foundation Of The Ruling Dynasty Of The Nso` In
Cameroon)
The potential to infer the history of the Nso΄ from genetic data depended heavily on the
ability to determine whether sex-specific genetic profiles in the won nto´ and duy
conformed to prior expectations given previously reported alternative rules concerning
inheritance of royal status (Royal Social Status Rules A and B). These expectations are
based on a range of assumptions, including that all fons („n‟) and all won nto´ and duy
(„y‟) have in each case an equal number of offspring. Combinations of „n‟ and „y‟ yield a
range of probabilities for the proportion of fon NRY types present in the won nto´ and
duy. This may be considered by some a somewhat unsatisfactory approach. Rather than
generating discrete probabilities an alternative approach could be to generate estimates of
proportions by simulating won nto´ and duy genealogies from an initial single fon under a
given set of rules in silico. This would permit variation in reproductive success among
individuals. Such an approach could allow for the effects of same social class marriages
and acquisition of duy status other than by descent from won nto´. At the end of Chapter 2
it was suggested that the won nto´ descent system may be evolving and further
anthropological work to investigate the possibility that a somewhat more patrilineal
system is emerging was proposed. The effect of this possibility and other could be
incorporated into such simulations. In addition given that Nso΄ fons are known to have
many more children than other won nto´ it is possible to explore associations of higher
male status and reproductive success.
Another aspect in which this case study can be refined is in the dating of the
Y*(xBR,A3b2) clade in the won nto´ and duy. As stated in Chapter 2 the point estimate
was non-informative because the Y*(xBR,A3b2) clade in the won nto´ and duy was
homogenous, meaning that the associated upper confidence interval for the most recent
common ancestor is zero years, which is obviously nonsense. It should be possible to
tighten confidence intervals by typing of further NRY microsatellites in Y*(xBR,A3b2)
individuals, collecting a new larger sample set or both.
212
5.3.2. Future work derived from Chapter 3 (It All Depends On The Scale: Little Sex-
Specific Genetic Variation In The Presence Of Substantial Language Variation In Peoples
Of The Cross River Region Of Nigeria Assessed Within The Wider Context Of West
Africa)
Given the high level of homogeneity of sex-specific systems observed in the peoples of
the Cross River region it would be interesting to see whether the populations can be
discriminated using additional markers (NRY) and sequences (mtDNA) and, if so, at what
point such discrimination becomes possible. For the NRY haplogroup E3a, because of its
high frequency, is the primary candidate for further resolution. According to the
nomenclature of the Y-chromosome Consortium (2002) E3a can be further characterised
at one additional level of UEP markers (Haplogroups E3a*, E3a1-E3a6). The results from
NRY microsatellite analysis suggest that there may not be observable genetic structure
even at this level of genealogical resolution. However extra UEP typing can on occasion
reveal fine-scale population structure that microsatellites cannot and this is a particularly
plausible scenario in Chapter 3 where the number of microsatellites typed (six) is quite
low. It would therefore be appropriate to type a subset of the Cross River samples to
assess whether or not further typing would be informative. In addition some further
insight concerning relationships among groups might be generated by more detailed
characterisation of haplogroup BR*(xDE,JR) samples (Haplogroup B is Africa-specific
while Haplogroup R is mostly found in Eurasia (Underhill et al. 2001)). Ultimately, any
study that considers the genealogical relationships of a non-recombining system will be
biased by the specific lineage delineators typed. While the UEP markers used here were
not chosen with any particular global region in mind, biases in their ascertainment will
obviously affect the conclusions drawn from the genetic studies described in this thesis
(see Jobling & Tyler-Smith 2003; Wilder et al. 2004).
Also, with the reduction in cost of whole mtDNA typing, it is now practical to envisage
complete sequencing a subset of samples from Cross River groups, which would allow
more accurate and reliable definition of mtDNA haplogroups. The assumption that
genetic drift would be the major evolutionary force in causing genetic structuring within
the Cross River region and that the effect of novel mutations would be negligible is a
reasonable one given the time periods during which the various languages separated.
However it is possible that signature mtDNA haplogroups may exists that lend clues to
more ancient demographic features of the Cross River region.
213
The findings of Chapter 3 suggest many questions that would require further sampling to
address. For example though samples from speakers of six of the most prominent
languages of the Cross River region were analysed there are numerous other groups
speaking their own languages that have not been sampled. It would be interesting to see if
groups speaking less common languages (such as Efai or Ibuno, each of which there are
less than 10,000 speakers each (Ethnologue 2005)) have experienced similar levels of
male and female mediated gene flow as the larger groups. More data on mating patterns
would also be interesting in these cases since it may be that in a linguistically diverse
region smaller populations must avoid inter-language unions if their language is to
survive.
The addition of the Igboland groups to the study indicated that genetic differentiation may
be greater outside the Cross River region. Further characterisation of populations at
various distances from the region may establish if this is the case. The two Igboland
groups did indicate some male-specific inter-group differentiation. This leads to two
further questions: a) can different populations in Igboland be differentiated and if so by
what criteria, e.g. geography or dialect? (Given the Igbo‟s prominent role in Nigeria
(there are almost 20 million Igbo speakers (Ethnologue 2005)) and their diverse range of
dialects a detailed study is clearly appropriate) and b) is the level of sex-specific genetic
homogeneity observed in the Cross River region common in South East Nigeria (Carrying
out a similar study to that pursued in Chapter 3 in other regions, especially ones likely to
be less influenced by the slave trade, would be informative)?
Comparison of the peoples of the Cross River region with other West Central African
populations discriminated between these and Ghanaian and Cameroon Grassfields
populations. At the same time it was shown that these two other regions also
demonstrated very different patterns of sex-specific genetic diversity. The Grassfields
populations were more heterogeneous than the Ghanaian populations despite covering a
smaller geographical area. Fine-scale investigation of these two regions and the reasons
for their different patterns of genetic diversity could uncover the underlying causes. In the
case of the Grassfields it is possible that topographic variation has been a major factor.
214
5.3.3. Future work derived from Chapter 4 (The potentially deleterious functional variant
FMO2*1 is at high frequency throughout sub-Saharan Africa)
This study examined the distribution of the FMO2 g.23238C>T SNP in sub-Saharan
Africa in a broad geographical context. While the frequency is generally homogenous
across sub-Saharan Africa there was nevertheless a large range. East Africa seems
particularly variable and further analysis of populations in southern Sudan and Ethiopia is
called for.
Of immediate importance is to establish a) whether the FMO2*1 allele does code a
functional FMO2 enzyme in Africans and not just in African Americans and b) what the
medical impact of functionality is. Early indications from work the laboratories of
Professor Ian Philips at Queen Mary, University of London and Professor Elizabeth
Shephard at University College London (unpublished data) suggest that African FMO2*1
alleles are in fact catalytically active. If it does code for a functionally active enzyme it is
important to establish how this functionality affects the metabolism of thiourea-based
drugs such as Ethionamide (e.g. are there dosage-specific effects?) and if any other drug
metabolizing enzymes may interact with or compensate for a) functional FMO2 and b)
non-functional FMO2 (either directly or indirectly).
The NIEHS data revealed many SNPs on the 23238C background that may affect protein
structure. Sequencing of FMO2 exons as well as possible promoters, enhancers and splice
sites in a large cohort of Africans may identify further variants, some of which may be
population-or region-specific, and again it will be important to establish the effect of
these variants on enzyme activity.
Though it was not the primary focus of the study tests for recent positive selection were
performed using the Long Range Haplotype (LRH) test (Sabeti et al. 2002b). This test
was chosen since the International HapMap project has made necessary SNP data readily
available. No evidence of positive selection at either 23238C>T allele was detected.
However, as discussed in Chapter 4, the LRH test has power to detect only very recent
selective sweeps (<30,000 years before present (Sabeti et al. 2006)) because
recombination will extinguish evidence of earlier events. Given the age of the SNP
(~502,404 years before present, lower boundary:154,790 years, upper boundary:
1,041,243 years) other methods are necessary to explore the possibility that the fixation of
215
the non-functional 23238T allele in Europeans and Asians was a result of positive
selection acting outside the range amenable to LRH testing. It will be necessary to re-
sequence the FMO2 gene in African and Eurasian chromosomes. This would allow us, in
a manner similar to that utilised by Xue et al. (2006), to conduct tests of neutrality using
Tajima‟s D (Tajima 1989), Fu and Li‟s D and F (Fu & Li 1993) and Fay and Wu‟s H (Fay
& Wu 2000) as well as to examine whether the level of haplotype diversity for either
allele is that expected under neutrality. Such methods might allow us to detect not only
signatures of possible positive selection but also balancing selection. Though NIEHS
sequence data are available for these populations they are probably not of the required
quality to perform such analyses accurately (there are large regions where successful
sequencing coverage in all samples has not been achieved, which may affect the ability to
detect selection as analysis such as Tajima‟s D depend on correctly identifying singleton
variation (Filatov 2002)). Only once this re-sequencing has been performed will it be
possible to attempt to assess evidence for selection as a factor in the present day
distribution of 23238C>T alleles.
5.4. Final Comments
The three case studies presented in this thesis have revealed important findings to
elucidate local human history, demographic behaviour and medically relevant variation in
sub-Saharan Africa. They demonstrate how examining the distribution of human genetic
diversity can generate useful insights in many diverse areas. Throughout this thesis these
studies have emphasised the importance of adopting appropriate sampling methodologies
and utilizing the expertise of collaborators working in other disciplines such as
anthropology and linguists. With the rapid advance of relevant genotyping and
sequencing technologies and rapidly reducing costs, scope for such work is increasing. It
is to be hoped that these advances will be matched by increasing attention to fieldwork
and the raw material for such studies; i.e. the choice of individuals from whom samples
are taken. Only then can peoples of sub-Saharan Africa start to reap the benefits, in
practical ways, that this research can generate.
216
Appendix A: Criteria for and problems
associated with collecting African samples for
The Centre for Genetic Anthropology
(TCGA) DNA bank.
1. Ethical and Legal Consents
No collection is made unless it is permitted by national and local law and
appropriate ethical consent has been obtained in the country in which the collection
is made. No collection is made unless to do so will not breach local custom. All
collections are made with the consent of local communal leaders and in ways
acceptable to local custom.
Problems
It is not always possible to establish what the relevant law is and is not always
possible to identify a suitable body from which ethical consent should be obtained.
It is sometimes difficult to identify local officials, the consent of whom should be
sought.
2. No Coercion
No collections are made under arrangements in which donors are instructed to
participate.
3. Random collection
To the greatest extent possible, samples are collected from donors randomly.
Where there is a preset number of samples to be collected they are collected using a
„first come first served‟ approach. There are three approaches to collection: a)
establishing presence in a public place e.g. a weekly market and waiting for persons
217
to offer to provide mouth swabs, b) prearranged gatherings e.g. in a village or town
hall, advertised previously by a local representative and c) visits to small hamlets.
4. Informed Consent
The purposes of the study are explained in simple terms to all donors.
Problem
Frequently the explanation has to be given through a local interpreter speaking in a
local language.
5. Thank you gift
Donors are rewarded for providing a sample by being given a Polaroid photograph.
Problem
There is a risk that a donor may provide false information in order to qualify to give
a mouth swab or attempt to give a mouth swab on more than one occasion in order
to get more than one photograph.
6. Donors
Samples are only collected from males of 18 years of age or older that do not have a
common paternal grandfather.
Problems
It is necessary to be careful to ensure that persons under 18 do not give false
information about their age in order to obtain a photograph and that persons
sharing the same paternal grandfather do not give false information for the same
reason. A local adviser (normally an interpreter) is recruited to ensure that these
activities do not take place. When collections are made at a market or in a village
218
or town hall, whenever possible, they are completed in a single day to minimise the
risk of breaching these rules.
Collecting only from males can sometimes cause females in the same location to feel
discriminated against. It would be preferable to ensure that individuals, in addition
to not sharing a common paternal grandfather, do not share a common maternal
grandmother. In fact it would be preferable if individuals did not share any
grandparent, either maternal or paternal. In practice however this objective cannot
be achieved while collecting random and anonymous samples in rural African
locations. To attempt to do so would require a level of questioning and record
keeping that is not practical or consistent with collecting samples anonymously.
Given frequent occurrences of polygamy (formal and informal) introducing a
criterion of not sharing a common maternal grandmother as well as not sharing a
common paternal grandfather is not practical. In addition, since it was not a
requirement of early TCGA collections made for Y-chromosome studies, not
introducing this criterion ensures consistency.
At markets, in particular, crowds can become animated making it difficult to keep
order.
Complying with the rule not to collect from individuals sharing the same paternal
grandfather is necessary to ensure that cases of false paternity are not identified.
A further purpose is to ensure consistency across TCGA collections which were
originally compiled for Y-chromosome studies. Complying with this restriction
does introduce an element of bias preventing a collection being entirely random. At
an extreme it is possible that in a village consisting of only one or two clans,
perhaps only one sample can be collected from each clan.
In collections made in markets, obtaining the information for datasheets can be
difficult and it is necessary to have a sufficiently large team of recorders and local
219
interpreters to ensure the task is performed satisfactorily. The possibility of
inaccurate information being provided by interpreters must be recognised.
Collectors need to ensure that answers are provided by the donor and are not
imposed on the donor by an interpreter.
7. Anonymity
All samples are collected anonymously.
8. Donor Targets
Prior to collection commencing a target figure for the number of samples to be
collected is defined and is normally set at 100.
Before collection starts a decision is made as to whether to collect persons attending
a particular location randomly or to restrict the collection to persons born or living
at a particular location, in a particular region, possessing a particular self defined
identity, speaking a particular first or second language or defined by some other
stated criterion.
Problem
It is possible that potential donors will provide inaccurate information in order to
obtain a photograph.
9. Form of Collection
Only mouth swabs are collected.
Problem
The yield of DNA is far lower than if blood is taken.
220
Appendix B: An example sociological data
sheet used during DNA sample collection
221
Appendix C: Extraction of DNA from Buccal
Swabs
For collection of samples in the field buccal swabs are rubbed along both cheeks on the
inside of the mouth for approximately 20 seconds to collect cheek cells. This is usually
performed by the collector but occasionally by the individual being sampled himself (for
example a high ranking individual such as a chief of a village may not be allowed contact
with other individuals). The buccal swab is then placed within a 1.5ml tube so that the
swab end makes contact with a 1ml 0.05M Ethylenediaminetetraacetic acid (EDTA),
0.5% Sodium Dodecyl Sulfate (SDS) preservative solution. The following
Phenol/Chloroform DNA extraction protocol is then performed for each sample.
1. 40 µl of 10 mgml-1
proteinase K is added to 20ml of sterile distilled water.
2. 0.8ml of the water/proteinase K solution described in step 1 is added to the 1.5ml
tube containing the buccal swab immersed in EDTA/SDS solution.
3. The mixture from step 2 is then incubated at 56C for between 1-3 hours.
4. 0.8ml of the mixture from step 3 is added to a microfuge tube containing 0.6ml of
phenol/chloroform (1:1) mix.
5. The sample from step 4 is mixed and then centrifuged for 10 minutes at maximum
speed.
6. The resultant aqueous (upper) phase (layer) in the microfuge tube is transferred to
a microfuge tube containing 0.6ml of chloroform and 30µl of 5M NaCl using a
standard Gilson pipette.
7. The sample from step 6 is mixed and then centrifuged for 10 minutes at maximum
speed.
8. The resultant aqueous (upper) phase (layer) in the microfuge tube is transferred to
a microfuge tube containing 0.7ml of chloroform using a standard Gilson pipette.
9. The sample from step 8 is mixed and then centrifuged for 10 minutes at maximum
speed.
222
10. The resultant aqueous (upper) phase (layer) in the microfuge tube is transferred to
a screw-top microfuge tube (which is used for long term storage of the DNA)
containing 0.7ml of isopropanol using a standard Gilson pipette.
11. The sample from step 10 is mixed and then centrifuged for 13 minutes at
maximum speed.
12. The resultant supernatant is carefully (to avoid dislodging the DNA from the walls
of the tube) discarded and the tube is inverted at a 45˚ for one minute in order to
drain off any remaining supernatant.
13. 0.8ml of 70% Ethanol is then added to the screw-top microfuge tube.
14. The sample from step 13 is then centrifuged for 10 minutes at maximum speed.
15. The resultant supernatant is carefully discarded and the tube is inverted at a 45˚
for 20 minutes in order to drain off any remaining supernatant.
16. 200 µl of TE (pH 9.0) is then added to the microfuge tube.
17. The mixture from step 16 is then incubated at 56˚C for 10 min mixing
occasionally.
18. The resulting DNA with TE mixture is then stored upright in a freezer at -
20˚C, ready for use.
The following protocol is performed in batches of samples to increase throughput. Steps
1-3 are performed in batches of 48, steps 4-10 in batches of 24 (the maximum capacity of
the microfuge centrifuge) and step 11-18 in batches of 96. Custom TCGA DNA
extraction sheets and appropriate labelling are used throughout to prevent mixing up of
samples.
223
Appendix D: Legends of figures and tables
found on the attached CD.
Chapter 2
Supplementary Table 2S.1: Distribution of NRY types, defined by UEP
haplogroups and microsatellite haplotypes, in the four Nso´ social classes
and people of the western Grassfields and Tikar Plain.
Supplementary Table 2S.2: Distribution of mtDNA types, defined by VSO
haplotypes, in the four Nso′ social classes.
Supplementary Table 2S.7: Confidence intervals for TMRCA calculations in
the duy, the nshiylav and mtaar, and the won nto´ and duy, using two
mutation models.
Chapter 3
Supplementary Tables 3S.1: Pairwise ETPD P-values for various levels of
NRY and mtDNA analysis for Cross River samples, Cameroon and Ghana.
Level of analysis shown in top left cell of matrix. Colour code is same as
Table 3.6.
Supplementary Table 3S.2: Distribution of NRY types, defined by UEP
haplogroups and microsatellite haplotypes, in the Cross River region,
Cameroon and Nigeria.
224
Supplementary Table 3S.3: Pairwise genetic distances and associated P-
values for various levels of NRY and mtDNA analysis. Level of analysis
shown in top left cell of matrix. Colour code is same as Table 3.6.
Supplementary Table 3S.4: Distribution of mtDNA types, defined by HVS-1
mtDNA haplogroups and VSO haplotypes, in the Cross River region,
Cameroon and Nigeria.
Supplementary Table 3S.5: Distribution of NRY types, defined by UEP
haplogroups and microsatellite haplotypes, in Ethiopia, Israeli and
Palestinian Arabs, Lake Chad and Sudan.
Supplementary Table 3S.6: Distribution of mtDNA types, defined by HVS-1
mtDNA haplogroups and VSO haplotypes, in Ethiopia, Israeli and
Palestinian Arabs, Lake Chad and Sudan.
Supplementary Table 3S.7: Pairwise genetic distances and associated P-
values for various levels of NRY and mtDNA analysis for Efik Uwanse
comparisons. Colour code is same as Table 3.6.
225
Appendix E: LRH test Source Code
The original Python source code written by myself that was used to perform the version
of the LRH haplotype test developed at the TCGA described in section 4.2.3.1.1 is
available on the CD that accompanies this thesis (SNPsig-v35-phase2-build36-cm.py).
This version uses only SNPs for the core region but a further version that can use
haplotype core regions has also been developed as part of another study and is available
from the author on request. This code uses build35 of the HapMap dataset and requires
the downloading of HapMap files in the following structure (though this structure is
easily editable within the code):
C:Hapmap-build35\Allelefrequencies\(All unzipped allele frequency files)
C:Hapmap-build35\Phasedata\(All unzipped phased data files)
C:Hapmap-build35\Recombrates\(All unzipped genetic map data)
The code was written in Python version 2.4 so should be compatible with this and any
future versions of python. Back compatibility has not been tested but the Python
programming environment is freely available at www.python.org. It also requires the
downloading and installation of the python package „numarray‟. The REHH values given
by the programme should NOT be used as it does not yet account for EHH values of 0 in
the calculation of REHH. This version is also quite computer memory intensive as it
requires a lot of re-accessing of HapMap data files. The author is currently working on a
quicker method which may be available in the near future. Contact
Krishna.veeramah@ucl.ac.uk for any further enquiries.
226
References
1. 1987. Second general census of population and housing of Cameroon. Volume
3:preliminary analysis.: SUPECAM, Yaounde.
2. The Concise International Chemical Assessment Document 49 (CICADA 49).
2003. 20 Avenue Appia, 1211 Geneva 27, Switzerland, UN Environment
Programme, the International Labour Organization and the World Health
Organization.
Ref Type: Generic
3. 2005. Ethnologue: Languages of the World. Dallas, Texas: SIL International.
4. Adelaar, A. 1995. Asian roots of the Malagasy: a linguistic perspective. Bijdragen
tot de Taal-Land en Volkenkunde, 151: 325-356.
5. Agrawal, S. & Khan, F. 2007. Human genetic variation and personalized
medicine. Indian J.Physiol Pharmacol., 51 (1): 7-28.
6. Aidoo, M. et al 2002. Protective effects of the sickle cell gene against malaria
morbidity and mortality. Lancet, 359 (9314): 1311-1312.
7. Akak, E. O. 1986. The Palestine Origin of the Efiks. Calabar: Akak and Sons.
8. Aklillu, E. et al 2003. Genetic polymorphism of CYP1A2 in Ethiopians affecting
induction and expression: characterization of novel haplotypes with single-
nucleotide polymorphisms in intron 1. Mol.Pharmacol., 64 (3): 659-669.
9. Aklillu, E. et al 2002. Functional analysis of six different polymorphic CYP1B1
enzyme variants found in an Ethiopian population. Mol.Pharmacol., 61 (3): 586-
594.
10. Aklillu, E. et al 1996. Frequent distribution of ultrarapid metabolizers of
debrisoquine in an ethiopian population carrying duplicated and multiduplicated
functional CYP2D6 alleles. J.Pharmacol.Exp.Ther., 278 (1): 441-446.
11. Allabi, A. C. et al 2003. Genetic polymorphisms of CYP2C9 and CYP2C19 in the
Beninese and Belgian populations. Br.J.Clin.Pharmacol., 56 (6): 653-657.
12. Allabi, A. C. et al 2005. Single nucleotide polymorphisms of ABCB1 (MDR1)
gene and distinct haplotype profile in a West Black African population.
Eur.J.Clin.Pharmacol., 61 (2): 97-102.
13. Allerston, C. K. et al 2007. Molecular evolution and balancing selection in the
flavin-containing monooxygenase 3 gene (FMO3). Pharmacogenet.Genomics, 17
(10): 827-839.
227
14. Alves, C. et al 2005. STR allelic frequencies for an African population sample
(Equatorial Guinea) using AmpFlSTR Identifiler and Powerplex 16 kits. Forensic
Sci.Int., 148 (2-3): 239-242.
15. Amos, W. & Manica, A. 2006. Global genetic positioning: evidence for early
human population centers in coastal habitats. Proc.Natl.Acad.Sci.U.S.A, 103 (3):
820-824.
16. Anderson, S. et al 1981. Sequence and organization of the human mitochondrial
genome. Nature, 290 (5806): 457-465.
17. Ardener, E. 1968. Documentary and linguistic evidence for the rise of the trading
polities between Rio del Rey and Cameroons. In: I. M. Lewis, ed., History and
Social Anthropology. London: 1500-1650.
18. Armour, J. A. et al 1996. Minisatellite diversity supports a recent African origin
for modern humans. Nat.Genet., 13 (2): 154-160.
19. Aylor, D. L., Price, E. W. & Carbone, I. 2006. SNAP: Combine and Map modules
for multilocus population genetic analysis. Bioinformatics., 22 (11): 1399-1401.
20. Bahuchet, S. 1992. Dans la Forêt d'Afrique Centrale:Les Pygmées Parmi le
Peuples d'Afrique Centrale., Histoire d'une civilisation forestière I. Paris: Peeters-
SELAF.
21. Bahuchet, S. 1993. La Recontre des Agriculteurs:Les Pygmées Parmi le Peuples.,
Histoire d'une civilisation forestière II. Paris: Peeters-SELAF.
22. Bandelt, H. J. et al 2001. Phylogeography of the human mitochondrial haplogroup
L3e: a snapshot of African prehistory and Atlantic slave trade. Ann.Hum.Genet.,
65 (Pt 6): 549-563.
23. Bapiro, T. E. et al 2002. The molecular and enzyme kinetic basis for the
diminished activity of the cytochrome P450 2D6.17 (CYP2D6.17) variant.
Potential implications for CYP2D6 phenotyping studies and the clinical use of
CYP2D6 substrate drugs in some African populations. Biochem.Pharmacol., 64
(9): 1387-1398.
24. Barbujani, G., Oden, N. L. & Sokal, R. R. 1989. Detecting Regions of Abrupt
Change in Maps of Biological Variables. Systematic Zoology, 38 (4): 376-389.
25. Basden, G. T. 1966. Among the Ibos of Nigeria. London: Frank Cass.
26. Bathum, L. et al 1999. Phenotypes and genotypes for CYP2D6 and CYP2C19 in a
black Tanzanian population. Br.J.Clin.Pharmacol., 48 (3): 395-401.
228
27. Batini, C. et al 2007. Phylogeography of the human mitochondrial L1c
haplogroup: genetic signatures of the prehistory of Central Africa.
Mol.Phylogenet.Evol., 43 (2): 635-644.
28. Behar, D. M. et al 2003. Multiple origins of Ashkenazi Levites: Y chromosome
evidence for both Near Eastern and European ancestries. Am.J.Hum.Genet., 73
(4): 768-779.
29. Beleza, S. et al 2005. The genetic legacy of western Bantu migrations.
Hum.Genet., 117 (4): 366-375.
30. Bender, M. L. 1997. Upside-down Afrasian. Afikanische Arbeitspapiere, 50: 19-
34.
31. Bentley, D. R. 2006. Whole-genome re-sequencing. Curr.Opin.Genet.Dev., 16
(6): 545-552.
32. Berniell-Lee, G. et al 2006. Y-chromosome diversity in Bantu and Pygmy
populations from Central Africa. International Congress Series, 1288: 234-236.
33. Bertorelle, G. & Barbujani, G. 1995. Analysis of DNA diversity by spatial
autocorrelation. Genetics, 140 (2): 811-819.
34. Bianchi, N. O. et al 1998. Characterization of ancestral and derived Y-
chromosome haplotypes of New World native populations. Am.J.Hum.Genet., 63
(6): 1862-1871.
35. Blench, R. 1995. Is Niger-Congo Simply a Branch of Nilo-Saharan. In: R. Nicolai
& F. Rottland, eds., Proceedings of the Fith Nilo-Saharan Linguistics Colloquium,
Nice, 1992. Cologne, Germany: Rudiger Koppe. 68-118.
36. Blench, R. 1999a. The Languages of Africa: Macrophyla Proposals and
Implications for Archaeological Interpretation. In: R. Blench & M. Spriggs, eds.
IV edn. London: Routledge. 29-47.
37. Blench, R. 1999b. The Westward Wanderings of Cushitic Pastoralists. In: C.
Baroin & J. Boutrais, eds., L'Homme et l'Animale Dans le Bassin du Lac Tchad.
Paris: IRD.
38. Blench, R. The Bendi languages: More lost Bantu languages. 2001.
Ref Type: Unpublished Work
39. Blench, R. 2006. Archaeology, Language, and the African Past. Lanham:
AltaMira Press.
229
40. Bolnick, D. A. et al 2004. Problematic use of Greenberg's linguistic classification
of the Americas in studies of Native American genetic variation.
Am.J.Hum.Genet., 75 (3): 519-522.
41. Bowcock, A. M. et al 1994. High resolution of human evolutionary trees with
polymorphic microsatellites. Nature, 368 (6470): 455-457.
42. Boyd, R. 1996. Chamba Daka and Bantoid: A further look at Chamba Daka
classification. Journal of West African Languages, 26 (2): 29-43.
43. Brandstatter, A. et al 2004. Mitochondrial DNA control region sequences from
Nairobi (Kenya): inferring phylogenetic parameters for the establishment of a
forensic database. Int.J.Legal Med., 118 (5): 294-306.
44. Brauer, G., Collard, M. & Stringer, C. 2004. On the reliability of recent tests of
the Out of Africa hypothesis for modern human origins. Anat.Rec.A
Discov.Mol.Cell Evol.Biol., 279 (2): 701-707.
45. Browning, S. R. & Browning, B. L. 2007. Rapid and accurate haplotype phasing
and missing-data inference for whole-genome association studies by use of
localized haplotype clustering. Am.J.Hum.Genet., 81 (5): 1084-1097.
46. Caglia, A. et al 1997. Y-chromosome STR loci in Sardinia and continental Italy
reveal islander-specific haplotypes. Eur.J.Hum.Genet., 5 (5): 288-292.
47. Campbell, L. Languages and Gene in Collaboration: some Practical Matters.
2006.
Ref Type: Unpublished Work
48. Cann, R. L., Stoneking, M. & Wilson, A. C. 1987. Mitochondrial DNA and
human evolution. Nature, 325 (6099): 31-36.
49. Cashman, J. R. 2000. Human flavin-containing monooxygenase: substrate
specificity and role in drug metabolism. Curr.Drug Metab, 1 (2): 181-191.
50. Cashman, J. R. 2005. Some distinctions between flavin-containing and
cytochrome P450 monooxygenases. Biochem.Biophys.Res.Commun., 338 (1):
599-604.
51. Cavaco, I. et al 2003. CYP3A4*1B and NAT2*14 alleles in a native African
population. Clin.Chem.Lab Med., 41 (4): 606-609.
52. Cavalli-Sforza, L. L. 1986. African Pygmies. Orlando, Florida: Academic Press.
53. Cavalli-Sforza, L. L., Menozzi, P. & Piazza, A. 1994. The History and Geography
of Human Genes. New Jersey: Princeton University Press.
230
54. Cerny, V. et al 2006. MtDNA of Fulani nomads and their genetic relationships to
neighboring sedentary populations. Hum.Biol., 78 (1): 9-27.
55. Cerny, V. et al 2004. mtDNA sequences of Chadic-speaking populations from
northern Cameroon suggest their affinities with eastern Africa. Ann.Hum.Biol., 31
(5): 554-569.
56. Cerny, V. et al 2007. A bidirectional corridor in the Sahel-Sudan belt and the
distinctive features of the Chad Basin populations: a history revealed by the
mitochondrial DNA genome. Ann.Hum.Genet., 71 (Pt 4): 433-452.
57. Chaubey, G. et al 2007. Peopling of South Asia: investigating the caste-tribe
continuum in India. Bioessays, 29 (1): 91-100.
58. Chelule, P. K. et al 2003. MDR1 and CYP3A4 polymorphisms among African,
Indian, and white populations in KwaZulu-Natal, South Africa.
Clin.Pharmacol.Ther., 74 (2): 195-196.
59. Chem-Langhee, B. & Fanso, V. G. 1997. Social categories, local politics and the
uses of oral tradition in Nso'. Paideuma, 43: 313-327.
60. Chen, Y. S. et al 2000. mtDNA variation in the South African Kung and Khwe-
and their genetic relationships to other African populations. Am.J.Hum.Genet., 66
(4): 1362-1383.
61. Chen, Y. S. et al 1995. Analysis of mtDNA variation in African populations
reveals the most ancient of all human continent-specific haplogroups.
Am.J.Hum.Genet., 57 (1): 133-149.
62. Chilver, E. M. & Kaberry, P. M. 1960. From Tribute to Tax in a Tikar Chiefdom.
Africa, 30 (1): 1-19.
63. Chilver, E. M. & Kaberry, P. M. 1968. Traditional Bamenda; The Pre-colonial
History and Ethnography of the Bamenda Grassfields.: Ministry of Primary
Education and Social Welfare and West Cameroon Antiquities Commision.
64. Chilver, E. M. & Kaberry, P. M. 1971. The Tikar problem: a non-problem.
Journal of African Languages, 10 (2): 13-14.
65. Coia, V. et al 2004. Binary and microsatellite polymorphisms of the Y-
chromosome in the Mbenzele pygmies from the Central African Republic.
Am.J.Hum.Biol., 16 (1): 57-67.
66. Coia, V. et al 2005. Brief communication: mtDNA variation in North Cameroon:
lack of Asian lineages and implications for back migration from Asia to sub-
Saharan Africa. Am.J.Phys.Anthropol., 128 (3): 678-681.
231
67. Collins-Schramm, H. E. et al 2002. Markers that discriminate between European
and African ancestry show limited variation within Africa. Hum.Genet., 111 (6):
566-569.
68. Connell, B. Unpublished fieldnotes. 1983.
Ref Type: Unpublished Work
69. Connell, B. 1994. The Lower Cross languages: a prolegomena to the classification
of the Cross River languages. Journal of West African Languages, XXIV (1): 3-
46.
70. Connell, B. 1998. Classifying Cross River. vol. 2. Lawrenceville, NJ: Africa
World Press. 17-25.
71. Connell, B. 2000. The Integrity of Mambiloid. In: H. E. Wolff & O. Gensler, eds.,
Proceedings from the 2nd World Congress of African Linguistics. Leipzig.
Cologne: pplyBrkRulesRüdiger Köppe Verlag. 197-213.
72. Connell, B. & Maison, K. B. 1994. A Cameroun homeland for the Lower Cross
languages? Sprache und Geschichte in Afrika, 15: 47-90.
73. Cox, M. 2007. Extreme patterns of variance in small populations: placing limits
on human Y-chromosome diversity through time in the Vanuatu Archipelago.
Ann.Hum.Genet., 71 (Pt 3): 390-406.
74. Cramon-Taubadel, N. & Lycett, S. J. 2007. Human cranial variation fits iterative
founder effect model with African origin. Am.J.Phys.Anthropol.
75. Cruciani, F. et al 2002. A back migration from Asia to sub-Saharan Africa is
supported by high-resolution analysis of human Y-chromosome haplotypes.
Am.J.Hum.Genet., 70 (5): 1197-1214.
76. Crystal, D. 1997. The Cambridge Encyclopedia of Language., 2nd edn.
Cambridge: Cambridge University Press.
77. Dandara, C. et al 2003. Arylamine N-acetyltransferase (NAT2) genotypes in
Africans: the identification of a new allele with nucleotide changes 481C>T and
590G>A. Pharmacogenetics, 13 (1): 55-58.
78. Dandara, C. et al 2001. Genetic polymorphism of CYP2D6 and CYP2C19 in east-
and southern African populations including psychiatric patients.
Eur.J.Clin.Pharmacol., 57 (1): 11-17.
79. Dandara, C. et al 2002. Genetic polymorphism of cytochrome P450 1A1
(Cyp1A1) and glutathione transferases (M1, T1 and P1) among Africans.
Clin.Chem.Lab Med., 40 (9): 952-957.
232
80. Darlu, P. & Tassy, P. 1987. Disputed African origin of human populations.
Nature, 329 (6135): 111-112.
81. Denbow, J. R. 1986. A new look at the later prehistory of the Kalahari. Journal of
African History, 27: 3-28.
82. Denbow, J. R. 1990. Congo to Kalahari:Data and hypotheses about the political
economy of the western stream of the Early Iron Age. African Archaelogical
Review, 8: 139-176.
83. Destro-Bisol, G. et al 2000. Microsatellite variation in Central Africa: an analysis
of intrapopulational and interpopulational genetic diversity. Am.J.Phys.Anthropol.,
112 (3): 319-337.
84. Destro-Bisol, G. et al 2004. The analysis of variation of mtDNA hypervariable
region 1 suggests that Eastern and Western Pygmies diverged before the Bantu
expansion. Am.Nat., 163 (2): 212-226.
85. Destro-Bisol, G. et al 1999. Estimating European admixture in African Americans
by using microsatellites and a microsatellite haplotype (CD4/Alu). Hum.Genet.,
104 (2): 149-157.
86. Di Giacomo, F. et al 2004. Y chromosomal haplogroup J as a signature of the
post-neolithic colonization of Europe. Hum.Genet., 115 (5): 357-371.
87. Dolphin, C. T. et al 1998. The flavin-containing monooxygenase 2 gene (FMO2)
of humans, but not of other primates, encodes a truncated, nonfunctional protein.
J.Biol.Chem., 273 (46): 30599-30607.
88. Donaldson, I. J. et al 2002. Unique TCR beta-subunit variable gene haplotypes in
Africans. Immunogenetics, 53 (10-11): 884-893.
89. Ehret, C. 2002. Languages Family Expansion:Broadening Our Understanding of
Cause from an African Perspective. In: P. Bellwood & C. Renfrew, eds.,
Examining the Farming/Language Dispersal Hypothesis. Cambridge: McDonald
Institute for Archaelogical Research. 163-176.
90. Eswaran, V. 2002. Rules A Diffusion Wave out of Africa: The Mechanism of the
Modern Human Revolution. Current Anthropology, 49: 1-18.
91. Eswaran, V., Harpending, H. & Rogers, A. R. 2005. Genomics refutes an
exclusively African origin of humans. J.Hum.Evol., 49 (1): 1-18.
92. Evans, P. D. et al 2006. Evidence that the adaptive allele of the brain size gene
microcephalin introgressed into Homo sapiens from an archaic Homo lineage.
Proc.Natl.Acad.Sci.U.S.A, 103 (48): 18178-18183.
233
93. Evans, W. E. 2003. Pharmacogenomics: marshalling the human genome to
individualise drug therapy. Gut, 52 Suppl 2: ii10-ii18.
94. Excoffier, L. 2002. Human demographic history: refining the recent African
origin model. Curr.Opin.Genet.Dev., 12 (6): 675-682.
95. Excoffier, L. & Langaney, A. 1989. Origin and differentiation of human
mitochondrial DNA. Am.J.Hum.Genet., 44 (1): 73-85.
96. Excoffier, L., Smouse, P. E. & Quattro, J. M. 1992. Analysis of molecular
variance inferred from metric distances among DNA haplotypes: application to
human mitochondrial DNA restriction data. Genetics, 131 (2): 479-491.
97. Fay, J. C. & Wu, C. I. 2000. Hitchhiking under positive Darwinian selection.
Genetics, 155 (3): 1405-1413.
98. Fenner, J. N. 2005. Cross-cultural estimation of the human generation interval for
use in genetics-based population divergence studies. Am.J.Phys.Anthropol., 128
(2): 415-423.
99. Filatov, D. A. 2002. proseq: A software for preparation and evolutionary analysis
of DNA sequence data sets. Molecular Ecology Notes, 2 (4): 621-624.
100. Fisher, H. J. 2001. Slavery in the History of Muslim Black Africa., 1st edn.
London: C.Hurst & Co. Ltd.
101. Flores, C. et al 2001. Y-chromosome differentiation in Northwest Africa.
Hum.Biol., 73 (4): 513-524.
102. Forde, D. & Jones, G. I. 1950. The Ibo and Ibibio-speaking Peoples of South-
eastern Nigeria. London: Oxford University Press.
103. Forster, P. et al 1998. Phylogenetic resolution of complex mutational features at
Y-STR DYS390 in aboriginal Australians and Papuans. Mol.Biol.Evol., 15 (9):
1108-1114.
104. Forster, P. et al 2000. A short tandem repeat-based phylogeny for the human Y
chromosome. Am.J.Hum.Genet., 67 (1): 182-196.
105. Fowler, I. & Zeitlyn, D. 1996. Introductory Essay: the Grassfields and the Tikar.
Oxford: Berghahn.
106. Fraaije, M. W. et al 2004. The prodrug activator EtaA from Mycobacterium
tuberculosis is a Baeyer-Villiger monooxygenase. J.Biol.Chem., 279 (5): 3354-
3360.
234
107. Freckleton, R. P. 2002. On the misuse of residuals in ecology:regression of
residuals vs. multiple regression. Journal of Animal Ecology, 71: 542-545.
108. Fu, Y. X. & Li, W. H. 1993. Statistical tests of neutrality of mutations. Genetics,
133 (3): 693-709.
109. Furnes, B. et al 2003. Identification of novel variants of the flavin-containing
monooxygenase gene family in African Americans. Drug Metab Dispos., 31 (2):
187-193.
110. Garrigan, D. & Hammer, M. F. 2006. Reconstructing human origins in the
genomic era. Nat.Rev.Genet., 7 (9): 669-680.
111. Garsa, A. A., McLeod, H. L. & Marsh, S. 2005. CYP3A4 and CYP3A5
genotyping by Pyrosequencing. BMC.Med.Genet., 6: 19.
112. Gene, M. et al 2001. The Bubi population of Equatorial Guinea characterised by
HUMTH01, HUMVWA31A, HUMCSF1PO, HUMTPOX, D3S1358, D8S1179,
D18S51 and D19S253 STR polymorphisms. Int.J.Legal Med., 114 (4-5): 298-300.
113. Goheen, M. 1996. Men Own the Fields, Women Own the Crops; Gender and
Power in the Cameroon Grassfields., 1st edn.: The University of Wisconsin Press.
114. Goldstein, D. B. et al 1995. Genetic absolute dating based on microsatellites and
the origin of modern humans. Proc.Natl.Acad.Sci.U.S.A, 92 (15): 6723-6727.
115. Goncalves, R., Spinola, H. & Brehm, A. 2007. Y-chromosome lineages in Sao
Tome e Principe islands: evidence of European influence. Am.J.Hum.Biol., 19 (3):
422-428.
116. Gonder, M. K. et al 2007. Whole-mtDNA genome sequence analysis of ancient
African lineages. Mol.Biol.Evol., 24 (3): 757-768.
117. Gong, Y. et al 2007. Single nucleotide polymorphism discovery and haplotype
analysis of Ca2+-dependent K+ channel beta-1 subunit.
Pharmacogenet.Genomics, 17 (4): 267-275.
118. Gonzalez, A. M. et al 2007. Mitochondrial lineage M1 traces an early human
backflow to Africa. BMC.Genomics, 8: 223.
119. Goudet, J. et al 1996. Testing differentiation in diploid populations. Genetics, 144
(4): 1933-1940.
120. Gower, J. C. 1966. Some distance properties of latent root and vector methods
used in multivariate analysis. Biometrika, 53: 325-328.
235
121. Green, R. E. et al 2006. Analysis of one million base pairs of Neanderthal DNA.
Nature, 444 (7117): 330-336.
122. Greenberg, J. H. 1955. Studies in African Linguistic Classification. Branford:
Compass.
123. Greenberg, J. H. 1963. The Languages of Africa. Bloomington: The Hague:
Mouton.
124. Gregersen E.A. 1972. Kongo-Saharan. Journal of African Languages, 11 (1): 69-
89.
125. Griese, E. U. et al 1999. Analysis of the CYP2D6 gene mutations and their
consequences for enzyme function in a West African population.
Pharmacogenetics, 9 (6): 715-723.
126. Griffiths, R. C. & Marjoram, P. 1996. Ancestral inference from samples of DNA
sequences with recombination. J.Comput.Biol., 3 (4): 479-502.
127. Gudschinsky, S. 1956. The ABC's of lexicostatistics. Word, 12: 175-210.
128. Güldemann, T. & Voßen, R. 2000. Khoesan. In: B. Heine & D. Nurse, eds.,
African Languages:An Introduction. Cambridge: Cambridge University Press. 99-
122.
129. Guo, S. W. & Thompson, E. A. 1992. Performing the exact test of Hardy-
Weinberg proportion for multiple alleles. Biometrics, 48 (2): 361-372.
130. Gusmao, L. et al 2001. STR data from S. Tome e Principe (Gulf of Guinea, West
Africa). Forensic Sci.Int., 116 (1): 53-54.
131. Gusmao, L. et al 2005. Mutation rates at Y chromosome specific microsatellites.
Hum.Mutat., 26 (6): 520-528.
132. Guthrie, M. 1967. Comparative Bantu: an introduction to the comparative
linguistics and prehistory of the Bantu languages. Farnborough: Gregg Press.
133. Hall, I. P. & Sayers, I. 2007. Pharmacogenetics and asthma: false hope or new
dawn? Eur.Respir.J., 29 (6): 1239-1245.
134. Hamblin, M. T. & Di Rienzo, A. 2000. Detection of the signature of natural
selection in humans: evidence from the Duffy blood group locus.
Am.J.Hum.Genet., 66 (5): 1669-1679.
236
135. Hamblin, M. T., Thompson, E. E. & Di Rienzo, A. 2002. Complex signatures of
natural selection at the Duffy blood group locus. Am.J.Hum.Genet., 70 (2): 369-
383.
136. Hammer, M. F. et al 1998. Out of Africa and back again: nested cladistic analysis
of human Y chromosome variation. Mol.Biol.Evol., 15 (4): 427-441.
137. Hammer, M. F. et al 2001. Hierarchical patterns of global human Y-chromosome
diversity. Mol.Biol.Evol., 18 (7): 1189-1203.
138. Hammer, M. F. et al 1997. The geographic distribution of human Y chromosome
variation. Genetics, 145 (3): 787-805.
139. Hanchard, N. et al 2007. Classical sickle beta-globin haplotypes exhibit a high
degree of long-range haplotype similarity in African and Afro-Caribbean
populations. BMC.Genet., 8: 52.
140. Hanchard, N. A. et al 2006. Screening for recently selected alleles by analysis of
human haplotype similarity. Am.J.Hum.Genet., 78 (1): 153-159.
141. Handley, L. J. et al 2007. Going the distance: human population genetics in a
clinal world. Trends Genet., 23 (9): 432-439.
142. Harding, R. M. et al 1997. Archaic African and Asian lineages in the genetic
ancestry of modern humans. Am.J.Hum.Genet., 60 (4): 772-789.
143. Harpending, H. & Rogers, A. 2000. Genetic perspectives on human origins and
differentiation. Annu.Rev.Genomics Hum.Genet., 1: 361-385.
144. Harpending, H. C. 1993. The genetic structure of ancient human populations.
Current Anthropology, 34: 483-496.
145. Hart, A. K. 1964. Report of the Enquiry into the Dispute over the Obongship of
Calabar. Enugu: Government Printer.
146. Hasegawa, M. & Horai, S. 1991. Time of the deepest root for polymorphism in
human mitochondrial DNA. J.Mol.Evol., 32 (1): 37-42.
147. Hawks, J. et al 2008. A genetic legacy from archaic Homo. Trends Genet., 24 (1):
19-23.
148. Hawks, J. et al 2000. Population bottlenecks and Pleistocene human evolution.
Mol.Biol.Evol., 17 (1): 2-22.
149. Hein, J. 1990. Reconstructing evolution of sequences subject to recombination
using parsimony. Math.Biosci., 98 (2): 185-200.
237
150. Henderson, M. C. et al 2004a. S-oxygenation of the thioether organophosphate
insecticides phorate and disulfoton by human lung flavin-containing
monooxygenase 2. Biochem.Pharmacol., 68 (5): 959-967.
151. Henderson, M. C. et al 2004b. Human flavin-containing monooxygenase form 2
S-oxygenation: sulfenic acid formation from thioureas and oxidation of
glutathione. Chem.Res.Toxicol., 17 (5): 633-640.
152. Hernandez, D. et al 2004. Organization and evolution of the flavin-containing
monooxygenase genes of human and mouse: identification of novel gene and
pseudogene clusters. Pharmacogenetics, 14 (2): 117-130.
153. Heyer, E. et al 1997. Estimating Y chromosome specific microsatellite mutation
frequencies using deep rooting pedigrees. Hum.Mol.Genet., 6 (5): 799-803.
154. Hines, R. N. et al 2002. Alternative processing of the human FMO6 gene renders
transcripts incapable of encoding a functional flavin-containing monooxygenase.
Mol.Pharmacol., 62 (2): 320-325.
155. Hirunsatit, R. et al 2007. Sequence variation and linkage disequilibrium in the
GABA transporter-1 gene (SLC6A1) in five populations: implications for
pharmacogenetic research. BMC.Genet., 8: 71.
156. Holtkemper, U. et al 2001. Mutation rates at two human Y-chromosomal
microsatellite loci using small pool PCR techniques. Hum.Mol.Genet., 10 (6):
629-633.
157. Horai, S. 1995. Evolution and the origins of man: clues from complete sequences
of hominoid mitochondrial DNA. Southeast Asian J.Trop.Med.Public Health, 26
Suppl 1: 146-154.
158. Horai, S. et al 1995. Recent African origin of modern humans revealed by
complete sequences of hominoid mitochondrial DNAs. Proc.Natl.Acad.Sci.U.S.A,
92 (2): 532-536.
159. Howells, W. W. 1976. Explaining modern man: Evolutionists Versus
migrationists. Journal of Human Evolution, 5 (5): 477-495.
160. Hudson, R. R. 2001. Two-locus sampling distributions and their application.
Genetics, 159 (4): 1805-1817.
161. Huffman, T. N. 1998. The antiquity of Lobola. South African Archaeological
Bulletin, 53: 57-62.
162. Hurles, M. E. et al 2002. Y chromosomal evidence for the origins of oceanic-
speaking peoples. Genetics, 160 (1): 289-303.
238
163. Ingman, M. & Gyllensten, U. 2001. Analysis of the complete human mtDNA
genome: methodology and inferences for human evolution. J.Hered., 92 (6): 454-
461.
164. Jackson, B. A. et al 2005. Mitochondrial DNA genetic diversity among four ethnic
groups in Sierra Leone. Am.J.Phys.Anthropol., 128 (1): 156-163.
165. Janmohamed, A. et al 2004. Cell-, tissue-, sex- and developmental stage-specific
expression of mouse flavin-containing monooxygenases (Fmos).
Biochem.Pharmacol., 68 (1): 73-83.
166. Jeffreys, M. D. W. 1964. Who are the Tikar? African Studies, 23 (3/4): 141-153.
167. Jobling, M., Hurles, M. E. & Tyler-Smith, C. 2004. Human Evolutionary
Genetics: Origins, People and Disease. Abingdon: Garland Science.
168. Jobling, M. A. & Tyler-Smith, C. 2003. The human Y chromosome: an
evolutionary marker comes of age. Nat.Rev.Genet., 4 (8): 598-612.
169. John, P. R. et al 2003. DNA polymorphism and selection at the melanocortin-1
receptor gene in normally pigmented southern African individuals.
Ann.N.Y.Acad.Sci., 994: 299-306.
170. Johnson, J. A. 2003. Pharmacogenetics: potential for individualized drug therapy
through genetics. Trends Genet., 19 (11): 660-666.
171. Jorde, L. B. et al 1995. Origins and affinities of modern humans: a comparison of
mitochondrial and nuclear genetic data. Am.J.Hum.Genet., 57 (3): 523-538.
172. Jorde, L. B. et al 1997. Microsatellite diversity and the demographic history of
modern humans. Proc.Natl.Acad.Sci.U.S.A, 94 (7): 3100-3103.
173. Jorde, L. B. et al 2000. The distribution of human genetic diversity: a comparison
of mitochondrial, autosomal, and Y-chromosome data. Am.J.Hum.Genet., 66 (3):
979-988.
174. Kaberry, P. M. 1952. Women of the Grassfields. London: HMSO.
175. Kaberry, P. M. 1959. Traditional Politics in Nsaw. Africa, 24 (4): 370.
176. Kaberry, P. M. 1962a. Retainers and Royal Households in the Cameroon
Grasslands. Cahiers D'Etudes Africaines, 3 (10): 282-298.
177. Kaberry, P. M. 1962b. The Date of the Bamun-Banso War 1885-1889. Man, 62
(s220): 140.
239
178. Kaessmann, H. et al 1999. DNA sequence variation in a non-coding region of low
recombination on the human X chromosome. Nat.Genet., 22 (1): 78-81.
179. Karafet, T. M. et al 2002. High levels of Y-chromosome differentiation among
native Siberian populations and the genetic signature of a boreal hunter-gatherer
way of life. Hum.Biol., 74 (6): 761-789.
180. Kayser, M. et al 1997. Evaluation of Y-chromosomal STRs: a multicenter study.
Int.J.Legal Med., 110 (3): 125-129.
181. Kayser, M. et al 2000. Characteristics and frequency of germline mutations at
microsatellite loci from the human Y chromosome, as revealed by direct
observation in father/son pairs. Am.J.Hum.Genet., 66 (5): 1580-1588.
182. Kayser, S. R. 2007. Pharmacogenomics and the potential for personalized
therapeutics in cardiovascular disease. Prog.Cardiovasc.Nurs., 22 (2): 104-107.
183. Ki-Zerbo, J. 1989. General History of Africa: Methodology and African
Prehistory.: James Currey Ltd.
184. Kimura, M. 1980. A simple method for estimating evolutionary rates of base
substitutions through comparative studies of nucleotide sequences. J.Mol.Evol., 16
(2): 111-120.
185. Kivisild, T. et al 2004. Ethiopian mitochondrial DNA heritage: tracking gene flow
across and around the gate of tears. Am.J.Hum.Genet., 75 (5): 752-770.
186. Kivisild, T. et al 2003. The genetic heritage of the earliest settlers persists both in
Indian tribal and caste populations. Am.J.Hum.Genet., 72 (2): 313-332.
187. Knight, A. et al 2003. African Y chromosome and mtDNA divergence provides
insight into the history of click languages. Curr.Biol., 13 (6): 464-473.
188. Krieter, P. A. et al 1984. Increased biliary GSSG efflux from rat livers perfused
with thiocarbamide substrates for the flavin-containing monooxygenase.
Mol.Pharmacol., 26 (1): 122-127.
189. Krings, M. et al 1999. mtDNA analysis of Nile River Valley populations: A
genetic corridor or a barrier to migration? Am.J.Hum.Genet., 64 (4): 1166-1176.
190. Krueger, S. K. et al 2002. Identification of active flavin-containing
monooxygenase isoform 2 in human lung and characterization of expressed
protein. Drug Metab Dispos., 30 (1): 34-41.
240
191. Krueger, S. K. et al 2005. Haplotype and functional analysis of four flavin-
containing monooxygenase isoform 2 (FMO2) polymorphisms in Hispanics.
Pharmacogenet.Genomics, 15 (4): 245-256.
192. Krueger, S. K. et al 2004. Differences in FMO2*1 allelic frequency between
Hispanics of Puerto Rican and Mexican descent. Drug Metab Dispos., 32 (12):
1337-1340.
193. Krueger, S. K. & Williams, D. E. 2005. Mammalian flavin-containing
monooxygenases: structure/function, genetic polymorphisms and role in drug
metabolism. Pharmacol.Ther., 106 (3): 357-387.
194. Krueger, S. K. et al 2001. Characterization of expressed full-length and truncated
FMO2 from rhesus monkey. Drug Metab Dispos., 29 (5): 693-700.
195. Lane, A. B. et al 2002. Genetic substructure in South African Bantu-speakers:
evidence from autosomal DNA and Y-chromosome studies.
Am.J.Phys.Anthropol., 119 (2): 175-185.
196. Lanfear, D. E. & McLeod, H. L. 2007. Pharmacogenetics: using DNA to optimize
drug therapy. Am.Fam.Physician, 76 (8): 1179-1182.
197. Lansing, J. S. et al 2007. Coevolution of languages and genes on the island of
Sumba, eastern Indonesia. Proc.Natl.Acad.Sci.U.S.A, 104 (41): 16022-16026.
198. Latham, A. J. H. 1973. Old Calabar., The impact of the international economy
upon a traditional society. Oxford: Clarendon Press. 1600-1891.
199. Lawton, M. P. et al 1994. A nomenclature for the mammalian flavin-containing
monooxygenase gene family based on amino acid sequence identities.
Arch.Biochem.Biophys., 308 (1): 254-257.
200. Lecerf, M. et al 2007. Allele frequencies and haplotypes of eight Y-short tandem
repeats in Bantu population living in Central Africa. Forensic Sci.Int., 171 (2-3):
212-215.
201. Lee, A. C. et al 2004. Molecular evidence for absence of Y-linkage of the Hairy
Ears trait. Eur.J.Hum.Genet., 12 (12): 1077-1079.
202. Li, J. et al 2005. [Analysis and application of SNP and haplotype in the human
genome]. Yi.Chuan Xue.Bao., 32 (8): 879-889.
203. Lichstein, J. 2007. Multiple regression on distance matrices: a multivariate spatial
analysis tool. Vegetatio, 188 (2): 117-131.
241
204. Liu, H. et al 2006. A geographically explicit genetic model of worldwide human-
settlement history. Am.J.Hum.Genet., 79 (2): 230-237.
205. Livingstone, F. B. 1984. The Duffy blood groups, vivax malaria, and malaria
selection in human populations: a review. Hum.Biol., 56 (3): 413-425.
206. Loktionov, A. et al 2002. Differences in N-acetylation genotypes between
Caucasians and Black South Africans: implications for cancer prevention. Cancer
Detect.Prev., 26 (1): 15-22.
207. Lovell, A. et al 2005. Ethiopia: between Sub-Saharan Africa and western Eurasia.
Ann.Hum.Genet., 69 (Pt 3): 275-287.
208. Lucotte, G. et al 1994. Reduced variability in Y-chromosome-specific haplotypes
for some Central African populations. Hum.Biol., 66 (3): 519-526.
209. Luis, J. R. et al 2004. The Levant versus the Horn of Africa: evidence for
bidirectional corridors of human migrations. Am.J.Hum.Genet., 74 (3): 532-544.
210. MacEachern, S. 2000. Genes, tribes, and African history. Current Anthropology,
41 (3): 357-384.
211. Macfarlane, C. & Simmonds, P. 2004. Allelic variation of HERV-K(HML-2)
endogenous retroviral elements in human populations. J.Mol.Evol., 59 (5): 642-
656.
212. Maddison, D. R. 1991. African Origin of Human Mitochondrial DNA
Reexamined. Systematic Zoology, 40 (3): 355-363.
213. Makarenkov, V. & Lapointe, F. J. 2004. A weighted least-squares approach for
inferring phylogenies from incomplete distance matrices. Bioinformatics., 20 (13):
2113-2121.
214. Manfredi, V. 1989. Igboid. In: J. Bendor-Samuel & V. Lanham, eds.University
Press of America. 337-358.
215. Margulies, M. et al 2005. Genome sequencing in microfabricated high-density
picolitre reactors. Nature, 437 (7057): 376-380.
216. Masimirembwa, C. et al 1995. Phenotyping and genotyping of S-mephenytoin
hydroxylase (cytochrome P450 2C19) in a Shona population of Zimbabwe.
Clin.Pharmacol.Ther., 57 (6): 656-661.
217. Masimirembwa, C. et al 1996a. Phenotype and genotype analysis of debrisoquine
hydroxylase (CYP2D6) in a black Zimbabwean population. Reduced enzyme
242
activity and evaluation of metabolic correlation of CYP2D6 probe drugs.
Eur.J.Clin.Pharmacol., 51 (2): 117-122.
218. Masimirembwa, C. et al 1996b. A novel mutant variant of the CYP2D6 gene
(CYP2D6*17) common in a black African population: association with
diminished debrisoquine hydroxylase activity. Br.J.Clin.Pharmacol., 42 (6): 713-
719.
219. Masimirembwa, C. M. et al 1993. Genetic polymorphism of cytochrome P450
CYP2D6 in Zimbabwean population. Pharmacogenetics, 3 (6): 275-280.
220. Mateu, E. et al 1997. A tale of two islands: population history and mitochondrial
DNA sequence variation of Bioko and Sao Tome, Gulf of Guinea.
Ann.Hum.Genet., 61 (Pt 6): 507-518.
221. Mehlotra, R. K. et al 2006. Prevalence of CYP2B6 alleles in malaria-endemic
populations of West Africa and Papua New Guinea. Eur.J.Clin.Pharmacol., 62
(4): 267-275.
222. Mellars, P. 2006. Why did modern human populations disperse from Africa ca.
60,000 years ago? A new model. Proc.Natl.Acad.Sci.U.S.A, 103 (25): 9381-9386.
223. Michalakis, Y. & Excoffier, L. 1996. A generic estimation of population
subdivision using distances between alleles with special reference for
microsatellite loci. Genetics, 142 (3): 1061-1064.
224. Migliano, A. B., Vinicius, L. & Lahr, M. M. 2007. Life history trade-offs explain
the evolution of human pygmies. Proc.Natl.Acad.Sci.U.S.A, 104 (51): 20216-
20219.
225. Mirghani, R. A. et al 2006. CYP3A5 genotype has significant effect on quinine 3-
hydroxylation in Tanzanians, who have lower total CYP3A activity than a
Swedish population. Pharmacogenet.Genomics, 16 (9): 637-645.
226. Mitchelson, K. R. 2003. The use of capillary electrophoresis for DNA
polymorphism analysis. Mol.Biotechnol., 24 (1): 41-68.
227. Mitnik, L. et al 2001. Recent advances in DNA sequencing by capillary and
microdevice electrophoresis. Electrophoresis, 22 (19): 4104-4117.
228. Modiano, D. et al 2001. HLA class I in three West African ethnic groups: genetic
distances from sub-Saharan and Caucasoid populations. Tissue Antigens, 57 (2):
128-137.
243
229. Mueller, J. C. & Andreoli, C. 2004. Plotting haplotype-specific linkage
disequilibrium patterns by extended haplotype homozygosity. Bioinformatics., 20
(5): 786-787.
230. Myers, S. et al 2005. A fine-scale map of recombination rates and hotspots across
the human genome. Science, 310 (5746): 321-324.
231. Mzeka, N. P. 1978. The Core Culture of Nso'. Agawam, Ma.: Jerome Radin Co.
232. Mzeka, N. P. 1990. Four Fons of Nso': Nineteenth and Early Twentieth Century
Kingship in the Western Grassfields of Cameroon. Bamenda Cameroon: The
Spider Publishing Enterprise.
233. Nasidze, I. et al 2004. Mitochondrial DNA and Y-chromosome variation in the
caucasus. Ann.Hum.Genet., 68 (Pt 3): 205-221.
234. Neal, R. A. & Halpert, J. 1982. Toxicology of thiono-sulfur compounds.
Annu.Rev.Pharmacol.Toxicol., 22: 321-339.
235. Nei, M. 1987. Molecular Evolutionary Genetics.: Columbia University Press.
236. Nei, M. & Ota, T. 1991. Evolutionary relationships of human populations at the
molecular level. In: S. Osowa & T. Honjo, eds., Evolutions of life. Tokyo:
Springer. 415-428.
237. Nei, M. & Roychoudhury, A. K. 1993. Evolutionary relationships of human
populations on a global scale. Mol.Biol.Evol., 10 (5): 927-943.
238. Neumann, K., Kalheber, S. & Uebel, D. 1998. Remains of woody plants from
Saouga, a medieval west African village. Vegetation History and Archaeobotany,
7: 57-77.
239. Niu, T. 2004. Algorithms for inferring haplotypes. Genet.Epidemiol., 27 (4): 334-
347.
240. Noah, M. E. 1980. Old Calabar: The City States and the Europeans. Calabar:
Scholars' Press.
241. O'Grady, R. T. et al 1989. Genes and tongues. Science, 243 (4899): 1651.
242. Olerup, O. et al 1991. HLA-DR and -DQ gene polymorphism in West Africans is
twice as extensive as in north European Caucasians: evolutionary implications.
Proc.Natl.Acad.Sci.U.S.A, 88 (19): 8480-8484.
243. Olivieri, A. et al 2006. The mtDNA legacy of the Levantine early Upper
Palaeolithic in Africa. Science, 314 (5806): 1767-1770.
244
244. Onderwater, R. C. et al 1999. Activation of microsomal glutathione S-transferase
and inhibition of cytochrome P450 1A1 activity as a model system for detecting
protein alkylation by thiourea-containing compounds in rat liver microsomes.
Chem.Res.Toxicol., 12 (5): 396-402.
245. Oscarson, M. et al 1997. A combination of mutations in the CYP2D6*17
(CYP2D6Z) allele causes alterations in enzyme function. Mol.Pharmacol., 52 (6):
1034-1040.
246. Page, R. D. 1996. TreeView: an application to display phylogenetic trees on
personal computers. Comput.Appl.Biosci., 12 (4): 357-358.
247. Panserat, S. et al 1999. CYP2D6 polymorphism in a Gabonese population:
contribution of the CYP2D6*2 and CYP2D6*17 alleles to the high prevalence of
the intermediate metabolic phenotype. Br.J.Clin.Pharmacol., 47 (1): 121-124.
248. Parfitt, T. 1997. Journey to the vanished city. London: Pheonix.
249. Parra, E. J. et al 1998. Estimating African American admixture proportions by use
of population-specific alleles. Am.J.Hum.Genet., 63 (6): 1839-1851.
250. Passarino, G. et al 1998. Different genetic components in the Ethiopian
population, identified by mtDNA and Y-chromosome polymorphisms.
Am.J.Hum.Genet., 62 (2): 420-434.
251. Patin, E. et al 2006. Sub-Saharan African coding sequence variation and haplotype
diversity at the NAT2 gene. Hum.Mutat., 27 (7): 720.
252. Penzak, S. R. et al 2007. Cytochrome P450 2B6 (CYP2B6) G516T influences
nevirapine plasma concentrations in HIV-infected patients in Uganda. HIV.Med.,
8 (2): 86-91.
253. Pereira, L. et al 2002. Bantu and European Y-lineages in Sub-Saharan Africa.
Ann.Hum.Genet., 66 (Pt 5-6): 369-378.
254. Pereira, L. et al 2001. Prehistoric and historic traces in the mtDNA of
Mozambique: insights into the Bantu expansions and the slave trade.
Ann.Hum.Genet., 65 (Pt 5): 439-458.
255. Persson, I. et al 1996. S-mephenytoin hydroxylation phenotype and CYP2C19
genotype among Ethiopians. Pharmacogenetics, 6 (6): 521-526.
256. Pesole, G. et al 1992. The evolution of the mitochondrial D-loop region and the
origin of modern man. Mol.Biol.Evol., 9 (4): 587-598.
245
257. Phillips, I. R. et al 1995. The molecular biology of the flavin-containing
monooxygenases of man. Chem.Biol.Interact., 96 (1): 17-32.
258. Piron, P. 1995a, Classfication interne du groupe bantöide, Université Libre de
Bruxelles.
259. Piron, P. 1995b. Identification lexicostatistique des groupes bantoïdes stables.
Journal of West African Languages, 25 (2): 3-39.
260. Piron, P. 1998. Internal classification of the Bantoid language group, with special
focus on the relation between Narrow Bantu, Southern Bantoid and Northern
Bantoid., Language History and Linguistic Description in Africa.Trenton N.J.:
Africa World Press. 65-74.
261. Plaza, S. et al 2004. Insights into the western Bantu dispersal: mtDNA lineage
analysis in Angola. Hum.Genet., 115 (5): 439-447.
262. Poloni, E. S. et al 1997. Human genetic affinities for Y-chromosome P49a,f/TaqI
haplotypes show strong correspondence with linguistics. Am.J.Hum.Genet., 61
(5): 1015-1035.
263. Price, D. 1979. Who are the Tikar now? Paideuma, 25: 89-98.
264. Price, E. W. & Carbone, I. 2005. SNAP: workbench management tool for
evolutionary population genetic analysis. Bioinformatics., 21 (3): 402-404.
265. Prugnolle, F., Manica, A. & Balloux, F. 2005. Geography predicts neutral genetic
diversity of human populations. Curr.Biol., 15 (5): R159-R160.
266. Qian, L. & Ortiz de Montellano, P. R. 2006. Oxidative activation of thiacetazone
by the Mycobacterium tuberculosis flavin monooxygenase EtaA and human
FMO1 and FMO3. Chem.Res.Toxicol., 19 (3): 443-449.
267. Quaranta, S. et al 2006. Ethnic differences in the distribution of CYP3A5 gene
polymorphisms. Xenobiotica, 36 (12): 1191-1200.
268. Quintana-Murci, L. et al 1999. Genetic evidence of an early exit of Homo sapiens
sapiens from Africa through eastern Africa. Nat.Genet., 23 (4): 437-441.
269. Ramsay, M. & Jenkins, T. 1988. Alpha-globin gene cluster haplotypes in the
Kalahari San and southern African Bantu-speaking blacks. Am.J.Hum.Genet., 43
(4): 527-533.
270. Rando, J. C. et al 1998. Mitochondrial DNA analysis of northwest African
populations reveals genetic exchanges with European, near-eastern, and sub-
Saharan populations. Ann.Hum.Genet., 62 ( Pt 6): 531-550.
246
271. Ray, N. et al 2005. Recovering the geographic origin of early modern humans by
realistic and spatially explicit simulations. Genome Research, 15 (8): 1161-1167.
272. Raymond, M. & Rousset, F. 1995. An Exact Test for Population Differentiation.
Evolution, 49 (6): 1280-1283.
273. Reed, F. A. & Tishkoff, S. A. 2006. African human diversity, origins and
migrations. Curr.Opin.Genet.Dev., 16 (6): 597-605.
274. Reed, T. E. 1969. Caucasian genes in American Negroes. Science, 165 (895): 762-
768.
275. Reich, D. E. & Goldstein, D. B. 1998. Genetic evidence for a Paleolithic human
population expansion in Africa. Proc.Natl.Acad.Sci.U.S.A, 95 (14): 8119-8123.
276. Reinbold, H. 2007. [Ethnic background related pharmacological differences].
MMW.Fortschr.Med., 149 (42): 34, 36.
277. Relethford, J. H. & Harpending, H. C. 1994. Craniometric variation, genetic
theory, and modern human origins. Am.J.Phys.Anthropol., 95 (3): 249-270.
278. Relethford, J. H. & Jorde, L. B. 1999. Genetic evidence for larger African
population size during recent human evolution. Am.J.Phys.Anthropol., 108 (3):
251-260.
279. Renfrew, C. 1992. Archaeology, genetic and linguistic diversity. Man, 27 (3):
445-478.
280. Renfrew, C., McMahon, A. & Trask, L. 2000. Time Depth in Historical
Linguistics. Cambridge, England: The McDonald Institute for Archaeological
Research.
281. Renquin, J. et al 2001. HLA class II polymorphism in Aka Pygmies and Bantu
Congolese and a reassessment of HLA-DRB1 African diversity. Tissue Antigens,
58 (4): 211-222.
282. Reynolds, J., Weir, B. S. & Cockerham, C. C. 1983. Estimation Of The
Coancestry Coefficient: Basis For A Short-Term Genetic Distance. Genetics, 105
(3): 767-779.
283. Richards, M. et al 2000. Tracing European founder lineages in the Near Eastern
mtDNA pool. Am.J.Hum.Genet., 67 (5): 1251-1276.
284. Richards, M. et al 2003. Extensive female-mediated gene flow from sub-Saharan
Africa into near eastern Arab populations. Am.J.Hum.Genet., 72 (4): 1058-1064.
247
285. Rosa, A. et al 2004. MtDNA profile of West Africa Guineans: towards a better
understanding of the Senegambia region. Ann.Hum.Genet., 68 (Pt 4): 340-352.
286. Rosa, A. et al 2007. Y-chromosomal diversity in the population of Guinea-Bissau:
a multiethnic perspective. BMC.Evol.Biol., 7: 124.
287. Rosser, Z. H. et al 2000. Y-chromosomal diversity in Europe is clinal and
influenced primarily by geography, rather than by language. Am.J.Hum.Genet., 67
(6): 1526-1543.
288. Rower, S. et al 2005. Short communication: high prevalence of the cytochrome
P450 2C8*2 mutation in Northern Ghana. Trop.Med.Int.Health, 10 (12): 1271-
1273.
289. Sabeti, P. et al 2002a. CD40L association with protection from severe malaria.
Genes Immun., 3 (5): 286-291.
290. Sabeti, P. C. et al 2002b. Detecting recent positive selection in the human genome
from haplotype structure. Nature, 419 (6909): 832-837.
291. Sabeti, P. C. et al 2006. Positive natural selection in the human lineage. Science,
312 (5780): 1614-1620.
292. Sabeti, P. C. et al 2007. Genome-wide detection and characterization of positive
selection in human populations. Nature, 449 (7164): 913-918.
293. Salas, A. et al 2002. The making of the African mtDNA landscape.
Am.J.Hum.Genet., 71 (5): 1082-1111.
294. Salas, A. et al 2004. The African diaspora: mitochondrial DNA and the Atlantic
slave trade. Am.J.Hum.Genet., 74 (3): 454-465.
295. Sanchez, J. J. et al 2005. High frequencies of Y chromosome lineages
characterized by E3b1, DYS19-11, DYS392-12 in Somali males.
Eur.J.Hum.Genet., 13 (7): 856-866.
296. Sanchez-Mazas, A. 2001. African diversity from the HLA point of view: influence
of genetic drift, geography, linguistics, and natural selection. Hum.Immunol., 62
(9): 937-948.
297. Sands, B. 1998. Eastern and Southern African Khoesan: Evaluating Claims of a
Distant Linguistic Relationship., Quellen zur Khoesan-Forschung 14. Cologne,
Germany: Rudiger Koppe.
248
298. Saunders, M. A., Hammer, M. F. & Nachman, M. W. 2002. Nucleotide variability
at G6pd and the signature of malarial selection in humans. Genetics, 162 (4):
1849-1861.
299. Saunders, M. A. et al 2005. The extent of linkage disequilibrium caused by
selection on G6PD in humans. Genetics, 171 (3): 1219-1229.
300. Schadeberg, T. C. 1986. The lexicostatistic base of Bennett & Sterk's
reclassification of Niger-Congo with particular reference to the cohesion of Bantu.
Studies in African Linguistics, 17: 69-83.
301. Schaeffeler, E. et al 2001. Frequency of C3435T polymorphism of MDR1 gene in
African people. Lancet, 358 (9279): 383-384.
302. Scheet, P. & Stephens, M. 2006. A fast and flexible statistical model for large-
scale population genotype data: applications to inferring missing genotypes and
haplotypic phase. Am.J.Hum.Genet., 78 (4): 629-644.
303. Schneider, S., Roessli, D. & Excoffier, L. Arlequin: A software for population
genetics data analysis. [Ver 2.000]. 2000. Genetics and Biometry Lab, Dept. of
Anthropology, University of Geneva.
Ref Type: Computer Program
304. Schuster, S. C. 2008. Next-generation sequencing transforms today's biology.
Nat.Methods, 5 (1): 16-18.
305. Scozzari, R. et al 1999. Combined use of biallelic and microsatellite Y-
chromosome polymorphisms to infer affinities among African populations.
Am.J.Hum.Genet., 65 (3): 829-846.
306. Scozzari, R. et al 1994. Genetic studies in Cameroon: mitochondrial DNA
polymorphisms in Bamileke. Hum.Biol., 66 (1): 1-12.
307. Scozzari, R. et al 1988. Genetic studies on the Senegal population. I.
Mitochondrial DNA polymorphisms. Am.J.Hum.Genet., 43 (4): 534-544.
308. Seielstad, M. et al 1999. A view of modern human origins from Y chromosome
microsatellite variation. Genome Research, 9 (6): 558-567.
309. Seielstad, M. T., Minch, E. & Cavalli-Sforza, L. L. 1998. Genetic evidence for a
higher female migration rate in humans. Nat.Genet., 20 (3): 278-280.
310. Semino, O. et al 2002. Ethiopians and Khoisan share the deepest clades of the
human Y-chromosome phylogeny. Am.J.Hum.Genet., 70 (1): 265-268.
249
311. Shendure, J. et al 2005. Accurate multiplex polony sequencing of an evolved
bacterial genome. Science, 309 (5741): 1728-1732.
312. Sim, S. C. et al 2006. A common novel CYP2C19 gene variant causes ultrarapid
drug metabolism relevant for the drug response to proton pump inhibitors and
antidepressants. Clin.Pharmacol.Ther., 79 (1): 103-113.
313. Slatkin, M. 1995. A measure of population subdivision based on microsatellite
allele frequencies. Genetics, 139 (1): 457-462.
314. Smith, F. H. 1985. Continuity and change in the origin of modern Homo sapiens.
Z.Morphol.Anthropol., 75 (2): 197-222.
315. Sokal, R. R. & Rohlf, F. J. 1994. Biometry., 3rd edn. New York: W. H. Freeman
and Co.
316. Soodyall, H. et al 1996. mtDNA control-region sequence variation suggests
multiple independent origins of an "Asian-specific" 9-bp deletion in sub-Saharan
Africans. Am.J.Hum.Genet., 58 (3): 595-608.
317. Soranzo, N. et al 2005. Positive selection on a high-sensitivity allele of the human
bitter-taste receptor TAS2R16. Curr.Biol., 15 (14): 1257-1265.
318. Spurdle, A. B. & Jenkins, T. 1996. The origins of the Lemba "Black Jews" of
southern Africa: evidence from p12F2 and other Y-chromosome markers.
Am.J.Hum.Genet., 59 (5): 1126-1133.
319. Steinlechner, M. et al 2002. Gabon black population data on the ten short tandem
repeat loci D3S1358, VWA, D16S539, D2S1338, D8S1179, D21S11, D18S51,
D19S433, TH01 and FGA. Int.J.Legal Med., 116 (3): 176-178.
320. Stephens, M. & Donnelly, P. 2003. A comparison of bayesian methods for
haplotype reconstruction from population genotype data. Am.J.Hum.Genet., 73
(5): 1162-1169.
321. Stephens, M., Smith, N. J. & Donnelly, P. 2001. A new statistical method for
haplotype reconstruction from population data. Am.J.Hum.Genet., 68 (4): 978-
989.
322. Stoneking, M. et al 1997. Alu insertion polymorphisms and human evolution:
evidence for a larger population size in Africa. Genome Research, 7 (11): 1061-
1071.
323. Stringer, C. 2002. Modern human origins: progress and prospects.
Philos.Trans.R.Soc.Lond B Biol.Sci., 357 (1420): 563-579.
250
324. Swadesh, M. 1952. Lexico-statistic dating of prehistoric ethnic contacts.
Proceeding of the American Philosophical Society, 96: 453-463.
325. Swadesh, M. 1955. Towards greater accuracy in lexicostatistic dating.
International Journal of American Linguistics, 21 (121): 137.
326. Swen, J. J. et al 2007. Translating pharmacogenomics: challenges on the road to
the clinic. PLoS.Med., 4 (8): e209.
327. Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by
DNA polymorphism. Genetics, 123 (3): 585-595.
328. Takahata, N., Lee, S. H. & Satta, Y. 2001. Testing multiregionality of modern
human origins. Mol.Biol.Evol., 18 (2): 172-183.
329. Talbot, P. A. 1912. In the Shadow of the Bush. London: Heinemann.
330. Tambets, K. et al 2004. The western and eastern roots of the Saami--the story of
genetic "outliers" told by mitochondrial DNA and Y chromosomes.
Am.J.Hum.Genet., 74 (4): 661-682.
331. Tang, K. et al 2004. Genomic evidence for recent positive selection at the human
MDR1 gene locus. Hum.Mol.Genet., 13 (8): 783-797.
332. Tardits, C. 1980. Le Royaume Bamoun. Paris: Libraire Armand Colin.
333. Tayeb, M. T. et al 2000. CYP3A4 promoter variant in Saudi, Ghanaian and
Scottish Caucasian populations. Pharmacogenetics, 10 (8): 753-756.
334. Templeton, A. 2002. Out of Africa again and again. Nature, 416 (6876): 45-51.
335. Templeton, A. R. 1997. Out of Africa? What do genes tell us?
Curr.Opin.Genet.Dev., 7 (6): 841-847.
336. Templeton, A. R. 2005. Haplotype trees and modern human origins.
Am.J.Phys.Anthropol., Suppl 41: 33-59.
337. Templeton, A. R. 2007. Genetics and recent human evolution. Evolution
Int.J.Org.Evolution, 61 (7): 1507-1519.
338. Tenesa, A. et al 2007. Recent human effective population size estimated from
linkage disequilibrium. Genome Research, 17 (4): 520-526.
339. Terreros, M. C., Martinez, L. & Herrera, R. J. 2005. Polymorphic Alu insertions
and genetic diversity among African populations. Hum.Biol., 77 (5): 675-704.
251
340. The Y Chromosome Consortium 2002. A Nomenclature System for the Tree of
Human Y-Chromosomal Binary Haplogroups. Genome Research, 12 (2): 339-
348.
341. Thomas, M. G. et al 2007. New genetic evidence supports isolation and drift in the
Ladin communities of the South Tyrolean Alps but not an ancient origin in the
Middle East. Eur.J.Hum.Genet.
342. Thomas, M. G., Bradman, N. & Flinn, H. M. 1999. High throughput analysis of
10 microsatellite and 11 diallelic polymorphisms on the human Y-chromosome.
Hum.Genet., 105 (6): 577-581.
343. Thomas, M. G. et al 2000. Y chromosomes traveling south: the cohen modal
haplotype and the origins of the Lemba--the "Black Jews of Southern Africa".
Am.J.Hum.Genet., 66 (2): 674-686.
344. Thomas, M. G. et al 2002. Founding mothers of Jewish communities:
geographically separated Jewish groups were independently founded by very few
female ancestors. Am.J.Hum.Genet., 70 (6): 1411-1420.
345. Thompson, R. F. 1983. Flash of the Spirit: African & Afro-American art &
philosophy. New York: Vintage.
346. Thorne, A. G. & Wolpoff, M. H. 1981. Regional continuity in Australasian
Pleistocene hominid evolution. Am.J.Phys.Anthropol., 55 (3): 337-349.
347. Tishkoff, S. A. et al 1996. Global patterns of linkage disequilibrium at the CD4
locus and modern human origins. Science, 271 (5254): 1380-1387.
348. Tishkoff, S. A. & Kidd, K. K. 2004. Implications of biogeography of human
populations for 'race' and medicine. Nat.Genet., 36 (11 Suppl): S21-S27.
349. Tishkoff, S. A. et al 2007. Convergent adaptation of human lactase persistence in
Africa and Europe. Nat.Genet., 39 (1): 31-40.
350. Tishkoff, S. A. et al 2001. Haplotype diversity and linkage disequilibrium at
human G6PD: recent origin of alleles that confer malarial resistance. Science, 293
(5529): 455-462.
351. Tishkoff, S. A. & Williams, S. M. 2002. Genetic analysis of African populations:
human evolution and complex disease. Nat.Rev.Genet., 3 (8): 611-621.
352. Tofanelli, S. et al 2003. Variation at 16 STR loci in Rwandans (Hutu) and
implications on profile frequency estimation in Bantu-speakers. Int.J.Legal Med.,
117 (2): 121-126.
252
353. Tomas, G. et al 2002. The peopling of Sao Tome (Gulf of Guinea): origins of
slave settlers and admixture with the Portuguese. Hum.Biol., 74 (3): 397-411.
354. Torroni, A. et al 2006. Harvesting the fruit of the human mtDNA tree. Trends
Genet., 22 (6): 339-345.
355. Torroni, A. et al 2000. mtDNA haplogroups and frequency patterns in Europe.
Am.J.Hum.Genet., 66 (3): 1173-1177.
356. Trask, L. 1997. The History of Basque. London: Routledge.
357. Trovoada, M. J. et al 2001. Evidence for population sub-structuring in Sao Tome e
Principe as inferred from Y-chromosome STR analysis. Ann.Hum.Genet., 65 (Pt
3): 271-283.
358. Trovoada, M. J. et al 2004. Pattern of mtDNA variation in three populations from
Sao Tome e Principe. Ann.Hum.Genet., 68 (Pt 1): 40-54.
359. Trovoada, M. J. et al 2007. Dissecting the genetic history of Sao Tome e Principe:
a new window from Y-chromosome biallelic markers. Ann.Hum.Genet., 71 (Pt 1):
77-85.
360. Udo, E. A. 1983. Who are the Ibibio? Onitsha: Africana-FEP Publishers.
361. Underhill, P. A. et al 2001. The phylogeography of Y chromosome binary
haplotypes and the origins of modern human populations. Ann.Hum.Genet., 65 (Pt
1): 43-62.
362. Underhill, P. A. et al 2000. Y chromosome sequence variation and the history of
human populations. Nat.Genet., 26 (3): 358-361.
363. Uya, O. E. 1984. A History of the Oron People. Oron: Manson.
364. Vannelli, T. A., Dykman, A. & Ortiz de Montellano, P. R. 2002. The
antituberculosis drug ethionamide is activated by a flavoprotein monooxygenase.
J.Biol.Chem., 277 (15): 12824-12829.
365. Vansina, J. 1990. Paths in the Rainforests: Toward a History of Political Tradition
in Equatorial Africa.: The University of Wisconsin Press.
366. Vansina, J. 1995. New Linguistic Evidence and the Bantu Expansion. Journal of
African History, 36 (2): 173-195.
367. Verrelli, B. C. & Tishkoff, S. A. 2004. Signatures of selection and gene
conversion associated with human color vision variation. Am.J.Hum.Genet., 75
(3): 363-375.
253
368. Vigilant, L. et al 1991. African populations and the evolution of human
mitochondrial DNA. Science, 253 (5027): 1503-1507.
369. Vizirianakis, I. S. 2004. Challenges in current drug delivery from the potential
application of pharmacogenomics and personalized medicine in clinical practice.
Curr.Drug Deliv., 1 (1): 73-80.
370. Voight, B. F. et al 2006. A map of recent positive selection in the human genome.
PLoS.Biol., 4 (3): e72.
371. Wainscoat, J. S. et al 1986. Evolutionary relationships of human populations from
an analysis of nuclear DNA polymorphisms. Nature, 319 (6053): 491-493.
372. Wall, J. D. & Hammer, M. F. 2006. Archaic admixture in the human genome.
Curr.Opin.Genet.Dev., 16 (6): 606-610.
373. Walsh, E. C. et al 2006. Searching for signals of evolutionary selection in 168
genes related to immune function. Hum.Genet., 119 (1-2): 92-102.
374. Watkins, W. S. et al 2001. Patterns of ancestral human diversity: an analysis of
Alu-insertion and restriction-site polymorphisms. Am.J.Hum.Genet., 68 (3): 738-
752.
375. Watson, E. et al 1996. mtDNA sequence diversity in Africa. Am.J.Hum.Genet., 59
(2): 437-444.
376. Watson, E. et al 1997. Mitochondrial footprints of human expansions in Africa.
Am.J.Hum.Genet., 61 (3): 691-704.
377. Watterson, G. A. 1975. On the number of segregating sites in genetical models
without recombination. Theor.Popul.Biol., 7 (2): 256-276.
378. Weidenreich, F. 1946. Apes, Giants and Men. Chicago: University of Chicago
Press.
379. Weinshilboum, R. 2003. Inheritance and drug response. N.Engl.J.Med., 348 (6):
529-537.
380. Weng, Z. & Sokal, R. R. 1995. Origins of Indo-Europeans and the spread of
agriculture in Europe: comparison of lexicostatistical and genetic evidence.
Hum.Biol., 67 (4): 577-594.
381. Wennerholm, A. et al 2002. The African-specific CYP2D617 allele encodes an
enzyme with changed substrate specificity. Clin.Pharmacol.Ther., 71 (1): 77-88.
254
382. Wennerholm, A. et al 2001. Characterization of the CYP2D6*29 allele commonly
present in a black Tanzanian population causing reduced catalytic activity.
Pharmacogenetics, 11 (5): 417-427.
383. Wennerholm, A. et al 1999. Decreased capacity for debrisoquine metabolism
among black Tanzanians: analyses of the CYP2D6 genotype and phenotype.
Pharmacogenetics, 9 (6): 707-714.
384. Wessel, P. & Smith, W. 1998. New, improved version of Generic Mapping Tools
released. EOS Transactions, 79 (47): 579.
385. Whetstine, J. R. et al 2000. Ethnic differences in human flavin-containing
monooxygenase 2 (FMO2) polymorphisms: detection of expressed protein in
African-Americans. Toxicol.Appl.Pharmacol., 168 (3): 216-224.
386. Wilder, J. A. et al 2004. Global patterns of human mitochondrial DNA and Y-
chromosome structure are not influenced by higher migration rates of females
versus males. Nat.Genet., 36 (10): 1122-1125.
387. Wilke, R. A. et al 2007. Identifying genetic risk factors for serious adverse drug
reactions: current progress and challenges. Nat.Rev.Drug Discov., 6 (11): 904-
916.
388. Williamson, K. & Blench, R. 2000. Niger-Congo. Cambridge: Cambridge
University Press. 11-42.
389. Wills, C. 1992. Human origins. Nature, 356 (6368): 389-390.
390. Wilson, J. F. et al 2001. Population genetic structure of variable drug response.
Nat.Genet., 29 (3): 265-269.
391. Witherspoon, D. J. et al 2006. Human population genetic structure and diversity
inferred from polymorphic L1(LINE-1) and Alu insertions. Hum.Hered., 62 (1):
30-46.
392. Wojnowski, L. et al 2004. Increased levels of aflatoxin-albumin adducts are
associated with CYP3A5 polymorphisms in The Gambia, West Africa.
Pharmacogenetics, 14 (10): 691-700.
393. Wood, E. T. et al 2005. Contrasting patterns of Y chromosome and mtDNA
variation in Africa: evidence for sex-biased demographic processes.
Eur.J.Hum.Genet., 13 (7): 867-876.
394. Xue, Y. et al 2006. Spread of an inactive form of caspase-12 in humans is due to
recent positive selection. Am.J.Hum.Genet., 78 (4): 659-670.
255
395. Yu, N. et al 2002. Larger genetic differences within africans than between
Africans and Eurasians. Genetics, 161 (1): 269-274.
396. Yu, N. et al 2001. Global patterns of human DNA sequence variation in a 10-kb
region on chromosome 1. Mol.Biol.Evol., 18 (2): 214-222.
397. Yueh, M. F., Krueger, S. K. & Williams, D. E. 1997. Pulmonary flavin-containing
monooxygenase (FMO) in rhesus macaque: expression of FMO2 protein, mRNA
and analysis of the cDNA. Biochim.Biophys.Acta, 1350 (3): 267-271.
398. Zalloua, P. A. et al 2008. Y-chromosomal diversity in Lebanon is structured by
recent historical events. Am.J.Hum.Genet., 82 (4): 873-882.
399. Zegura, S. L. et al 2004. High-resolution SNPs and microsatellite haplotypes point
to a single, recent entry of Native American Y chromosomes into the Americas.
Mol.Biol.Evol., 21 (1): 164-175.
400. Zeigler-Johnson, C. M. et al 2002. Ethnic differences in the frequency of prostate
cancer susceptibility alleles at SRD5A2 and CYP3A4. Hum.Hered., 54 (1): 13-21.
401. Zeitlyn, D. & Connell, B. 2003. Ethnogenesis and Fractal History on the African
Frontier: Mambila-Njerep-Mandulu. Journal of African History, 44 (1): 117-138.
402. Zekraoui, L. et al 1997. High frequency of the apolipoprotein E *4 allele in
African pygmies and most of the African populations in sub-Saharan Africa.
Hum.Biol., 69 (4): 575-581.
403. Zhao, Z. et al 2000. Worldwide DNA sequence variation in a 10-kilobase
noncoding region on human chromosome 22. Proc.Natl.Acad.Sci.U.S.A, 97 (21):
11354-11358.
404. Zhivotovsky, L. A. et al 2004. The effective mutation rate at Y chromosome short
tandem repeats, with application to human population-divergence time.
Am.J.Hum.Genet., 74 (1): 50-61.