БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА...

57
БИОИНФОРМАТИЧЕСКИЕ УЛОВКИ ДЛЯ АНАЛИЗА ДРЕВНИХ ДНК TATIANA TATARINOVA

Transcript of БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА...

Page 1: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

БИОИНФОРМАТИЧЕСКИЕУЛОВКИ ДЛЯ АНАЛИЗА

ДРЕВНИХ ДНК

TATIANA TATARINOVA

Page 2: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference
Page 3: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

WHAT HAVE BEEN SEQUENCED?

Page 4: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

KHAZARIA: WHERE AND WHEN?

• КАК НЫНЕ СБИРАЕТСЯ ВЕЩИЙ ОЛЕГ

• ОТМСТИТЬ НЕРАЗУМНЫМ ХОЗАРАМ*,

• ИХ СЕЛЫ И НИВЫ ЗА БУЙНЫЙ НАБЕГ

• ОБРЕК ОН МЕЧАМ И ПОЖАРАМ.

*Хозары — кочевой народ, некогда обитавший на юге России.

Page 5: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

KHAZARIAN PUZZLE

Khazars were mentioned first by several Arabic historians in VIII century AD, and last in XIII century, as one of the peopleconquered by Baty-khan

Page 6: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference
Page 7: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

CLAIMS OF ASHKENAZI CONNECTION?

ARTHUR KOESTLER THE THIRTEENTH TRIBE THE KHAZAR EMPIRE AND ITS HERITAGE HUTCHINSON OF

LONDON, LONDON 1976

Lev Gumilev, Discovery of Khazaria

No written sources from Khazaria other than three manuscripts in ancient Hebrew

One of the rules was called Joseph

Jewish artefacts

Legends about Jewish practices

Page 8: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

THE MISSING LINK OF JEWISH EUROPEAN ANCESTRY: CONTRASTING THE RHINELAND AND THE KHAZARIAN

HYPOTHESES BY ERAN ELHAIKGENOME BIOLOGY AND EVOLUTION, 2013

Page 9: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

SO, WHO IS MR. KHAZAR?

aDNA may provide answers to this historic riddle

Page 10: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

Ingredient 1: high quality input

Stepped grave vs Niche grave

Page 11: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference
Page 12: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

ANCIENT DNA (ADNA) IS THUS EXPECTED TO REVOLUTIONIZE EVOLUTIONARY

GENETICS IN THE SAME MANNER THAT SYSTEMATIC APPROACH TO ANALYSIS OF

FOSSIL RECORDS REVOLUTIONIZED PALEONTOLOGY: IT IS A DIRECT WINDOW INTO

THE PAST ‒ A “TIME CAPSULE”.

RECENTLY DNA SAMPLES WERE OBTAINED FROM NEANDERTHAL, DENISOVA,

MAMMOTH, PALEO-HORSE, ANCIENT SEEDS ETC.

Many of the questions we addressed in this paperToward high-resolution population genomics using archaeological samples

Irina Morozova, Pavel Flegontov, Alexander Mikheyev, Hosseinali Asgharian, Petr Ponomarenko, Vladimir Klyuchnikov, GaneshPrasad ArunKumar, Sergey Bruskin,Egor Prokhortchouk, Yuriy Gankin, Evgeny Rogaev, Yuri Nikolsky, Ancha Baranova,Eran Elhaik, Tatiana V. Tatarinova, DNA Research 2016

Ingredient 2: high-quality

sequencing

Page 13: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

GENOTYPING SEQUENCING

Potentially, every position on a

genome is studied. However, quality

is variable, lower than for the SNP

chip (0.1% error is achieved for 75%

of bases). Some areas require read

depths of 100 or more.

Large, but limited number of high-quality calls.1 million SNPs can be genotyped for $100Error rate (wrong calls) <1% (reported by 23 and me)

Page 14: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

Der Sarkissian et al. 2015

http://mammoth.psu.edu/hair.html

Page 15: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

QUALITY ASSESSMENT

• NUMBER OF SNPS 124,780,238

• QUALITY (Q) (3.01, 226.77)

• MEAN Q 14.14, MEDIAN Q 7.80

• AVERAGE DEPTH OF COVERAGE 5

• COVERED 1-2% OF GENOME

Consider 300 bronze age genomes published in 2014-2015• Allentoft et al. 2015 (RISE*)• Haak, Lazaridis et al. Nature 2015 (I0*)• Gamba et al. Nature Communications 2014 (I1*)• Mathieson et al. 2015 (I*)

Page 16: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

SYSTEMATIC ERRORS

Samples I and Rise: both sequenced on HiSeqRise: whole genome, I - targeted

Page 17: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

SO, WE DEAL WITH POOR QUALITY

• LOW COVERAGE

• POOR QUALITY

• INSUFFICIENT NUMBER OF SNPS PER INDIVIDUAL

• DIFFERENT GROUPS GET DIFFERENT RESULTS FROM THE SAME SAMPLES

• INDIVIDUAL SNPS CANNOT BE TRUSTED!

Page 18: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

APPROACH: AGGREGATION

• ADMIXTURE

• GPS

• PATHWAYS

• USING PROBABILITY TO MODEL

• LOCALLY AGGREGATED ANCESTRY RLAI

Page 19: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

To infer population structure from genotype data, it is necessary to first reduce the

dimensionality of the dataset due to the thousands of SNPs it encompasses.

From SNPs to Admixture

Thousands of SNPs

North EastAsian Mediterranian South African

South West Asian Native American Oceanian South East Asian

NorthernEuropean

Sub-SaharanAfrican

HGDP00985 0.5253 0.0202 0 0.2222 0.0404 0.0101 0.0101 0.1717 0

HGDP01094 0.04 0.04 0 0.03 0.83 0 0.01 0.05 0

HGDP00982 0.0102 0.1531 0.0306 0.0714 0.0408 0 0.0102 0.2041 0.4796

ADMIXTURE

Admixture proportions in geographically adjacent populations, such as Italian and Greeks, and populations sharing similar history, like British and Germans, are similar.

19

Page 20: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

GPS ORIGIN PREDICTION

20

A B

X ΔGEO = α × ΔGEN + 𝛽

Page 21: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

APPLICATION OF GPS TO ADNA (BRONZE AGE)

30 OUT OF 100 BRONZE AGE

SAMPLES (ALLENTOFT ET AL

2015) HAD OVER 500 OF

ANCESTRY INFORMATIVE

MARKERS.

WE APPLIED GPS ALGORITHM TO

FIND THE CLOSEST MODERN

POPULATION.

Page 22: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

GPS accurately assigned:

• ~100% of all individuals to their continental regions• 80% of all individuals to their country of origin• 60% of all individuals to their inner-country region

22

Page 23: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

PCA

Page 24: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

I1280RISE568

BritishI0550I0246Tatars

RISE546I0060I0115I0235

RISE240RISE154

PolandI0805

ChuvashsRISE562

BulgariansI1530I1281I1303

Ashkenazi_PolandNorthern Caucasian

NogaisSephardic Jews B

NorthEastAsian Mediterranean SouthAfrican SouthWestAsian NativeAmerican

Oceanian SouthEastAsian NorthernEuropean SubsaharanAfrican

Page 25: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

SNPS PER PATHWAYS

Changes in biological pathways during 6,000 years of civilization in Europe, Chekalin et al, 2018,Molecular Biology and Evolution

Page 26: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

• READMIX DEVELOPED TO TREAT INDIVIDUALS OF MIXED

ORIGIN AND REPRESENTS AN INDIVIDUAL AS A LINEAR

COMBINATION OF ADMIXTURE VECTORS OF REFERENCE

POPULATIONS

• 30%BRITISH+10%RUSSIAN+60%CHINESE

• P=A1RS1 + A2RS2 +... + APRSP+ERROR26

More complex cases?

reAdmix

Page 27: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

HOW IT WORKS

• WE ASSUME RIGHT AWAY THAT THE GIVEN ANCIENT PROPORTIONS CONTAIN

ERROR

• START WITH A GUESS POPULATION

• ADD/REMOVE POPULATIONS TO ACHIEVE OPTIMAL FIT

• CONDITIONAL OPTIMIZATION (SUCH AS “I KNOW THAT THERE WAS A JEWISH

• ANCESTOR SOMEWHERE IN MY PEDIGREE”)

27

Page 28: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

READMIX APPROACHAim: to find the smallest subset of modern populations whose combinedadmixture components are similar to those of the individual within a smalltolerance margin.

The algorithm consists of three phases:

1. Iteratively build the first candidate solution and improve it.

2. Generate the predefined number M of additional candidate solutionsrandomly and apply the Differential Evolution (DEEP).

3. Identify the populations that have stable membership in the solution acrossthe set, that is, are part of solution in at least 75% of cases.

Let R={ri}

i=1..Ibe the set of modern populations where

ri=(ri,1, ..., ri,K) and K is the dimension (K=9).

We seek two sets S=(s1,...,s

p) and A=(a

1,...,a

p) where

siare the indices of modern populations a

iare the coefficients of modern populations

in the approximation

each

of test vector T

Page 29: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

SOHN ET (2012) AL BENCHMARK

• 2 COMPONENTS

• 4 COMPONENTS

4-dim space: European, African, Native American and East Asian

Color coding: red-European, green-African, yellow- Native American, blue-East

Asian, and white- unassigned

Page 30: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

reAdmix

Page 31: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

RLAI (ROBUST INFERENCE OF LOCAL ANCESTRY)

Page 32: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

RLAI METHOD

In every window find the most similar position

Page 33: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

COMPARISON WITH OTHERS

• LAMP

• PROBABILITY OF A SEGMENT TO BELONG TO A SPECIFIC POPULATION

• LAMP-ANC

• MODIFICATION OF LAMP, SKIPPING ESTIMATION OF ANCESTRAL ALLELES, THEREFORE MORE

RELIABLE

• RFMIX

• TREATS ORIGIN IS A HIDDEN PARAMETER

Page 34: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

COMPARISON

• RFMIX HAS THE HIGHEST ACCURACY FOR MIXES EUROPE-JAPAN AND

EUROPE -AFRICA. TRIPLE MIXES SHOW DROP IN QUALITY

Page 35: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

RLAI

• RLAI SHOWS ACCURACY ABOVE 0.9 FOR ALL MIXES INCLUDING TRIPLE

Page 36: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

RLAI ACCURACY AS A FUNCTION OF GENERATIONS

Page 37: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

ZOOMING IN AND OUT

Page 38: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

Unique ID Part

Archaeological culture Reference

Date (2-sigma)

Min Аgе

Max Age

Location

Country Lat Lon

Coverage SNPs Sex #reads

mtDNA haplogroup

% endogenous

I0047 ToothCentral_LNBA

Haak, Lazaridis et al. Nature 2015

2111-1891 cal BCE 4037 3952

Halberstadt-Sonntagsfeld

Germany 51.89 11.04 1.655

836,247 F

17,431,013 V9 0.449

CORDED WIRE ANALYSIS

Page 39: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

Das et al 2016Behar et al 2013

Page 40: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

Sample Bone Gender Age Century Location Race

67 Left humerusM 35-40 IX Martynovsky district

mongoloid

166 Left femur F 25-30 VIII-IX Martynovsky district mongoloid

531Right tibia and left

ulna M 35-40 VIII-IX Dubosvky districteuropoid

619 Left femur M 35-40 VII-VIII Dubosvky district mongoloid

656 Right tibiaM 30-35 VII-VIII Dubosvky district europoid (?)

1251 Left humerus M 40 IX Zimovnikovsky district undefined

1564 Left tibiaM 25-35 VIII-IX Belokalitvinsky distict europoid (?)

1566 Right humerusM 35-40 VIII-IX Belokalitvinsky distict undefined

1986Right humerus and left

tibia M 35-45 VIII-X Orlovsky district europoid (?)

Page 41: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

NINE 8TH -9TH CENTURY GENOMES OF KHAZARS

DNA extraction conducted in two labs independentlySequencing performed by Dr. Mikheyev (OIST)Test all samples on MiSeq and the best samples on HiSeq0.32-0.48 of human genome coveredAverage depth of coverage ~0.75X

Page 42: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

BIOINFORMATICS PROCEDURES

• USING MULTIPLE PIPELINES IN PARALLEL

• PALEOMIX (SCHUBERT ET AL,

CHARACTERIZATION OF ANCIENT AND

MODERN GENOMES BY SNP DETECTION

AND PHYLOGENOMIC AND METAGENOMIC

ANALYSIS USING PALEOMIX. NAT PROTOC.

2014)

• MAPDAMAGE, SCHMUTZI, ANGSD, FASTQC,

CUTADAPT, FOLLOWED BY GATK

• PILEUPCALLER

(HTTP://STEPHANSCHIFFELS.DE/SOFTWARE/)

Page 43: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

MTDNA

Sample 67 166 531 619 656 1251 1564 1566 1986

Coverage 30.69 62.11 5.43 7.51 30.86 71.07 86.44 31.29 38.52

Haplogroup D4e5 C4 X2e2 H1a3 C4a1 H5b H13c1 D4b1a1a C4a1c

Using BAM Analysis Kit

Page 44: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

YDNA

• 619 - Q

• 1986 - R1A

• 1251 - R1A

• 656 - C3

Page 45: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

NGSADMIX ANALYSIS

Page 46: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

ANCESTRY INFORMATIVE MARKERS ADMIXTURE

Sample Q20 DP2 Q30 DP2 Q20 DP3 Q30 DP3

1251 7057 6715 1347 1140

1566 7404 7115 1380 1158

1564 3448 3274 512 439

166 6538 6289 1113 927

1986 10049 9572 2273 1886

656 877 858 57 47

531 389 385 8 7

67 1166 1152 79 75

619 1041 1036 37 35

Page 47: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

GPS ANALYSIS, MODERN AND ANCIENT SAMPLES AS REFERENCE

GPS algorithm: Elhaik, Tatarinova et al (2014)

SAMPLE NEAREST MODERN DISTANCE NEAREST ANCIENT DISTANCE

1251 Tajik 0.18 Steppe MLBA 0.06

1564 Lebanese 0.16 Levant BA 0.19

1566 Yakut 0.04 Pazyryk IA (Altai) 0.27

166 Evenk 0.09 Pazyryk IA (Altai) 0.52

1986 Shor 0.10 Pazyryk IA (Altai) 0.14

531 Ishkasim 0.19 Early Sarmatian IA 0.17

619 Turkmen 0.26 Pazyryk IA (Altai) 0.40

656 Kazakh 0.29 Pazyryk IA (Altai) 0.41

67 Khanty 0.12 Pazyryk IA (Altai) 0.16

Page 48: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

READMIX ANALYSIS, MODERN REFERENCE

Sample

Populations Proportions

1251 Turkmen Abkhazian Belarusian Yizu 0.494 0.142 0.238 0.127

1564 Druze 0.649

1566 Yakut 1

166 Yakut Even Sakha 0.637 0.339

1986 Yakut Saami Abhaz 0.368 0.428 0.167

531 Yaghnobi(Tajikistan)

Kets Kurmi Selkup 0.506 0.053 0.202 0.239

619 Egypt Yakut Azeri Yizu 0.315 0.183 0.206 0.297

656 Mongolian Even Sakha Egypt Yizu 0.369 0.252 0.193 0.187

67 Yakut 0.623

Page 49: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

READMIX, ANCIENT REFERENCESample

Populations Proportions

1251 Steppe Eneolithic

Anatolia Neolithic

SE Iberia CA Zevakino Chilikta IA

0.624 0.149 0.124 0.103

1564 Peloponnese Neolithic

0.595

1566 Pazyryk IA 1.000

166 Pazyryk IA 1.000

1986 Pazyryk IA 0.832

531 Beaker Central Europe

Armenia MLBA

Yamnaya Ukraine

Maros.SG 0.388 0.217 0.258 0.138

619 Beaker Central Europe

Pazyryk IA Anatolia Neolithic

Peloponnese Neolithic

0.489 0.237 0.175 0.099

656 Beaker Central Europe

Armenia MLBA

Pazyryk IA 0.443 0.299 0.193

67 Pazyryk IA 0.875

Page 50: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference
Page 51: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

Ancestry and demography and descendants of Iron Age nomads of the Eurasian Steppe, Martina Unterlander et al, Nature Comm 2017

Page 52: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

F3 OUTGROUP

166 1564

Page 53: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

OVERLAP WITH THE ASHKENAZI GENOME CONSORTIUM MARKERS

• SEQUENCING AN ASHKENAZI

REFERENCE PANEL SUPPORTS

POPULATION-TARGETED

PERSONAL GENOMICS AND

ILLUMINATES JEWISH

• AND EUROPEAN ORIGINS,

SHAI CARMI, KEN Y. HUI,…, ITSIK PE’ER, NATURE COMMUNICATIONS VOLUME 5,

ARTICLE NUMBER: 4835 (2014)

Sample 1251 1564 1566 166 1986 531 619 656 67

Total 3.1E+09 3.1E+09 3.1E+09 3.1E+09 3.1E+09 3.1E+09 3.1E+09 3.1E+09 3.1E+09

Overlapping positions (out

of 953mutations in

known Ashkenazi

genes)

247 190 301 282 327 21 23 48 70

Same allele as in Ashkenazi database

1 0 3 1 5 0 0 0 0

Page 54: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

ASHKENAZIM AND KHAZARS

1. NO SIGNIFICANT ASHKENAZI GENETIC AFFINITY WAS DETECTED IN ANY OF THE SEQUENCED INDIVIDUALS

2. ALL OF THE STUDIED KHAZARS, EVEN THOSE WITH SIGNIFICANT CAUCASIAN ANCESTRY, HAD SIGNIFICANT

ASIATIC NUCLEAR GENETIC CONTRIBUTIONS, WHICH ARE MISSING FROM PRESENT-DAY JEWISH POPULATIONS

3. WHILE LOCAL WOMEN WERE RECRUITED INTO ASHKENAZI COMMUNITIES, NONE OF THE IDENTIFIED

MITOCHONDRIAL HAPLOTYPES ARE COMMON IN PRESENT-DAY ASHKENAZI JEWS

4. THE EUROPEAN GENETIC COMPONENTS OF THE KHAZARS DERIVE FROM THE CAUCASUS TRIBES THAT WERE

UNDER CONTROL OF THE KHAGANATE, RATHER THAN FROM MORE DISTANT LEVANTINE POPULATIONS MORE

CLOSELY RELATED TO ASHKENAZI AND SEPHARDIC JEWS. WHILE JEWS PROBABLY LIVED IN THE TERRITORY OF

THE KHAZAR KHAGANATE ALONG WITH CHRISTIANS, MUSLIMS AND PAGANS, IT SEEMS UNLIKELY THAT THEY

FORMED ITS RULING CLASSES, WHICH WERE DOMINATED BY STEPPE NOMADS FROM THE EAST, AND THUS THE

KHAZARS WERE NOT LIKELY PROGENITORS OF THE ASHKENAZIM.

Page 55: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

CONCLUSIONS

• USE MULTIPLE METHODS FOR ANALYSIS FOR VALIDATION

• THERE WERE TWO GROUPS OF KHAZARS, EUROPEAN AND

ASIAN, BOTH GROUPS MIXED

• KHAZARS WERE PROBABLY NOT THE DIRECT ANCESTORS OF

ASHKENAZI

• NEED MORE MONEY FOR MORE GENOMES

Page 56: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference

ACKNOWLEDGEMENTS

• OKINAWA: ALEXANDER MIKHEYEV

• ROSTOV: IGOR KORNIENKO, ELENA BATYEVA,

VLADIMIR KLYUCHNIKOV

• PETERSBURG: YURI ORLOV, IVAN DMITRIEVSKY

• TOMSK: ALEXEI ZARUBIN

• MOSCOW: NIKITA MOSHKOV

Page 57: БИОИНФОРМАТИЧЕСКИЕ УЛОВКИДЛЯАНАЛИЗА …bioinformaticsinstitute.ru/.../tatarinova_ancient_dna_summer_2019.pdf · ID Part Archaeolo gical culture Reference