Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in...

62
Understanding Human Y Chromosome Robert Yu 2014

Transcript of Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in...

Page 1: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

UnderstandingHuman Y Chromosome

Robert Yu2014

Page 2: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Human Genome“the complete set of genetic information for humans…as DNA sequences within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria.”– Human Genome, wikipedia [http://en.wikipedia.org/wiki/Human_genome]

Human mitochondrial DNA: ~16,600 bp, 37 genesHuman chromosomes contain DNA being ~3.1 billion bp, 20K‐25K genes. 

Page 3: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

not yet Genome‐wide Association Studies (GWAS)• “…also known as whole genome association study…is an 

examination of many common genetic variants in different individuals to see if any variant is associated with a trait. ” —Genome‐wide association study, wikipedia, [http://en.wikipedia.org/wiki/Genome‐wide_association_study]

• A typical report of GWAS contains results only covering autosomal chromosomes: where are X, Y, mt chromosomes?

Ref: http://en.wikipedia.org/wiki/Genome‐wide_association_study

An illustration of a Manhattan plot depicting several strongly associated risk loci. Each dot represents a SNP, with the X‐axis showing genomic location and Y‐axis showing association level. 

Page 4: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Genetic Studies of Mendelian Trait• Austrian monk, Gregor Mendel announced his discovery of Mendel’s 

Principles of Heredity or Mendelian Inheritance on March 8, 1865.• Example:

– The pea plant has two alleles, B and b, of a flower color gene. – Randomly, the B (or b) was from sperm and b (or B) from egg of grandparents.– The current parent, pollen (father) has Bb and pistil (mother) Bb, too.– Randomly inherited,  the 4 offspring (children) have one BB, 2 Bb’s and 1 bb.

• The 3rd Mendel’s law says “Some alleles are dominant while others are recessive; an organism with at least one dominant allele will display the effect of the dominant allele”.

• This 3rd law laid foundation of genetic models in GWAS: assuming b is associated with a trait (disease)

– Additive model: a trait (disease) will be enhanced if an individual has two trait‐associatedalleles, bb, comparing with individuals who have only one copy, Bb.

– Dominant model: a trait (disease) will appear whenever an individual carries one or two copies of a trait‐associated allele, b or bb.

– Recessive model: a trait (disease) will not appear only when an individual carries TWO copies of a trait‐associated allele, only bb.

Note: sometimes this trait‐associated allele could be found an actual causal allele of a disease (trait).

Ref: “Mendelian inheritance” at http://en.wikipedia.org/wiki/Mendelian_inheritance

Page 5: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Genetic markers, e.g. SNPs, on chromosomes 1‐22 can fit Mendelian inheritance principles well but not so well or not at all for the ones on chromosomes X, Y and mt.

We discussed the case on chromosome X last year. Now we focus on Chromosome Y.

Page 6: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

"We used to think most of the Y chromosome was junk," said Dr. Martin Bialer, a medical geneticist with North Shore‐LIJ Health System in Great Neck, N.Y.

"But … there could be something on the chromosome that's important in cancer prevention," Bialer said. "What that is, is unclear.“

April 28, 2014

Page 7: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

First let us look at a few research publications on Y chromosome.

Page 8: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary
Page 9: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

• SNP from blood• Age: 70.7‐83.6• 1153 male samples• Association LOY

Page 10: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary
Page 11: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Review of some basic biology may help us catch the picture.

Page 12: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

A Set of Human ChromosomesRef: http://en.wikipedia.org/wiki/Human_genome

Page 13: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Sex Chromosomes

Ref: http://phylogenous.wordpress.com/2010/07/19/y‐chromosome‐ii‐what‐is‐even‐on‐it/

Page 14: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Data source: Ensembl genome browser release 68, July 2012

Chromosomes 1 – 22: autosomal chromosomes        Chromosomes X and Y: allosomal chromosomes (sex determining chromosomes)“Putative protein” = predicted proteins from sequencing data or other information.  “Confirmed protein” = real protein seen.“Length” = bp x 0.34 nanometers (distance between base pair)     “Variations” = unique DNA seq changes so far identified by Ensembl (Jul 2012); this #’ll be up as more personal genomes are examined.     “miRNA” = microRNA, functions as a post‐transcriptional regulator of gene expression.         “rRNA” = ribosomal RNA, critical in the synthesis of proteins.           “snRNA” = small nuclear RNA, regulating transcription factors.       “snoRNA” = small nucleolar RNA, primarily functions in guiding chemical modifications to other RNA molecules.    “pseudogenes” = inactive copies of protein coding genes, often generated by gene duplication. Gene duplication is a step of new genetic material being added.“ncRNA” = noncoding RNA, roles as epigenetic elements in protein synthesis and RNA processing, e.g. tRNA, rRNA, miRNA and others.

Page 15: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Human Y Chromosome• ~58 million base pairs, 2% of total DNA• 200+ genes, at least 72 being protein coding• 5% of Y chromosome are PAR1 and PAR2, capable of being recombined with X’s homologues

• 95% of Y are NRY, non‐recombining region of Y chromosome. 

• SNPs from NRY were used to trace direct paternal ancestral lines.

Ref: Ensembl Human MapView release 43". February 2014. Retrieved 2007‐04‐14.

Page 16: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Y chromosome• The DNA in the human Y chromosome is composed of about 59 million base pairs.

• The Y chromosome is passed only from father to son. Mito DNA: only from mother to daughter.

• With a 30% difference between humans and chimpanzees, the Y chromosome is one of the fastest evolving parts of the human genome.

• To date, over 200 Y‐linked genes have been identified.[3] All Y‐linked genes are expressed and (apart from duplicated genes) hemizygous (present on only one chromosome) except in the cases of aneuploidy such as Klinefelter's Syndrome (47,XXY) or XXYY syndrome. (See Y linkage.)

http://en.wikipedia.org/wiki/Y_chromosome

Page 17: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Evolution of Chromosomes X and Y

XCR/YCR: X or Y conserved regionSRY: sex determining region YXAR/YAR: X or Y added regionPAR: pseudoautosomal regionMSY: male specific region

Ref: Linda Hellborg, 2004, “Evolutionary Studies of the Mammalian Y Chromosome”http://www.diva‐portal.org/smash/get/diva2:164238/FULLTEXT01.pdf&sa=U&ei=0‐ZOU‐HXOcOGtAbntYGgCQ&ved=0CDwQFjAF&usg=AFQjCNFIgeQzfGnYvC9_I0c7eHYJaoHCQA

Million years

Page 18: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Fundamental Tasks of GWAS

Phenotype Genomic elementsassociation ?

• Diseases• Traits of interest• Behavior

• DNA• RNA• Epigenetic markers

DNA

Sequencing based

A AG C TCAA G C TC

A AG G TCAA G C TC

1. Single allele based2. Genotype based3. Haplotype based

Allele Genotype  genetic models Haplotype HGSNP1 C ,G CG (minor: C) (SNP1 + SNP2)

1 – additive GT vs GA (on same strand)1 – dominant CT vs CA0 – recessive

SNP2 A,G AA (minor: T)0 – additive0 – dominant0 – recessive

TT 2 – additive1 – dominant2 – recessive

Sequence fragment AGACCTT vs AGAGCTA

SNP de novomutation

Glossary:

A mutation is defined as any change in a DNA sequence away from normal. This implies there is a normal allele that is prevalent in the population and that the mutation changes this to a rare and abnormal variant.

a polymorphism is a DNA sequence variation that is common in the population. In this case no single allele is regarded as the standard sequence. Instead there are two or more equally acceptable alternatives.

The arbitrary cut‐off point between a mutation and a polymorphism is 1 per cent.

Ref:Wellcome Trust at http://genome.wellcome.ac.uk/doc_WTD020780.html

TT

AA

Page 19: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Ref: http://humupd.oxfordjournals.org/content/14/4/293/F1.expansion

Page 20: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Basic Biochemistry

http://novellaqalive2.mhhe.com/sites/dl/free/0071402357/156709/figure56_2.jpg

One strand of DNA forms one chromosome. It was descended from one of the two parents.

Another strand of DNA comes from another chromosome, i.e. another parent.

Here we see:• nucleotides, e.g. A, C, G and T• DNA sequence is the sequential 

listing of A,C,G and T on one strand.

• The ordered listing of SNPs from one DNA strand is called haplotype.

• A genotype is a pair of nucleotides, e.g. AG or CC, at the same sequencing site (position) but from two different DNA strands of the two chromosomes.

Page 21: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

https://www.msu.edu/course/isb/202/tsao/images/DNA_to_trait2.jpg

Page 22: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

http://genome.crg.es/courses/Madrid04/exercises/ensembl/

Note:UPE and DPE –upstream and downstream are promoter elements, influencing gene transcription, regulatory elements.

Page 23: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

http://lecturer.ukdw.ac.id/dhira/BacterialStructure/Proteins.html

Building Blocks of Proteins: 20 common amino acids

Page 24: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Meiosis is germ cell division for reproduction.

Mitosis is somatic cell division for growth.

Basic Biology: Cell types and divisions

passing to offspring

remain to self

http://mscoppinsbio.wikispaces.com/file/view/meiosis‐and‐mitosis‐comparison.gif/129266029/527x477/meiosis‐and‐mitosis‐comparison.gif

Page 25: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Glossary: Chromosome Banding Pattern and Nomenclatureaccording to International System for Cytogenetic Nomenclature (ISCN)

Ref: http://www.nature.com/scitable/topicpage/Chromosome‐Mapping‐Idiograms‐302

arm

region

band

sub‐band

centromere

major Giemsa‐staining band

additional bands

individual area

ExampleGene “NLGN4X” was reported located at• Xp22.33, from 6,147,085  to 5,758,678Gene “NLGN4Y” was reported located at• Yq11.221, from 166,344,88 to 169,57,538

Ref: AceView, “http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=human&term=NLGN4X&submit=Go”

Page 26: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Source:Doris Bachtrog and Brian Charlesworth(2001) “Minireview: Towards a complete sequence of the human Y chromosome”, Genome Biology, 2(5): reviews 1016.1‐1016.5

Page 27: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Y‐STR: short tandem repeat (STR) on the Y‐chromosome.

• short tandem repeats (STRs), are repeating sequences of 2‐5 base pairs of DNA

• Y‐STR Analysis: The likelihood of two people having the same number of repeated sequences is extremely small, and becomes even smaller the more regions that are analyzed. This makes up the basis of short tandem repeat analysis.

Page 28: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Y‐Chromosome STRs: important genetic markers on Y chromosome

Source: http://www.cstl.nist.gov/div831/strbase/ystr_fact.htm

Page 29: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Example from FamilyTreeDNA

Source: https://www.familytreedna.com/public/Haplogroup_A/default.aspx?section=yresults

Page 30: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Y‐STR positions

Ref: http://www.cstl.nist.gov/div831/strbase/ystrpos1.htm

Page 31: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Haplogroup• A haplogroup is a group of similar haplotypes that share 

a common ancestor having the same single nucleotide polymorphism (SNP) mutation in all haplotypes. 

• It is possible to predict a haplogroup from haplotypes. An SNP test confirms a haplogroup. 

• Haplogroups are assigned letters of the alphabet, and refinements consist of additional number and letter combinations, for example R1b1. 

• Y‐chromosome and mitochondrial DNA haplogroups have different haplogroup designations. 

• Haplogroups pertain to deep ancestral origins dating back thousands of years.

• The Y‐Chromosome Consortium (YCC) has a naming system naming major Y‐DNA haplogroups from A through T.

http://en.wikipedia.org/wiki/Haplogroup

Page 32: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

http://dnaconsultants.com/images/links/49‐conversion.pdf

Page 33: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Y‐chromosome DNA haplogroupsand evolutionary tree

http://en.m.wikipedia.org/wiki/Haplogroup

Page 34: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Glossary: Y‐chromosomal Adam• In human genetics, Y‐chromosomal Adam (Y‐MRCA) is a 

hypothetical name given to the most recent common ancestor (MRCA) from whom all currently living people are descended patrilineally (tracing back only along the paternal or male lines of their family tree). However, the title is not permanently fixed on a single individual (see below).

http://en.wikipedia.org/wiki/Y‐chromosomal_Adam

Page 35: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Glossary: Mitochondrial Eve• Mitochondrial Eve refers to the matrilineal most recent common ancestor (MRCA) of all currently living anatomically modern humans, who is estimated to have lived approximately 100,000–200,000 years ago.

http://en.wikipedia.org/wiki/Mitochondrial_EveMtDNA‐MRCA‐generations‐EvolutionCC

Through random drift or selection lineage will trace back to a single person. In this example over 5 generations, the colors represent extinct matrilineal lines and black the matrilineal line descended from the MRCA.Ref: http://en.wikipedia.org/wiki/Most_recent_common_ancestor

Page 36: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Y‐chromosome DNA haplogroups and markers 

http://www.phylotree.org/Y/tree/index.htm

Page 37: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Y‐chromosome DNA haplogroups and markers 

Source: http://www.phylotree.org/Y/marker_list.htm

Page 38: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Source: http://www.yfull.com/tree/CT/Based on ISOGG v9.29 at 2 March 2014 and YFull Experimental YTree v2.25 at 14 October 2014; © 1000 Genomes Project © HGDP Project 

Page 39: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

STR and SNP• A STR testing usually yields results of a “number”, 

which is the counts of the repeats in the DNA segment. This pattern in the segment is called haplotype.

• Introductory level of STR testings includes 37‐marker and 67‐marker tests. Advanced level provides testing to 111 or more markers.

• SNP testing usually reports a result of “‐” or “+”, representing absence or presence of a mutation. Results of SNP testing determine a person’s haplogroup.

• SNP‐based haplogroup correlates with STR‐based haplotype, in terms of tracing evolutionary path.

http://dgmweb.net/DNA/General/DNA‐Haplo‐text‐page.html

Page 40: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

MRCA‐A

MRCA‐Aa

HG‐A1

HG‐A1a

HG‐A1B...n

Schematic Representation of Formation of Haplogroups on Human Y‐chromosome 

“All men carrying mutation A form a single haplogroup, and all … mutation B are part of this haplogroup…”“… each mutation defines a set of specific Y chromosomes called a haplogroup.”(http://en.wikipedia.org/wiki/Haplogroup)

Page 41: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Public Databases

http://www.cstl.nist.gov/div831/strbase/y_strs.htm

Page 42: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Public Databases

http://www.usystrdatabase.org/

Page 43: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

• 1481 SNPs (17 homologues genes on X) • 480 SNPs (34 genes on Y only)• 263 SNPs (PAR1 and PAR2 regions – recombinant 

regions, on both X and Y) 

• Discovery group: male (white) cohort: 883 cases and 445 controls

• Contrast group: female (white) cohort: 526 cases and 1073 controls

• Contrast‐exploratory cohorts: male (AA) 428:169, female (AA) 253:339.

• PLINK: AD ~ age + [c1 … c10]PCA;  • PCA was performed only on Chr 1‐22, using EIGENSTRAT• Allelic frequency comparison• Logistic regression 

• Multitesting correction was done using an adjusted Bonferroni inSNPSpD• The corrected significance levels were set at 1.3 x 10‐4 in White and 1.0 x 10‐4 for AA.

• Power analysis was using pwr in R.

A simple practical case

Page 44: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Non‐sex Chromosomes Processing in PLINK

Page 45: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

2 Pseudoautosomal Regions

The location of PAR1 and PAR2 on GRCh38 are:

Chr PAR1 PAR2Y 10,000‐2,781,479  56,887,902‐57,217,415 X 10,000‐2,781,479  155,701,382‐156,030,895

Ref: • http://genome‐euro.ucsc.edu/cgi‐bin/hgGateway• http://www.ncbi.nlm.nih.gov/projects/genome/assembl

y/grc/human/

Page 46: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Example SNP Data: 

SNPs in allosomal chromosomesChr Counts Starting Ending-----------------------------------------------------------------23 (X) 18,055 2,700,157 154,916,84524 (Y) 1,409 2,655,180 59,032,19725 (XY) 473 62,321 - 2,697,868

154,939,018 - 155,236,747-----------------------------------------------------------------

DataGSGT Version 1.8.4Processing Date 8/26/2011 2:59 PMContent HumanOmniExpress-

12v1_H.bpmNum SNPs 730,525Total SNPs 730,525Num Samples 1,728Total Samples 1,728

The location of PAR1 and PAR2 on GRCh38Chr PAR1 PAR2--------------------------------------------------Y 10,000-2,781,479 56,887,902-57,217,415 X 10,000-2,781,479 155,701,382-156,030,895

‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐

no Chr SNP BP

1 25 rs28503286 62,3212 25 rs6423165 169,805

no Chr SNP BP380 25 rs17808254 2,642,269

1 24 rs11575897 2,655,180 381 25 rs312259 2,658,2992 24 rs2534636 2,657,176 382 25 rs312257 2,658,6983 24 rs35840667 2,661,3064 24 rs2253109 2,661,6945 24 rs35067692 2,663,6856 24 rs2058276 2,668,456 383 25 rs35193207 2,665,647

no Chr SNP BP 7 24 rs13303871 2,679,100394 25 rs2534634 2,697,868

1 23 rs5939319 2,700,157 8 24 rs28813670 2,703,0382 23 rs1419931 2,703,633 9 24 rs11799203 2,705,854

16 23 rs1905995 2,778,526 23 24 rs13304723 2,810,62817 23 rs17330993 2,779,74918 23 rs5939137 2,783,55519 23 rs5939382 2,787,45520 23 rs5982603 2,788,70721 23 rs5939384 2,789,84822 23 rs12011788 2,792,16023 23 rs6567674 2,794,46124 23 rs211665 2,802,36825 23 rs211666 2,802,56826 23 rs11796830 2,811,185 24 24 rs13303695 2,810,629

25 24 rs11799194 2,818,883

3912 23 rs10126493 28,762,640

3916 23 rs7888309 28,789,883 1403 24 rs9785925 28,792,268

3920 23 rs4633214 28,823,041 1407 24 rs9786224 28,817,4583921 23 rs1384575 28,841,156

7614 23 rs4384157 58,483,247 1408 24 rs9786720 58,883,6907615 23 rs9778500 61,694,576 1409 24 rs28715603 59,032,197

18041 23 rs28582039 154,668,682

18055 23 rs669237 154,916,845395 25 rs28814596 154,939,018

473 25 rs2981828 155,236,747

Chr XY data ‐ Beginning

Ending

Ending

Ending

continued

data omitteddata omitted

continued

data omitted

data omitted

data omitted

data omitted data omitted

data omitted data omitted

Chr Y data ‐ Beginning

Chr X data  ‐ Beginning

data omitted

data omitted

data omitted

data omitted

continued

continue

continued

Page 47: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

SNPs in allosomal chromosomesChr Counts Starting Ending-----------------------------------------------------------------23 (X) 18,055 2,700,157 154,916,84524 (Y) 1,409 2,655,180 59,032,19725 (XY) 473 62,321 - 2,697,868

154,939,018 - 155,236,747-----------------------------------------------------------------

no Chr SNP BP

1 25 rs28503286 62,3212 25 rs6423165 169,805

no Chr SNP BP380 25 rs17808254 2,642,269

1 24 rs11575897 2,655,180 381 25 rs312259 2,658,2992 24 rs2534636 2,657,176 382 25 rs312257 2,658,6983 24 rs35840667 2,661,3064 24 rs2253109 2,661,6945 24 rs35067692 2,663,6856 24 rs2058276 2,668,456 383 25 rs35193207 2,665,647

no Chr SNP BP 7 24 rs13303871 2,679,100394 25 rs2534634 2,697,868

1 23 rs5939319 2,700,157 8 24 rs28813670 2,703,0382 23 rs1419931 2,703,633 9 24 rs11799203 2,705,854

16 23 rs1905995 2,778,526 23 24 rs13304723 2,810,62817 23 rs17330993 2,779,74918 23 rs5939137 2,783,55519 23 rs5939382 2,787,45520 23 rs5982603 2,788,70721 23 rs5939384 2,789,84822 23 rs12011788 2,792,16023 23 rs6567674 2,794,46124 23 rs211665 2,802,36825 23 rs211666 2,802,56826 23 rs11796830 2,811,185 24 24 rs13303695 2,810,629

25 24 rs11799194 2,818,883

3912 23 rs10126493 28,762,640

3916 23 rs7888309 28,789,883 1403 24 rs9785925 28,792,268

3920 23 rs4633214 28,823,041 1407 24 rs9786224 28,817,4583921 23 rs1384575 28,841,156

7614 23 rs4384157 58,483,247 1408 24 rs9786720 58,883,6907615 23 rs9778500 61,694,576 1409 24 rs28715603 59,032,197

18041 23 rs28582039 154,668,682

18055 23 rs669237 154,916,845395 25 rs28814596 154,939,018

473 25 rs2981828 155,236,747

Chr XY data ‐ Beginning

Ending

Ending

Ending

continued

data omitteddata omitted

continued

data omitted

data omitted

data omitted

data omitted data omitted

data omitted data omitted

Chr Y data ‐ Beginning

Chr X data  ‐ Beginning

data omitted

data omitted

data omitted

data omitted

continued

continue

continued

Haplogroup Markers (SNPs) matching

http://www.phylotree.org/Y/marker_list.htm

Marker Alias(es) rsSNP IDPosition relative 

to GRCh37Mutation Major clade

Z142 S211 ‐ 1,519,693 A>G RM236 ‐ 2,649,696 G>C BM176 SRY465; Page63 rs11575897 2,655,180 G>A OSRY10831 SRY1532; Page65 rs2534636 2,657,176 T>C; C>T A4=BCDEF; RCTS6 Z2469; S3403 ‐ 2,657,349 T>C RM6529 rs113717750 2,657,411 G>A BM177 SRY9138 rs369688174 2,658,869 C>T MM40 SRY4064; SRY8299 rs9786608 2,663,943 C>T EZ5865 ‐ 2,667,783 C>T HZ1828 CTS15 ‐ 2,669,716 C>T JZ2220 ‐ 2,687,198 G>C JZ161 ‐ 2,696,497 C>G IP305 rs72625368 2,710,154 G>A A1'2'3'4M7104 rs112887592 2,711,408 C>G BCTS94 ‐ 2,726,402 T>G DM130 RPS4Y711; Page51 rs35284970 2,734,854 C>T CM8495 rs368790958 2,736,732 T>G BM386 CTS117 ‐ 2,738,986 C>T CM4 rs3895 2,744,628 A>G MM406 PF3285 ‐ 2,749,995 T>G GM410 rs371079691 2,751,678 A>G JZ5857 ‐ 2,759,285 C>G HL1085 ‐ 2,790,726 T>C A0'1'2'3'4V244 rs112298449 2,798,459 C>T BF11 KL4 rs17276393 2,815,303 C>G OM324 rs13447361 2,821,786 G>C OL1086 ‐ 2,826,312 A>T A00P201 JST021354 rs2267801 2,828,196 T>C OJST021355 rs2267802 2,828,425 A>G DDF13 S521; CTS241 rs373989227 2,836,431 A>C RM7583 rs374589940 2,844,095 G>T BZ5899 ‐ 2,852,640 T>C CZ5867 ‐ 2,866,232 C>T HM347 ‐ 2,877,479 A>G C

List of markers included in the minimal reference tree (partial)

Page 48: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Distribution of 1409 SNPs along Y Chr

1409 SNPs 

91 SNPs (of 1409) that matched HG listing.

Page 49: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

rs1558843M306

Page 50: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

FID IID PID MID Sex Pheno 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27WEI0002 WEI0002 0 0 1 2 G G G G G G A A 0 0 A A G G A A G G G G C C A A A A G G G G G G G G C C A A 0 0 A A G G G G A A G G A A G GWEI0004 WEI0004 0 0 1 2 G G G G G G A A G A G G G G A A A A G G C C A A C C G G G G G G G G A A A A 0 0 A A G G G G A A G G A A G GWEI0005 WEI0005 0 0 1 2 G G G G G G A A 0 0 G G G G A A A A G G C C A A A A G G G G G G G G C C A A 0 0 A A G G G G A A G G A A G GWEI0007 WEI0007 0 0 1 2 G G G G G G A A G A G G G G A A A A G G C C A A C C G G G G A A G G A A A A 0 0 A A G G G G A A G G A A G GWEI0010 WEI0010 0 0 1 2 G G G G G G A A 0 0 G G G G A A A A G G C C A A C C G G G G G G G G A A A A 0 0 A A G G G G A A G G A A G GWEI0011 WEI0011 0 0 1 2 G G G G G G A A G A G G G G A A A A G G C C A A C C G G G G A A G G A A A A 0 0 A A G G G G A A G G A A G GWEI0012 WEI0012 0 0 1 2 G G G G G G A A G A G G G G A A A A G G C C A A C C G G G G A A G G A A A A 0 0 A A G G G G A A G G A A G GWEI0014 WEI0014 0 0 1 2 G G G G G G A A 0 0 G G G G A A A A G G C C A A C C G G G G G G G G A A A A 0 0 A A G G G G A A G G A A G GWEI0015 WEI0015 0 0 1 2 G G G G G G A A 0 0 G G G G A A A A G G C C A A C C G G G G G G G G A A A A 0 0 A A G G G G A A G G A A G GWEI0016 WEI0016 0 0 1 2 G G G G G G A A 0 0 G G G G A A A A G G C C A A C C G G G G G G G G A A A A 0 0 A A G G G G A A G G A A G GWEI0017 WEI0017 0 0 1 2 G G G G G G A A G A G G G G A A A A G G C C A A C C G G G G G G G G A A A A 0 0 A A G G G G A A G G A A G GWEI0018 WEI0018 0 0 1 2 G G G G G G A A G A G G G G A A A A G G C C A A C C G G G G A A G G A A A A 0 0 A A G G G G A A G G A A G GWEI0019 WEI0019 0 0 1 2 G G G G G G A A 0 0 G G G G A A A A G G C C A A C C G G G G G G G G A A A A 0 0 A A G G G G A A G G A A G GWEI0023 WEI0023 0 0 1 2 G G G G G G A A 0 0 G G G G A A A A G G C C A A C C G G G G G G G G A A A A 0 0 A A G G G G A A G G A A G GWEI0027 WEI0027 0 0 1 2 G G A A G G A A G A G G G G A A A A G G C C A A C C G G A A G G G G C C A A 0 0 A A G G G G A A G G A A G GWEI0028 WEI0028 0 0 1 2 G G G G G G A A G A G G G G A A A A G G C C A A C C G G G G G G G G A A A A 0 0 A A G G G G A A G G A A G GWEI0029 WEI0029 0 0 1 2 G G G G G G A A 0 0 G G G G A A A A G G C C A A C C G G G G G G G G A A A A 0 0 A A G G G G A A G G A A G GWEI0030 WEI0030 0 0 1 2 G G G G G G A A 0 0 G G G G A A A A G G C C A A C C G G G G A A G G A A A A 0 0 A A G G G G A A G G A A G GWEI0031 WEI0031 0 0 1 2 G G G G G G A A G A G G G G A A A A G G C C A A C C G G G G G G G G A A A A 0 0 A A G G G G A A G G A A G GWEI0032 WEI0032 0 0 1 2 G G G G G G A A G A G G G G A A A A G G C C A A C C G G G G G G G G A A A A 0 0 A A G G G G A A G G A A G G

WEI0002 GGGAXAGAGGCAAGGGGCAXAGGAGAGAAAGGAGGAAAGGGAAAAAGGGGGAGGGAGTGAAAAACGCAGGAGAGGAGGAGCAAGCGGAGAAWEI0004 GGGAXGGAAGCACGGGGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0005 GGGAXGGAAGCAAGGGGCAXAGGAGAGAAAGGAAGAAAGGGAAAAAGAGGGAGGGAGTGAAAAACGCAGGAAAGGAGGAGCAAGCGGAGAAWEI0007 GGGAXGGAAGCACGGAGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0010 GGGAXGGAAGCACGGGGAAXAGGAGAGAAAGGAGAAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0011 GGGAXGGAAGCACGGAGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0012 GGGAXGGAAGCACGGAGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0014 GGGAXGGAAGCACGGGGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0015 GGGAXGGAAGCACGGGGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0016 GGGAXGGAAGCACGGGGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0017 GGGAXGGAAGCACGGGGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0018 GGGAXGGAAGCACGGAGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0019 GGGAXGGAAGCACGGGGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0023 GGGAXGGAAGCACGGGGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0027 GAGAXGGAAGCACGAGGCAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAAAAACGCAGGAGAGGAGGAGCAAGAGGAGTCWEI0028 GGGAXGGAAGCACGGGGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0029 GGGAXGGAAGCACGGGGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0030 GGGAXGGAAGCACGGAGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGXCCAGAGAGGAGGAGCAAGAGGAGACWEI0031 GGGAXGGAAGCACGGGGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGACWEI0032 GGGAXGGAAGCACGGGGAAXAGGAGAGAAAGGAGGAAAGGGAAACAGAGGGAGAGAGTGAGGAAGGCCAGAGAGGAGGAGCAAGAGGAGAC

Genotype data in diploid (PLINK) format

Genotype data in haploid format

Page 51: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Counting “singletons”Total 881 “singletons” in 1409 SNPs of 1302 samples (batch 1 data set).

No SNP_pos label Allele:Counts Total (size) No SNP_pos label Allele:Count X:Count Total (size)1 7 <S1> <G:1302> <sum=1302> 1 1 <SX> <G:1293> <X:9> <sum=1302>2 40 <S1> <A:1302> <sum=1302> 2 3 <SX> <A:1291> <X:11> <sum=1302>3 185 <S1> <G:1302> <sum=1302> 3 4 <SX> <G:1294> <X:8> <sum=1302>4 249 <S1> <G:1302> <sum=1302> 4 5 <SX> <A:1292> <X:10> <sum=1302>5 266 <S1> <A:1302> <sum=1302> 5 8 <SX> <C:1294> <X:8> <sum=1302>6 270 <S1> <A:1302> <sum=1302> 6 9 <SX> <G:1292> <X:10> <sum=1302>7 271 <S1> <G:1302> <sum=1302> 7 10 <SX> <A:1292> <X:10> <sum=1302>8 282 <S1> <A:1302> <sum=1302> 8 11 <SX> <G:1292> <X:10> <sum=1302>9 361 <S1> <A:1302> <sum=1302> 9 13 <SX> <C:1294> <X:8> <sum=1302>

10 14 <SX> <G:1293> <X:9> <sum=1302>38 1139 <S1> <A:1302> <sum=1302> 11 17 <SX> <G:1294> <X:8> <sum=1302>39 1212 <S1> <G:1302> <sum=1302> 12 19 <SX> <A:1292> <X:10> <sum=1302>40 1236 <S1> <A:1302> <sum=1302> 13 20 <SX> <T:1294> <X:8> <sum=1302>41 1261 <S1> <G:1302> <sum=1302>42 1312 <S1> <C:1302> <sum=1302> 831 1405 <SX> <G:1294> <X:8> <sum=1302>43 1347 <S1> <G:1302> <sum=1302> 832 1406 <SX> <A:1294> <X:8> <sum=1302>44 1374 <S1> <G:1302> <sum=1302> 833 1407 <SX> <C:1294> <X:8> <sum=1302>45 1389 <S1> <A:1302> <sum=1302> 834 1408 <SX> <A:1294> <X:8> <sum=1302>46 1393 <S1> <G:1302> <sum=1302> 835 1409 <SX> <G:1268> <X:34> <sum=1302>

omitted

omitted

singletons without "X" singletons with "X"

Page 52: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Get GWAS Data Set (370K SNPs)Case – diseasedControl  ‐ separate sourced no‐diseased sample

Examining Sex Effect on a sub‐phenotypeCase – diseased with e.g. HPVControl  ‐ diseased without HPV

Focus on SNPs on Y ChromosomeExtracting SNPs using HaploGroupMarkers List

Checking Data ‐ Remove females‐ Remove genotype with heterozygous

Preliminary Analysis PLINK‐ Logistic regression adjusted by age or not

Narrow Down to a small set of SNPs‐ Allelic frequency comparison

male female Total male female Total male female Totalnon‐diseased 2483 2024 4507 diseased others 712 338 1050 diseased others 1386 408 1794

diseased 1685 500 2185 diseased HPV 973 162 1135 diseased NHPV 299 92 391Total 4168 2524 6692 Total 1685 500 2185 Total 1685 500 2185

p‐value < 0.0001 p‐value < 0.0001 p‐value 0.7371

male female Total male female Total male female Totalnon‐diseased 976 566 1542 diseased others 343 171 514 diseased others 758 205 963

diseased 899 255 1154 diseased HPV 556 84 640 diseased NHPV 141 50 191Total 1875 821 2696 Total 899 255 1154 Total 899 255 1154

p‐value < 0.0001 p‐value < 0.0001 p‐value 0.1369

overall data set

sub data set ‐ batch 1

overall data set

sub data set ‐ batch 1

p‐value OR p‐value ORage <.0001 0.976 0.969 0.984 age <.0001 0.973 0.965 0.981hpv <.0001 0.276 0.214 0.356 nhpv 0.0987 0.747 0.528 1.056

p‐value OR p‐value ORage 0.246 0.992 0.98 1.005 age 0.9295 0.999 0.987 1.012hpv1 <.0001 0.295 0.219 0.397 nhpv1 0.138 1.314 0.916 1.885

1154 (899 males) samples were used95% IC 95% IC

batch 1 data set, with diseased only

Logistic regression "sex" = age, HPV (or NHPV)batch 1 data set, with diseased and non‐diseased

95% IC 95% IC2182 (1532) samples were used 1733 (1117) samples were used

1154 (899 males) samples were used

Result: Sex may be a factor on HPV ?

Page 53: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Example Analysis (1)• Retrieve the SNP list from the Haplogroup Marker list, total 338 rsIDs.• 91 of 338 rsIDs were matched in the data, 1154 (batch1) and  1031 (batch2) samples were 

available.• Check data for heterozygous, basic QA step.

– 39 (batch1) and 47 (batch2) of 91 SNPs found het.

• To take the advantage of PLINK’s function, run a preliminary analysis, e.g. logistic regression using “PLINK’s genotypic model” setting. A trait‐1 and trait‐2 were selected, in which anyone was previously assigned as case for trait‐1 (or trait‐2) and the rest as controls.

• Four SNPs were noticed as significant (here listed is from batch 1 only)

CHR SNP BP A1 TEST NMISS OR SE L95 U95 STAT P24 rs895530 7963031 A ADD 898 1.564 0.1562 1.152 2.124 2.864 0.00418524 rs9785994 16222561 G ADD 897 1.558 0.1562 1.147 2.117 2.841 0.00450424 rs9785659 18248698 A ADD 899 1.499 0.1458 1.126 1.995 2.775 0.00551624 rs1558843 22750583 C ADD 899 1.578 0.1556 1.163 2.14 2.93 0.003385

A1 A2 MAF NCHROBS A1 A2 MAF NCHROBS Marker Name Alias(es) rsSNP IDPositionGRCh37

Mutation

A C 0.3051 898 A C 0.3092 786 P295 PF5866; S8 rs895530 7,963,031 T>GG A 0.3055 897 G A 0.3099 784 P331 M1221; PF5911 rs9785994 16,222,561 C>TA G 0.3871 899 A G 0.4084 786 P311 S128 rs9785659 18,248,698 A>GC A 0.3092 899 C A 0.3134 785 M306 S1 rs1558843 22,750,583 C>A

freq_b1_HN_male_cases freq_b2_HN_male_cases HaploGroup Marker Info

Page 54: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

map file

ped file

log file

PLINK commad

output

Page 55: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Output from checking heterozygous SNPs in batch 1 and batch 2 data sets

No_Het_b1 No_Het_b2 No CHR SNP bp A1 A2 MAF batch 1 A1 A2 MAF batch 2snp 5 311 481 5 24 rs2071394 6736154 G A NA 0 G A 0 1snp 6 11 6 24 rs9786139 6753519 A G 0.07675 899 A G 0.08015 786snp 8 1 2 8 24 rs34555473 6893093 C A 0.001115 897 0 A 0 775snp 11 4 10 11 24 rs16981290 7568568 A C 0 897 0 C 0 785snp 12 3 3 12 24 rs7892855 7879415 C A 0.05575 879 C A 0.06041 778snp 18 7 8 18 24 rs9786194 9170545 C A 0.3737 899 C A 0.3873 785snp 20 2 20 24 rs35617575 14028148 C A 0.3548 31 0 0 NA 0snp 22 2 3 22 24 rs16980473 14159846 A G 0.04672 899 A G 0.05089 786snp 24 3 24 24 rs34043621 14486667 0 A 0 890 G A 0 779snp 25 6 6 25 24 rs9786371 14636457 A G 0.001119 894 0 G 0 784snp 29 2 3 29 24 rs2032597 14847792 C A 0.1752 896 C A 0.1802 777snp 30 2 9 30 24 rs2032598 14850341 G A 0.003341 898 G A 0.003822 785snp 31 5 4 31 24 rs2032601 14869076 0 G 0 898 0 G 0 784snp 32 7 14 32 24 rs35285796 14871976 A G 0.004459 897 0 G 0 783snp 33 2 7 33 24 rs2032600 14888783 C A 0.001117 895 0 A 0 780snp 34 2 15 34 24 rs20320 14898163 A G 0.005574 897 A G 0.001277 783snp 35 5 35 24 rs20321 14902414 A G 0.03782 899 A G 0.02554 783snp 36 2 11 36 24 rs34442126 14922583 G A 0.003337 899 G A 0.001274 785snp 37 1 4 37 24 rs2032603 14968527 G A 0.001112 899 0 A 0 784snp 38 5 38 24 rs2032604 14969634 C A 0.03007 898 C A 0.03831 783snp 41 1 4 41 24 rs8179021 15018582 A G 0.002225 899 A G 0.002554 783snp 42 9 13 42 24 rs2032590 15019613 C A 0.001112 899 0 A 0 785snp 43 2 7 43 24 rs9341290 15020578 0 A 0 897 0 A 0 783snp 45 3 5 45 24 rs2032624 15026424 A C 0.3096 898 A C 0.3125 784snp 46 1 1 46 24 rs2032668 15437333 0 A 0 899 0 A 0 786snp 47 13 47 24 rs2032666 15437564 0 G 0 898 0 G 0 783

No_Het_b1 No_Het_b2 No CHR SNP bp A1 A2 MAF batch 1 A1 A2 MAF batch 2snp 48 3 48 24 rs9786043 15472863 G A 0.2962 898 G A 0.3078 783snp 50 13 17 50 24 rs2032659 15576203 0 G 0 899 0 G 0 781snp 52 16 18 52 24 rs7892889 15668070 G A 0.007786 899 G A 0.003841 781snp 53 6 4 53 24 rs34893929 15999244 0 G 0 899 0 G 0 778snp 55 1 6 55 24 rs16980588 16253694 A G 0.002225 899 A G 0.003822 785snp 59 2 59 24 rs16980502 17294958 A G 0.002227 898 A G 0.003827 784snp 60 6 60 24 rs9786420 17398598 G A 0.04799 896 G A 0.0535 785snp 61 6 16 61 24 rs9786076 17844018 A G 0.3871 899 A G 0.4084 786snp 63 1 2 63 24 rs17316592 18560005 0 A 0 897 0 A 0 784snp 64 9 6 64 24 rs3897 18571026 0 A 0 899 0 A 0 778snp 65 12 9 65 24 rs9785702 18656508 C G 0.3746 897 C G 0.4078 748snp 67 2 15 67 24 rs17276338 18842841 0 C 0 898 0 C 0 779snp 68 3 5 68 24 rs9786283 18907236 A C 0.3871 899 A C 0.4089 785snp 69 2 69 24 rs9786111 19054889 G A 0.3737 899 G A 0.3865 784snp 71 2 71 24 rs34534058 19136822 G A 0.01763 397 G A 0.01175 766snp 77 1 3 77 24 rs2032629 21865821 A G 0.002225 899 A G 0.006402 781snp 78 2 8 78 24 rs2032612 21866491 0 G 0 899 0 G 0 786snp 81 13 18 81 24 rs2032617 21896261 0 C 0 896 A C 0.001279 782snp 82 8 82 24 rs2032648 21904023 0 A 0 898 G A 0.005096 785snp 86 5 10 86 24 rs13447357 22751863 A G 0.002225 899 0 G 0 786snp 88 2 88 24 rs34459399 22942897 G A 0.02004 898 G A 0.02163 786snp 89 1 10 89 24 rs34954951 23292782 0 G 0 898 A G 0.002554 783snp 90 4 28 90 24 rs17250535 23473201 T A 0.04983 883 T A 0.05628 764snp 91 2 91 24 rs2033003 23550924 A C 0.3538 766 A C 0.3127 774sum 474 845

• Columns “No_Het_b1” and “No_Het_b2” list the counts of detected heterozygous genotypes corresponding the SNPs in rows among the samples in batches 1 and 2.

• Columns in gray representing the info for batch 1 and blue for batch 2.• Columns “batch 1” and “bath 2” next “MAF” (minor allele frequency) represent the number of non‐missing genotypes counted.• Columns “A1” represents the value for minor allele detected in the samples.• Note:MAF calculation was based on PLINK’s tool and it treated Y chromosome different from other autosomal sections, i.e. whenever a 

heterozygous genotype in Y chromosome is encountered, it treats it automatically as missing, and when the missing reaches over a certain threshold, this SNP will not be calculated. This was exact the case for “snp5” (the first one) in the table.

• This is also indicating that those analytical tools in PLINK may or may not be suitable to the data in Y chromosome, without modification.

Page 56: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Example Analysis (2)

Counting alleles in cases and controls, and then run Fisher’s exact test

Note: what are listed here are only from batch 1 set (males).

13 54 62 85 snp13 snp54 snp62 snp85 FID IID Pheno Age Sex HPV NHPVA A G G A A C C A G A C WEI0002 WEI0002 2 73 1 2 1C C A A G G A A C A G A WEI0004 WEI0004 2 24 1 1 1A A G G A A C C A G A C WEI0005 WEI0005 2 67 1 1 1C C A A G G A A C A G A WEI0007 WEI0007 2 18 1 1 1C C A A G G A A C A G A WEI0010 WEI0010 2 64 1 1 1

C C A A G G A A C A G A WEI1690 WEI1690 2 60 1 2 1C C A A G G A A C A G A WEI1692 WEI1692 2 80 1 2 1

original data format simplified format phenotype info

omitted for space saving

Ratio Exact Test Ratio Exact Testrs895530 snp13 missing A C Total A vs ? p‐value snp13 missing A C Total A vs ? p‐value

others 1 86 256 343 0.34 others 1 239 518 758 0.46HPV 188 368 556 0.51 0.0071595 NHPV 35 106 141 0.33 0.1124057Total 1 274 624 899 Total 1 274 624 899

rs9785994 snp54 missing A G Total snp54 missing A G Totalothers 2 255 86 343 2.97 others 1 518 239 758 2.17HPV 368 188 556 1.96 0.0071681 NHPV 1 105 35 141 3.00 0.1343267Total 2 623 274 899 Total 2 623 274 899

rs9785659 snp62 missing A G Total snp62 missing A G Totalothers 113 230 343 0.49 others 297 461 758 0.64HPV 235 321 556 0.73 0.0059557 NHPV 51 90 141 0.57 0.5117231Total 348 551 899 Total 348 551 899

rs1558843 snp85 missing A C Total snp85 missing A C Totalothers 256 87 343 2.94 others 515 243 758 2.12HPV 365 191 556 1.91 0.0047702 NHPV 106 35 141 3.03 0.0923098Total 621 278 899 Total 621 278 899

Page 57: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

The 4th SNP, rs1558843, is found located in gene EIF1AY, which has a female (chr X) counter part EIF1AY.

Page 58: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

The 4th SNP, rs1558843, is found located in gene EIF1AY, which has a female (chr X) counter part EIF1AY.

Page 59: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

SNP, rs1558843, in the human Y‐chromosome phylogeny

Page 60: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

• This was just a simple example of analysis running allelic frequency comparison.

• Further analysis of SNP halotypesmay yield more results.

"haplotype"block 1 AGAC 86 0.250729 188 0.338129block 2 CAAA 25 0.072886 44 0.079137block 3 CAGA 229 0.667638 321 0.577338

control (freq)343

case (freq)556

Example: HPV vs “haplotype” (4 SNPs)

Page 61: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Partial section of sequence data files

Page 62: Understanding Human Y Chromosomeodin.mdacc.tmc.edu/~ryu/materials/ChrY.pdf · chromosome pairs in cell nuclei and in a small DNA molecule found within ... Linda Hellborg, 2004, “Evolutionary

Example Sequencing Data: SNP site length of Y region in 431 VCF files

0

500

1000

1500

2000

2500

3000

3500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

sam

ples

Mbp

0

Quality Score CountsPASS 22low_coverage;low_qual 1128low_VariantReads;low_coverage;low_qual 4653low_coverage;read_end_ratio;low_qual 1low_qual 85low_VariantReads;low_coverage 3929low_VariantReads;low_VariantRatio;read_end_ratio;low_qual 1low_VariantReads;low_coverage;read_end_ratio 33low_VariantReads;low_coverage;read_end_ratio;low_qual 367low_VariantReads 36read_end_ratio;low_qual 1low_VariantReads;read_end_ratio;low_qual 1low_VariantReads;low_VariantRatio;low_qual 4low_VariantRatio;low_qual 1low_coverage 1020low_VariantReads;low_VariantRatio 2low_coverage;read_end_ratio 1low_VariantReads;low_qual 62sum 11347