Considering positional bias in regulatory motifs of genes associated with breast cancer

23
Considering positional bias in regulatory motifs of genes associated with breast cancer Nathaniel Gustafson Dr. Garry Larson (City of Hope)

description

Considering positional bias in regulatory motifs of genes associated with breast cancer. Nathaniel Gustafson Dr. Garry Larson (City of Hope). Cancer Studies. Can we tie genetic variation ( eg . SNPs) to Cancer Risk? Myriad Genetics found BrCa1 and BrCa2 - PowerPoint PPT Presentation

Transcript of Considering positional bias in regulatory motifs of genes associated with breast cancer

Page 1: Considering positional bias in regulatory motifs of genes associated with breast cancer

Considering positional bias in regulatory motifs of genes associated

with breast cancerNathaniel Gustafson

Dr. Garry Larson (City of Hope)

Page 2: Considering positional bias in regulatory motifs of genes associated with breast cancer

Cancer StudiesCan we tie genetic variation (eg. SNPs) to

Cancer Risk?Myriad Genetics found BrCa1 and BrCa2

Mutations in BrCa1/2 tied to 800% increase in breast cancer risk

Most research is on exonic regionsChanges protein composition

http://members.cox.net/amgough/Fanconi-genetics-genetics-primer.htm

Page 3: Considering positional bias in regulatory motifs of genes associated with breast cancer

Our approachWhat about regulatory regions?Motif:

Recurring sequence, usu. 6-20 bp. Generally functional

Hypothesis: Regulatory motifs upstream of the transcriptional start site (TSS) may play some role in breast cancer stopATG

3`5`

5` - upstream

YGCGYRCGCATCMNTCCGYTGAYRTCAGCTNWTTGK...

Page 4: Considering positional bias in regulatory motifs of genes associated with breast cancer

Disease Mutations in Phylogenetically Conserved Motifs

Sequ

enc e

ph y

loge

n y

G A C C T A C T A C A

Orthologous bases: identical by descent

NonorthologousBases in red

G A G C T A C T A C T

~5 myr

G A C T T A A T T C A~70 myr

G A G C T A C - A G A

~300 myr

G A G T T A A T G G T

~475 myr

G A C C T T C T A C ABrCa Pt.

Mutation

Page 5: Considering positional bias in regulatory motifs of genes associated with breast cancer

BackgroundMeta-Analysis pools several brca ER+/-* studiesStatistics used to find genes that have consistent

differences in expression levels in ER+ vs ER- cell lines

*ER = Estrogen Receptor – a common way of classifying breast cancer cells

Page 6: Considering positional bias in regulatory motifs of genes associated with breast cancer

GCCATnTT x 50

GCCATnTT x 9

Page 7: Considering positional bias in regulatory motifs of genes associated with breast cancer

AimsInvestigate regulatory motifs for these genesCompare occurrences of each motif across

gene sets

Hypothesis: genes overexpressed in the same tumor type

share motifs

Page 8: Considering positional bias in regulatory motifs of genes associated with breast cancer

Weak Results

Page 9: Considering positional bias in regulatory motifs of genes associated with breast cancer

Are we missing the signal?

Old counting method

15 10

-2000 -1500 -1000 -500 TSS

ER+ < ER- gene setmotif occurrences

P-val: .30

ER+ > ER- gene set motif occurrences

-2000 -1500 -1000 -500 TSS

NOT significant

Page 10: Considering positional bias in regulatory motifs of genes associated with breast cancer

Are we missing the signal?

New counting method: use position bias

12

3-2000 -1500 -1000 -500 TSS

ER+ < ER- gene setmotif occurrences

P-val: .03

ER+ > ER- gene set motif occurrences

-2000 -1500 -1000 -500 TSS

significant

Page 11: Considering positional bias in regulatory motifs of genes associated with breast cancer

ToolsPerl

Handy scripting languageGreat for parsing textual data

mySQLStorage and retrieval of structured data

www.yusoft.net/yu-graph/main/logo-mysql.jpg

Page 12: Considering positional bias in regulatory motifs of genes associated with breast cancer

ProblemsLack of data specificity

What do Xie’s pos. biases mean?Insufficient data

Needed position of motif relative to TSSImproperly annotated data

Position shown to be inconsistentCollaboration

Norway is about 10 time zones away

Page 13: Considering positional bias in regulatory motifs of genes associated with breast cancer

Results

Motif

1down count

1up cnt

5down count

5up cnt

pos. bias

Pvaltop1

Pvaltop5

SCGGAAGY 5 8 36 71 -240.4011

60.0001

9... ... ... ... ... ... ... ...

Motif

1down count

1up cnt

5down count

5up cnt

pos. bias

PvalTop1

Pvaltop5

SCGGAAGY 31 41 168 206 -24 0.10299

0.00719

... ... ... ... ... ... ... ...

Reading 100 bp from positional bias

No window (Previous results)

Page 14: Considering positional bias in regulatory motifs of genes associated with breast cancer

Any SNPs in this motif?One SNP was found from HapMap in this

motifBut it was at a degenerate position (eg. Y =

C or G)= still satisfied the motif

Might still affect expression

Page 15: Considering positional bias in regulatory motifs of genes associated with breast cancer

Biological SignificanceSCGGAAGY found more in ER+

overexpressed genesKnown as a binding site for ELK-1

Might provide some insight into ER+/ER- cell differentiation

Verification in vivo remains to be done

Page 16: Considering positional bias in regulatory motifs of genes associated with breast cancer

3’UTR Motif List-6/7mer miRNA seeds-Phylogenetic conser. motifs

HapMap

BrCa GWASDatasets

Hunter, et al(CGEMS)

Gold, et al.(MSKCC)

Easton, et al.(UK)(unavailable)

Stacey (deCode)(unavailable)

SNP_list

SNPs Rank &Biological Testing

BrCa SomaticMutations (Sjöblom)

Linkage Studiesin BrCa (Smith, et al.)

LOH (aCGH)in BrCa

Thermodynamic Profiling(STarMir, PITA)

In-House IndependentAssociation Studies

3’UTR-luc Fusion Assay

Reciprocal AllelicTesting-Effect

Evolutionary Conser-vation (miRNA seeds)

LDMappingProxySNPs

Allele frequency inHapMap Population(s)

Reciprocal Allelictesting-no effect

Additional BiologicalTesting

Page 17: Considering positional bias in regulatory motifs of genes associated with breast cancer

GWAS“Genome Wide Association Study”Genotypes cases and controls at thousands of

lociIntended to be an unbiased approachPotentially identifies pertinent mutations

http://www2.bioinformatics.tll.org.sg/img/species/karyotype_Homo_sapiens.png

Page 18: Considering positional bias in regulatory motifs of genes associated with breast cancer

Study Assay Platform

Cases/ Controls

Comment_1 Comment_2 Public Dataset

Hunter, et al.(Nat Genet 39, 2007)

IlluminaHap 550

(keep 528K)

1,145 / 1,142 Prospective, post-menopausal women

Logistic Regression

YES(CGEMS)

Easton, et al.(Nature 447, 2007)

Affy, 266k SNPs

(keep 227k)

Stage I - 380 / 364Stage 2-3,990 /3,916 ctrlsStage 3-21,860/22,578 ctrls

Stage 1-Cases (2 first-degree relatives with Fam Hx)

3 stage associationStage 2-top 5% of stage 1Stage 3-Top 30 SNPs from Stage 2TNRC9 high score

NO

Stacey, et al(Nat Genet 39, 2007)deCode Dataset

Illumina Hap300

(keep 311k)

1,600 Icelandic cases/ 11,563 ctrls

Top 10 SNPs GTP’d in 2nd Icelandic sample and 2-3 ind. European cohorts

1 SNP strong LD with 9995 BRCA2-removed from studyFound SNP near TNRC9

NO

Gold, et al.(PNAS 105March, 2008)

Affy GTP 435K SNPs(keep 150k)

249 AJ Fam Hx (3 cases, BRCA1 & 2 neg) vs.299 Ca-free AJ ctrls

3 stage design Reproduced FGFR2 region

MAYBE?

BrCa GWAS Datasets

Page 19: Considering positional bias in regulatory motifs of genes associated with breast cancer

Study Assay Platform

Cases/ Controls

Comment_1 Comment_2 Public Dataset

Hunter, et al.(Nat Genet 39, 2007)

IlluminaHap 550

(keep 528K)

1,145 / 1,142 Prospective, post-menopausal women

Logistic Regression

YES(CGEMS)

Easton, et al.(Nature 447, 2007)

Affy, 266k SNPs

(keep 227k)

Stage I - 380 / 364Stage 2-3,990 /3,916 ctrlsStage 3-21,860/22,578 ctrls

Stage 1-Cases (2 first-degree relatives with Fam Hx)

3 stage associationStage 2-top 5% of stage 1Stage 3-Top 30 SNPs from Stage 2TNRC9 high score

NO

Stacey, et al(Nat Genet 39, 2007)deCode Dataset

Illumina Hap300

(keep 311k)

1,600 Icelandic cases/ 11,563 ctrls

Top 10 SNPs GTP’d in 2nd Icelandic sample and 2-3 ind. European cohorts

1 SNP strong LD with 9995 BRCA2-removed from studyFound SNP near TNRC9

NO

Gold, et al.(PNAS 105March, 2008)

Affy GTP 435K SNPs(keep 150k)

249 AJ Fam Hx (3 cases, BRCA1 & 2 neg) vs.299 Ca-free AJ ctrls

3 stage design Reproduced FGFR2 region

MAYBE?

BrCa GWAS Datasets

YES

Page 20: Considering positional bias in regulatory motifs of genes associated with breast cancer

Bring this...SM70 SG74 LF52 SM17 SH14 L5721 SM56 SF63 L5957 L5349 L5420 L5713 SH5 LF48 SJG4 L6029 SG21 L5352 L6121 SG69 L5952 SM78 SM113 SF23 L5573 SN6 SF1 SM91 L5895 L5518 L5501 L5328 L5772 SG08 SG28 SM52 SM106 SM67 L5463 L5494 SA17 L5796 L6014 SN15rs2180341 chr6 127642323 + ncbi_b35 MSKCCOffit AffyEAv3 PhaseIGold_et_al TT TT TT CT CT CC TT CT TT TT TT TT CT TT CT TT CT TT TT CT TT TT TT CT CT CT CT CT CT CT TT CT CT TT TT CT CC TT TT CT TT TT CT TT CT CT CT CT TT TT TT CT CT CT TT TT CT TT TT TT TT TT TT CT TT TT TT TT CT TT TT CT TT TT TT CT CT TT CT CT TT CT TT ...rs6569480 chr6 127663441 + ncbi_b35 MSKCCOffit AffyEAv3 PhaseIGold_et_al GG GG GG GG AG AA GG AG GG GG GG GG AG GG GG GG AG GG GG AG GG GG GG AG AG AG AG AG AG AG GG AG AG GG GG AG AA GG GG AG GG GG AG GG AG AG AG AG GG GG GG AG AG AG GG GG AG GG GG NN GG GG GG AG GG GG GG GG AG GG GG AG GG GG GG AG GG GG AG AG GG AG GG...

rs_num chr pos analysis_name p_value OR_het OR_hom build

rs10510126 chr10 124992475chi square - genotype 4e-06 0.5918 0.6387 ncbi_b36

rs10510126 chr10 124992475chi square - allele 2e-06 0.5918 0.6387 ncbi_b36

To this...

Page 21: Considering positional bias in regulatory motifs of genes associated with breast cancer

Future WorkWe’ve digested the Gold data set

Employ this in the triage for producing a gene list

Combine with other triage methods to find the most interesting genes Test these in vivo

Page 22: Considering positional bias in regulatory motifs of genes associated with breast cancer

Special ThanksDr. Garry LarsonSoCalBSI programSoCalBSI mentorsCity of Hope

FundingKomen for the CureNational Science

FoundationNational Institute of

HealthEmployment and

Workforce Development

Page 23: Considering positional bias in regulatory motifs of genes associated with breast cancer

References Xie X, Mikkelsen TS, Gnirke A, Lindblad-Toh K, Kellis M, Lander

ES. Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites. Proc Natl Acad Sci U S A. 2007 Apr 24;104(17):7145-50.

D. Smith, P. Sætrom, O. Snøve Jr, C. Lundberg, G. Rivas, C. Glackin and G. Larson. Meta-analysis of breast cancer microarray studies in conjunction with conserved cis-elements suggest patterns for coordinate regulation. BMC Bioinformatics 2008, 9:63