Considering positional bias in regulatory motifs of genes associated with breast cancer

Post on 24-Feb-2016

39 views 0 download

Tags:

description

Considering positional bias in regulatory motifs of genes associated with breast cancer. Nathaniel Gustafson Dr. Garry Larson (City of Hope). Cancer Studies. Can we tie genetic variation ( eg . SNPs) to Cancer Risk? Myriad Genetics found BrCa1 and BrCa2 - PowerPoint PPT Presentation

Transcript of Considering positional bias in regulatory motifs of genes associated with breast cancer

Considering positional bias in regulatory motifs of genes associated

with breast cancerNathaniel Gustafson

Dr. Garry Larson (City of Hope)

Cancer StudiesCan we tie genetic variation (eg. SNPs) to

Cancer Risk?Myriad Genetics found BrCa1 and BrCa2

Mutations in BrCa1/2 tied to 800% increase in breast cancer risk

Most research is on exonic regionsChanges protein composition

http://members.cox.net/amgough/Fanconi-genetics-genetics-primer.htm

Our approachWhat about regulatory regions?Motif:

Recurring sequence, usu. 6-20 bp. Generally functional

Hypothesis: Regulatory motifs upstream of the transcriptional start site (TSS) may play some role in breast cancer stopATG

3`5`

5` - upstream

YGCGYRCGCATCMNTCCGYTGAYRTCAGCTNWTTGK...

Disease Mutations in Phylogenetically Conserved Motifs

Sequ

enc e

ph y

loge

n y

G A C C T A C T A C A

Orthologous bases: identical by descent

NonorthologousBases in red

G A G C T A C T A C T

~5 myr

G A C T T A A T T C A~70 myr

G A G C T A C - A G A

~300 myr

G A G T T A A T G G T

~475 myr

G A C C T T C T A C ABrCa Pt.

Mutation

BackgroundMeta-Analysis pools several brca ER+/-* studiesStatistics used to find genes that have consistent

differences in expression levels in ER+ vs ER- cell lines

*ER = Estrogen Receptor – a common way of classifying breast cancer cells

GCCATnTT x 50

GCCATnTT x 9

AimsInvestigate regulatory motifs for these genesCompare occurrences of each motif across

gene sets

Hypothesis: genes overexpressed in the same tumor type

share motifs

Weak Results

Are we missing the signal?

Old counting method

15 10

-2000 -1500 -1000 -500 TSS

ER+ < ER- gene setmotif occurrences

P-val: .30

ER+ > ER- gene set motif occurrences

-2000 -1500 -1000 -500 TSS

NOT significant

Are we missing the signal?

New counting method: use position bias

12

3-2000 -1500 -1000 -500 TSS

ER+ < ER- gene setmotif occurrences

P-val: .03

ER+ > ER- gene set motif occurrences

-2000 -1500 -1000 -500 TSS

significant

ToolsPerl

Handy scripting languageGreat for parsing textual data

mySQLStorage and retrieval of structured data

www.yusoft.net/yu-graph/main/logo-mysql.jpg

ProblemsLack of data specificity

What do Xie’s pos. biases mean?Insufficient data

Needed position of motif relative to TSSImproperly annotated data

Position shown to be inconsistentCollaboration

Norway is about 10 time zones away

Results

Motif

1down count

1up cnt

5down count

5up cnt

pos. bias

Pvaltop1

Pvaltop5

SCGGAAGY 5 8 36 71 -240.4011

60.0001

9... ... ... ... ... ... ... ...

Motif

1down count

1up cnt

5down count

5up cnt

pos. bias

PvalTop1

Pvaltop5

SCGGAAGY 31 41 168 206 -24 0.10299

0.00719

... ... ... ... ... ... ... ...

Reading 100 bp from positional bias

No window (Previous results)

Any SNPs in this motif?One SNP was found from HapMap in this

motifBut it was at a degenerate position (eg. Y =

C or G)= still satisfied the motif

Might still affect expression

Biological SignificanceSCGGAAGY found more in ER+

overexpressed genesKnown as a binding site for ELK-1

Might provide some insight into ER+/ER- cell differentiation

Verification in vivo remains to be done

3’UTR Motif List-6/7mer miRNA seeds-Phylogenetic conser. motifs

HapMap

BrCa GWASDatasets

Hunter, et al(CGEMS)

Gold, et al.(MSKCC)

Easton, et al.(UK)(unavailable)

Stacey (deCode)(unavailable)

SNP_list

SNPs Rank &Biological Testing

BrCa SomaticMutations (Sjöblom)

Linkage Studiesin BrCa (Smith, et al.)

LOH (aCGH)in BrCa

Thermodynamic Profiling(STarMir, PITA)

In-House IndependentAssociation Studies

3’UTR-luc Fusion Assay

Reciprocal AllelicTesting-Effect

Evolutionary Conser-vation (miRNA seeds)

LDMappingProxySNPs

Allele frequency inHapMap Population(s)

Reciprocal Allelictesting-no effect

Additional BiologicalTesting

GWAS“Genome Wide Association Study”Genotypes cases and controls at thousands of

lociIntended to be an unbiased approachPotentially identifies pertinent mutations

http://www2.bioinformatics.tll.org.sg/img/species/karyotype_Homo_sapiens.png

Study Assay Platform

Cases/ Controls

Comment_1 Comment_2 Public Dataset

Hunter, et al.(Nat Genet 39, 2007)

IlluminaHap 550

(keep 528K)

1,145 / 1,142 Prospective, post-menopausal women

Logistic Regression

YES(CGEMS)

Easton, et al.(Nature 447, 2007)

Affy, 266k SNPs

(keep 227k)

Stage I - 380 / 364Stage 2-3,990 /3,916 ctrlsStage 3-21,860/22,578 ctrls

Stage 1-Cases (2 first-degree relatives with Fam Hx)

3 stage associationStage 2-top 5% of stage 1Stage 3-Top 30 SNPs from Stage 2TNRC9 high score

NO

Stacey, et al(Nat Genet 39, 2007)deCode Dataset

Illumina Hap300

(keep 311k)

1,600 Icelandic cases/ 11,563 ctrls

Top 10 SNPs GTP’d in 2nd Icelandic sample and 2-3 ind. European cohorts

1 SNP strong LD with 9995 BRCA2-removed from studyFound SNP near TNRC9

NO

Gold, et al.(PNAS 105March, 2008)

Affy GTP 435K SNPs(keep 150k)

249 AJ Fam Hx (3 cases, BRCA1 & 2 neg) vs.299 Ca-free AJ ctrls

3 stage design Reproduced FGFR2 region

MAYBE?

BrCa GWAS Datasets

Study Assay Platform

Cases/ Controls

Comment_1 Comment_2 Public Dataset

Hunter, et al.(Nat Genet 39, 2007)

IlluminaHap 550

(keep 528K)

1,145 / 1,142 Prospective, post-menopausal women

Logistic Regression

YES(CGEMS)

Easton, et al.(Nature 447, 2007)

Affy, 266k SNPs

(keep 227k)

Stage I - 380 / 364Stage 2-3,990 /3,916 ctrlsStage 3-21,860/22,578 ctrls

Stage 1-Cases (2 first-degree relatives with Fam Hx)

3 stage associationStage 2-top 5% of stage 1Stage 3-Top 30 SNPs from Stage 2TNRC9 high score

NO

Stacey, et al(Nat Genet 39, 2007)deCode Dataset

Illumina Hap300

(keep 311k)

1,600 Icelandic cases/ 11,563 ctrls

Top 10 SNPs GTP’d in 2nd Icelandic sample and 2-3 ind. European cohorts

1 SNP strong LD with 9995 BRCA2-removed from studyFound SNP near TNRC9

NO

Gold, et al.(PNAS 105March, 2008)

Affy GTP 435K SNPs(keep 150k)

249 AJ Fam Hx (3 cases, BRCA1 & 2 neg) vs.299 Ca-free AJ ctrls

3 stage design Reproduced FGFR2 region

MAYBE?

BrCa GWAS Datasets

YES

Bring this...SM70 SG74 LF52 SM17 SH14 L5721 SM56 SF63 L5957 L5349 L5420 L5713 SH5 LF48 SJG4 L6029 SG21 L5352 L6121 SG69 L5952 SM78 SM113 SF23 L5573 SN6 SF1 SM91 L5895 L5518 L5501 L5328 L5772 SG08 SG28 SM52 SM106 SM67 L5463 L5494 SA17 L5796 L6014 SN15rs2180341 chr6 127642323 + ncbi_b35 MSKCCOffit AffyEAv3 PhaseIGold_et_al TT TT TT CT CT CC TT CT TT TT TT TT CT TT CT TT CT TT TT CT TT TT TT CT CT CT CT CT CT CT TT CT CT TT TT CT CC TT TT CT TT TT CT TT CT CT CT CT TT TT TT CT CT CT TT TT CT TT TT TT TT TT TT CT TT TT TT TT CT TT TT CT TT TT TT CT CT TT CT CT TT CT TT ...rs6569480 chr6 127663441 + ncbi_b35 MSKCCOffit AffyEAv3 PhaseIGold_et_al GG GG GG GG AG AA GG AG GG GG GG GG AG GG GG GG AG GG GG AG GG GG GG AG AG AG AG AG AG AG GG AG AG GG GG AG AA GG GG AG GG GG AG GG AG AG AG AG GG GG GG AG AG AG GG GG AG GG GG NN GG GG GG AG GG GG GG GG AG GG GG AG GG GG GG AG GG GG AG AG GG AG GG...

rs_num chr pos analysis_name p_value OR_het OR_hom build

rs10510126 chr10 124992475chi square - genotype 4e-06 0.5918 0.6387 ncbi_b36

rs10510126 chr10 124992475chi square - allele 2e-06 0.5918 0.6387 ncbi_b36

To this...

Future WorkWe’ve digested the Gold data set

Employ this in the triage for producing a gene list

Combine with other triage methods to find the most interesting genes Test these in vivo

Special ThanksDr. Garry LarsonSoCalBSI programSoCalBSI mentorsCity of Hope

FundingKomen for the CureNational Science

FoundationNational Institute of

HealthEmployment and

Workforce Development

References Xie X, Mikkelsen TS, Gnirke A, Lindblad-Toh K, Kellis M, Lander

ES. Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites. Proc Natl Acad Sci U S A. 2007 Apr 24;104(17):7145-50.

D. Smith, P. Sætrom, O. Snøve Jr, C. Lundberg, G. Rivas, C. Glackin and G. Larson. Meta-analysis of breast cancer microarray studies in conjunction with conserved cis-elements suggest patterns for coordinate regulation. BMC Bioinformatics 2008, 9:63