Motivation

17
SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative gnome hybridization data

description

SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative gnome hybridization data. Motivation. Chromosomal changes cause genetic diseases aneusomies Easy to detect Copy number changes of genes Not so easy. Array CGH. - PowerPoint PPT Presentation

Transcript of Motivation

Page 1: Motivation

SW-ARRAY: a dynamic programming solution for the identification of copy-number

changes in genomic DNA using array comparative gnome

hybridization data

Page 2: Motivation

Motivation

• Chromosomal changes cause genetic diseases– aneusomies

• Easy to detect

– Copy number changes of genes• Not so easy

Page 3: Motivation

Array CGH

• Comparative Genome Hybridization CGH to DNA microarrays

• Method for detecting copy number changes– Data analyzed using thresholds– Not reliable to detect single-copy gains or losses

when using large insert clones as probes – High false positives and false negatives– Inconsistent for probes of different chromosomal

regions

• Cannot be used for clinical diagnostic applications!

Page 4: Motivation

Data Adjustment

• Normalization and Correction– Reason: variations between probes– Control vs. control data ratio

• Find mean and SD

– Divide control vs. test ratios by that mean

Page 5: Motivation

Threshold method

• Compare each data from control vs. test experiment to threshold values– Below 0.8=deletion– Above 1.2=polysomy

Page 6: Motivation

SW-ARRAY

• Smith-Waterman algorithm adapted for Array CGH

• New way to analyze Array CGH data

• Reason:– Log ratio data is contiguous one-dimensional

series, where locations of high values may indicate polysomic regions, low deletions

Page 7: Motivation

SW-ARRAY

• Step 1:– Remove outlying probes

• Log intensity ratio more than 2.5 MAD away from median of other probes in array

• MAD=Mean Absolute Deviation– Robust measure of Standard Deviation

1

1 n

iix x

n

Page 8: Motivation

SW-ARRAY

• Step 2:– Log ratio data - t0

– Ensures that the mean of adjusted data is negative

• t0=median + 0.2 x MAD

Page 9: Motivation

SW-ARRAY

• Step 3:– Search for high-scoring islands

• Definition– locally high-scoring segment-a positive

scoring segment whose score cannot be increased by shrinking or expanding segment boundaries

Page 10: Motivation

SW-ARRAY

( , ) ( )q

i pT p q X i

T(p,q)=score of segmentX(i)=score for the pth probe ordered along genome

Page 11: Motivation

SW-ARRAY

S(p)=score of island ending at pB(p)=beginning point of the islandS(0)=0P>0

Page 12: Motivation

SW-ARRAY

• Iterate through locations along gene probes

• Search where scores>0– Find max-scoring island– Record data– Set island=0– Find next max-scoring island

Page 13: Motivation

SW-ARRAY

• Statistical Significance– In 1000 runs with permuted log ratios for each

probe• find frequency of highest scoring island in each run

Page 14: Motivation

Experiment

• Test Group– DNA from subjects with well-characterized

monosomies

• Control groups

• Data analyzed using 2 methods– Threshold– SW-ARRAY

Page 15: Motivation

Experiment Results

• Threshold Method– 78.1% correct identification of copy-number

changes

• SW-ARRAY– Identified 13/14 of the monosomic regions

with high significance levels in the 14 blind tests

Page 16: Motivation

Ideal Conditions for SW-ARRAY

• numerious probes border region of copy number change

• long sequences for which edge effects are minimized

Page 17: Motivation

Output