Post on 09-Jul-2020
Sequence Design for DNA Sequence Design for DNA ComputingComputing
2004. 10. 16Advanced AI
Soo-Yong Shin and Byoung-Tak ZhangBiointelligence Laboratory
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
DNADNA
A single-stranded DNA molecule is a sequence over four possible nucleotides
Hydrogen bonds Hybridization
Watson-Crick Complement
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Glossary TermsGlossary Terms(in these slides)(in these slides)
Duplex♦Double stranded DNA strands
Library♦A set of DNA strands for DNA computing
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Role of DNA SequencesRole of DNA Sequences
Short DNA strands are the units of information storage and manipulation in a computation process.♦ Just like a computer memory
Usually, in a DNA computing a long strand is the solution of the given problem, which is a typically concatenation of short DNA strands.
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Role of DNA SequencesRole of DNA Sequences
DNA bases DNA strands: represents each city. ⇒ information
Concatenation of DNA strands ⇒ Computing process
We have to design DNA strands very carefully.
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence DesignSequence Design
Design the sequence set that correctly assembly them into the desired longer molecules♦ To form stable duplexes with only their
complements.♦ Two distinct strands are non-interacting
Between pairs of strandsBetween a strand and the Watson-Crick complement of another
relatively unstable, compared with any perfectly matched duplex formed from a DNA strand and its complement
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence DesignSequence Design
5’-ATGCATGCAT-3’
3’-TACGTACGTA-5’
5’-AACCTTGGAC-3’
3’-TAGGATCAGA-5’
5’-ATGCATGCAT-3’3’-TACGTACGTA-5’
Desired output
Unexpected outputs
5’-ATGCATGCAT-3’3’-TAGGATCAGA-5’ ΔG
ΔG
>
5’-ATGCATGCAT-3’3’-TACGTACGTA-5’
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence Design ExampleSequence Design Example
SCTCA
PACGT
TGTTA
¬RTGAC
¬Q ¬P RGACT TGCA ACGT
¬S ¬T QGAGT CAAT CTGA
¬S ¬T QGAGT CAAT CTGA ¬P R
GACT TGCA ACGTSCTCA
PACGT
TGTTA
¬RTGAC
¬S ¬T QGAGT CAAT CTGA
GACT TGCA ACGTCTCA
PACGT
GTTA
¬RTGAC
15mer for each variable
? , , , ,
RPTSQTSRQP →∧→∧
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
¬ Q ∨ ¬P ∨ R :5’ 3’
Q ∨ ¬T ∨ ¬S :3’ 5’
S : 5’ 3’
T : 3’ 5’
P : 5’ 3’
R : 5’ 3’
¬R : 3’ 5’
Sequence Design ExampleSequence Design Example
CGT ACG TAC GCT GAA CTG CCT TGC GTT GAC TGC GTT CAT TGT ATG
GTC AAC GCA AGG CAG
TTC AGC GTA CGT ACG TCA ATT TGC GTC AAT TGG TCG CTA CTG CTT
AAG CAG TAG CGA CCA
ATT GAC GCA AAT TGA
CAT ACA ATG AAC GCA
TGC GTT CAT TGT ATG
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence Design ExampleSequence Design Example-- Reaction in a Test TubeReaction in a Test Tube
R ∨ ¬P ∨ ¬Q S
Q ∨ ¬T ∨ ¬S T P
¬R
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence Design ExampleSequence Design Example-- Hybridization and Hybridization and LigationLigation
GTA TGT TAC TTG CGT CAG TTG CGT TCC GTC AAG TCG CAT GCA TGC
R ∨ ¬P ∨ ¬Q
CAT ACA ATG AAC GCA
¬R ACC AGC GAT GAC GAA
S
TTC AGC GTA CGT ACG TCA ATT TGC GTC AAT TGG TCG CTA CTG CTT
Q ∨ ¬T ∨ ¬S
AGT TAA ACG CAG TTA
T GTC AAC GCA AGG CAG
P
GTA TGT TAC TTG CGT CAG TTG CGT TCC GTC AAG TCG CAT GCA TGC
R ∨ ¬P ∨ ¬Q
CAT ACA ATG AAC GCA
¬R
ACC AGC GAT GAC GAA
S
TTC AGC GTA CGT ACG TCA ATT TGC GTC AAT TGG TCG CTA CTG CTT
Q ∨ ¬T ∨ ¬SAGT TAA ACG CAG TTA
TGTC AAC GCA AGG CAG
P
GTA TGT TAC TTG CGT CAG TTG CGT TCC GTC AAG TCG CAT GCA TGC
R ∨ ¬P ∨ ¬Q
CAT ACA ATG AAC GCA
¬R
ACC AGC GAT GAC GAA
S
TTC AGC GTA CGT ACG TCA ATT TGC GTC AAT TGG TCG CTA CTG CTT
Q ∨ ¬T ∨ ¬S
AGT TAA ACG CAG TTA
T
GTC AAC GCA AGG CAG
P
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Research DirectionsResearch Directions
Theoretical models, to study general properties of librariesTheoretical models, to estimate bounds on the size of a libraryAlgorithms to design the librariesSoftware tools
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Theoretical ModelsTheoretical Models
Related to the coding problem can be derived from classical theory of codes.♦ Ex) error correcting codes
Watson-Crick complementarity is a new feature to be considered.
H-system or splicing system
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Research DirectionsResearch Directions
Theoretical models, to study general properties of librariesTheoretical models, to estimate bounds on the size of a libraryAlgorithms to design the librariesSoftware tools
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Strand Design CriteriaStrand Design Criteria
Preventing undesired reactionsControlling the secondary structuresControlling the chemical characteristics of libraryRestricting DNA sequences
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Preventing Undesired ReactionsPreventing Undesired Reactions
Forces the library to form the duplexes between a given DNA strand and its complement only.♦Hamming distance♦Reverse complement Hamming distance♦ Similarity♦H-measure♦ 3’-end H-measure
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Preventing Undesired ReactionsPreventing Undesired Reactions
Similarity♦ Simple Hamming distance with (or without)
position shifts♦Compared the sequences with other sequences for
the same direction
5’-ATGCATGC-3’5’-ACCAATCG-3’
Similarity = 3
5’-ATGCATGC-3’5’-ACCAATCG-3’
Similarity = 2
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Preventing Undesired ReactionsPreventing Undesired Reactions
H-measure♦ Simple Hamming distance with (or without)
position shifts♦Compared the sequences with other sequences for
the opposite direction♦ To make duplex only at the planned positions
5’-ATGCATGC-3’3’-GCTAACCA-5’
H-measure = 1
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Controlling the Secondary Controlling the Secondary StructuresStructures
Secondary structures are usually formed by the interaction of single stranded DNA.♦ Internal loop, hairpin loop, bulge loop, and so on.
Prediction methods♦ Thermodynamic approach♦Continuity
It can be encouraged or prohibited by the target problem.
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
DNA StructuresDNA StructuresSantaLucia. Jr., Annu. Rev. Biophys. Biomol. Struct. 2004, 33:415-440, Fig. 1
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
DNA Structures for DNA DNA Structures for DNA ComputingComputing
Self-assembly computation Winfree et al., Nature, 394: 539-544
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
DNA Structures for DNA DNA Structures for DNA ComputingComputing
Whiplash PCR
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Controlling the Secondary Controlling the Secondary StructuresStructures
Thermodynamic approach♦ Based on nearest neighbor parameters and
dynamic programming♦Mfold♦Vienna RNA package
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Controlling the Secondary Controlling the Secondary StructuresStructures
Continuity♦Reduce continuous occurrence of the same base
more than threshold.♦ If the same base appears continuously, a reaction
is not well controllable since the structure of DNA will become unstable.
5’-ATGGGGGCATGC-3’
Continuity = 5
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library
It is desirable to have similar chemical characteristics for the successful DNA operations.♦ Free energy♦Melting temperature♦GC ratio
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Free energy♦ The energy to make a duplex ♦Actually, it is defined as the energy required to
break a duplex♦ The most reliable measure for the relative stability
of a DNA duplex♦ Easily calculated by the nearest neighbor model
Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Nearest neighbor parameters
Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library
SantaLucia. Jr., Annu. Rev. Biophys. Biomol. Struct. 2004, 33:415-440, Table 1
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Melting temperature♦ The temperature at which 50% of DNA strands
and its perfect complement are in duplex.
)4/|ln(| Tm CRS
HT+°∆
°∆=
R : Boltzmann’s constant (1.987 cal/(K mol))[C]] : total molar strand concentrationT : Kelvin
∑
∑
∈
∈
°∆+°∆+°∆=°∆
°∆+°∆+°∆=°∆
}{
}{
stackskkinitends
stackskkinitends
SSSS
HHHH ΔH : EnthalpyΔS : Entropy
Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
GC ratio♦ The percentage of G or C in a whole DNA
sequence♦ The most simple method, but unreliable
Controlling the Chemical Controlling the Chemical Characteristics of LibraryCharacteristics of Library
5’-ATGGTTGCATGC-3’
GCratio = 50%
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Restricting DNA SequencesRestricting DNA Sequences
Restriction of the composition (DNA base or subsequence) of a DNA sequence.♦One of four DNA bases is reserved for the special
purposes.♦ Special DNA sequences such as restriction enzyme
site
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Research DirectionsResearch Directions
Theoretical models, to study general properties of librariesTheoretical models, to estimate bounds on the size of a libraryAlgorithms to design the librariesSoftware tools
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence Design AlgorithmSequence Design Algorithm
Exhaustive searchRandom searchTemplate-map strategyGraph methodStochastic methodsDynamic programmingEvolutionary algorithmsBiological-inspired methods
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence Design AlgorithmSequence Design Algorithm
Exhaustive search♦Hartemink et al. DNA4, pp. 227-235, 1998.
Random search♦ Penchovsky and Ackermann, Journal of Comput.
Biology, 10(2): 215-229, 2003.
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence Design AlgorithmSequence Design Algorithm
Template-map strategy♦ Wenbin Liu et al. J. Chem. Inf. Comput. Sci. 2003, 43, 2014-2018
Template 10010011
Map 10100101
Sequence TCGACGAT
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence Design AlgorithmSequence Design Algorithm
Graph method♦ Feldkamp et al. GPEM 4: 153-171, 2003.
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence Design AlgorithmSequence Design Algorithm
Stochastic methods♦ Simulated annealing♦ Tanaka et al., DNA7, pp. 179-188, 2001.
Dynamic Programming♦Marathe et al., DNA5, pp. 75-89, 1999.
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence Design AlgorithmSequence Design Algorithm
Evolutionary algorithm♦Deaton et al., Physical Review Letters, 80(2): 417-
420, 1998.♦ Shin et al., IEEE Trans. Evolutionary Computation.
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence Design AlgorithmSequence Design Algorithm
Biological-inspired methods♦Deaton et al. DNA8, pp. 196-204, 2002.
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence Design AlgorithmSequence Design Algorithm
Biological-inspired methods
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Research DirectionsResearch Directions
Theoretical models, to study general properties of librariesTheoretical models, to estimate bounds on the size of a libraryAlgorithms to design the librariesSoftware tools
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Requirements of DNA Sequence Requirements of DNA Sequence Design SystemsDesign Systems
Sequence reliabilityUser friendlinessAnalysis capabilitySequence reusability
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
NACST/SeqNACST/Seq
Sequence Generator♦ Based on MOEA♦ Using 6 objectives
- GC Ratio, Tm, Continuity, Hairpin, H-measure, Similarity
Each run of MOEA
Selected Pareto optimal
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequence GenerationSequence Generation
① Generation Options ② Sequence Structure ③ Sequence Options
④ Fitness Options⑥ Genetic Algorithm Options ⑤ Fitness Parameter Setting
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
NACST/ReportNACST/Report
① Show Fitness Result- Each objective values of generated pools.- User can get fitness information of each pool.
② Compare two pools- Visualizes the superiority of each fitness comparing sequences of two pools.- User can select pools arbitrarily.
③ Analyze a pool- Shows each nucleotide.- Finds the given subsequence.- Finds the given complementary sequence. - Finds continuous occurrence of each nucleotide.- User can choose a pool arbitrarily.
© 2004, SNU Biointelligence Lab, http://bi.snu.ac.kr/
NACST/PlotNACST/Plot① Project Plot- Plot fitness results.- Plots comparison result of two pools.- User can browse plotting history.- Plotted graphs can be saved as postscript file.
② Data Plot- Plots arbitrary data from a given file.
Comparison graphFitness results graph
Data plot