Model for Evaluation of DNA Synthesis Created by: Ori Kaplan Gilad Myerson Supervised by: Gregory...

Post on 06-Jan-2018

223 views 1 download

description

New Approach Prof. Udi Shapiro / Gregory Linshiz: New confidential method of in-vitro DNA molecule synthesis. Goal – synthesize DNA quicker, easier and cheaper. Part of this method, involves elongation of oligonucleotides. Elongation success rate (until now) ≈ 80-90%.

Transcript of Model for Evaluation of DNA Synthesis Created by: Ori Kaplan Gilad Myerson Supervised by: Gregory...

Model for Evaluation of DNA SynthesisModel for Evaluation of DNA Synthesis

Created by:Ori KaplanGilad Myerson

Supervised by:Gregory Linshiz, Weizmann instituteProf. Udi Shapiro, Weizmann institute

Synthesizing DNASynthesizing DNACurrently, there are few successful ways of

synthesizing DNA.

Most common - Assembly PCR.

Methods are costly and take much time (±3 weeks from order to delivery of a DNA strand).

ABI 3900Mer-Made6

New ApproachNew Approach

Prof. Udi Shapiro / Gregory Linshiz:New confidential method of in-vitro DNA molecule

synthesis.

Goal – synthesize DNA quicker, easier and cheaper.

Part of this method, involves elongation of oligonucleotides.

Elongation success rate (until now) ≈ 80-90%.

New ApproachNew Approach

Elongation of DNA includes…..

Since the elongation of oligonucleotides in-vitro is done on the pattern of synthetic DNA strands, we will give a brief explanation of synthetic oligonucleotide synthesis.Oligonucleotide synthesis is a remarkably simple process that has far reaching implications. Oligonucleotide synthesis is extremely useful in laboratory procedures. It is used to make primers crucial in methods such as PCR replication. Making a custom oligonucleotide is additionally useful because they will only bind to the region of DNA that is complementary to your custom oligonucleotide sequence. This allows specific segments of DNA to be amplified. In addition, custom oligonucleotide synthesis allows other sequences, such as restriction sites, to be added on to the desired oligonucleotide. Custom oligonucleotides are generally 50 bases in length which can limit how many additional sequences can be added on to the desired primer sequence.Oligonucleotides are synthesized by using DNA Phosphoramidite Monomer Bases as building blocks. The monomer bases active sites are all chemically blocked in such a way that they can be unblocked at will by use of unblocking solutions. The oligonucleotide synthesis involves 4 stages:Stage 1: De blockingThe first base, which is attached to the solid support, is at first inactive because all the active sites have been blockedor protected. To add the next base, the DMT group protecting the 5'-hydroxyl group must be removed. This is done by adding a base. The 5’-hydroxyl group is now the only reactive group on the base monomer. This ensures that the addition of the next base will only bind to that site.Stage 2: Base condensationThe next base monomer cannot be added until it has been activated. This is achieved by adding tetrazole to the base. The active 5’-hydroxyl group of the preceding base and the newly activated phosphorus bind to loosely oin the two bases together. Stage 3: CappingThe unbound, active 5’-hydroxyl group is capped with a protective group which subsequently prohibits that strand from growing again. This is done by adding acetic anhydride and N-methylimidazole to the reaction column.Stage 4: OxidationIn order to stabilize the phosphate linkage, a solution of dilute iodine in water, pyridine, and tetrahydrofuran is added to the reaction column, oxidizing and strengthening it.

Top Secret

SequencingSequencingAfter the DNA synthesis procedure ,sequencing

the new molecules will indicate if the right molecule was synthesized.

A chromatogram of DNA synthesis:

ChromatogramChromatogramWhat does a chromatogram portray?

“Clean” chromatogram –all molecules are identical

“Noisy” chromatogram – inexplicit

All A

Some A Some T

The problemThe problemLets assume this is the sequencing result:

I. Is the experiment successful???II. What needs to be changed in order to

improve method?pH, temp, polymerase, dNTP’s,

concentrations…

Noise

The problem contd..The problem contd..Which result is better…?

Conventional AnalysisConventional AnalysisCLONE TO UNDERSTAND THE SEQUENCINGIsolation cloning:

Isolate single molecules read exact sequence.

Cloning several oligos gives an insight to the methods' degree of success.

Theoretically, clone all in order to see if experiment was successful.

Weizmann’s requestWeizmann’s requestCloning – very long, hard and expensive.

Please try figure out a way to asses the degree of success “visually” using the chromatogram…

OK…OK…אם נחייך יחשבו

שאנחנו מבינים???

ננסה בכל מקרה

OK…OK…

יש לי יש לי יש לי...

A Solution ???A Solution ???

Lets treat the graph like LEGO© and see what we can do with the pieces…

Perfect SequencingPerfect SequencingA C T G

C A C T G A C A C G C T T A C T G C C G

10 molecules

Mutations occurMutations occur

“Dirty” chromatogram

deletion

deletion

deletion

insertion

substitution

Two ways to try understand graphTwo ways to try understand graph

Sequence every single oligonucleotide

(isolation cloning)

Impossible

Sample sequencing and assessment of result

Statistically inaccurate

Another OptionAnother Option

Mathematically “Build” oligonucleotide molecules in such a way that the accumulated graph of those molecules will be identical to the chromatogram

Graph Graph Table Table

A1018271361

G936136136

C1010282646113613971

T1013961361

If I had 10 20-nucleotide long molecules – how many bases of each kind do I have in each “place”?

Table Table Molecules MoleculesA1018271361

G936136136

C1010282646113613971

T1013961361

Random procedure

Molecules Molecules Graph Graph

New ProblemNew ProblemHow do we choose the 100 molecules that build

graph?

Linear – too many options to check O(4n)!

Choose 100 from 4n.

If oligo is 100 nucleotides long n = 100.Choose 100 molecules from 1.6*1060

nk

1.6*1060

100= ≈

OK…OK…

תחייך – אולי יתנו 100לנו

ננסה...

OKOK……

יש לי יש לי יש לי...

The problemThe problemDon’t choose from all possibilities, assume that each

molecule has only one mutation – Edit Distance 1

Reduced molecules: 4n 8n

Select 100 molecules from 800 (instead of 1.6*1060)

OR OR

Still a problemStill a problemHow do we choose 100 molecules from 800?

Linear:n!

k!(n-k)!nk

1.6*1060

100== = 3*10129 possibilities

Genetic AlgorithmGenetic Algorithm

Genetic algorithmGenetic algorithmDefine initial mutation rates:

deletions , insertions (?), substitutions (?)

Normalize graph and convert graph to matrix (4xn).

Build a molecule bank of “Edit Distance 1”.

PopulationPopulationThere is a population of 100 – each entity in

population represents a single result.

Each result consists of 100 molecules(from the ED1 bank) that build up a graph.

The population is initialized using the mutation rate.

100

100

One result

Evaluation functionEvaluation functionThe current Evaluation function is:

F(e) = ∑|Mij – Rij|

In the future the function will take amount of substitutions into consideration.

F(e) ∑experiment result

GenerationGenerationGeneration Policy (current):

Replication – Always replicate best 10.Crossover – Biased choice of entities for crossover.Mutations – i: mutate best 10.

ii: randomly mutate the whole pop.

Local Minimum Policy:20 generations without improvement – shake pop.

File HandlingFile HandlingSequence data is initially in *.ab1 filesIn order to utilize data:

Retranslate *ab1 file – Sequencing Analysis

Convert *.ab1 *.txt – Bioedit

Manage *.txt – Excel (also calculate del rate)

Genetic Algorithm

No mutations - beforeNo mutations - before

1

No mutations - afterNo mutations - after

1

10*del at 1, 10*del at 910*del at 1, 10*del at 9

1

10*del at 1, 10*del at 910*del at 1, 10*del at 9

1

10*del at 1, 10*del at 910*del at 1, 10*del at 9

1

10*del at 1, 10*del at 910*del at 1, 10*del at 9

1

15 scattered subs - before15 scattered subs - before

1

15 scattered subs15 scattered subs

1

SetbacksSetbacksED1 – Result will never be 100% correct.

Genetic Algorithm setbacks:heuristic, different final results, local min, evaluation function…

No indication if results are correct.

Algorithm deals with successful experiments.

Data input – noise interpretation, normalized data.

AdvantagesAdvantages

New method of sequencing analysis.

Potentially save many hours of isolation cloning.

Mathematically – result is correct.

Development potential for different areas of research.

Personal ViewPersonal ViewThrown into deep water swam.

Idea will (hopefully) be practical and useful.

Learned a great deal – new programs, languages, methods.

Mathematical analysis of chromatogram sequencing – ever done before???

Thank you