DNA Computing Charles Ormsby III CSE 497 4/15/2004.

51
DNA Computing Charles Ormsby III CSE 497 4/15/2004

Transcript of DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Page 1: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

DNA Computing

Charles Ormsby III

CSE 497

4/15/2004

Page 2: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Outline

• DNA Computing Characteristics

• Different Approaches

• Lipton’s Paper

– DNA Solution of Hard Computational Problems

• Practical Purposes

• Future Work/Funding

• References

Page 3: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

DNA Computing CharacteristicsDNA Computing Characteristics

(Advantages & Disadvantages)

Page 4: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

DNA Computation CharacteristicsDNA Computation Characteristics

Parallel Processing

Processes all possible solutions simultaneously!

Well kind of, but it is not instantaneous

AND, it is a Physical Process!

Therefore, the molecular steps required to process the solution set can take weeks

But, we are finding ways improve time efficiency! More To Come

Page 5: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

DNA Computation CharacteristicsDNA Computation Characteristics

Read/Write Rate of DNA

DNA replication rate = 500 base pairs per second- 10 times faster than human cells- Very low error rates

But only 1000 bits/sec? Compare to the data throughput of an average hard drive? SLOW!!!

Can anyone think of an advantage that DNA-based computers might have over the way today’s PC’s interact with memory?

http://www.arstechnica.com/reviews/2q00/dna/dna-2.html

Page 6: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

DNA Computation CharacteristicsDNA Computation Characteristics

…YES, copies of the replication enzymes can work on DNA in parallel

*Bonus* - Replication enzymes can start on the second replicated strand of DNA even before they're finished copying the first one. So already the data rate jumps to 2000 bits/sec

Electric computers are incapable of such a feat!

http://www.arstechnica.com/reviews/2q00/dna/dna-2.html

Page 7: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

DNA Computation CharacteristicsDNA Computation Characteristics

Read/Write Rate of DNA (cont’d)

Look what happens after each replicating iteration– number of DNA strands increases exponentially

• 2^n after n iterations

– Data rate increases by 1000 bits/sec per strand

After 10 iterations, replication rate = 1Mbit/sec

And, after 30 iterations it increases to 1000 Gbits/sec

This is well beyond the sustained data rates of the fastest hard drives!!!

http://www.arstechnica.com/reviews/2q00/dna/dna-2.html

Page 8: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

DNA Computation CharacteristicsDNA Computation Characteristics

Data density – { A, T, C, G}

Bases spaced every 0.35 nanometers

1-dimension = 18 Mbits per inch2-dimension = Over one million Gbits per square inch

(assuming one base per square nanometer)

Typical high performance hard drivedata density = 7 Gbits per square inch

A factor of over 100,000 smaller!!

http://www.arstechnica.com/reviews/2q00/dna/dna-2.html

Page 9: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

DNA Computation CharacteristicsDNA Computation Characteristics

Double stranded nature- Every DNA sequence has a natural complement

If S = ATTACGTCG

S‘ = TAATGCAGC, its complement

DNA’s complementary nature makes it a unique data structure for computation and can be exploited in many ways, such as Error Correction

Page 10: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

DNA Computation CharacteristicsDNA Computation Characteristics

DNA Error RatesDNA Error Rates• Biological error rate 1/10^9 copied bases• Hard drive read error rate 1/10^13

Error Correction: Errors occur due to many factors, for examples…– Incorrect insertions/deletions – Damage from thermal energy and UV energy from the sun

However, if the error occurs in one of the strands of double stranded DNA, repair enzymes can restore the proper DNA sequence by using the complement strand as a reference.

RAID 1 array

http://www.arstechnica.com/reviews/2q00/dna/dna-1.html

Page 11: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

DNA Computation CharacteristicsDNA Computation Characteristics

The Statistics of Randomness

Pertaining to Adleman’s method…

All HDPP’s paths are equally likely to be formed during the random production of sequences

In other words, over a large well distributed solution set, all solutions (or at least a great majority) should be present

*This is key because in order for the DNA computer to arrive at the correct solution, the solution must first exist in the solution set

Statistics – If only 99% of the solutions exist in the solution set than the method will have a successrate of only…?

Page 12: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Different ApproachesDifferent Approaches

Free Floating vs. DNA Chips

Page 13: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Free FloatingFree Floating

Approach 1: Bits of DNA float freely in a test tube – (pioneered by Leonard M. Adleman)

Page 14: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Free FloatingFree Floating

Advantages:- Strong general problem solving application- Increased freedom in experimentation

i.e. Immediate scalability by amplification(could the freedom also be also considered a disadvantage?)

- Can encode unique problems- Scales very well

Can you think of any other advantages?

HAHA, neither could I

Page 15: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

DNA-based ChipsDNA-based Chips

Approach #2: A gold-plated square of glass (one inch square) anchors as many as a trillion individual strands of DNA to the glass.

Microarrays

http://www.dhgp.de/ethics/ethics02.html

Page 16: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

DNA-based ChipsDNA-based Chips

Advantages:- Easier to handle, specific orientation- Keeps out impurities- Serves as a building block to scale upwards- Programmable interfaces (in the future)- Very useful for storing information about Bio-agents

Business Quiz:

Why is this approach more appealing to corporations and institutions who fund research?

Page 17: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

DNA-based ChipsDNA-based Chips

Can be manufactured!!! =

$$$$$$$$$$$$$$$$

Page 18: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Lipton’s PaperLipton’s Paper

DNA Solution of Hard Computational Problems

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 19: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Richard Lipton’s: DNA Solution of Hard Computational Problems

Two factors limit any computers performance

1) Parallel processing capabilities3 grams of water 1022 molecules

2) Computations per unit time100 million instructions per second

Human Time vs. Computation Time

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 20: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Richard Lipton’s: DNA Solution of Hard Computational Problems

State-of-the-Art Supercomputer– 100 million instructions per second– Biological computers are limited to only a

fraction of an experiment per second• Doesn’t the complexity of the experiment

determine the difference?

However, DNA computers counter the instruction time disparity with parallelism

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 21: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Richard Lipton’s: DNA Solution of Hard Computational Problems

Traveling Salesman Revisited– Conventional computer can solve tour with 70 cities, but

would fail with 100 or more cities• Even with 1023 parallel processors, Brute force is too inefficient

However, are DNA computers only advantageous for problems with very large solutions sets?

No, Adelman’s work can be extended to produce solutions to all problems that are obtainable and unobtainable by traditional CPUs in much less time

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 22: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Richard Lipton’s: DNA Solution of Hard Computational Problems

NP-complete The Satisfaction Problem (SAT)

SAT is a simple search problem, and was one of the first NP-complete problems

Consider:

F = (x V y) Λ (Γx V Γy)

Current Best Method: test all 2n solutions for ‘n’ variables

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 23: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Richard Lipton’s: DNA Solution of Hard Computational Problems

Truth Table

Current Best Method: test all 2n solutions for ‘n’ variables

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 24: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Richard Lipton’s: DNA Solution of Hard Computational Problems

Initial Assumptions/Conditions– This model is simple and idealized

• Ignores many known complex effects, but is an excellent first order approximation

– Strands of DNA are just sequences• α1,…, αk of the set {A,C,G,T}

– Double stranded DNA are a pair of sequences• For i = 1,…,k; given α1,…, αk and b1,…, bk both

sequences of the set {A,C,G,T}; α1 must complement b1, meaning AT or CG

– Only consider strands with a length of 20 nucleotides

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 25: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Richard Lipton’s: DNA Solution of Hard Computational Problems

Five Simple operations the can be performed on test tubes that contain DNA strands

1) Possible to synthesize a large number of copies of any single strand

2) Annealing produces a double strand from a single strand and its complementary strand

3) Given a test tube of DNA, one can extract a strand that contains some simple pattern of length ‘l’

4) Using a Polymearse Chain Reaction (PCR), one can detect whether there are DNA strands at all in the test tube

5) All of the DNA in the test tube may be amplified by replicating the strands in the test tube

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 26: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

The Theory

One fixed test tube– The set in the test tube corresponds to the

following graph Gn

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 27: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

All paths the travel from a1 to an + 1 encode an ‘n’-bit binary string

At each stage, a path has exactly two choices1) Unprimed node encodes a 12) Primed node encodes a 0

Therefore, the example path a1x’a2ya3 encodes 01 Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 28: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

1) Encode graph’s vertices in DNA

2) Encode edges in DNA

3) Encode starting and ending points in DNA

The Solution Set Discovery

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 29: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Step 1 - vertices in DNA

• The Graph is encoded in a test tube of DNA– Each vertex of the graph is assigned a random

pattern of length ‘l’ from {A,C,G,T}

• Each encoding is referred to as the name of the vertex and is comprised of two parts

1st half pi

2nd half qi

Therefore, each vertex can be referenced by piqi

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 30: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Step 2 - edges in DNAThen, fill a test tube with the following…

…For each vertex, add many copies of a 5’ 3’ DNA sequence of the form piqi

…For each edge i j, put many copies of a 3’ 5’ sequence that is of the form (ΓqjΓpi)

If…Vertex i = ATCGGCTACTCCTGACTTGA

pi = ATCGGCTACTqi = CCTGACTTGA

Vertex j = AGGTTCAGTCAGGCCTATTCpi = AGGTTCAGTCqj = AGGCCTATTC

Therefore, for edge I j a sequence like the following would be added…

Γqj = GGACTGAACT + Γpi = TCCAAGTCAGLipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 31: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Step 3 – end points in DNAThen, add the following DNA strands…

…Add a 3’ 5’ sequence of length ‘l /2’ that is complementary to the first half of the initial vertex

…Similarly, add 3’ 5’ sequence of length ‘l /2’ that is complementary to the last half of the final vertex

In other words, add Γp1 Γqn)

If initial vertex was…ACTTGCCATCTCCGATACTT And the final vertex was…TCGCCTAATCTACGATCTTA

then add…TGAACGGTAG + ATGCTAGAAT

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 32: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Goal of Initial Solution Set

KEY = That every legal path in Gn corresponds to a correctly matched sequence of vertices and edges*** Any path through the graph must contain a sequence that

alternates between vertex, edge, vertex, edge,...

Try this visual…Consider the edge v u, any path that passes through v and then passes through u must fit together like “bricks”

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 33: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

So, the top 5’ 3’ represents a series of vertices

Whereas, the bottom 3’ 5’ represents an edge

Furthermore…

Vertex ‘v’ is encoded as puqv

Edge ‘uv’ is encoded as Γ qv Γ pu

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 34: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Why is this ordering significant?

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 35: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

…the end of the vertex and the beginning of the edge can anneal because they are complementary!

Similarly, the end of the edge and the beginning of the next vertex can anneal too!

High Probability of No inadvertant paths1) Sequences are chosen at random2) The sequence lengths are large

After the annealing, all of the possible paths through the graph will be encoded into ‘n’-bit long DNA sequences

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 36: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Similarity Between Sequences

At any given vertex in a path, the choice is simply left or right, therefore, all paths are similar

What does this mean?All paths are equally likely to be formed during the

random production of sequences

In other words, over a large well distributed solution set, all solutions (or at least a great majority) should be present

***This is key because in order for the computer to arrive at the correct solution, the solution must first exist in the solution set

Statistics!

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 37: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Extraction Operations

NotationE(t,i,a), denotes all sequences in test tube ‘t’ where i == a

Perform one extract operation such that…

checks for the sequence that corresponds to the name of xl if

a = 1,

…and if a = 0, it check for x’l

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 38: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Extraction Operations

1) Construct a series of test tubes

Values Present

t0 = contains all sets {00,01,10,11}

t1 = E(t0, 1, 1) {10,11}

t’1 = remainder of t1 {00,01}

t2 = E(t’1, 2, 1) {01}

Pour t1 and t2 together to form t3

t3 = t1 + t2 {01,10,11}

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 39: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Extraction Operations

2) Construct a series of test tubes

Values Present

t4 = E(t3, 1, 0) {01}

t’4 = remainder of t4 {00,10,11}

t5 = E(t’4, 2, 0) {10}

Pour t4 and t5 together to form t6

t6 = t4 + t5 {01,10}

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 40: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Extraction Operations

3) Check to see if there are DNA strands available in t6

Those left in t6 are the satisfying assignment!

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 41: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Understanding How it Works

Test tube t3 consists of all the sequences that satisfy the first clause {01,10,11}

…and, similarly t6 consists of all those that satisfy the second clause and are contained in t3

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 42: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

More General Case

Any SAT problem on…

‘n’ variables, and

‘m’ clauses,

can be solved with at most ‘m’ extract steps

(with one detect step at end)

Lipton’s Acknowldegments

Operations are assumed perfect and without error

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

Page 43: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Practical PurposesPractical Purposes

Page 44: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

PurposesPurposesCounter Bioterrorism/Monitor Genetic Progression

Institute for Countermeasures against Agricultural Bioterrorism (ICAB):

Plan:

1) Obtain DNA sequences from crops, animals, bio-agents, etc.

2) Deploy DNA-chip technology to identify and characterize

3) Build geo-referenced information system

4) Predict and track the spread of bio-agents after introduction

5) Create powerful DNA-based tools for monitoring and enhanced diagnosis

DNA microarrays & DNA-based chips

- Can store 1,000 to 100,000 different diagnostic DNA sequences

Next generation will contain one million tags!

http://icab.tamu.edu/

Page 45: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

PurposesPurposesPredictive Gene Testing

http://www.dhgp.de/ethics/ethics02.html

Page 46: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Poker Playing

DNA Computing: 7th International Workshop on DNA Based Computers, Dna7, Tampa, Florida, June 10-13, 2001: Revised Papers

Page 47: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Weighted-Recursive Algorithms

DNA Computing: 7th International Workshop on DNA Based Computers, Dna7, Tampa, Florida, June 10-13, 2001: Revised Papers

Page 48: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

PessimismPessimism

1) Too fragile and prone to error

2) The field is dominated by hard-core enthusiasts who, will be forced to "slog through and do the heavy research" before there is a major breakthrough

http://www.jsonline.com/alive/news/0607dna.stm

OptimismOptimism

However, keep in mind the first commercially available electronic computer was not well received, and IBM in 1951 had to reinvent what they spent millions of dollars and years working on to fit customers needs (such as payroll)

Page 49: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

The Future of DNA ComputingThe Future of DNA Computing

Commercial application by 2010Alternative to traditional computing by 2020

Vision: Today we have not one but several companies making "DNA chips," where DNA strands are attached to a silicon substrate in large arrays (for example Affymetrix's genechip). Production technology of MEMS is advancing rapidly, allowing for novel integrated small scale DNA processing devices. The Human Genome Project is producing rapid innovations in sequencing technology. The future of DNA manipulation is speed, automation, and miniaturization

http://www.jsonline.com/alive/news/0607dna.stm

Page 50: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Research Funding

Funding: National Science Foundation

Pentagon's Defense Advanced Research Projects Agency - Much of the military's interest arises from the increasing sophistication of encryption techniques that other countries can use to encode their data. As a result, Washington needs ever-more-powerful computers for code breaking

Page 51: DNA Computing Charles Ormsby III CSE 497 4/15/2004.

Internet References

http://chronicle.com/data/articles.dir/art-44.dir/issue-4.dir/14a02301.htmhttp://www.jsonline.com/alive/news/0607dna.stmhttp://www.arstechnica.com/reviews/2q00/dna/dna-1.html

Book/Papers References

Lipton, Richard J., DNA Hard Solution of Computational Problems. Science, New Series, Vol. 268, No. 5210 (April 28, 1995), 542-545

DNA Computing: 8th International Workshop on DNA Based Computers, Dna8, Sapporo, Japan, June 10-13, 2002: Revised Papers (Lecture Notes in Computer Science, 2568)

DNA Computing: 7th International Workshop on DNA Based Computers, Dna7, Tampa, Florida, June 10-13, 2001: Revised Papers

Future References

http://www.nas.nasa.gov/http://www.nas.nasa.gov/Research/Reports/reportsarchive.html