Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in...

213
Copyright c 2005 by Peng Yin All rights reserved

Transcript of Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in...

Page 1: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Copyright c�

2005 by Peng YinAll rights reserved

Page 2: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

DNA BASED SELF-ASSEMBLY AND NANO-DEVICE:THEORY AND PRACTICE

by

Peng Yin

Department of Computer ScienceDuke University

Date:Approved:

Prof. John H. Reif, Supervisor

Prof. Pankaj K. Agarwal

Prof. Alexander J. Hartemink

Prof. Thomas H. LaBean

Prof. Andrew J. Turberfield

Prof. Hao Yan

Dissertation submitted in partial fulfillment of therequirements for the degree of Doctor of Philosophy

in the Department of Computer Sciencein the Graduate School of

Duke University

2005

Page 3: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

ABSTRACT

DNA BASED SELF-ASSEMBLY AND NANO-DEVICE:THEORY AND PRACTICE

by

Peng Yin

Department of Computer ScienceDuke University

Date:Approved:

Prof. John H. Reif, Supervisor

Prof. Pankaj K. Agarwal

Prof. Alexander J. Hartemink

Prof. Thomas H. LaBean

Prof. Andrew J. Turberfield

Prof. Hao Yan

An abstract of a dissertation submitted in partial fulfillment of therequirements for the degree of Doctor of Philosophy

in the Department of Computer Sciencein the Graduate School of

Duke University

2005

Page 4: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

To my loved ones

iv

Page 5: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Abstract

The construction of complex systems at the 1 - 100 nanometer (1 nanometer = ������� meter)

scale is a key challenge in current nanoscience. This challenge can be most effectively ad-

dressed by the “bottom-up” nano-construction methodology based on self-assembly, a pro-

cess in which substructures autonomously associate with each other to form superstructures

driven by the selective affinity of the substructures. DNA, with its immense information

encoding capacity and well defined Watson-Crick complementarity, has recently emerged

as an excellent material for constructing self-assembled nano-structures. In this disserta-

tion, we study four closely related aspects of DNA based self-assembly and nano-devices:

complexity of self-assembly, fault-tolerant self-assembly, DNA robotics devices, and DNA

computing devices.

Complexity of self-assembly. We establish a framework that models assemblies result-

ing from the cooperative effects of repulsion and attraction forces in a general setting of

graphs. By capturing a much wider range of interesting self-assembly phenomena, it ad-

vances previous work that models simple rectangular grid structures formed by only attrac-

tion force. We define an accretive graph assembly model and a self-destructible graph as-

sembly model, and obtain several complexity results including the first PSPACE-complete

result in the study of self-assembly.

Fault tolerant self-assembly. Fault tolerance is essential for building complex synthetic

self-assembled systems at the molecular scale. In the practical context of algorithmic DNA

tiling lattices, we propose an information encoding scheme using overlaid redundant com-

putation, which, for the first time, reduces the error rate from � to � without increasing the

size of the assembled lattice.

DNA robotics devices. A major challenge in nanotechnology is to precisely transport a

nanoscale object from one location on a nano-structure to another location along a desig-

v

Page 6: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

nated path. To address this challenge, we design DNA motors capable of autonomous, uni-

directional, progressive linear motion along self-assembled DNA tracks. The practicality

of the designs is partially supported by the experimental construction of a three-anchorage

autonomous DNA walking device.

DNA computing devices. Building on the designs of the above robotics devices, we

obtain the designs of autonomous DNA mechanical computing devices embedded in DNA

lattices. These devices represent a novel converging point for studies on nano-lattice as-

sembly, nano-robotics, and nano-computing. In particular, we present the designs of an

autonomous universal DNA Turing machine and an autonomous universal DNA cellular

automaton.

vi

Page 7: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Acknowledgements

At the final hours of my Ph.D. years, I must express my deep gratitude to the many people

who have helped me to grow into an independent researcher and in the course of this, to

produce this work.

My advisor, John Reif, has been most invaluable. From the very beginning, John treated

me as a friend and as an intellectual peer. I deeply appreciate his respect and the accom-

panying trust. John never pushed me for immediate results on any particular project, but

instead let me set my own pace and follow my own course in exploring the rich land of

research. Working with John, I felt the need, the joy, and the reward of growing into an

independent thinker. At the same time, John continually shared with me his remarkable

vision of science, discussed with me the exciting, promising new directions to pursue, in-

spired me with numerous novel ideas, and provided me with the most needed and beneficial

resources. The freedom John gave me coupled with his guidance and support has made my

Ph.D. journey both fruitful and full of fun. The best thing John has taught me is not only

how to conduct research but also how to enjoy it. John has a deep passion towards science

and enjoys conducting research in an amazingly diverse array of fields. John’s infectious

passion has influenced and benefited me tremendously.

Besides John, I am fortunate to have a most enviable committee to guide me: Andrew

Turberfield, Hao Yan, Alexander Hartemink, Thom LaBean, and Pankaj Agarwal. Andrew

worked from the other side of Atlantic ocean. Yet the distance did not, by any means,

diminish his remarkable and beneficial influence on my dissertation work. Indeed, the

work on nano-robotics (Chapter 4) and nano-computation (Chapter 5) in this thesis was

initiated by his elegant ideas when he visited John at Duke in the winter of 2002. I worked

with Hao very closely on the experimental aspect of my thesis. Hao showed me the delicate

intricacies of experimental research in DNA-based nanoscience. Working with Hao has

vii

Page 8: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

always been invaluably inspiring and educational. Alex has encouraged me all along the

way. I wrote my first major scientific paper with Alex and benefited greatly from that

experience. In addition, Alex is always open and ready to help me on my various career

concerns. A rigorous researcher and a fun friend, Thom always gives me an unfailing

smile, helpful feedbacks, and warm encouragement. I greatly enjoyed working and talking

with Thom both in and outside the lab. Pankaj provided me with rigorous training in formal

methods through the Computational Complexity course and kindly advised me at several

critical points during my Ph.D. career. I am grateful to the valuable advice from him, which

was always pithy, thoughtful, and to the point. It is a great honor to have these wonderful

scientists on my committee.

I am also fortunate to have studied at Department of Computer Science at Duke, a most

wonderful place where academic rigorousness and excellence is combined with openness

and friendliness. I am grateful to the many faculty members who introduced me to the

fascinating discipline of Computer Science by teaching inspiring classes, including Pankaj

Agarwal, Lars Arge, Owen Astrachan, Jeff Chase, Robert Duvall, Chris Dwyer, Herbert

Edelsbrunner, Carla Ellis, Alex Hartemink, Gershon Kedem, Thom LaBean, Alvy Lebeck,

Kamesh Munagala, Ron Parr, John Reif, Don Rose, Xiaobai Sun, Amin Vahdat, Jeff Vit-

ter, and Jun Yang. I am grateful to my fellow Computer Science students and postdocs

who shared my learning experience and shared great fun on and off duty, especially Yusu

Wang, Haifeng Yu, Sudheer Sahu, Hai Yu, Ke Yi, Tingting Jiang, Nabil Mustafa, Xiuwen

Ouyang, Wenbin Pan, Jeff Phillips, Xiaobo Fan, Danxia Xie, Vijay Natarajan, Sathish

Govindarajan, Zheng Sun, David Cohen-Steiner, Andrew Ban, Hao He, Haoying Li, Junyi

Xie, Vicky Choi, Tong Li, Yujuan Bao, Albert Meixner, Zheng Sun, Andrew Danner, Avik

Bhattacharya, Jaidev Patwardhan, Fareed Zaffar, and Ashish Gehani. I am particularly

grateful to the Theory group for providing an intellectual heaven where ideas freely fo-

ment and flow, in particular to Yusu Wang, Sudheer Sahu, Hai Yu, Nabil Mustafa, and Ke

viii

Page 9: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Yi. I would like also to thank all my lab mates in DNA Nanotech group, including Hao

Yan, Thom LaBean, Sung Ha Park, Xiaoju Daniell, Liping Feng, Hanying Li, Sang Jung

Ahn, Natalie Johnson, and Dage Liu. Without their help, I could have done little. I am

also grateful for stimulating discussions with the Computational Biology group, includ-

ing Alexander Hartemink, Allister Bernard, and Leelavati Narlikar, and with the Nano-

Architecture group, including Alvy Lebeck, Chris Dywer, Jaidev Patwardhan, and Vijeta

Johri.

I would like to express my special gratitude to Yusu Wang, my best friend, and Sud-

heer Sahu, my academic “younger brother”. Yusu has been my best friend and intellectual

companion throughout my years at Duke. We shared many fun memories, stimulated and

witnessed each other’s growth, and deeply influenced each other’s perspectives on research

and on life. Sudheer and I worked together with John as our academic advisor. We worked

on many common projects and spent many sleepless working nights, rejoicing in accom-

plishments and trudging through challenges.

Reaching further into the past, I would like to thank the Medical Center community at

Duke. I spent two wonderful years with them before embarking on Ph.D. study in Com-

puter Science. The solid biomedical foundation I gained during those two years greatly

helped me in my later interdisciplinary research. In particular, I would like to thank my ad-

visor in Molecular Cancer Biology, Robert Abraham, as well as several other faculty mem-

bers, including Xiao-Fan Wang, Christopher Counter, Tso-Pang Yao, and Tobias Meyers.

I would also like to thank the many postdocs and students who helped me, in particu-

lar, Christine C. Hudson, Yanan Fang, Xuan Zhao, Ji Zhang, Xing Shen, Jingwei Meng,

Xuefang Bai, Kang Shen, and Sandy Gibbons.

I also wish to thank all the staff in the Computer Science department for providing

smooth administrative and technical support, in particular Diane Riggs, Jewel Wheeler,

and Jeff Wright.

ix

Page 10: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Lastly and most importantly, I must thank my family: my wife Vicky, my parents, and

my younger sister Yunxia. I am eternally grateful to life for meeting Vicky and for falling

and staying in love with her – this is the most wonderful thing I could ever expect from

this wide world. Her ever fresh love and her unconditional support has accompanied me

through all these years. She makes my life beautiful. My deepest love and gratitude also

goes to my parents and my sister. My parents taught Yunxia and me how to live one’s life

honestly and with one’s full heart. They nurtured in us, ever since our earliest childhood,

the strongest belief that life is beautiful; they continually encouraged and supported us to

make our very best effort to pursue our dreams. Thanks Vicky, thanks Mom, thanks Dad,

thanks Yunxia. This work must be dedicated to you, for it is yet another drop in a beautiful

river powered by your endless love.

x

Page 11: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Contents

Abstract v

Acknowledgements vii

List of Tables xv

List of Figures xvi

1 Introduction 1

1.1 Related Research Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 Mathematical Theory of Self-Assembly . . . . . . . . . . . . . . 3

1.1.2 Self-assembled DNA Structures . . . . . . . . . . . . . . . . . . 5

1.1.3 DNA Nano-devices . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.1.4 DNA Computation . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3 Publication List, Collaborators, and Remarks . . . . . . . . . . . . . . . 14

2 Complexity of Graph Self-Assembly 19

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Accretive Graph Assembly Model . . . . . . . . . . . . . . . . . . . . . 23

2.3 AGAP and PAGAP are � � -complete . . . . . . . . . . . . . . . . . . 26

2.3.1 4-DEGREE AGAP is ��� -complete . . . . . . . . . . . . . . . 26

2.3.2 5-DEGREE PAGAP is ��� -complete . . . . . . . . . . . . . . 29

2.4 #AGAP and SAGAP are ��� -complete . . . . . . . . . . . . . . . . . 32

2.4.1 #AGAP is ��� -complete . . . . . . . . . . . . . . . . . . . . . 32

2.4.2 SAGAP is ��� -complete . . . . . . . . . . . . . . . . . . . . . 35

xi

Page 12: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

2.5 Self-Destructible Graph Assembly Model . . . . . . . . . . . . . . . . . 35

2.6 ������� is ����������� -complete . . . . . . . . . . . . . . . . . . . . . . 41

2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3 Fault Tolerant DNA Assembly 49

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2 Algorithmic Assembly Problems . . . . . . . . . . . . . . . . . . . . . . 51

3.2.1 Algorithmic Assembly in Abstract Tile Assembly Model . . . . . 51

3.2.2 Thermodynamic Error Analysis in Kinetic Tile Assembly Model . 55

3.3 Error-Resilient Assembly Using Two-Way Overlay Redundancy . . . . . 57

3.3.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.3.2 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.4 Error-Resilient Assembly Using Three-way Overlay Redundancy . . . . . 66

3.4.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.4.2 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.5 Computer Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4 Autonomous DNA Walking Devices 78

4.1 Theoretical Designs of Three Autonomous DNA Walking Devices . . . . 79

4.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.1.2 Device I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.1.3 Device II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.1.4 Device III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.2 Experimental Implementation of an Autonomous DNA Walker . . . . . . 113

xii

Page 13: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

4.2.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . 114

4.2.2 Methods and Materials . . . . . . . . . . . . . . . . . . . . . . . 115

4.2.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 117

4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

5 Autonomous DNA Cellular Computing Devices 128

5.1 Designs of Autonomous DNA Turing Machine . . . . . . . . . . . . . . 130

5.1.1 Introduction to Universal Turing Machine . . . . . . . . . . . . . 130

5.1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5.1.3 Structural Overview . . . . . . . . . . . . . . . . . . . . . . . . 132

5.1.4 Operational Overview . . . . . . . . . . . . . . . . . . . . . . . 135

5.1.5 Step-by-step Implementation . . . . . . . . . . . . . . . . . . . . 138

5.1.6 Complete Molecule Sets . . . . . . . . . . . . . . . . . . . . . . 150

5.1.7 Futile Reactions . . . . . . . . . . . . . . . . . . . . . . . . . . 151

5.1.8 Encoding Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

5.1.9 Computer Simulation . . . . . . . . . . . . . . . . . . . . . . . . 156

5.2 Design of Autonomous DNA Celluar Automaton . . . . . . . . . . . . . 156

5.2.1 Introduction to Cellular Automata . . . . . . . . . . . . . . . . . 156

5.2.2 Structural Overview . . . . . . . . . . . . . . . . . . . . . . . . 157

5.2.3 Operational Overview . . . . . . . . . . . . . . . . . . . . . . . 160

5.2.4 Step-by-step Implementation . . . . . . . . . . . . . . . . . . . . 167

5.2.5 Complete Molecule Sets . . . . . . . . . . . . . . . . . . . . . . 172

5.2.6 Futile Reactions . . . . . . . . . . . . . . . . . . . . . . . . . . 172

5.2.7 Computer Simulation . . . . . . . . . . . . . . . . . . . . . . . . 172

xiii

Page 14: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

5.2.8 Two-Dimensional Autonomous DNA Celluar Automaton . . . . . 176

5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

Bibliography 182

Biography 192

xiv

Page 15: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

List of Tables

3.1 Instance of � �"! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.1 Implementation of device I . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.2 Implementation of energy free device . . . . . . . . . . . . . . . . . . . 93

4.3 Implementation of device II . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.4 Implementation of device II . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.5 Implementation of device III . . . . . . . . . . . . . . . . . . . . . . . . 112

5.1 Encoding of symbol-molecule . . . . . . . . . . . . . . . . . . . . . . . 141

5.2 Symbol-molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

5.3 Transition of head-molecule . . . . . . . . . . . . . . . . . . . . . . . . 146

xv

Page 16: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

List of Figures

1.1 Binary counter tiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Branched DNA molecules . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Molecular tweezer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 Example of graph assembly in the accretive model . . . . . . . . . . . . 24

2.2 (a) Clause gadget. (b) AGAP reduction from 3SAT . . . . . . . . . . . . 26

2.3 (a) Identifying graph. (b)PAGAP graph . . . . . . . . . . . . . . . . . . 29

2.4 Bipartite graph and corresponding graph . . . . . . . . . . . . . . . . . . 32

2.5 Example self-destructible graph assembly system . . . . . . . . . . . . . 36

2.6 Sequence of operations that assemble a target graph . . . . . . . . . . . . 37

2.7 Turing machine simulation . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.8 Turing machine simulation . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.9 Cyclic gadget and integrated proof scheme . . . . . . . . . . . . . . . . . 44

3.1 Binary counter and Sierpinsky triangle assemblies . . . . . . . . . . . . . 51

3.2 Assembly with no error corrections . . . . . . . . . . . . . . . . . . . . . 53

3.3 Compact error resilient assembly version I . . . . . . . . . . . . . . . . . 58

3.4 Proof in assembly version I . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.5 Proof in assembly version I . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.6 Proof in assembly version I . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.7 Version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

xvi

Page 17: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

3.8 Proof in assembly version 2 . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.9 Proof in assembly version 2 . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.10 Proof in version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.11 Proof in version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.12 Proof in version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.13 Binary counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3.14 Sierpinsky triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.15 Simulation of error resilient tilings . . . . . . . . . . . . . . . . . . . . . 76

4.1 Introduction to DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.2 Endonuclease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.3 Design and operation of device I . . . . . . . . . . . . . . . . . . . . . . 84

4.4 Implementation of device I . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.5 Real enzymes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.6 Design of device II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.7 Implementation of device II . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.8 Construction of device II . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.9 Real enzymes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.10 Design of device III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.11 Enzymes in device III . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

4.12 Real enzymes in device III . . . . . . . . . . . . . . . . . . . . . . . . . 113

xvii

Page 18: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

4.13 Structural design and operation of the walker . . . . . . . . . . . . . . . 122

4.14 Evidence of unidirectional motion . . . . . . . . . . . . . . . . . . . . . 123

4.15 Control experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.16 DNA strand structure and sequences . . . . . . . . . . . . . . . . . . . . 125

4.17 Time course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

4.18 Test for inter-molecular reactions . . . . . . . . . . . . . . . . . . . . . . 127

5.1 Information encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.2 Autonomous DNA Turing Machine . . . . . . . . . . . . . . . . . . . . 133

5.3 Enzymes used in the construction of Autonomous DNA Turing Machine . 134

5.4 Operation of Autonomous DNA Turing Machine . . . . . . . . . . . . . 136

5.5 Step 1 in the operation of Autonomous DNA Turing Machine . . . . . . . 139

5.6 Step 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

5.7 Step 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

5.8 Complete set of assisting-molecules . . . . . . . . . . . . . . . . . . . . 142

5.9 Step 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

5.10 Step 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

5.11 Step 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

5.12 Step 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

5.13 Step 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

5.14 Overview of operation of Autonomous DNA Turing Machine . . . . . . . 150

xviii

Page 19: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

5.15 head-molecules and symbol-molecules . . . . . . . . . . . . . . . . . . 152

5.16 Futile reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

5.17 Encoding schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

5.18 Cellular automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

5.19 Structure of Autonomous DNA Celluar Automaton . . . . . . . . . . . . 158

5.20 Enzymes in Autonomous DNA Celluar Automaton . . . . . . . . . . . . 158

5.21 Structural change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

5.22 Information flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

5.23 Reaction 0.1 in Autonomous DNA Celluar Automaton . . . . . . . . . . 168

5.24 Reaction 0.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

5.25 Reaction 0.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

5.26 Reaction 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

5.27 Reaction 1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

5.28 Molecule set A, B, C, and I . . . . . . . . . . . . . . . . . . . . . . . . . 174

5.29 Molecule set R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

5.30 Molecule set T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

5.31 Molecule E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

5.32 Futile reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

5.33 Structural overview of 2D Autonomous DNA Celluar Automaton . . . . . 178

5.34 Operational overview of 2D Autonomous DNA Celluar Automaton . . . . 179

xix

Page 20: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Chapter 1

Introduction

The construction of complex systems at the 1 - 100 nanometer (1 nanometer = ������� meter)

scale is a key challenge currently facing science and technology and lies at the core of

the emerging field of nanoscience. How to construct such systems? There are two basic

approaches: the “top-down” approach and the “bottom-up” approach. In the “top-down”

approach, microscopic manipulations are performed on a small number of molecules by

external devices, such as the scanning tip of an atomic force microscope [57]. This ap-

proach is the conventional approach by which human beings construct structures in the

macroscopic world. For example, to build a pyramid, people lay bricks upon bricks. It

is also an approach that people naturally turn to for molecular constructions. In his semi-

nal talk delivered in 1959, Feynman discussed this approach with a sensational scheme of

“one hundred tiny hands” operating on atomic scale objects [28]. Indeed, in the past half

century, remarkable progress has been made in this direction, as exemplified by the diverse

nanoscale patterns and devices fabricated using electron beam lithography [52, 58] and

scanning probe microscopy [31, 89]. Complementary to this “top down” approach, another

(arguably more) powerful approach is the “bottom-up” approach based on the phenomenon

of self-assembly, in which a great number of molecules autonomously self-organize into

complex structures in parallel.

Self-assembly is a process in which substructures autonomously associate with each

other to form superstructures driven by the selective affinity of the substructures. As an

illustrative example, let’s revisit how to build a pyramid and imagine the following self-

assembly based bottom up approach. Suppose each side of each brick is labeled with a

predefined glue such that two bricks can stick to each other if and only if their abutting sides

1

Page 21: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

have the same glue. Given enough distinct types of glues, one can label the bricks with

glues in a way such that the bricks can stick to each other in a unique fashion as they appear

in the final product of the pyramid. Further suppose the bricks can move freely in the air

and collide with each other, then the bricks, once released in the air, can self-assemble

into a pyramid automatically. Now further imagine that you can make a million copies

of each brick and release them all in the air, then you can simultaneously build a million

such pyramids. Or more precisely, a million pyramids emerge by themselves via the self-

assembly of the bricks. This may sound like scientific fantasy in the macroscopic world,

but it actually represents a most powerful methodology for molecular scale constructions,

i.e., the bottom up approach based on self-assembly. A wide range of structures have been

constructed using a diverse set of materials, such as lipids, peptides, and nucleic acids

(DNA/RNA).

Not only a powerful nanoconstruction methodology, self-assembly is also a ubiquitous

and fundamental mechanism that accounts for the organization of many natural structures,

ranging from subatomic scale to cosmo scale, and from simple inorganic crystals to com-

plex living cells. For example, on the cosmo-scale, planets can self-organize into celestial

systems mediated by gravity forces; on the atomic scale, ions can interact with each other

and form crystals. Some of the most fascinating instances of self-assembled structures can

be found in biomolecular systems, as exemplified by the spontaneous organization of the

ribosome from more than 50 individual components.

Though self-assembly has been identified as a fundamental natural phenomenon and

a powerful nanoconstruction approach, its complex nature is still poorly understood and

its immense power inadequately harnessed. In particular, only relatively simple synthetic

structures have been constructed. Further understanding the self-assembly process and

hence fully utilizing its power to build more complex and useful nanoscale systems is thus a

key challenge facing current science and technology. In this thesis, I address this challenge

2

Page 22: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

in the practical context of self-assembled synthetic DNA nanostructures and devices, and

study the following three closely related questions.

# How to self-assemble complex nanoscale systems?

# How to move things?

# How to conduct computation?

I approach these questions both by studying the fundamental mathematical and sci-

entific properties of the self-assembly process, and by designing, simulating, fabricating

practical self-assembled systems using DNA as a nano-construction material.

1.1 Related Research Areas

The work presented in this thesis has its intellectual roots in several interwoven scientific

threads: mathematical theory of self-assembly, experimental construction of DNA nanos-

tructure and device, and DNA computation.

1.1.1 Mathematical Theory of Self-Assembly

Most work in investigating complexity of self-assembly studies the self-assembly of ori-

ented unit squares (tiles) on a 2D plane. One of the most successful models is the Tile

Assembly Model proposed by Rothemund and Winfree [71], which builds on the tiling

model initially proposed by Wang in 1960 [95]. In this model, each of the four sides of a

tile has a glue (also called pads) and each glue has a type and a positive integral strength.

Assembly occurs by the accretion of tiles iteratively to an existing assembly, starting with

a special seed tile. A tile can be “glued” to a position in an existing assembly if the tile

can fit in the position such that each pair of adjacent pads of the tile and the assembly have

3

Page 23: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

S

10

0

a

a

a

a

a

0 1 10

S 0 a

0

00

0 01

1 1

1

11

00

0

0 0

0 0

0 0 0

0

1

1

00 0

1

00

0

Tiles

Pads

0

0

01

1

1

1

Binary Counter10

0

10

1

1

0

0

0

1

0 0

1

0

0

1

11 1

1

0 0

1

0 0

10 0

0

0 1

00 0

0

1

1

1

1

0 0

0

0

0 0

0

0 0 0

00

00

Figure 1.1: The construction of a binary counter using self-assembly of tiles. The pads and thetile set are shown on the left and the assembled binary counter is shown on the right. The pads ofstrength 2 have black borders while the strength 1 pads are border-less. The first row of tiles onthe left are the internal tiles; the second row are the frame tiles. A special frame tile is the seed tile(labeled with $ )

the same glue type and the total strength of the these glues is greater than or equal to the

temperature, a system parameter.

As a concrete example, we describe a binary counter constructed by Winfree [71] in

Figure 1.1. Here, the temperature of the system is set to 2. Two adjacent pads (glues) on

neighboring tiles can be glued to each other if they are of the same type. The assembly

starts with the seed tile % at the lower right corner and proceeds to the left and to the top

by the accreation of individual tiles. First, the reverse L shaped frame, composed of the

frame tiles is assembled. Note that the glue strength between two neighbouring frame tiles

is 2, which is greater than or equal to the temperature, and hence the assembly of the frame

tiles can carry through. Next, the internal tiles are assembled. Since the glue strength of

a pad on an internal tile is 1, the assembly of an internal tile requires cooperative support

from two other already assembled tiles. More specifically, after the assembly of the frame,

the frame tile & and frame tile � immediately neighbouring the seed % tile cooperatively

form a binding site for an internal � tile that has label � on its left side and label � on its

bottom side. And this � tile can attach itself at this site. This in turn produces further

growing sites for � internal tiles on top of and to the left of this just assembled � tile.

4

Page 24: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Thus the growth can go on inductively by the accretion of appropriate individual tiles. It

is straight forward to verify that the accretion of the tiles forms a binary counter with each

row representing a binary number. Though the above example appears simple, it has been

proven that algorithmic assembly of tiles holds universal computing power by simulating

a one dimensional cellular automaton [98].

A major part of theoretical research in the algorithmic self-assembly field is study-

ing the complexity of and algorithms for (uniquely and terminally) producing assemblies

with given properties, such as shape. In [70], Rothemund and Winfree showed that the

construction of ')(*' squares has a program size complexity (i.e., the number of distinct

types of tiles) of +�,-'/.10324'65 . The upper bound is obtained by simulating a binary counter

and the lower bound by analyzing the Kolmogorov complexity of the tiling system [40].

The model was later extended by Adleman et al to include the time complexity of gen-

erating specified assemblies [5]. Later work studies various combinatorial optimization

and complexity problems in the standard tile assembly model as well as some of its vari-

ants [6, 8, 9, 18, 23, 24, 25, 36, 70, 76].

However most of the above complexity theoretic studies of self-assembly have two

limitations: 1) only attraction, while no repulsion, is studied; 2) only assembled structures

of two dimensional square grids are studied. In Chapter 2 of this thesis, I study the com-

plexity of the assemblies resulting from the cooperative effect of repulsion and attraction

in a more general setting of graphs. This allows for the study of a more general class of

self-assembled structures than the previous tiling model.

1.1.2 Self-assembled DNA Structures

DNA, the information carrier for living cells, has recently emerged as an ideal engi-

neering material on the nanoscale [79]. As such a material, DNA possesses the following

5

Page 25: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

appealing properties: 1) its well-defined Watson-Crick complementarity and its immense

information encoding capacity, 2) its minuscule size (diameter of 798:'�; for its double

stranded < form), 3) the stiffness of its double stranded form (persistence length of 7>=3�'?; or �@= full turns) as well as the flexibility of its single stranded form (persistence length

of 7 1 '?; , or A bases), 4) the ease to manipulate it with a rich set of enzymes, and 5)

the topological diversity of DNA constructions achievable by a combinatorial arrangement

of branched DNA molecules. For comprehensive reviews of DNA based nanoscience,

see [34, 66, 79, 90].

Constructions with DNA goes back to 1970s, when researchers started “glue” together

duplex DNA fragments via sticky ends in genetic manipulations. A sticky end is a single

stranded overhang at the end of a duplex DNA fragment. Two duplex DNA fragments with

complementary sticky ends (i.e., sticky ends with reverse complementary arrangements

of A, T, C, and G) can hybridize with each other by forming hydrogen bonds between

corresponding bases (A with T, G with C) of the sticky ends, and thus be “glued” together.

However, since each duplex DNA can have at most two sticky ends, the structure formed

from DNA duplices are rather “dull” – only linear structures or rings are permissible.

A crucial step forward is the construction of branched DNA molecules and the self-

assembly of larger complexes with these branched molecules. Figure 1.2 gives exam-

ples of unbranched (simple duplex), 3-branched, and 4-branched DNA molecules. See-

man’s group are the pioneers in constructing geometrical objects using these branched

molecules [20, 78]. A problem with these early constructions of branched DNA molecules

is that the junctions of the molecules are floppy, and this presents a barrier to forming well-

defined two-dimensional lattices. This problem was solved with the invention of stiffer

branched DNA molecules – e.g. DNA double-crossover (DX) molecules [41] and DNA

triple-crossover (TX) molecules [35]. A DX molecule consists of two parallel coplanar

double helices connected to each other through two crossover points. A TX tile has three

6

Page 26: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

parallel co-planar double-helices linked together through four crossover points.

The sticky end sequences of the above stiff molecules can be properly designed so that

these molecules can associate with each other autonomously and form large lattices. This

process is known as tiling and each molecule is called a DNA tile. For example, lattices

composed of hundreds of thousands of TX tiles and extending up to at least 10 microns

on their long edge have been created using TX tiles. Other experimentally demonstrated

periodic two dimensional lattices include those made from double-crossover (DX) DNA

tiles [102], rhombus tiles [49], and “4x4” tiles [107], triangle tiles [44], and hexagonal

tiles [2]. Aperiodic barcode DNA lattices have also been experimentally constructed [106].

In addition to forming extended lattices, DNA tiles can also form tubes [45, 72].

Self-assembly of DNA tiles can be used to carry out computation, by encoding data

and computational rules in the sticky ends of tiles [99]. Such self-assembly of DNA tiles

is known as algorithmic self-assembly or computational-tilings. Researchers have exper-

imentally demonstrated a one-dimensional algorithmic self-assembly of triple-cross over

DNA molecules (TX tiles), which performs a 4-step cumulative XOR computation [48].

A one-dimensional “string” tiling assembly was also experimentally constructed that com-

putes an XOR table in parallel [105]. Recently, a two dimensional algorithmically self-

assembled DNA crystal was constructed that demonstrate the pattern of Sierpinski trian-

gles [73].

The successful construction of the two-dimensional DNA Sierpinski triangle crystal is

exciting progress, which further confirms the practical feasibility of the various theoretical

constructions proposed by theoreticians as discussed in Sec. 1.1.1. However, at the same

time, it also clearly illustrates a current severe barrier to fully realize the power of algo-

rithmic self-assembly: the self-assembly of algorithmic crystals is error-prone. Indeed, to

build error-resiliency into algorithmic self-assembly of tiles represents one of the key chal-

lenges to both theoreticians and experimentalists in DNA nanoscience. Winfree proposes a

7

Page 27: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 1.2: Unbranched, 3-branched, and 4-branched DNA molecules. DNA strands are drawn asdirected line segments. Figure excerpted from [34]

seminal strategy to decrease tiling self-assembly errors without decreasing the intrinsic er-

ror rate of assembling a single tile. His technique, however, results in a final structure larger

than the original one (four times larger for decreasing error from B to B ! , nine times larger

for decreasing error to BC ) [101]. In Chapter 3, we propose the first compact error-resilient

tiling method that decreases the error rate in the assembled tiling without increasing its

size. Following Winfree’s work, more schemes are proposed that aim at decreasing the er-

ror of the computational tilings by designing new information encoding schemes or novel

molecular mechanisms [18, 19, 29, 76].

1.1.3 DNA Nano-devices

The construction of nanoscale mechanical devices, or nanomotors, is another key challenge

in nanoscience. Nanomotors are common in biological systems. Indeed, biomolecular mo-

tors are compared to the cell’s workhorses, whose proper functioning is indispensable for

the survival of a living cell. Biomolecular motors are usually protein complexes that con-

vert chemical energy stored in ATP, a universal biological energy provide molecule, into di-

rected motion and hence useful work. Biomolecular motors can be classified into two broad

categories, rotary motors (e.g. ATP synthase) and linear motors (e.g. myosin and kinesin).

Active efforts have been conducted in fabricating nanomotors that can demonstrate similar

functions. This includes research on harnessing the power of natural or modified protein

motors and making them work in a synthetic engineered environment [54] and research on

8

Page 28: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

constructing motors by fabricating novel molecules and molecule complexes [11].

DNA nanoscientists have been particularly successful in constructing a wide range

of molecular motors. Experimentally constructed DNA nanomechanical devices include

those that exhibit motions such as open/close [83, 84, 115], extension/contraction [10, 27,

39], and rotation [50, 108], mediated by external environmental changes such as the ad-

dition and removal of DNA “fuel” strands [10, 39, 27, 83, 84, 108, 115] or the change of

ionic composition of the solution [50]. As a concrete example, Figure 1.3 shows a pro-

totype DNA “tweezer” that uses DNA strands as fuels [115]. The device consists of two

duplex DNA fragments (the “arms” of the tweezer) connected by a flexible single strand

DNA “hinge”. A closing DNA strand D hybridizes with the dangling single strand exten-

sions of the two arms of an open tweezer and thus sets the tweezer to a closed configuration.

An end portion of strand D does not pair with the dangling ends of the arms and this non-

pairing portion of D serves as a “toehold” to pull in a removal strand ED . In a process known

as strand displacement, ED will fully hybridize with D and thus release D from the tweezer,

restoring the tweezer to its open configuration. Similar principles are exploited by a series

of later work on DNA nano-mechanical devices [10, 27, 39, 83, 84, 91, 108, 115].

In this thesis, I ask the question how to construct nanomechanical devices capable of

more complex and useful motions. In particular, I aim at constructing a system where an

autonomous motor can precisely transport a nanoparticle from one location on a nanos-

tructure to another location, following a designated programmable path. Systems of such

nature are analogous to the complex molecular “highway” system found in living cells:

actin filaments and microtubules form a complex network, along which motor proteins

autonomously transport cargoes from one location in the cell to another designated loca-

tion. Synthetic systems capable of such autonomous programmable transportation can have

many potential nanotechnology applications, for example, performing complex nanoscale

manufacturing tasks.

9

Page 29: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 1.3: The operation of a molecular tweezer. Excerpted from [115]

The diverse set of self-assembled DNA lattices described in Sec. 1.1.2 can provide

platforms on which embedded DNA nanomechanical devices perform autonomous pro-

grammable transportations. However, the DNA nanodevices described above are not suit-

able for such purpose since 1) they do not move autonomously and 2) they only demon-

strate local conformational changes, not linear progressive motions. Reif proposed various

theoretical designs of autonomous DNA walking and rolling devices, which demonstrate

random bidirectional motion [62]. Turberfield proposed to use DNA hybridization as en-

ergy source for autonomous molecular motors. In addition, the experimental constructions

of non-autonomous DNA biped walker devices [81, 82] and autonomous DNA tweez-

ers [21, 22] have been recently reported.

In Chapter 4 of this thesis, I take the next important step of designing and construct-

ing DNA walkers capable of autonomous, unidirectional, linear progressive motion. By

embedding dangling DNA duplex fragments in self-assembled DNA lattices, we have de-

signed a suite of walking DNA devices capable of autonomous, programmable, unidi-

rectional motions along linear tracks [110]. The practicality of our designs is partially

supported by our experimental construction of a three-anchorage walking device [113].

10

Page 30: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

1.1.4 DNA Computation

Starting with Adleman’s seminal demonstration of using DNA to compute in 1994 [3], the

field of DNA based computing has experienced a flowering growth and leaves us with a

rich legacy.

Adleman uses DNA as computing material to solve the NP-hard Hamiltonian path

problem [3]. The Hamiltonian path problem asks the following question. Given a directed

graph, decide whether there exists a path connecting a designated source vertex and a des-

ignated destination vertex such that the path passes each vertex exactly once. Adleman

represents each vertex with a single strand DNA such that no two vertex strands hybridize

with each other. An edge connecting vertex F to vertex < is also represented as a single

strand DNA that can hybridize with both vertices simultaneously in a fashion such that

vertex F ’s 5’ end is juxtaposed with vertex < ’s 3’ end. In the presence of ligase, vertex

F ’s 5’ end can thus be joined to vertex < ’s 3’ end via a covalent bond. As such, mixing

all the vertex strands and the edge strands in a solution containing ligase, the strands rep-

resenting all the legal paths in the graph will be formed. By fishing out the target strand

that represents the desired Hamiltonian path, if it exists, one can obtain the solution to the

Hamiltonian problem.

Subsequent to Adleman’s work, a rich set of DNA computational schemes have been

proposed and implemented [26, 37, 43, 46, 48, 55, 59, 60, 61, 68, 74, 87, 97]. Of these

DNA computing schemes, two are most relevant to the work described in this thesis: com-

putational tilings [48, 97] as described above in Section 1.1.1 and a prototype enzyme

driven finite state automaton built by Benenson and colleagues [12, 13, 14].

In Benenson’s construction, a duplex DNA encoding the sequence of input symbols is

digested sequentially by an endonuclease in a fashion mimicking the processing of input

data by a finite state automaton. Its hardware consists of a restriction nuclease and a ligase

11

Page 31: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

and its software and input are encoded by double-stranded DNA. By selecting appropriate

software molecules, the automaton can be programmed to execute different programs. The

automaton processes the duplex input molecule via a cascade of restriction, hybridization,

and ligation cycles and produces a detectable output molecule. The output molecule en-

codes the computational result, which is the automaton’s final state. A limitation of this

automaton is that the data are destroyed as the finite state automaton proceeds. Though

this feature does not affect the operation of it as a finite state automaton, it poses a barrier

to further extending this finite state automaton to more powerful computing devices such

as Turing machine.

In Chapter 5 of this thesis, I describe my work in constructing a suite of autonomous

DNA nanomechanical devices capable of parallel universal computation. I call them DNA

cellular computing devices. In particular, I present my designs of an Autonomous DNA

Turing Machine and an Autonomous DNA Celluar Automaton. These DNA cellular com-

puting devices hold universal computing power and perform meaningful mechanical mo-

tion in the process of computing. They thus represent a central converging point for DNA

lattice construction, nanorobotics, and nanocomputing and may have important applica-

tions in nanofabrication, nano-sensors, and nano-actuated electronics.

1.2 Contributions

In this thesis, we study the following closely related aspects of self-assembled biomolec-

ular structures: mathematical theory of self-assembly, fault-tolerant self-assembly, DNA

robotics device, and DNA computing device. The development of a rigorous mathematical

theory builds a solid theoretical foundation for studying self-assembled structures, while

fault tolerance is crucial for constructing large scale nanostructures such as DNA lattices,

which are intrinsically error prone. Successful construction of DNA lattices in turn pro-

vides a platform on which novel nanorobotics devices can be build; these robotics devices

12

Page 32: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

can be further extended to nanomechanical computing devices, which can potentially serve

as central controlling units for complex nanoscale information processing, transporting,

and possibly manufacturing networks.

Mathematical theory of self-assembly. A central question regarding an abstract complex

system is how complex the system is. In the case of self-assembly, such question can

be addressed by computational complexity theory. To study the complexity of diverse

biomolecular systems, in Chapter 2, we establish a framework that models assemblies

resulting from the cooperative effect of repulsion and attraction forces in a general setting

of graphs [65]. By capturing a much wider range of interesting self-assembly phenomena,

this model advances previous work that models simple rectangular grid structures formed

by only attraction forces. In particular, we define two novel assembly models, the accretive

graph assembly model and the self-destructible graph assembly model, and identify one

common fundamental problem in them: the sequential construction of a given graph. We

obtain several complexity results including the first PSPACE-complete result in the study

of self-assembly.

Fault tolerant self-assembly. Fault tolerance, a vital property of natural self-assembled

biological systems, is also a key requirement for building complex synthetic self-assembled

systems since self-assembly at molecular scale is intrinsically error prone. In Chapter 3,

we attack this problem in the practical context of DNA tiling systems and propose a novel

information encoding scheme, which, for the first time, reduces the error rate from � to �3without increasing the size of the assembled structure [64]. In addition to mathematical

and thermodynamic analysis, we analyze the error rate using computer simulation.

DNA robotics devices. Biomolecular systems possess autonomous protein motors for

13

Page 33: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

intra-cellular transportations, e.g. kinensin moving along the microtubule. To construct

a similar mechanism in synthetic biomolecular systems is widely considered as a grand

challenge to current nanoscience. To address this challenge, in Chapter 4, we design a suite

of DNA motors capable of autonomous, unidirectional, progressive linear motion [110].

We further partially implemented one of these motors in a biochemistry lab [113].

DNA computing devices. Building on the designs of the above robotics devices, in Chap-

ter 5, we obtain the designs of autonomous DNA cellular computing devices, which are

DNA mechanical computing devices embedded in DNA lattices. These devices represent

a novel converging point for studies on nano-lattice assembly, nano-robotics, and nano-

computing. In particular, we present the designs of an autonomous universal DNA Turing

machine [111] and an autonomous universal DNA cellular automaton [109]. The correct

operations of devices are proved by theoretical analysis and further validated by computer

simulation.

1.3 Publication List, Collaborators, and Remarks

The work presented in the thesis uses material from the following papers.

# Chapter 2 uses material from

John H. Reif, Sudheer Sahu, Peng Yin, “Complexity of Graph Self-Assembly in

Accretive Systems and Self-Destructible Systems” [65]

# Chapter 3 uses material from

John H. Reif, Sudheer Sahu, Peng Yin, “Compact Error-Resilient Computational

DNA Tiling Assemblies” [64]

# Chapter 4 uses material from

Peng Yin, Andrew J. Turberfield, John H. Reif, “Designs of Autonomous Unidirec-

14

Page 34: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

tional Walking DNA Devices” [110]

and

Peng Yin, Hao Yan, Xiaoju G. Daniell, Andrew J. Turberfield, John H. Reif, “A

Unidirectional DNA Walker Autonomously Along a Linear Track” [113]

# Chapter 5 uses material from

Peng Yin, Andrew J. Turberfield, Sudheer Sahu, John H. Reif, “Design of an Au-

tonomous DNA Nanomechanical Device Capable of Universal Computation and

Universal Translational Motion” [111]

and

Peng Yin, Sudheer Sahu, Andrew J. Turberfield, John H. Reif, “Design of Au-

tonomous DNA Cellular Automata” [109]

Except where noted below, the intellectual ideas, theoretical designs and proofs, exper-

imental implementations, figures, and writing contained in this thesis are due to me. I am

solely responsible for any mistakes herein.

Chapter 2. The concept of accretive systems and self-destructible systems and the

definition of the graph assembly problem therein are due to me. I defined and produced the

proof for AGAP problem; Sudheer Sahu extended my proof to the planar case. John Reif

pointed out the problem of SGAP and the proof direction of using reduction from PER-

MANENT. In discussion with Sudheer, I produced the final version of the proof presented

in the thesis. I defined the DGAP problem and gave the motivating example therein. John

conjectured DGAP is PSPACE complete and pointed the proof direction of simulating a

Turing machine. I identified the reduction from INSPACE problem, constructed the cyclic

gadget, and produced the integrated proof scheme. Sudheer helped in further concretiz-

ing and debugging the DGAP proof and in generating figures therein. The initial drafting

is done in collaboration with Sudheer. I am responsible for writing the final production

15

Page 35: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

version, which appeared in this thesis. John Reif, Yusu Wang, and Hai Yu gave helpful

comments on the exposition of the paper.

Chapter 3. John Reif wrote the original draft of the paper and contributed most of the

intellectual work therein. I rewrote the paper extensively, including redesigning and pro-

ducing all the figures used in the proofs. The � error reduction construction presented in

the paper is due to me. The thermodynamic analysis of error rate was done in collaboration

with Sudheer. Sudheer and I further collaborated in constructing the examples used in the

simulation. Sudheer ran the simulation using XGrow software written by Erik Winfree,

and produced the simulation figure. Sudheer also helped with fine tuning the figures and

proofreading the text of the paper. I am responsible for writing the final production ver-

sion of the paper, which appeared in this thesis. Thomas LaBean and Hao Yan contributed

helpful comments on the exposition of the paper.

Chapter 4. 1) Theoretical Designs. Andrew Turberfield designed and provided an

initial description of Device I and Device II. I abstracted out the designing principles and

designed some variants. The design of Device III is due to me. I drafted the paper and

benefited from discussion with John Reif. Yusu Wang, Alex Hartemink, and Hai Yu gave

helpful comments on the exposition of the paper. 2) Experimental Design and Implemen-

tation. John Reif initiated the project and organized the team. The theoretical design of

device I by Andrew provides the most critical intellectual basis for the subsequent exper-

imental work. Based on Andrew’s theoretical design, I designed the prototype molecular

system, and benefited from critical feedback from Hao Yan and John Reif. Hao pointed

out the direction of using radioactive labeling to monitor the motion of walker. The ex-

periments were designed by me under the supervision of and in collaboration with Hao.

All the experiments presented in the paper were performed by me and supervised by Hao,

who also trained me on acquiring the essential experimental techniques. Xiaoju Daniell

helped with purifying DNA strands and in optimizing some experimental conditions using

16

Page 36: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

unlabeled DNA strands. Liping Feng maintained a very supportive environment in the lab.

Under Hao’s guidance, I produced the figures and wrote the first draft of the paper. Hao,

Andrew, and John contributed critical edits to improve the exposition of the paper. Thom

H. LaBean provided helpful comments on drafting the paper.

Chapter 5. John Reif motivated the study in this chapter: he laid out the vision of con-

structing reusable DNA computers, posed the question of extending DNA robotics devices

into computing devices, and defined the concept of DNA cellular computing devices. In

discussion with John, Andrew Turberfield extended his design of robotics Device I into an

accumulative XOR computing device during his visit at Duke, which provided the initial

inspiration for the design of DNA cellular computing devices presented in this disserta-

tion. The design of the DNA Turing machine and the drafting of the paper are due to me.

A primitive computer simulator was designed and implemented in collaboration with Sud-

heer Sahu, and was used to debug my design (the simulator is described in Sudheer’s thesis

and not included in this thesis). The design of the DNA cellular automata and the drafting

of the paper are due to me. I benefited from discussion with John, especially on designing

the synchronization mechanism. Sudheer again helped in debugging the system with the

computer simulator. Nabil Mustafa gave helpful comments on the exposition of the Turing

machine paper.

This work is supported by NSF ITR Grants EIA-0086015 and CCR-0326157, NSF

QuBIC Grants EIA-0218376 and EIA-0218359, and DARPA/AFSOR Contract F30602-

01-2-0561.

17

Page 37: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Part I:

Self-Assembly

18

Page 38: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Chapter 2

Complexity of Graph Self-Assembly in AccretiveSystems and Self-Destructible Systems

Self-assembly is a process in which small objects autonomously associate with each other

to form larger complexes. It is ubiquitous in biological constructions at the cellular and

molecular scale and has also been identified by nanoscientists as a fundamental method

for building molecular scale structures. Recent years see convergent interest and efforts

in studying self-assembly from mathematicians, computer scientists, physicists, chemists,

and biologists. However most complexity theoretic studies of self-assembly utilize mathe-

matical models with two limitations: 1) only attraction, while no repulsion, is studied; 2)

only assembled structures of two dimensional square grids are studied. In this chapter, we

study the complexity of the assemblies resulting from the cooperative effect of repulsion

and attraction in a more general setting of graphs. This allows for the study of a more gen-

eral class of self-assembled structures than the previous tiling model. We define two novel

assembly models, namely the accretive graph assembly model and the self-destructible

graph assembly model, and identify one fundamental problem in them: the sequential con-

struction of a given graph, referred to as Accretive Graph Assembly Problem (AGAP) and

Self-Destructible Graph Assembly Problem (DGAP), respectively. Our main results are:

(i) AGAP is � � -complete even if the maximum degree of the graph is restricted to 4

or the graph is restricted to be planar with maximum degree 5; (ii) counting the number

of sequential assembly orderings that result in a target graph (#AGAP) is �G� -complete;

and (iii) DGAP is ����������� -complete even if the maximum degree of the graph is re-

stricted to 6 (this is the first �G�6�H�I��� -complete result in self-assembly). We also extend

the accretive graph assembly model to a stochastic model, and prove that determining the

19

Page 39: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

probability of a given assembly in this model is ��� -complete.

2.1 Introduction

Self-assembly is the ubiquitous process in which small objects associate autonomously

with each other to form larger complexes. For example, atoms can self-assemble into

molecules; molecules into crystals; cells into tissues, etc. Recently, self-assembly has also

been explored as a powerful and efficient mechanism for constructing synthetic molecular

scale objects with nano-scale features. This approach is particularly fruitful in DNA based

nanoscience, as exemplified by the diverse set of DNA lattices made from self-assembled

branched DNA molecules (DNA tiles) [35, 49, 102, 106, 107]. Another nanoscale example

is the self-assembly of peptide molecules [17]. Self-assembly is also used for mesoscale

construction, for example, via the use of capillary forces [16, 69] or magnetic forces [1] to

provide attraction and repulsion between meso-scale tiles and other objects.

Building on classical Wang tiling models [67, 95] dating back to 1960s, Rothemund

and Winfree [71] in 2000 proposed an elegant discrete mathematical model for complexity

theoretic studies of self-assembly known as the tile assembly model. In this model, DNA

tiles are treated as oriented unit squares (tiles). Each of the four sides of a tile has a

glue with a positive integral strength. Assembly occurs by accretion of tiles iteratively

to an existing assembly, starting with a distinguished seed tile. A tile can be “glued” to

a position in an existing assembly if the tile can fit in the position such that each pair of

abutting sides of the tile and the assembly have the same glue and the total strength of

the glues is greater than or equal to the temperature, a system parameter. Research in

this field largely focuses on studying the complexity of and algorithms for (uniquely and

terminally) producing assemblies with given properties, such as shape. It has been shown

that the construction of 'J(K' squares has a program size complexity (the minimum number

of distinct types of tiles required) of +�,MLONQP3RLONQP3LONQPSR 5 [5, 71]. The upper bound is obtained by

20

Page 40: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

simulating a binary counter and the lower bound by analyzing the Kolmogorov complexity

of the tiling system [40]. The model was later extended by Adleman et al. to include

the time complexity of generating specified assemblies [5]. Later work studies various

combinatorial optimization and complexity problems in the standard tile assembly model

as well as some of its variants [6, 8, 9, 18, 23, 24].

Though substantial progress has been made in recent years in the study of self-assembly

using the above tile assembly model, which captures many important aspects of self-

assembly in nature and in nano-fabrications, the complexity of some other important as-

pects of self-assembly remains unexplored:

# Only attraction, while no repulsion, is studied. However, repulsive forces often oc-

cur in self-assembly. For example, there is repulsion between hydrophobic and hy-

drophilic tiles [16, 69]; between tiles labeled with magnetic pads of the same polar-

ity [1]; and there is also static electric repulsion in molecular systems, etc.. Indeed,

the study of repulsive forces in the self-assembly system was posed as an open ques-

tion by Adleman and colleagues in [5]. Though there has been previous work on

the kinetics of such systems, e.g. Klavin’s “waterbug” model [33], no complexity

theoretic study has been directed towards such systems.

# Generally only assembled structures of two dimensional square grids are studied. In

contrast, many molecular self-assemblies using DNA and other materials involve the

assembly of more diverse structures in both two and three dimensions. For example,

Seeman’s group constructed self-assembled non-regular graphs using DNA junction

molecules as vertices and duplex DNA molecules as edges [75].

In this chapter, we study the cooperative effect of repulsion and attraction in a graph

setting. This approach allows the study of a more general class of assemblies as described

above.

21

Page 41: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

We distinguish two systems, namely the accretive system and the self-destructible sys-

tem. In an accretive system, an assembled component cannot be removed from the assem-

bly. In contrast, in the self-destructible system, a previously assembled component can

be “actively” removed from the assembly by the repulsive force exerted by another newly

assembled component. In other words, the assembly can (partially) destruct itself. We

define the accretive graph assembly model for the former and the self-destructible graph

assembly model for the latter.

We first define an accretive assembly model and study a fundamental problem in this

model: the sequential construction of a given graph, referred to as Accretive Graph As-

sembly Problem (AGAP). Our main result for this model is that AGAP is � � -complete

even if the maximum degree of vertices in the graph is restricted to 4; the problem remains

��� -complete even for planar graphs (planar AGAPor PAGAP) with maximum degree 5.

We also prove that the problem of counting the number of sequential assembly orderings

that lead to a target graph (#AGAP) is ��� -complete. We further extend the AGAP model

to a stochastic model, and prove that determining the probability of a given assembly in

this model (stochastic AGAP or SAGAP) is ��� -complete.

If we relax the assumption that an assembled component always stays in the assembly,

repulsive force between assembled components can cause self-destruction in the assem-

bly. Self-destruction is a common phenomenon in nature, at least in biological systems.

One renowned example is apoptosis, or programmed cell death [88]. Programmed cell

death can be viewed as a self-destructive behavior exercised by a multi-cellular organism,

in which the organism actively kills a subset of its constituent cells to ensure the normal

development and function of the whole system. It has been shown that abnormalities in

programmed cell death regulation can cause a diverse range of diseases such as cancer

and autoimmunity [88]. It is also conceivable that self-destruction can be exploited in self-

assembly based nano-fabrication: the components that serve to generate intermediate prod-

22

Page 42: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

ucts but are unnecessary or undesirable in the final product should be actively removed. We

provide an illustrative abstract example in Figure 2.5 and Figure 2.6 in Section 2.5.

To the best of our knowledge, our self-destructible graph assembly model is the first

complexity theoretic model that captures and studies the fundamental phenomenon of self-

destruction in self-assembly systems. Our model is different from previous work on re-

versible tiling systems [4, 7, 103]. These previous studies use elegant thermodynamic or

stochastic techniques to investigate the reversible process of tile assembly/disassembly:

an assembled tile has a probability of “falling” off the assembly in a kinetic system. In

contrast, our self-destructible system models the behavior of a self-assembly system that

“actively” destructs part of itself.

To model the self-destructible systems, we define a self-destructible graph assembly

model, and consider the problem of sequentially constructing a given graph, referred to

as the Self-Destructible Graph Assembly Problem (DGAP). We prove that DGAP is

�G�6�H�I��� -complete even if the graph is restricted to have maximum degree 6.

The rest of the chapter is organized as follows. We first define the accretive graph

assembly model and the AGAP problem in Section 2.2. In this model, we first show the

��� -completeness of AGAP and PAGAP (planar AGAP) in Section 2.3 and then show

the �G� -completeness of SAGAP (stochastic AGAP) in Section 2.4. Next, we define the

self-destructible graph assembly model and the DGAP problem in Section 2.5 and show

the ����������� -completeness of DGAP in Section 2.6. We close with a discussion of our

results in Section 2.7.

2.2 Accretive Graph Assembly Model

Let T and U denote the set of natural numbers and the set of integers, respectively. A

graph assembly system is a quadruple VXWXY[Z\W],_^a`Cb�5c`edgfh`Ci:`ej�k , where Z\W],_^a`Cb�5 is

a given graph with vertex set ^ and edge set b , dgfIlm^ is a distinguished seed vertex,

23

Page 43: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

n

o

p q

12

2 1 2r s1

tu 2 2

2 2

v -1 1

(a)

2

p

1

2 1 2

ov

tu or s1

2 2

2 2

v -1 1

n p q

12

2 1 2

tu or s1

2 2

2 2

-1 1

n p q

12

2 1 2

v

tu or s1

2 2

2

-1 1

n p q

12

2 1 2

v

tu or 1

2 2

2 2

-1 1

n p q

12

2 1 2sv

1

tu or s1

2 2

2 2

v -1 1

n p q

12

2 1 2

tu or s1

2 2

2 2

-1

1

n p q

1

1 2

v 22

tu or s1

2 2

2 2

-1 1

n p q

v

2

2 1 2

v

tu or s1

2 2

2 2

-1 1

n p q

12

2 1 2

2

ts

q

ur 1

2 2

2 2

-1 1

n

(b)

Figure 2.1: (a) An example of graph assembly in the accretive model. (b) A step-by-step illustra-tion of the example assembly sequence

24

Page 44: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

iXwxbzy U is a weight function (corresponding to the glue function in the standard tile

assembly model [71]), and j�l{T is the temperature of the system (intuitively temperature

provides a tunable parameter to control the stability of the assembled structure). In contrast

to the canonical tile assembly model in [71], which allows only positive edge weight, we

allow both positive and negative edge weights, with positive (resp. negative) edge weight

modeling the attraction (resp. repulsion) between the two vertices connected by this edge.

We will see that this simple extension makes the assembly problem significantly more

complex.

Roughly speaking, given a graph assembly system V|W}Y[Z�`Cd~fh`Ci:`�j�k , Z is sequentially

constructible if we can attach all its vertices one by one, starting with the seed vertex;

a vertex � can be assembled if the support to it is equal to or greater than the system

temperature j , where support is the sum of the weights of the edges between � and its

assembled neighbors.

Figure 2.1 gives an example. Here the graph is shown in Figure 2.1 (a) and the temper-

ature is set to 2. Figure 2.1 (b) gives a step-by-step illustration of the assembly sequence.

Note that if � gets assembled before � , then the whole graph can get assembled: an exam-

ple assembly ordering can be &I���/���/�������M�������{������� . In contrast, if vertex

� gets assembled before � , the graph cannot be assembled: � can be assembled only if it

gets support from both � and � ; while � cannot get assembled without the support from � .

Formally, given a graph assembly system V W Y[Z�`Cdgf�`Ci�`ej�k , Z is sequentially con-

structible if there exists an ordering of all the vertices in ^ , ����W�,�d3f�W�dS�/��d~����dS!/������ ��d R � ��5 such that �������� �¡£¢¤�Q¥§¦©¨ ªh«~¬�i�,-d ¬ `Cd ª 5�­�j"`��¯®���°�'�±�� , where ²:³a,�d ¬ 5 denotes

the set of vertices adjacent to d ¬ in Z . The ordering �´� is called an assembly ordering

for Z . µ�¶�,�d ¬ 5KW9�����h�· �¡�¢¸�Q¥§¦¹¨ ª�«~¬�iG,�d ¬ `Cd ª 5 is called the support of d ¬ in ordering � . When

the context is clear, we simply use � and µa,�d ¬ 5 to denote assembly ordering and support,

respectively.

25

Page 45: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

We define the accretive graph assembly problem as follows,

Definition 2.2.1 Accretive Graph Assembly Problem (AGAP): Given a graph assem-

bly system V W Y[Z�`Cd3fc`ei:`ej�k in the accretive model, determine whether there exists an

assembly ordering � for Z .

The above model is accretive in the sense that once a vertex is assembled, it cannot be

‘knocked off’ by the subsequent assembly of any other vertex. If we relax this assumption,

we will obtain a self-destructible model, which is described in Section 2.5.

2.3 AGAP and PAGAP are º�» -complete

2.3.1 4-DEGREE AGAP is � � -complete

top vertices

literal vertices

bottom vertex

¼@½�¼�¾¼3¿À�Á¼@½>¼�¾ ¼�½ ¼@½|¼�¾¼3¿Â¼ ¿¼3¿ ¼�¾

(a) (b)

Figure 2.2: (a) A clause gadget. The top vertices and the bottom vertex are colored black; theliteral vertices are white. (b) A graph construction corresponding to an AGAP reduction from3SAT formula Ã1Ä �ÆÅ Ä !�Å Ä �Ç�È ÃeÉÄ �cÅ ÉÄ Å Ä ! Ç�È Ã1Ä �cÅ ÉÄ !·Å Ä hÇ . An edge between two literal vertices isdepicted as a dashed arch and assigned weight -1; all other edges have weight 2

Lemma 2.3.1 AGAP is in ��� .

Proof: Given an assembly ordering of the vertices, sequentially check whether each vertex

can be assembled (with sufficient support). This takes polynomial time. Ê

Recall that the ��� -complete 3SAT problem asks: Given a Boolean formula Ë in con-

junctive normal form with each clause containing 3 literals, determine whether Ë is sat-

isfiable [56]. Also recall that 3SAT remains � � -complete for formulas in which each

26

Page 46: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

variable appears at most three times, and each literal at most twice [56]. We will reduce

this restricted 3SAT to AGAP to prove AGAP is ��� -hard.

Lemma 2.3.2 AGAP is � � -hard.

Proof: Given a 3SAT formula Ë where each variable appears at most three times, and

each literal at most twice, we will construct below an accretive graph assembly system

V W Y[ZÌ`ed3f�`Ci�`ej�k for Ë . We will then show that the satisfiability problem of Ë can be

reduced (in logarithmic space) to the sequential constructibility problem of Z in V .

For each clause in Ë , construct a clause gadget as in Figure 2.2 (a). For each literal,

we construct a literal vertex (colored white in Figure 2.2 (a)). We further add top vertices

(black) above and bottom vertices (black) below the literal vertices as in Figure 2.2 (a).

We next take care of the structure of formula Ë as follows. Connect all the clause gadgets

sequentially via their top vertices as in Figure 2.2 (b); connect two literal vertices if and

only if they correspond to two complement literals. This produces graph Z . Designate the

leftmost top vertex as the seed vertex dgf . We next assign weight ±´� to an edge between two

literal vertices and weight 8 to all the other edges. Finally, set the temperature jÍW�8 . This

completes the construction of VÎWÎY[Z�`Cd fh`Ci:`ej�k . For a concrete example, see Figure 2.2

(b).

The following proposition implies the lemma.

Proposition 2.3.3 There is an assembly ordering � for V if and only if Ë is satisfiable.

ÏFirst we show that if Ë can be satisfied by truth assignment Ð , then we can derive an

assembly ordering � based on Ð .

Stage 1. Starting from the seed vertex, assemble all the top vertices sequentially. This

can be easily done since each top vertex will have support 2, which is greater than or equal

to jÍW�8 , the temperature.

27

Page 47: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Stage 2. Assemble all the literal vertices assigned ÑÓÒÔÕ� . Since two ÑÓÒÔ?� literals cannot

be complement literals, no two literal vertices to be assembled at this stage can have a

negative edge between them. Hence all these ÑÓÒÔÕ� literal vertices will receive a support 2

( ­Öj�W�8 ).

Stage 3. Assemble all the bottom vertices. Note that truth assignment Ð satisfies Ëimplies that every clause in Ë has at least one ÑÓÒSÔÕ� literal. Thus every clause gadget in

Z has at least one literal vertex (a ÑÓÒÔÕ� literal vertex) assembled in stage 2, which in turn

allows us to assemble the bottom vertex in that clause gadget.

Stage 4. Assemble all the remaining literal vertices (the ��&~×-Ø@� literal vertices). Observe

that any remaining literal vertex d has support Ù from its already assembled neighboring

top vertex and bottom vertex and that d can have negative support at most ±Ú8 from its

assembled literal vertex neighbors (recall that each literal vertex can have at most two

literal vertex neighbors since each variable appears at most three times in Ë ). Hence the

total support for d will be at least 8 ( ­Ûj ).ÜSuppose that there exists an assembly ordering � , then we can derive a satisfying truth

assignment Ð for Ë . For each literal vertex, assign its corresponding literal ÑÓÒSÔÕ� if it appears

in � before all of its literal vertex neighbors (this assures no two complement literals are

both assigned ÑÓÒÔÕ� ); otherwise assign it ��&~×-Ø�� .To show that Ð satisfies Ë , we only need to show every clause contains at least one

ÑÓÒÔÕ� literal. For contradiction, suppose there exists a clause gadget F with three ��&~×_Ø�� lit-

eral vertices, where d is the literal vertex assembled first. However, d cannot be assembled:

it has support 8 from the top vertex; no support from the bottom vertex ( d gets assembled

first and hence the bottom vertex in F cannot be assembled before d ); at least ±´� negative

support from one of its literal vertex neighbors ( d is assigned ��&~×-Ø�� ); the total support of dis thus at most � , less than temperature j�W�8 . Contradiction. Hence Ð must satisfy Ë . Ê

28

Page 48: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

The following theorem follows immediately from Lemma 2.3.1 and Lemma 2.3.2.

Theorem 2.3.4 AGAP is � � -complete.

Let Ý -DEGREE AGAP be the AGAP in which the largest degree of any vertex in graph

Z is Ý . Observe that the largest degree of any vertex in the graph construction in the proof

of Lemma 2.3.2 is Ù . Hence we have

Corollary 2.3.5 4-DEGREE AGAP is ��� -complete.

2.3.2 5-DEGREE PAGAP is � � -complete

C

D

E

B

A

¼

ÂÞ

Þ

¼

ß

à á ÂáÂà 4

4

4

22

44

4

44

−10

−10−10

−6

4

A

B

C

D

E

44

4

4

4

22

22

2

2

24

4 2

4 á

ÞÀ ¾

àÂà Âá

À ½

¼

ÂÞ

¼ Àc¿"â�À ÁÀeã

(a) (b)

Figure 2.3: (a) and (b) are respectively an identifying graph and a PAGAP graph constructioncorresponding to the P3SAT formula ä ÈÚå{È/æIÈÚçMÈÚèêé Ã1Ä ÅKëìÅKí Ç È Ã1Ä ÅKë Ç È Ã í¯ÅÚî Ç È ÃeÉëïÅÉî ÇðÈ Ã�Éí)Å ÉÄ Ç . The larger (smaller) white circles represent clauses (literals); black vertices in (b)represent assisting vertices. Note that each clause is adjacent to at most three literals; each literal isadjacent to at most two clauses. The grey loop in (a) is loop ñ ; integers in (b) indicate edge weights

We next study the planar AGAP (PAGAP) problem, where the graph Z in the assem-

bly system V is planar. First, note that the following lemma is trivially true.

Lemma 2.3.6 PAGAP is in ��� .

29

Page 49: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

We show that PAGAP is ��� -hard by a reduction from the � � -hard planar three-

satisfiability problem (P3SAT) [42], defined in the following way. Given a 3SAT formula

Ë , construct its identifying graph Z W ,[^ì`�b:5 as follows: the vertex set ^ is ò×Có�× is a

variable ôìõ�ò�~óÆ� is a clause ô ; the edge set b is òö,-×Q`ì�Æ5�ó·× is a variable in clause �@ô . If Z is

planar, Ë is referred to as a planar 3SAT (P3SAT) formula. P3SAT problem is to decide

the satisfiability of a P3SAT formula Ë .

We use the identifying graph construction in [51], which represents each variable �with two vertices (one for � and one for E� ) connected by an edge. See Figure 2.3 (a) for

an example. We use the following two properties of this construction in our proof: 1)

There exists a loopß

that passes between all pairs of literals without intersecting any edge

between a literal and a clause; 2) Any literal can belong to at most two clauses [51].

Lemma 2.3.7 PAGAP is � � -hard.

Proof: Given an arbitrary P3SAT formula Ë , we first construct an assembly system V÷WY[Z�`CdSf�`Ci:`ej�k . We then show that the satisfiability problem of Ë can be reduced in logarith-

mic space to the sequential constructibility problem of Z in V .

We construct a graph ZøWù,[^ì`�b�5 by modifying the identifying graph of Ë : along the

loopß

, add an assisting vertex d ¬ between every two consecutive pairs of literal vertices

and connect d ¬ with all these four vertices as shown in Figure 2.3 (b). Next, we assign edge

weights. The weight of an edge between a literal and a clause is Ù ; the weight of an edge

between a literal � and its complement E� is ±Kú if neither of them is connected to more than

one clause; it is ±���� if at least one of the literals is connected to two clause vertices. The

weight of an edge connecting an assisting vertex and a literal vertex � is Ù if the weight of

edge ,��û` E��5 is ±´��� and � is connected to only one clause vertex; otherwise it is 8 . Finally,

we select an arbitrary assisting vertex, say dö� , as the seed vertex d f and set the temperature

jÍW�8 . This completes the construction of V .

30

Page 50: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

We next prove the following proposition, which completes the proof of the lemma.

Proposition 2.3.8 If and only if Ë is satisfiable, there is an assembly ordering � .

ÏSuppose there exists a truth assignment Ð that satisfies Ë , we give the following assembly

ordering.

Stage 1. Assemble all the assisting vertices and ÑÓÒÔÕ� literals as follows. Starting

from the seed vertex, following the clockwise direction along loopß

, we assemble al-

ternately ÑÓÒÔÕ� literals (one of � and E� is necessarily ÑÓÒÔÕ� ) and assisting vertices, till we

reach the seed vertex again. For example, a satisfying truth assignment ,��û`CüÕ`�ý"`ei/5{W,¹ÑÓÒSÔÕ� `¤��&~×_Ø�� `þÑÓÒSÔÕ� `¤��&~×-Ø@�@5 in Figure 2.3 (b) will give the assembly ordering dgf�W�d~��� Eü �d!K��ýÌ�Ûd � Ei���dÿK�Ö� .

Stage 2. Assemble all the clauses. Since Ð satisfies Ë , each clause contains at least

one ÑÓÒSÔÕ� literal and hence is now connected to at least one ÑÓÒSÔÕ� literal vertex assembled in

stage 1. Thus all the clause vertices can be assembled now.

Stage 3. Assemble all the ��&~×-Ø@� literals and thus complete the whole graph. Since all

the neighbors of each �?&�×-Ø�� literal have already been assembled, it is easy to verify that

there is enough support for it.ÜSuppose that there exists an assembly ordering � , we derive from � a truth assignment Ðby assigning a literal vertex ��ÑÓÒÔ?� if it appears before E� in � ; assign it �?&�×-Ø�� otherwise.

We claim that Ð satisfies Ë .

For contradiction, assume there is a clause, say F , unsatisfied, with all its literals � , ü ,

and ý assigned ��&~×_Ø�� . This implies that E� (resp. Eü , Eý ) appears before � (resp. ü , ý ) in � .

Assume w.l.o.g. that �Ö�øü���ý in � . Since F is adjacent to only � , ü , and ý , vertex �must appear before F in � . However, by the edge weight assignment, if � appears after its

31

Page 51: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

complement E� , then it can be assembled only after all the clause vertices connected to �are assembled. In particular, we must have clause F appears before � . Contradiction. We

thus conclude that Ð must satisfy Ë . Ê

Putting together Lemma 2.3.6 and Lemma 2.3.7, we have

Theorem 2.3.9 PAGAP is � � -complete.

Corollary 2.3.10 5-DEGREE PAGAP is ��� -complete.

2.4 #AGAP and SAGAP are � » -complete

2.4.1 #AGAP is ��� -complete

� ã� ½

� ½

� ¾

� ¿

� ¿

� ¾���� ¿

� ½� ¾

���� ½

� ½� ¿ � ¾

� ¿ � ¾

� ¿ � ¾� Á� ã� ½

(a) (b)

Figure 2.4: (a) and (b) show an example bipartite graph å and the corresponding graph � used inthe proof of Lemma 2.4.2, respectively. In (b), � ¬ ’s denote connector vertices (colored white); � isthe seed vertex. The weight of an edge connecting two connector vertices (dashed line) is �� ; theweight of any other edge is

We now consider a more general version of AGAP: given an accretive graph assembly

system V W Y[Z�`Cd3f�`Ci�`ej�k and a target vertex set ^���� ^ , determine if there exists an

ordering ���,_^a`h^��_5 of ^ such that ^�� is assembled after we attempt assembling each vertex

d�l ^ sequentially according to �� . Vertex d will be assembled if there is enough support;

otherwise it will not. �� is called an assembly orderingof ^ for ^�� . When the context

is clear, we simply call it assembly ordering for ^�� and denote it by �� . Note that the

assembly ordering �� is an ordering on all the vertices in ^ , but we only care about the

32

Page 52: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

assembly of the target vertex set ^�� : the assembly of vertices in ^��Ú^�� is neither required

nor prohibited. For ^��4W ^ , the general AGAP is then precisely the standard AGAP we

have been studying. The problem of counting the number of assembly orderings for ^�����^under this general AGAP model is referred to as #AGAP.

Lemma 2.4.1 #AGAP is in �G� .

We next show #AGAP is �G� -hard, using a reduction from the �G� -complete problem

PERMANENT, the problem of counting the number of perfect matchings in a bipartite

graph [56].

Lemma 2.4.2 #AGAP is ��� -hard.

Proof: Given a bipartite graph <}W|,���`h^ì`�b�5 with two partitions of vertices � and ^ and

edge set b , where ��W>ò@Ôð�h`������Æ`CÔ R ô , ^>Wmò@d~�h`������Æ`ed R ô , and b>Wmò�S�h`������Æ`�����ô (recall that

by definition of bipartite graph, there is no edge between any two vertices in � and no edge

between any two vertices in ^ ), we construct an assembly system V}W9Y[Z�`Cd~fh`Ci:`�j�k . First,

we derive graph Z by adding vertices and edges to < (see Figure 2.4 for an example):

on each edge �! add a splitting connector vertex �� ; add an edge (dashed line) between

two connector vertices if they share a same neighbor in � ; connect Ô ¬ and Ô ¬#" � for ��W� `������c`C'{±�� . Next, assign weight ±�Ù to an edge between two connector vertices; assign

weight 8 to all the other edges. Finally, designate Ôx� as the seed vertex d3f , and set the

temperature jÍW�8 . The target vertex set ^�� is � õ ^ .

A crucial property of Z is that the assembly of one connector vertex � will make all of

� ’s connector vertex neighbors unassemblable, due to the negative edge connecting � and

its neighbors. Consequently, starting from a vertex Ô{l$� , only one connector vertex and

hence only one dÍl ^ can be assembled. For a concrete example, see Figure 2.4 (b): start-

ing from Ôð� , if we sequentially assemble �@� and d~� , vertex ��� will render �c! unassemblable,

and hence the assembly sequence Ôx�H���h!K�Ûd! is not permissible.

33

Page 53: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

We first show that if there is no perfect matching in < , there is no assembly ordering

for � õ ^ . If there is no perfect matching in < , then there exists %%��^ s.t. óO²�,[%a5@ó�®}óO%Úó(Hall’s theorem), where ²),_%45&�'� is the set of neighboring vertices to the vertices in % in

original graph < . However, as argued above, one vertex in � can lead to the assembly of

at most one vertex in ^ . Thus ó ²),_%45�ó�®\óþ%Úó implies that at least one vertex in % remains

unassembled. Hence, no assembly ordering exists that can assemble all vertices in � õ ^ .

Next, when there exists perfect matching(s) in < , we can show that each perfect match-

ing in < corresponds to a fixed number of assembly orderings for � õ ^ . First note that

the total number of vertices in graph Z is 8S'&(I; (recall that ; is the number of edges in <and hence number of connector vertices in Z ), giving a total Ø W9,_8S')( ;�5+* permutations.

We divide Ø by the following factors to get the number of assembly orderings for � õ ^ .

1. For every matching edge �, between Ô l-� and d�l)^ , we have to follow the strict

order ÔÖ�\�+ ��ød , where �. is the connector vertex on �, . This is ensured by our

construction as argued above. There are altogether ' such matching edges. So we

need to further divide Ø by ,-A/* 5 R .

2. For the ' vertices in � , we have to follow the strict order of assembling the vertices

from left to right, and hence we need to divide Ø by '0* .3. Denote by � ¬ the degree of Ô ¬ in graph < . For the � ¬ connector vertices corresponding

to the � ¬ edges incident on Ô ¬ , the connector vertex corresponding to the matching

edge must be assembled first, and thus, we need to further divide Ø by 1 R¬#2 � � ¬ .Putting together 1), 2), and 3), we have that each perfect matching in < corresponds to

¢ ! R " � ¦43¢ 3 ¦65@¢ R 3 ¦¹¢87 5¥:9 ¿�; ¥§¦ assembly orderings for � õ ^ in Z . This completes the proof. Ê

Lemma 2.4.1 and Lemma 2.4.2 imply

Theorem 2.4.3 #AGAP is ��� -complete.

34

Page 54: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

2.4.2 SAGAP is ��� -complete

An intimately related question to counting the total number of assembly orderings is the

problem to calculate the probability of assembling a target structure in a stochastic set-

ting. We extend the accretive graph self-assembly model to stochastic accretive graph

self-assembly model as follows. Given a graph Z9Wø,[^ì`�b�5 , where óþ^Ió~W�' , starting with

the seed vertex d f , what is the probability that the target vertex set ^��<��^ gets assembled

if anytime any unassembled vertex can be picked with equal probability? This problem is

referred to as stochastic AGAP (SAGAP).

Since any unassembled vertex has equal probability of being selected and the assembly

has to start with the seed vertex, the total number of possible orderings are ,-'I±Û�@5+* . Then

SAGAP asks precisely how many of these ,-'�±9�@5+* orderings are assembly orderings

for the target vertex set ^�� . Thus, #AGAP can be trivially reduced to SAGAP, and the

reduction is obviously a logarithmic space parsimonious reduction. We immediately have

Theorem 2.4.4 SAGAP is ��� -complete.

2.5 Self-Destructible Graph Assembly Model

The assumption in the above accretive model is that once a vertex is assembled, it cannot

be “knocked off” by the assembly of another vertex at a later stage. Next, we relax this

assumption and obtain a more general model: the self-destructible graph assembly model.

In this model, the incorporation of a vertex & that exerts repulsive force on an already

assembled vertex � can make � unstable and hence “knock” � off the assembly. This phe-

nomenon renders the assembly system an interesting dynamic property, namely (partial)

self-destruction.

The self-destructible graph assembly system operates on a slot graph. A slot graph �Z>W,_%ï`�b�5 is a set of “slots” % connected by edges b=��% (�% . Each “slot” Ø�l % is associated

35

Page 55: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Slot Graph

>@?> ¥>BA

>DC>�E >DF

>DG

> Á âH>DI>�J

Vertex Set

{ , }

{ }

{ , }

Association

{ }

{ }{ }{ }

{ }

{ }{ }

>�J >@?>DF> I >�E

>DC>�G > A > ¥

(a) (b) (c)

Edge Weights

>@?>DC> G > A > ¥ >BA > ¥

>�E >DF >@E > F> I

> G>DC

0

2 -22

2

0

2 2

1 3

21

1> I>@?> J 11

1

> J

Target Graph

>@?2

> F>�E>DC>DG >BA > ¥

0

2 2

1 3

21

2

2 0 1> I>�J

(d) (e)

Figure 2.5: An example self-destructible graph assembly system

with a set of vertices ^Í,_Ø@5 . During the assembly process, a slot Ø is either empty or is

occupied by a vertex dÍl ^Í,[Ø@5 . A slot Ø occupied by a vertex d is denoted as Y Ø `Cdök .A self-destructible graph assembly system is defined as V9W}YK�Z>W9,_%ï`�b�5h`h^ì`+L�`Ci�`@Y Ø�f�`Cd3f�k `�j�k ,

where �Z W ,[%ï`�b:5 is a given slot graph with slot set % and edge set b � %|(�% ;

^ W õ f ��MH^¯,[Ø@5 is the set of vertices; the association rule L � %�( ^ is a binary re-

lation between % and ^ , which maps each slot Ø to its associated vertex set ^�,_Ø5 (note that

the sets ^�,[Ø@5 are not necessarily disjoint); for any edge ,[ØON`hØ�PÓ5JlÖb , we define a weight

function i w6^Í,[Ø�N·5�(�^�,_Ø�PÓ5Úy U (here a weight is determined cooperatively by an edge

,_Ø!N@`hØ�PÓ5 and the two vertices occupying Ø,N and Ø!P ); Y Ø�fh`Cd3f�k is a distinguished seed slot Ø@f oc-

cupied by vertex d f ; j l�T is the temperature of the system. The size of a self-destructible

graph assembly system is the bit representation of the system.

A configuration of �Z is a function F wì%>y ^ õ ò�·;RQ£ÑÓü�ô , where empty indicates a

slot being un-occupied. For ease of exposition, a configuration is alternatively referred to

36

Page 56: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

(1) (2) (3)

(4) (5) (6) (7)

(8a) (8b) (9)

> I >�E >�F>DC >@J

> ¥>@A>�G1

2 2

12

2 1 1

> I >�E>�C >�J

> ¥>@A>DG> ?> F

>@?> ¥

> F1

2 2

21

2

2 1 1

> G >BA

S I >�E>DC >�J

2 2

1 3

22

2 1

>DI> C

>@A> G > ¥

> F>@E>@J 0

2 2

1 3

21

2

2 0 1

2

>DI>DC >@?> J

>@A > ¥>DG

>DF>@E

>B?

>�F>DI > E>�J> C

>DG >@A > ¥

1

2 2

1 3

21

2

2 1 1

-2

>DI > E>�C > J> G >@A > ¥

> ?> F

>@?

2 2

2

2

> I > E> C > J

>DF

>@?> ¥>BA> G

> ?2>DI >�E

>DC >�J>DF

> ?> ¥>@A> G

22

> I >�E>DC > J

> F

>@?> ¥> A>DG

2 22

>�I >�E>DC >@J

>DF

> ?> ¥>@A> G

2 2

12

2 1

Figure 2.6: The sequence of operations that assemble the target graph � �

37

Page 57: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

as a graph, denoted as Z . When the context is clear, we simply refer to a slot occupied by

a vertex as a vertex, for readability.

Given the above self-destructible graph assembly system, we aim at assembling a target

graph, i.e. reaching a target configuration, ZT� , starting with the seed vertex Y Øfh`Cd3f�k and

using the following unit assembly operations. In each unit operation, we temporarily attach

a vertex d to the current graph Z and obtain a graph ZVU , and then repeat the following

procedure until no vertex can be removed from the assembly: inspect all the vertices in

current graph ZWU ; find the vertex dXU with the smallest support, i.e. the sum of the weights

of edges between dXU and its assembled neighbors, and break the ties arbitrarily (note that d/Ucan be d ); if the support to dXU is less than j , remove dYU . This procedure ensures that when

a vertex that repulses its assembled neighbors is incorporated in the existing assembly, all

the vertices whose support drops below system temperature will be removed. However,

in the case when a vertex to be attached exerts no repulsive force to its already assembled

neighbors, the above standard unit assembly operation can be simplified as follows: a

vertex can be assembled if the total support it receives from its assembled neighbors is

equal to or greater than the system temperature j – this is exactly the same as the operation

in the accretive graph assembly model.

Figure 2.5 and Figure 2.6 give a concrete example of a self-destructible graph assembly

system V and a sequence of unit assembly operations that assemble a target graph ZR� in

V . Figure 2.5 illustrates the assembly system VXWùY �Z WÌ,_%ï`�b�5h`6^ì`ZL�`?i�`ïY[Ø@fh`CdSfekc`�j�k .Here, slot Ø�N is designated as the distinguished seed slot Øf and temperature j is set to

8 . Figure 2.5 (a) depicts the slot graph �Z W ,_%ï`�b�5 , where %|W òSØ!N@`?Ø�PC`ÕØ � `?Ø ; `?Ø � `?Ø,[ `Ø�\@`xØ�]g`xØ ¬ ô , b W]òö,[Ø�N@`hØ�PÓ5c`�,_Ø!Pe`hØ � 5c`4,_Ø!N`hØ ; 5c`4,_Ø�P�`hØ � 5h`4,[Ø � `hØ![@5c`4,_Ø ; `hØ � 5h`4,[Ø � `�Ø,[@5h`4,[Ø ; `hØ�\c5c`,_Ø � `hØ�]@5h`�,[Ø![ `hØ ¬ 5c`�,_Ø�\@`hØ�]@5c`£,[Ø�]g`hØ ¬ 5�ô . Figure 2.5 (b) gives the vertex set ^9W}ò�h×�&~�ÆÝÕ`e��Ò3�·ü�ô .

Figure 2.5 (c) shows the association rule L : ^¯,[Ø � 5aW>ò�h×�&�� Ý?`e��ÒS�·üÕô ; ^¯,[Ø@5aW>ò�h×�&�� ÝÕô , for

Ø�l*%^�ÚØ � . Figure 2.5 (d) illustrates i . A numerical value indicates the weight of an edge

38

Page 58: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

incident to two occupied slots. The left panel of Figure 2.5 (d) describes the cases when

both slots incident to an edge are occupied by black vertices; the right panel describes the

case when slot Ø � is occupied by a grey vertex but its neighboring slot is occupied by a

black vertex. For example, the weight for edge ,[Ø � `hØ�]@5 , when both Ø � and Ø!] are occupied

with black vertices, is 8 ; when Ø � is occupied with a grey vertex and Ø,] with a black one,

is ± 8 . The negative weight is further indicated by the dashed edge. Figure 2.5 (e) depicts

the target graph (configuration) Z_� , where each the slot in % is occupied by a black vertex,

i.e. F�,[Ø@5aW��h×�&~�ÆÝ for any Ø�l % .

An example sequence of unit assembly operations that sequentially assemble the target

graph ZW� are illustrated step by step in Figure 2.6. We start with Y Ø@f�`��h×�&~�ÆÝ£k , where Ø@fìW�Ø�N .In step (1), a black vertex is put into slot Ø,P and stays there, since the support it receives

from the black vertex occupying slot ØON is 2, which is greater than or equal to the system

temperature j�W÷8 . In step (5), a grey vertex occupies slot Ø � and is attached to existing

assembly. It stays in slot Ø � since it receives a total support of 2 from its neighboring

assembled vertices (support 1 from the black vertex occupying slot Ø`P and support 1 from

the black vertex occupying slot Ø ; ). Step (8) has two stages (8a) and (8b). In step (8a),

a black vertex is temporarily put into slot Ø,] . Now the grey vertex occupying slot Ø � has

the least support among all the vertices in this temporary assembly. Since its support 1 is

less than temperature j Wù8 , the grey vertex in Ø � is removed from the assembly in step

(8b), according to the unit assembly operation rule. Now no vertex can be removed since

all vertices have support greater than or equal to j*W�8 . In step (9), a black vertex is put

into slot Ø � and this completes the assembly of the target graph.

Here we emphasize that in the above example, the grey vertex at slot Ø � serves as

a “stepping stone” for assembling the target graph: its incorporation into the assembly

enables the subsequent assembly of a black vertex at slot Øa[ , which in turn effects the as-

sembly of a black vertex at Ø ¬ . However, at this stage, the grey vertex at slot Ø � becomes

39

Page 59: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

a barrier for the progress of the assembly towards the target configuration – it must be

knocked off the assembly to evacuate slot Ø � for the assembly of a black vertex at Ø � . This

is achieved by the incorporation of a black vertex at slot Ø`] . This is precisely the power of

(partial) self-destruction: the system actively gets rid of the undesirable components to en-

sure the progress of further assembly. Finally, we point out that the grey vertex associated

with Ø � is indispensable for the assembly of the target graph. The reader can verify that

without this grey vertex, the target graph cannot be sequentially constructed.

In the above example, the assembly of black vertex at slot Ø`] “deterministically” and

“irreversibly” knocks off the grey vertex at slot Ø � . However, the self-destructible graph

assembly model can also exhibit interesting non-deterministic, reversible behavior under

the following circumstance: the assembly of component & knocks off component � , while

the immediate re-assembly of component � can in turn knock off the newly assembled

component & . For a concrete example, now assume that in Figure 2.6, the weight for edge

,_Ø!]g`�Ø ¬ 5 (when slots Ø�] and Ø ¬ are occupied by black vertices) is 2 instead of 3. Then at step

(8b) either the black vertex at slot Ø,] or the grey vertex at slot Ø � can be removed, since

both vertices have support ��®}8�W9j and we break ties arbitrarily. For the same reason,

in the case when the grey vertex at Ø � is removed, an immediate reassembly of a grey

vertex at slot Ø � can result in the disassembly of the black vertex at ØO] . In this sense, the

system at this stage becomes “non-deterministic” and “reversible”. This property is used in

the construction of a cyclic gadget, which provides the basis for our �G�6�H�I��� -complete

proof in Section 2.6.

Now we are ready to define the Self-Destructible Graph Assembly Problem (DGAP).

Definition 2.5.1 Self-Destructible Graph Assembly Problem (DGAP): Given a self-

destructible graph assembly system V W Y[Z]W ,_%ï`�b�5h`h^ì`+L�`Ci�`@Y ØfC`CdSfek `�j�k and a target

graph (configuration) Z_� , determine whether there exists a sequence of assembly opera-

tions such that ZW� can be assembled starting from Y Øfc`ed3f�k .40

Page 60: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

2.6 bdc'e�» is »-f�»)ehg$i -complete

Theorem 2.6.1 DGAP is �G�6�H�I��� -complete.

Proof: Recall that the �G�������:� -complete problem IN-PLACE ACCEPTANCE is as

follows: given a deterministic Turing machine (TM for short) � and an input string � ,

does � accept � without leaving the first ó �ìó,(m� symbols of the string [56]? We reduce

IN-PLACE ACCEPTANCE to DGAP using a direct simulation of a deterministic TM

� on � with self-destructible graph assembly in ����������� .

The proof builds on 1) a classical technique for simulating TM using self-assembly

of square tiles [67, 71], which takes exponential space for deciding ����������� -complete

languages; and 2) our new cyclic gadget, which helps the classical TM simulation to reuse

space and thus achieve a �G�������:� simulation. We will first reproduce the classical simu-

lation; next introduce our modification to the classical simulation; then describe our cyclic

gadget; finally integrate the cyclic gadget with the modified TM simulation to obtain a

�G�6�H�I��� simulation and thus conclude the proof.

Classical TM simulation. The classical scheme uses the assembly of vertices on a 2D

square grid to mimic a TM’s transition history [67, 71]. Consecutive configurations of TM

are represented by successive horizontal rows of assembled-vertices.

Given a TM ��,kj�`+lÚ`nm�`poÆ� 5 , where j is a finite set of states, l is a finite set of symbols,

m is the transition function, and o·�´lqj is the initial state, we construct a self-destructible

assembly system V\W Y[Z W ,[%ï`�b:5c`h^ì`+L�`Ci�`@Y Ø@f�`CdSf�kc`ej�k as follows. The slot graph Z�W,_%ï`�b�5 is an infinite 2D square grid; each node of the grid corresponds to a slot دl % . A

vertex d�l ^ is represented as a quadruple d�W}Y[&�`���`��`C�ök , where & , � , � , and � are referred

to as the North, East, South, and West ‘glues’ (see Figure 2.7). Each glue � is associated

with an integral strength ��,¹�?5 . More specifically, we construct the following vertices:

# For each Ø�lrl , construct a symbol vertex Y Ø `tsx`hØ `ts�k , where s is a special symbol

41

Page 61: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

N

SW E

s

s s stransition verticessymbol vertices state vertices termination vertices

s’s’

qs qsqsqsq q q’ q’

qsqs rejectacceptuu u u u u u uvuuFigure 2.7: Vertices used in the basic TM simulation

wlxl .

# For each Yko�`hØ@k lyj�(zl , construct state vertices YCYko�`hØ@k `tsx`hØ `.{ o k and YCY�oö`hØ@k `+| o `hØ `ts�k .# For each transition Ykoö`�Øk�y YkoOU©`hØ�U©` L k (resp. Ykoö`�ØkGy YkoOU1`hØ�U©` R k ), where L (resp.

R) is the head moving direction “Left” (resp. “Right”), construct a transition vertex

Y[Ø�U1`tsx`@Yko�`hØ@k ` |oOU k (resp. Y Ø�U©` {o,U©`�Ykoö`�Økc`ts�k ).# For transition Y�oö`hØ@k)y ACCEPT (resp. REJECT), construct a termination vertex

Y ACCEPT `tsx`@Yko�`hØ@k `ts�k (resp. Y REJECT `tsx`@Y�oö`hØ@k `}s6k ).The glue strength �6,�Yko�`hØ@kC5 is set to 8 ; all other glue strengths are 1. Mapping relation L :

every vertex in ^ can be mapped to every slot in % . We next describe weight function

^ ( ^ ( b>y U . Consider two vertices d��ìW|Y[&�`���`��@`���k and dS!HW|Y[& U `C� U `�� U `�� U k connected

by edge � , if � is horizontal and d�� lies to the East (resp. West) of dS! , the weight function

is ��,_� U `���5 (resp. �6,-��`�� U 5 ); if � is vertical and d~� lies to the North (resp. South) of d3! , the

weight function is ��,_�@`�&~U 5 (resp. ��,-&�`C�+U 5 ); where �6,¹�û`Cü"5�W9��,��?5 (resp. � ) if � W ü (resp.

�-�W�ü ). In other words, the edge weight for two neighboring vertices is the strength of the

abutting glues, if the abutting glues are the same; otherwise it is 0.

It is straightforward to show the assembly of the vertices in ^ on the slot graph ZùW,_%ï`�b�5 simulates the operation of the TM � . Figure 2.8 (a) gives a concrete example to

illustrate the simulation process as in [71]. Here we assume the bottom row in the assembly

in Figure 2.8 (a) is pre-assembled.

Our modified TM simulation. We add two modifications to the classical simulation

and obtain the scheme in Figure 2.8 (b): 1) a set of vertices are added to assemble an

input row (bottom row in the figure) and 2) a dummy column is added to the leftmost of

42

Page 62: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

the assembly. For the construction, see the self-explanatory Figure 2.8 (b). The leftmost

bottom vertex is the seed vertex and a thick line indicates a weight 2 edge. The reason

for adding the dummy column is as follows. The glue strength ��,eYko�`hØ@kC5 is 8 in Figure 2.8

(a); this is necessary to initiate the assembly of a new row and hence a transition to next

configuration. However, due to a subtle technical point explained later (in the part “Inte-

grating cyclic gadget with TM simulation”), we cannot allow weight 2 edge(s) in a column

unless all the edges in this column have weight 2. So we add the leftmost dummy column

of vertices connected by weight 2 edges, and this enables us to set ��,eYko�`hØ@kC5�W�� and thus

avoid weight 2 edge other than those in the dummy column.

The modified scheme simulates a TM on input � with the head initially residing at ØS�and never moving to the left of Ø@� . The assembly proceeds from bottom to top; within each

row, it starts from the leftmost dummy vertex and proceeds to the right (note the difference

in the assembly sequence in Figure 2.8 (a) and (b), as indicated by the thick grey arrows).

symbol vertices

transition vertices

1

1

1

1

state vertices

0

1

1

1

1 1

0 0

0 0

1

0

0 1

1

1

1

1

1

1

0

1

0 10

0

1

1

u u

�/�6���������D�R

���6�/���/�4���L

�/���X���Y�4�D�RC

A1

A0 A0

A0

A1

A1

C1

BBB0

B0

u

u u

B0

AA1

u

uu�u

A0

B0

uu

u

uB

A

B

A

Au

uA1

uCC

uuuuu

u uu

u

A0

B0

u uuu

A

A

B

A

A

u0

1

1

1

0

0

0

1

1

1

1

1

1

1

1

1

1

i=3

i=2

i=1

i=0

j=1 j=2 j=3j=0

CC

B

A

�� u

uuC1

B0

B0

A1

A0

A0

A1

�� ��

uuuuu

uuu

u uu

�.� �n� � � �n��p�u�.�� �

A

B

(a) (b)

Figure 2.8: (a) An example classical simulation of a Turing machine �/Ã����}���B�.�@� � Ç , where � é� ä�� å � æ�� ; � é�.� �+� � ; transition function � is shown in the figure; � � é ä . The top of the left

panel shows two symbol vertices; below are some example transition rules and the correspondingstate vertices and transition vertices. The right panel illustrates the simulation of � on input

�!� �(simulated as the bottom row, which is assumed to be preassembled), according to the transitionrules in the figure; the head’s initial position is on the leftmost vertex. Each transition of � addsa new row. (b) Our modified scheme. The leftmost bottom vertex is the seed vertex. The leftmostcolumn is the dummy column. In both (a) and (b), a thick line indicates a weight 2 edge; a thin lineindicates weight 1; thick grey arrows indicate the assembly sequence

Our cyclic gadget. The above strategy to simulate TM by laying out its configurations

43

Page 63: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

−3

−32

−3 2

2

2

2

2 2

2

2

ou

t

¼��

Þ � á �

¼

áÞ

(a)

−4

11

1

11

1

23−4

2

2−3

3 2

2

3 2

−4 −3

3

3

2

3

11

1

−3

2

2

� �� � �}� �t�

  �  �¡ � ¡��

¢ � ¢ �  �   �¡ � ¡ �¢ � ¢ � ��

£�¤£�¥¤ ¦ �

�@¥� �@¥�� ¥�� ¥ �� ¥��B¥� �B¥�

¦ �¦ ¥�¦ ¥�¦ ¥�¦ ¥�

¦ � ¦ �

� ¥�� � � � � �� �

(b)

Figure 2.9: (a) The construction and operation of our cyclic gadget. The counterclockwise greycycle indicates the desired sequence of events. (b) The integrated scheme. Grey edges have weight2. Unlabeled black edges have weight 1. § f indicates the seed vertex; îÆ� is the seed slot. § Ufindicates a distinguished computational “seed”

44

Page 64: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

one above another can result in a graph with height exponential in the size of the input ( ó �ìó ):the height of the graph is precisely the number of transitions plus one. A crucial observa-

tion is that once row � is assembled, row �x±�� is no longer needed: row � holds sufficient

information for assembling row �Z(�� and hence for the simulation to proceed. Thus, we

can evacuate row �ð±Û� and reuse the space to assemble a future row, say row �¨( 8 . Using

this trick, we can shrink the number of rows from an exponential number to a constant.

The self-destructible graph assembly model can provide us with precisely this power. To

realize this power of evacuating and reusing space, we construct a cyclic gadget, shown

Figure 2.9 (a). The gadget contains three kinds of vertices: the computational vertices ( & ,

� , and � ) that carry out the actual simulation of the Turing machine; the knocking vertices

( � , ü , and ý ) that serve to knock off the computational vertices and thus release the space;

the anchor vertices ( ��U , üYU , and ý`U ) that anchor the knocking vertices. Edge weights are

labeled in the figure.

For ease of exposition, we introduce a little more notation. The event in which a new

vertex � is attached to a pre-assembled vertex & is denoted as & � � ; the event in which &knocks off � is denoted as &T©�� .

We next describe the operation of the cyclic gadget. We require that anchor vertices �ªU ,üXU , and ý`U and computational vertex & are pre-assembled. The anchor vertices and compu-

tational vertices will keep getting assembled and then knocked off in a counterclockwise

fashion. First, � is attached to & (event & � � ). Then � is attached to � (event � � � ). At this

point, � has total support � from � , � U , and & (providing support 8 , 8 , and ±ÚA , respectively);

& has total support ±�� from � and � (providing support 2 and -3, respectively). Since the

temperature is 8 , � will knock off & ( �«© & ). Next, we have � � � followed by � � ü . At this

point, ü has total support � from � and ü U ; � has total support � from � and � . Therefore,

either üy© � or �R© ü can happen, but üx© � is in the desired counterclockwise direction.

Next, we will have cycles of (reversible) events. In summary, the following sequence of

45

Page 65: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

events occur, providing the desired cyclicity:

& � � , � � � , �^©�& ; � � � , � � ü , ü)©{� ; ,-� � & , &H© � , & � ý , ý¬©{� ; & � � , �­©�ü , � � � , �®©{& ; � � � ,�­©�ý , � � ü , ü¬©��c5}¯ ;

The steps in the ,[5 will keep repeating. Note that the steps in the ,[5 are reversible,

which facilitates our reversible simulation of a Turing machine below.

Integrating cyclic gadget with TM simulation. We next integrate the cyclic gadget

with the modified аL simulation in Figure 2.8 (b). In the resulting scheme, we obtain a

reversible simulation of a deterministic TM on a slot graph of constant height, by evacu-

ating old rows and reusing the space: row � is evacuated after the assembly of row �±(�� ,providing space for the assembly of row ��( A .

Figure 2.9 (b) illustrates the integrated scheme. Slot rows F , < , and ² correspond to

rows � W�ASÒ , � W A3Ò�(�� , and ��W�ASÒ�(�8 in Figure 2.8 (b), respectively. Let ó �ìó�W|' . Fis a sequence of slots F]W´³ &~�·`�&"�c`������Æ`�& R " �Dµ ; similarly, < W¶³þ�h��`��Æ�c`������Æ`�� R " ��µ and ² W³þ�c�Æ`����c`������c`�� R " ��µ as in Figure 2.9 (b). Slots &~� , �h� , and �h� are dummy slots (corresponding

to the dummy column in Figure 2.8 (b)). For each & ª , � ª , and � ª , we construct a cyclic

gadget by introducing slots � ª , ü ª , ý ª , ��Uª , üXUª , and ý`Uª .Slot ý U� is designated as the seed slot Øf and one of its associated vertices as the seed

vertex dSf and the temperature is again set to 8 .

The edge weights are shown in the figure. We emphasize that the weight for an edge

between two computational vertices (vertices in F , < , and ² ) Ô and d is set to the glue

strength if Ô and d have the same glue on their abutting sides; otherwise it is 0. This is

consistent with the scheme in Figure 2.8 (b) and helps to ensure the proper operation of

the computational assembly. In contrast, the weight for any other edge is always set to the

value shown in Figure 2.9 (b), regardless of the actual computational vertices present in

the slots in F , < , and ² ; this ensures the proper operation of the cyclic gadget.

There are some subtle technical points regarding edge weight assignment. First, the

46

Page 66: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

weight for the edge connecting vertices d f W|ý·� and d Uf is 2; while the weight for an edge

connecting ý`U� and subsequent vertices other than dYUf that occupy slot &�� is 0. This ensures

the correct operation of the cyclic gadget for the dummy slots. Second, the assembly

of the first row (input row) involves computational vertices with glue strength 2 (rather

than 1) and hence weight 2 edges between neighboring vertices in this row. However

no modification on the edge weight of the edges incident to the knocking vertices and

anchor vertices is required to accommodate this edge weight difference: the initial step

( & � � , � � � , �-© & ) is irreversible and it is straightforward to check that �·©ê& can occur

successfully. Third, except for the edges connecting dummy vertices, no weight 2 edge

exists between the computational vertices after the evacuation of the input row. This is

essential for upper bounding the number of vertices associated with each slot: otherwise,

an exponential number of knocking vertices and anchor vertices would be required.

The assembly proceeds as follows. First, the frame of anchoring vertices (subgraph

with grey edges) will be assembled, starting from the seed vertex at ý~U� . The seed vertex

at ý`U� will pull in a distinguished computational vertex d¸Uf (corresponding to the seed vertex

in Figure 2.8 (b)) at slot &~� , and dXUf subsequently initiates the assembly of the input row

(corresponding to the bottom row in Figure 2.8 (b)). Then the computational vertices will

assemble, simulating the process shown in Figure 2.8 (b). Meanwhile, the cyclic gadget

functions along each layer of & ª , � ª , and � ª (corresponding to column ¹ in Figure 2.8 (b)),

effecting the reusing of space. More specifically, vertices corresponding to those in rows

�KW ASÒ , �KW ASÒ�(�� , and �KW\ASÒ�(�8 in Figure 2.8 (b) will be assembled in F , < , and ²respectively. Similar to the process in Figure 2.9 (a), row �K(m� gets assembled with the

support from row � , and subsequently pulls in knocking vertices, which knock off row �and thus evacuate space for future row �K(�A to assemble. Within a row, the vertices are

knocked off sequentially from left to right, starting with the dummy vertex.

Concluding the proof. We set the target graph ZV� as a complete row of vertices that

47

Page 67: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

contains the ACCEPT termination vertex Y ACCEPT `tsx`@Y Ø `po3k `ts�k . Then ZV� can be assem-

bled if and only if TM � accepts � . We insist Z_� to be a complete row of vertices (oc-

cupying Ø�� , ØS� , ����� , Ø»º ¼,º " � , where Ø l�ò&�`C�@`C�@ô ) to avoid false positives. Note the size of

the slot graph used in the proof is polynomial in the size of the input ó �ìó and hence our

simulation is in �G�6�H�I��� . Ê

Corollary 2.6.2 6-DEGREE DGAP is �G�������:� -complete.

2.7 Conclusion

In this chapter, we define two new models of self-assembly and obtain the following

complexity results: 4-DEGREE AGAP is � � -complete; 5-DEGREE PAGAP is � � -

complete; #AGAP and SAGAP are ��� -complete; 6-DEGREE DGAP is �G�6�H�I��� -

complete. One immediate open problem is to determine the complexity of these problems

with lower degrees. In addition, it would be nice to find approximation algorithms for the

optimization version of the ��� -hard problems. Note AGAP can be solved in polynomial

time if only positive edges are permitted in graph Z , using a greedy heuristic. In contrast,

when negative edges are allowed, for each negative edge �¯W],-d"��`CdS!c5 , we need to decide

the relative order for assembling d�� and d! . Thus Ý negative edges will imply 8 choices,

and we have to find out whether any of these 8 choices can result in the assembly of the

target graph. This is the component that makes the problem hard.

48

Page 68: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Chapter 3

Error Resilient Computational DNA Tilings

The self-assembly process for bottom-up construction of nanostructures is of key impor-

tance to the emerging scientific discipline Nanoscience. However, self-assembly at the

molecular scale is prone to a quite high rate of error. Such high error rate is a major barrier

to large-scale experimental implementation of DNA tiling. The goals of this paper are to

develop theoretical methods for compact error-resilient self-assembly and to analyze these

methods by thermodynamic analysis and computer simulation. Prior work by Winfree

provided an innovative approach to decrease tiling self-assembly errors without decreasing

the intrinsic error rate B of assembling a single tile. However, his technique resulted in a

final structure that is larger than the original one (four times larger for decreasing the error

to B ! , nine times for to B ). In this chapter, we describe various compact error-resilient

tiling methods that do not increase the size of the tiling assembly. These methods apply to

the assembly of Boolean arrays which perform input sensitive computations (among other

computations). Our 2-way (3-way) overlay redundancy construction decreases the error

rate from B to approximately B ! ( B ), without increasing the size of the assembly. As in

Winfree’s constructions, the number of distinct tile types required is also increased in our

error-resilient tiling constructions. These results were further validated using computer

simulation.

3.1 Introduction

Self-assembly is a process in which simple objects associate into large (and complex) struc-

tures. The self-assembly of DNA tiles can be used both as a powerful computational mech-

anism [36, 61, 97, 100, 103] and as a bottom-up nanofabrication technique [79]. Periodic

49

Page 69: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

2D DNA lattices have been successfully constructed with a variety of DNA tiles, for ex-

ample, double-crossover (DX) DNA tiles [102], rhombus tiles [49], triple-crossover (TX)

tiles [35], and 4x4 tiles [107], triangle tiles [44], and hexagonal tiles [2]. Two dimensional

algorithmic self-assembly, in contrast, is comparatively resistant to experimental demon-

stration, partially due to the large number of errors in the assembled structure.

How to decrease such errors? There are primarily two kinds of approaches. The first

one is to decrease the intrinsic error rate B by optimizing the physical environment in which

a fixed tile set assembles [103], by improving the design of the tile itself using new molec-

ular mechanism [18], or by using novel materials. The second approach is to design new

tile sets that can reduce the total number of errors in the final structure even with the same

intrinsic error rate. A seminal work in this direction is the proofreading tile set constructed

by Winfree [101].

One desirable improvement on Winfree’s construction (which results in an assembled

structure with 4x size of the original one) is to make the design more compact. Here

we report construction schemes that achieve performance comparable to Winfree’s tile set

without scaling up the assembled structure. We will describe our work primarily in the

context of self-assembling Sierpinsky triangles and binary counters, but note that the de-

sign principle can be applied to a more general setting. The basic idea of our construction

is to overlay redundant computations and hence force consistency in the scheme (in sim-

ilar spirit as in [101]). The idea of using redundancy to enhance reliability of a system

constructed from unreliable individual components goes back to von Neumann [94].

The rest of the chapter is organized as follows. In Section 3.2, we introduce the algo-

rithmic assembly problem by reviewing Winfree’s abstract Tile Assembly Model (aTAM)

and kinetic Tile Assembly Model (kTAM) [101]. In Section 3.3, we describe a scheme

that decreases the error rate from B to A B ! . In Section 3.4, this scheme is further improved

to �@=3Be using a three-way overlay redundancy technique. Two concrete constructions are

50

Page 70: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

given in Section 3.5 and empirical study with computer simulation of our tile sets is con-

ducted. We conclude with discussions about future work in Section 3.6.

3.2 Algorithmic Assembly Problems

3.2.1 Algorithmic Assembly in Abstract Tile Assembly Model

S

1000

0 0 01

1 1

1

11

00

0

0 0

0 0

0 0 0 0

a

a

a

a

a

0 1 10

S 0 a

0

0

Pads

0

0

1

1

1

0Tiles

Binary Counter

00

0

1

00

1

1

1

0

0

1

1

1

0

0

0 0

00

0

0

0

0

0

0

0

0

0

0

0

1

1

0

1

1

1

0

1

00

1

00

10

0

00

0

1 1

1

1

1

1

0

0

10

00 0

(a)

S

0

0

0

00

0

0

1

11

11

1

1

1 1 1 1

1

1

1

0 0 1 0 1

1 0

1

0 1 10

S 1 1

0

1

0

1

01

0

10

1

0

10

1

1

001

0

11

1

1

0

0

1

1

1

0

1

0

1

0

1 1

1

0

1

1

1

1

1

Pads

1

1

Tiles

1

Sierpinsky Tiling

11

1

1 0

1

01

1

1

0

0

0

11 0

000

0

0 00 00

0

(b)

Figure 3.1: (a) Binary counter tiling assembly. (b) Sierpinsky triangle tiling assembly. In both (a)and (b), the pads and the tile set are shown on the left and the corresponding assembled structuresare shown on the right. The pads of strength 2 have black borders while the strength 1 pads areborder-less. The first row of tiles on the left are four internal tiles (computational tiles); the secondrow are three frame tiles, one of which is a special seed tile (labeled with $ )

The growth process of a tiling assembly is elegantly captured by an abstract Tile As-

sembly Model (aTAM) proposed by Winfree [71], which builds on the tiling model initially

51

Page 71: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

proposed by Wang in 1960 [95]. In this model, each of the four sides of a tile has a glue

(also called pad) and each glue has a type and a positive integral strength. Assembly occurs

by the accretion of tiles iteratively to an existing assembly, starting with a special seed tile.

A tile can be “glued” to a position in an existing assembly if the tile can fit in the position

such that each pair of adjacent pads of the tile and the assembly have the same glue type

and the total strength of the these glues is greater than or equal to the temperature, a system

parameter.

As a concrete example, we describe a binary counter constructed by Winfree [71] in

Figure 3.1 (a). Here, the temperature of the system is set to 2. Two adjacent pads (glues)

on neighboring tiles can be glued to each other if they are of the same type. The assembly

starts with the seed tile % at the lower right corner and proceeds to the left and to the top

by the accreation of individual tiles. First, the reverse L shaped frame, composed of the

frame tiles is assembled. Note that the glue strength between two neighbouring frame tiles

is 2, which is greater than or equal to the temperature, and hence the assembly of the frame

tiles can carry through. Next, the internal tiles are assembled. Since the glue strength of

a pad on an internal tile is 1, the assembly of an internal tile requires cooperative support

from two other already assembled tiles. More specifically, after the assembly of the frame,

the frame tile & and frame tile � immediately neighbouring the seed % tile cooperatively

form a binding site for an internal � tile that has label � on its left side and label � on its

bottom side. And this � tile can attach itself at this site. This in turn produces further

growing sites for � internal tiles on top of and to the left of this just assembled � tile.

Thus the growth can go on inductively by the accretion of appropriate individual tiles. It

is straight forward to verify that the accretion of the tiles forms a binary counter with each

row representing a binary number. As another concrete example, the tile set in Figure 3.1

(b) forms a Sierpinsky triangle [15]. Though the above two example appear simple, it

has been proven that algorithmic assembly of tiles holds universal computing power by

52

Page 72: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

½¾½½¾½½¾½½¾½½¾½½¾½¿¾¿¿¾¿¿¾¿¿¾¿¿¾¿¿¾¿

À¾ÀÀ¾ÀÀ¾ÀÀ¾ÀÀ¾ÀÀ¾ÀÁÁÁÁÁÁ

¾¾Â¾¾Â¾¾ÂþþÃþþÃþþÃľľľÄľľľÄžžžÅžžžÅƾƾƾƾÆƾƾƾƾÆƾƾƾƾÆƾƾƾƾÆƾƾƾƾÆƾƾƾƾÆƾƾƾƾÆƾƾƾƾÆ

ǾǾǾǾÇǾǾǾǾÇǾǾǾǾÇǾǾǾǾÇǾǾǾǾÇǾǾǾǾÇǾǾǾǾÇǾǾǾǾÇȾȾȾÈȾȾȾÈȾȾȾÈȾȾȾÈȾȾȾÈȾȾȾÈȾȾȾÈȾȾȾÈ

ɾɾɾÉɾɾɾÉɾɾɾÉɾɾɾÉɾɾɾÉɾɾɾÉɾɾɾÉɾɾɾÉ

ʾʾʾʾÊʾʾʾʾÊʾʾʾʾÊʾʾʾʾÊʾʾʾʾÊʾʾʾʾÊʾʾʾʾÊ

˾˾˾˾Ë˾˾˾˾Ë˾˾˾˾Ë˾˾˾˾Ë˾˾˾˾Ë˾˾˾˾Ë˾˾˾˾Ë

̾̾̾̾Ì̾̾̾̾Ì̾̾̾̾Ì̾̾̾̾Ì̾̾̾̾Ì̾̾̾̾Ì̾̾̾̾Ì̾̾̾̾Ì

;;;Í;;;Í;;;Í;;;Í;;;Í;;;Í;;;Í;;;ÍξξξξÎξξξξÎξξξξÎξξξξÎξξξξÎξξξξÎξξξξÎ

ϾϾϾÏϾϾϾÏϾϾϾÏϾϾϾÏϾϾϾÏϾϾϾÏϾϾϾÏ

оооÐоооÐоооÐоооÐоооÐоооÐоооÐ

ѾѾѾÑѾѾѾÑѾѾѾÑѾѾѾÑѾѾѾÑѾѾѾÑѾѾѾÑU(i,j) U(i−1,j)

V(i,j)

V(i,j−1)

ÒOÓtÔ qaÕWÖt×ÙØnÚ

ÒOÓpÔ q�×6ب۬Ö�Ú

ÒOÓpÔ q�×6ØnÚV(i+1,j)

V(i,j-1)

V(i,j)

V(i,j+1)

V(i-1,j)

Ò Ó Ô q,ÛTÖt×ÙØnÚ

ÒOÓpÔ q�×6رÕ_Ö@Ú

Figure 3.2: Tile Ü � ÃÞÝ@��ß Ç takes input �/ÃÞݱ-����ß Ç and à�ÃÞÝ@��ß_-� Ç ; determines à�ÃÞÝ@��ß ÇKé � ÃÞݱ����ß ÇZá,â � à�ÃÞÝB��ß­ã� Ç and � ÃÞÝB��ß Çxé � ÃÞÝ�ä����ß Ç�á!â ! à´ÃÞÝ@��ßåã� Ç ; displays à:ÃÞÝ@��ß Çsimulating a one dimensional cellular automaton [98].

Note that each internal tile performs two computations: the right pad and bottom pad

of each pad serve as two input bits; the left pad represents an output bit as the result of

binary æXçXè of the two input bits; the upper pad represents the result of the binary éö�»êoperation of the two input bits (Recall that éö�aê is exclusive �»ê , a binary operator that

outputs bit 1 if the two input bits are different and 0 otherwise) .

By modifying the internal computational tiles and let the left pad represent an output

bit as the result of binary éö�»ê of the two input bits, we obtain a set of tiles that can

self-assemble in a Sierpinsky triangle [15] (Figure 3.1 (b)).

The above two assemblies serve as illustrating examples for the general algorithmic

assembly problem considered in this chapter, the assembly of a Boolean array. A Boolean

array assembly is an ² (hL array, where the elements of each row are indexed over

ò�"`������c`�² ±ø�3ô from right to left and the elements of each column are indexed over

ò�"`������c`+L ±��3ô from bottom to top. The bottom row and right most column both have

some given values. Let ^�,-��`�¹�5 be the value of the � -th (from the right) bit on the ¹ -

53

Page 73: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

th row (from the bottom) displayed at position ,��e`�¹�5 and communicated to the position

,��e`�¹¬(m�@5 . Let ��,-��`�¹�5 be a Boolean value communicated to the position ,-�0(}� `�¹�5 . For

� Wm�3`������ `�²m± � and ¹GWm� `������Æ`nL ± � , we have ^�,-��`�¹�5ìWë��,-��± � `�¹�5û� �Õ�ð^�,-��`�¹�± �@5 and

��,-��`�¹�5aWë��,-�±��3`�¹�56�3�"!x^�,��e`�¹ ±���5 , where � ��� and �3��! are two Boolean functions, each

with two Boolean arguments and one Boolean output. See Figure 3.2 for an illustration.

The binary counter shown in Figure 3.1 (a) is an ² ( 8   Boolean binary array. In

a binary counter, the bottom row has all � s and the ¹ -th row (from the bottom) is the

binary representation of counter value ¹ , for ¹ÛW �"`������ `h8   ±m� . Note that the � -th bit

is � -th from the right – this is in accordance with the usual left to right binary notation

of lowest precision bits to highest precision bits. ^¯,-�e`D¹ö5 represents the value of the � -th(from the right) counter bit on the ¹ -th row (from the bottom), and ��,-��`�¹�5 is the value

of the carry bit from the counter bit at position ,-��`�¹�5 . In the binary counter, we have

^¯,_�"`�¹�5aW�^Í,_�"`�¹´±Ö�@5±é"�aê¯� ; ^¯,-��`�¹�54Wì�G,��û±Û� `�¹�5±éö�aê�^Í,-��`�¹�±Û��5 for � W}� `������Æ`�²÷±Û� ;��,-��`�¹�5�Wí��,-�4±m�3`�¹�5�æYç~è ^�,��e`�¹¯±m��5 . Hence � �Õ� is the éö�aê operation and �3�"! is

the æXç~è operation. The Sierpinsky triangle shown in Figure 3.1 (b) is an ² (�² Boolean

binary array, where the bottom row and right most column all have � s; its � �6� and �3�"!operators are both éö�aê .

To construct a Boolean array assembly, we make each side of each tile, denoted Ðx�,-�e`D¹ö5 ,a binary valued pad. The bottom, right, top, and left pads of tile Ðx�,��e`�¹�5 represent the

values of ^Í,-�e`D¹�±��@5 (as communicated from the tile below Ðð�,-�e`D¹�±��@5 ), ��,�� ±��3`�¹�5 (as

communicated from the tile on its right Ðð��,-�a±�� `�¹�5 ), ^Í,-��`�¹�5 ( as computed by ^�,��e`�¹Ì±�@5û�3���0��,-�Õ± � `�¹�5 ), and ��,-��`�¹�5 (as computed by ^�,��e`�¹K± �@5û� �"!î��,-�Õ± � `�¹�5 ), respectively.

In the practical context of DNA tiling assemblies, a determined value ^Í,��e`�¹�5KWù� can be

displayed by the tile Ð��@,-��`�¹�5 using, for example, an extruding stem loop of single strand

DNA. Note that such assembly requires only Ù tile types in addition to A frame tiles, but

results in rather small scale error-free assemblies (with the actual size contingent on the

54

Page 74: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

probability of single pad mismatch between adjacent tiles).

3.2.2 Thermodynamic Error Analysis in Kinetic Tile Assembly Model

Experimental construction of Boolean array assemblies has shown that such algorithmic

assemblies are error prone. In particular, the experimental construction of Sierpinsky tri-

angles suffers a pad mismatch rate B of �,ï to ���»ï [73]. To analyze the error rate, Winfree

further extended the above aTAM model to a kinetic Tile Assembly Model (kTAM), which

includes rates both for tiles to associate to (forward rate) and to dissociate from (reverse

rate) growing assemblies [101].

Winfree’s kTAM model computes the forward and reverse rates as thermodynamic

parameters. The forward rate is determined solely by the concentration of tiles, but not

the type of the tiles. When the concentration of the tiles is fixed, the absolute forward rate

is given by

Ò![ÚW�Ý~[X³monomer tile µ?W�ÝX[�� � ³�ð F `where ZW� � W}±�.ÞñZ³monomer µ w L is a unitless free energy that measures the monomer, i.e.

tile, concentration in the system.

In contrast, the reverse reaction rate depends inversely exponentially on the number of

base pair bonds that must be broken for the tile to dissociate from the assembly. It is given

by

Ò�ò ¨ P W�Ýaò ¨ PïW�ÝX[�� � P�³ Á J `where Z�f � W�óÌZ waô Ð is unitless free energy corresponding to the dissociation of a single

sticky end, and � is the number of such sticky ends.

It has been shown that when Z_� � is a little smaller than 83Z�f � , the algorithmic self-

assembly under temperature 2 proceeds with optimal error rate. Intuitively, when ZR� ��õ83Z´f � , the assembly occurs near melting temperature of the system. Under such conditions,

55

Page 75: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

the self-assembly can achieve equilibrium, and the probability of observing a particular

assembly ö is given by

Pr ,�öÌ5ìW �÷ � � ³ ¢:ø�¦ with÷ Wúù ø � � �

³ ¢:ø � ¦ `where Z¯,�öÌ5 W '6ZW� � ±ê�[Z�f � is the free energy of the assembly, ' is the number of tiles

in the assembly, � is the number of mismatches in the assembly, and÷

is the partition

function. As such, an ' -assembly with ó�� more mismatches will occur �`û ¬ ³ Á J less likely.

Now let ö ¬ be the collection of assemblies with � mismatches in the assembly and let

Ý ¬ be the number of the distinct types of ö ¬ assemblies. In particular, ö�� is the unique

correct assembly and Ý~�´W÷� . In addition, öÍ� represents the assemblies with exactly one

mismatch. Since there are altogether 8S' bonds in an ' assembly, ��aW�8S' . Then we have

Pr ,�öG�h5 W ³üöG�tµ� R¬#2 � ³ýö ¬ µ (3.1)

W �� R¬#2 � ³üö ¬ µ w ³üöG�tµ (3.2)

W ��þ(êÝö��� � ³ Á J ( Ýg!C� � !Ó³ Á J (êÝ � �� ³ Á J (h�Ù�6�,(êÝ R � � R ³

Á J (3.3)

õ ��þ(ê8S'�� � ³ Á J (3.4)

õ � ± 8S'6� � ³ Á J � (3.5)

On the other hand, since it takes ' error-less steps to assembly FJ� , we have

Pr ,�öG�h5ìW9,e,�� ± BC5 ! 5 R õ � ± 8'6BÆ` (3.6)

where B is pad mismatch rate. Comparing equations 3.5 and 3.6, we have B�W��3� ³ Á J .Under the equilibrium conditions, Winfree further showed that the net growth rate of

the assembly is given by

Ò·��W�Ò![K±�Ò�ò õ'ÿ � � ³ ð � õ'ÿ � � !Ó³ Á � W ÿ B ! `56

Page 76: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

where ÿ is a constant reflecting the small difference between ZT� � and 83Z´f � . Now, based

on the above formula, a straightforward method to reduce the error rate B is to reduce the

growth rate ҷ� . However, since �� depends quadratically on B , a small decrease in error rate

may entail dramatic decrease in the growth rate.

3.3 Error-Resilient Assembly Using Two-Way Overlay Redundancy

Let B be the probability of a single pad mismatch between adjacent assembling DNA tiles,

and assume that the likelihood of a pad mismatch error is independent for distinct pair

of pads as long as they do not involve the binding of the same two tiles. As such, a pad

mismatch rate of B�W =»ï would imply an error-free assembly with an expected size of

only 83� tiles, which is disappointingly small. Thus, a key challenge in experimentally

demonstrating large scale algorithmic assemblies is to construct error-resilient tiles. Win-

free’s construction is an exciting step towards this goal [101]. However, to reduce the error

rate to B ! (resp. Be ), his construction replaces each tile with a group of 8�( 8�W\Ù (resp.

A�(´A�W�� ) tiles and hence increases the size of the tiling assembly by a factor of Ù (resp. � ).

Our construction described below, in contrast, reduces the tiling error rate without scaling

up the size of the final assembly. This would be an attractive feature in the attempt to ob-

tain assemblies with large computational capacity. We call our construction compact error

resilient assemblies and describe them below in detail.

3.3.1 Construction

To achieve the goals stated above, we propose the following error resilient tiling scheme.

Our Error-Resilient Assembly I (using two-way overlay redundancy) uses only � computa-

tional tile types plus the Ù frame tile types. This drops the probability of assembly error to

A B ! , which is �»�¤=»ï for BÚW|=»ï , potentially allowing for error-free assemblies of expected

size in the hundreds of tiles.

57

Page 77: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

������������ ��������������������������������

������������������������

������� � � ����������������������������������

������������������������������

��������������������������������������������������������������������������������������������������������������������������������

����������������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������V(i−1,j)

V(i−1,j−1)V(i,j−1)

V(i,j)

U(i−1,j) U(i−2,j)

V(i,j) V(i−1,j)

¿ ¢¤¬ � � ¨ ªe¦

¿ ¢¸¬1¨¤ª � � ¦

¿ ¢¤¬." � ¨ ªe¦

¿ ¢¤¬1¨ ª/" � ¦

¿ ¢¸¬1¨¤ªe¦

V(i,j+1)

V(i-1,j)V(i,j)

V(i,j-1)

V(i+1,j)

Figure 3.3: Construction of compact error-resilient assembly version I. Each pad has two portions.A portion encoding an input (resp. output) value is indicated with a dark blue (resp. light pink)colored arrow head. The error checking portion is depicted as a checked rectangle. Tile Ü � ÃÞÝ@��ß Çtakes inputs �/ÃÞÝ� a��ß Ç , à´ÃÞÝ�h����ß)h� Ç , and à´ÃÞÝB��ßHh� Ç ; determines à�ÃÞÝ�h����ß Ç�é �/ÃÞÝ� a��ß Ç á,â � à´ÃÞÝZ-����ßW·� Ç , � ÃÞÝZq����ß Ç é � ÃÞÝZ a��ß Ç<á,â ! à´ÃÞÝK·����ßW-� Ç , and à´ÃÞÝB��ß Ç é � ÃÞÝK����ß ÇZá,â � à�ÃÞÝB��ß­ã� Ç ; displays à�ÃÞÝ@��ß Ç

The construction is depicted in Figure 3.3. Tiles in this construction are denoted as

Ðû� tiles (for version 1). Each pad of each tile encodes a pair of bits. The basic idea to

achieve error resiliency is to use two-way overlay redundancy: each tile Ð � ,��e`�¹�5 computes

the outputs for its own position ,��e`�¹�5 and also for its right neighbor’s position ,-� ±�� `�¹�5 ;the redundant computation results obtained by Ðï�Æ,-��`�¹�5 and its right neighbor Ðx� ,-� ±�� `D¹ö5are compared via an additional error checking portion on Ð � ,��e`�¹�5 ’s right pad (which is the

same as Ðû�Æ,-�ï±�� `�¹�5 ’s left pad). Tile Ðx�Æ,��e`�¹�5 ’s right neighbor Ðx�Æ,-� ±�� `�¹�5 is not likely to

bind to Ðx� ,-��`�¹�5 if these pad values are not consistent. Hence if only one of Ð �Æ,-��`�¹�5 and

Ðû� ,-�H±}� `D¹ö5 is in error (incorrectly placed), the kinetics of the assembly may allow the

incorrectly placed tile to be ejected from the assembly.

The four pads of Ð �Æ,��e`�¹�5 are constructed as follows (Figure 3.3).

# The right and left portions of the bottom pad represent the value of ^Í,-�x±�� `�¹:±��@5and ^Í,��e`�¹´±Û�@5 respectively as communicated from the tile Ðï�Æ,-��`�¹�±Û��5 .

58

Page 78: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

# The top portion of the right pad represents the value of �G,���±*8ö`�¹�5 as communicated

from the tile Ðx� ,-� ±�� `�¹�5 . The bottom portion of the right pad represents the value

of ^Í,�� ±�� `D¹ö5 as determined by the tile Ðx�Æ,-�e`D¹ö5 . Note that the value ^¯,-� ±�� `�¹�5 is

also redundantly determined by Ð �Æ,-�6± � `�¹�5 and hence this bottom portion performs

comparison of the two values and is referred to as error checking portion, and labeled

with checked background in Figure 3.3.

# The top and bottom portions of the left pad represent the values of ��,-� ±�� `�¹�5 and

^¯,-��`�¹�5 respectively, as determined by the tile Ð �Æ,-��`�¹�5 . Again, the bottom portion is

the error checking portion.

# The right and left portions of the top pad represent the values of ^�,��H±m� `�¹�5 and

^¯,-��`�¹�5 respectively, as determined by tile Ð � ,-��`�¹�5 .The above tile design allows the values ^�,��x±Û�3`�¹:±Û�@5 and ^�,-��`�¹�±��@5 to be commu-

nicated to tile Ðx�Æ,-��`�¹�5 from the tile Ðx� ,-��`�¹�±��@5 just below Ðx�c,-�e`D¹ö5 . The value �G,��ï± 8ö`�¹�5is communicated to tile Ð �c,-�e`D¹ö5 from its immediate right neighbour Ðï�Æ,���±>� `�¹�5 . These

three values, ^�,���±>� `�¹¯±m��5 , ^�,��e`�¹¯±m��5 , and ��,-�H±�8ö`D¹ö5 , can be viewed as input bits

to tile Ðû�Æ,-��`�¹�5 , and the other portions of the pads as outputs. The values ^¯,-�ì±��3`�¹�5 and

��,-��±Í� `�¹�5 are determined by tile Ð �Æ,��e`�¹�5 from ^�,-��±Í� `�¹ ±Í�@5 and ��,���±�8ö`�¹�5 : ^Í,���±�� `�¹�54W��,-�~±�8ö`D¹ö5û�3���û^¯,-�~± � `�¹H± ��5 and ��,���± �3`�¹�5�Wú�G,��~±�8ö`�¹�5û� �ö! ^¯,-�~± � `�¹�± ��5 . The value

^¯,-�e`D¹ö5 is determined from ^�,-��`�¹x±¯�@5 and ��,-�·±¯� `�¹�5 : ^Í,-��`�¹�5aWì��,-�·±¯� `�¹�5ð�3�Õ�6^Í,��e`�¹û±Í��5 .The determined value ^�,��e`�¹�5ìW}� is displayed by the tile Ðï� ,-��`�¹�5 .

In this construction, each pad encodes two bits. However, since the values of the left

pad, the top pad, and the bottom portion ( ^�,���±)� `�¹�5 ) of the right pad each depend only on

the values of the top portion ( ��,��ö± 8ö`�¹�5 ) of the right pad and the bottom pads, the tile type

depends on only A input binary bits, namely, ^�,��6±Ö� `D¹J±ê�@5 , ^�,-��`�¹´±ê�@5 , and ��,-�ð± 8ö`�¹�5 .Hence only 8S W!� tile types are required. In addition, Ù tiles are required to assemble the

59

Page 79: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

frame, as described in Sect. 3.5.

We emphasize that though a pad has two portions, it should be treated as a whole unit.

A value change in one portion of a pad changes the pad to a completely new pad. If the

pad is implemented as a single strand DNA, this means that the sequence of the single

strand DNA will be a complete new sequence. One potential confusion to be avoided is

mistakenly considering two pads encoding, say � � and �"� , as having the � portions identical

or, in the context of single strand DNA, as having half of the DNA sequences identical. To

emphasize the unity of a pad, we put a box around each pad in Figure 3.3.

3.3.2 Error Analysis

Recall that B is the probability of a single pad mismatch between two adjacent DNA tiles.

We further assume that the likelihood of a pad mismatch error is independent for distinct

pads as long as they do not involve the binding of the same two tiles and that �3��� is the

function éö�aê .

Our intention is that the individual tiling assembly error rate (and hence the propagation

of these errors to further tile assemblies) is substantially decreased, due to cooperative as-

sembly of neighboring tiles, which redundantly compute the ^Í,�±´`·±Ú5 and ��,�±´`·±Ú5 values

at their positions and at their right neighbours.

Without loss of generality, we consider only the cases where the pad binding error

occurs on either the bottom pad or the right pad of a tile Ðï� ,-��`�¹�5 . Otherwise, if the pad

binding error occurs on the left (resp. top) pad of tile Ðï� ,-��`�¹�5 , then use the same below

argument for tile Ðx�Æ,-�p(�� `�¹�5 (resp. Ðx�Æ,��e`�¹�(��@5 ). We define the neighborhood of tile Ð �Æ,-�e`D¹ö5to be the set of � distinct tiles òKÐx�Æ,-�kU1`�¹»U 5/w ó ��U£± � ó?®>8ö`/ó ¹»Uö±$¹6óÕ®>8�ô��´òKÐx�Æ,��e`�¹�5Hô with

coordinates that differ from ,��e`�¹�5 by at most � . A neighborhood tile Ð �Æ,�� U `�¹ U 5 is dependent

on Ðx�Æ,-��`�¹�5 if both its coordinates are equal to or greater than those of Ð � ,��e`�¹�5 ; otherwise

Ðû� ,-� U `�¹ U 5 is independent of Ð �Æ,��e`�¹�5 . Note that a neighborhood tile Ð �Æ,-� U `�¹ U 5 is dependent

60

Page 80: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

on Ðx�Æ,��e`�¹�5 if and only if the values ^Í,-� U `�¹ U 5 and ��,�� U `�¹ U 5 are determined at least partially

from ^�,��e`�¹�5 or �G,��e`�¹�5 . More specifically, the neighborhood tiles dependent on Ð �Æ,-�e`D¹ö5are Ðx�Æ,���(�� `�¹°(��@5 , Ðx�Æ,��¨(�� `�¹�5 , and Ðx�Æ,��e`�¹�(��@5 . The neighborhood tiles independent of

Ðû� ,-��`�¹�5 are Ðx�Æ,��X( �3`�¹ ±��@5 , Ðû�Æ,-�e`D¹�±M�@5 , Ðx�Æ,��£±�� `D¹�( �@5 , Ðû�Æ,-�£±M� `�¹�5 , and Ðx�Æ,-�£±M� `�¹ ±���5 .

Lemma 3.3.1 Suppose that the neighborhood tiles independent of tile Ð �Æ,-�e`D¹ö5 have cor-

rectly computed ^Í,�±´`·±Ú5 and ��,�±´`·±Ú5 . If there is a single pad mismatch between tile

Ðû� ,-��`�¹�5 and another tile just below Ð �Æ,-��`�¹�5 or to its immediate right, then there is at least

one further pad mismatch in the neighborhood of tile Ðï�Æ,-�e`D¹ö5 . Furthermore, given the lo-

cation of the initial mismatch, the location of the further pad mismatch can be determined

among at most three possible pad locations.

Proof: Suppose that a pad binding error occurs on the bottom pad or the right pad of tile

Ðû� ,-��`�¹�5 but no further pad mismatch occurs between two neighborhood tiles which are

independent of Ðx� ,-��`�¹�5 . We now consider a case analysis of possible pad mismatches.

(1) First consider the case where the pad binding error occurs on the bottom pad of tile

Ðû� ,-��`�¹�5 . Recall that the right and left portions of the bottom pad represent the values of

^¯,-�@±I� `�¹ï±I��5 and ^Í,-��`�¹ï±I�@5 respectively as communicated from tile Ðï�Æ,��e`�¹ï±I��5 . Observe

that neighborhood tiles Ð �Æ,��e`�¹�± �@5 , Ðû�Æ,-��± � `�¹�± �@5 , and Ðx�Æ,-��± � `�¹�5 are all independent of

Ðû� ,-��`�¹�5 and so all correctly compute ^�,Ó±�`·±Ú5 and ��,�±´`·± 5 according to the assumption

of the lemma.

(1.1) Consider the case where the pad binding error is due to the incorrect value of the

right portion ^¯,-�£±M� `�¹�±��@5 of the bottom pad of tile Ðï� ,��e`�¹�5 as shown in Figure 3.4. Note

that the left portion ^�,��e`�¹Í±���5 of the bottom pad of tile Ðï� ,��e`�¹�5 may also be incorrect.

In case (i), Ðx� ,-��`�¹�5 has an incorrect value for the �G,��ì±Û8ö`D¹ö5 portion of its right pad and

hence there is a further pad mismatch on the right pad of Ðï�Æ,-�e`D¹ö5 . In case (ii), Ð �Æ,��e`�¹�5 has

a correct value for the �G,��ï± 8ö`�¹�5 portion of its right pad. Since Ð �Æ,��e`�¹�5 uses the formula

61

Page 81: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

"�""�""�""�""�""�"#�##�##�##�##�##�#

$�$�$$�$�$%�%�%%�%�%

&�&�&�&'�'�'�'(�((�((�((�((�((�()�))�))�))�))�))�)

*�**�**�**�**�**�*++++++

,�,�,�,,�,�,�,-�-�--�-�-.�.�.�./�/�/

0000011111

2�22�22�22�22�22�23�33�33�33�33�33�3

44444555556�6�6�67�7�7�78�8�8�88�8�8�89�9�99�9�9

:�:�:�:�::�:�:�:�::�:�:�:�::�:�:�:�::�:�:�:�::�:�:�:�::�:�:�:�:;�;�;�;;�;�;�;;�;�;�;;�;�;�;;�;�;�;;�;�;�;;�;�;�;

<�<�<�<<�<�<�<<�<�<�<<�<�<�<<�<�<�<<�<�<�<<�<�<�<=�=�=�==�=�=�==�=�=�==�=�=�==�=�=�==�=�=�==�=�=�=

>�>�>�>�>>�>�>�>�>>�>�>�>�>>�>�>�>�>>�>�>�>�>>�>�>�>�>>�>�>�>�>?�?�?�?�??�?�?�?�??�?�?�?�??�?�?�?�??�?�?�?�??�?�?�?�??�?�?�?�?

@�@�@�@@�@�@�@@�@�@�@@�@�@�@@�@�@�@@�@�@�@@�@�@�@A�A�A�AA�A�A�AA�A�A�AA�A�A�AA�A�A�AA�A�A�AA�A�A�A

B�B�B�B�BB�B�B�B�BB�B�B�B�BB�B�B�B�BB�B�B�B�BB�B�B�B�BB�B�B�B�BC�C�C�CC�C�C�CC�C�C�CC�C�C�CC�C�C�CC�C�C�CC�C�C�C

D�D�D�D�DD�D�D�D�DD�D�D�D�DD�D�D�D�DD�D�D�D�DD�D�D�D�DD�D�D�D�DE�E�E�E�EE�E�E�E�EE�E�E�E�EE�E�E�E�EE�E�E�E�EE�E�E�E�EE�E�E�E�E

V(i−1,j−1)

V(i−1,j)

Further Mismatch

V(i,j)

Mismatch

V(i,j−1) V(i−1,j−1) V(i−2,j−1)V(i−1,j−1)

U(i−2,j−1)

U(i−1,j) U(i−2,j)

V(i,j) V(i−1,j)

F �HGJILKNMPOF �HGJIRQ � KNMPO

F � GSINKNMUT � OV(i+1,j)

V(i,j-1)

V(i,j)

V(i,j+1)

V(i-1,j)

V(i-1,j-1)

F �HGSIVT � KLMWT � O

F �HGSIVT � KLMPO

F � GJILKNMXQ � O

Figure 3.4: Case 1.1 in the proof of Lemma 3.3.1: error in right portion à�ÃÞݪq����ßW·� Ç of thebottom pad of tile Ü � ÃÞÝ@��ß Ç causes a further mismatch on the right pad of tile Ü � ÃÞÝ@��ß Ç^¯,-�g±{� `D¹ö5ìWì�G,��g±�8ö`�¹�5ð�3���ð^Í,-�g±{� `D¹4±{��5 to compute ^Í,-�g±{� `D¹�5 and �3�Õ� is assumed to

be the éö�»ê function, it will determine an incorrect value for ^�,���± � `�¹�5 , which is distinct

from the correct value of ^¯,-�ì±��3`�¹�5 determined by its (independent) right neighbor tile

Ðû� ,-�û±Ö�3`�¹�5 . This again implies a further pad mismatch on the right pad of tile Ð �c,-�e`D¹ö5 .(1.2) Next consider the case in Figure 3.5 where the pad binding error is due to the

wrong value of the left portion ^¯,-��`�¹ ±��@5 of the bottom pad of tile Ðï�Æ,-��`�¹�5 . However, there

is a correct match in the right portion ^Í,��û±Û� `�¹J±Û�@5 of the bottom pad of tile Ð �c,-�e`D¹ö5 . In

case (i), Ðx� ,-��`�¹�5 has an incorrect value for the top portion ��,-�ð± 8�`�¹�5 of its right pad, then

there will be a mismatch on the right pad of Ð �Æ,-��`�¹�5 . In case (ii), Ðx�Æ,��e`�¹�5 has a correct value

for the top portion ��,-�£±{8ö`�¹�5 of its right pad, then it will further determine a correct value

for �G,��ö± � `D¹ö5 , since ��,-��± � `�¹�5ìWë��,-��± 8ö`�¹�5ð�3�"! ^¯,-��± � `�¹H± �@5 and both �G,��ö± 8ö`D¹ö5 and

^¯,-�~± � `�¹H± ��5 have correct values. Since ^Í,��e`�¹�5aWì��,-�~± � `D¹ö5û�3�Õ�ð^�,-��`�¹�± ��5 , �G,��~± � `�¹�5is correct and ^Í,-�e`D¹�±{�@5 is incorrect, Ð �Æ,��e`�¹�5 will determine an incorrect value for ^Í,-�e`D¹ö5 .

Note that the neighborhood tiles Ð � ,��6±ê�3`�¹/±ê�@5 , Ðx�Æ,��e`�¹/± �@5 , and Ðx�Æ,���(Ö� `�¹/±ê��5 are

independent of Ð �Æ,��e`�¹�5 and so both correctly compute ^�,�±´`·±Ú5 and ��,�±´`·±Ú5 . However,

62

Page 82: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

YYYYYZZZZZ

[�[[�[[�[[�[[�[\�\\�\\�\\�\\�\

]�]�]�]^�^�^�^_�_�_�_`�`�`�`

aaaaabbbbb

c�cc�cc�cc�cc�cddddd

e�e�e�ef�f�f�f

ggggghhhhh

i�i�ii�i�ij�j�jj�j�jk�k�k�kk�k�k�kl�l�ll�l�l

m�mm�mm�mm�mm�mm�mn�nn�nn�nn�nn�nn�n o�o�o�oo�o�o�op�p�p�pp�p�p�p

q�q�q�q�qq�q�q�q�qq�q�q�q�qq�q�q�q�qq�q�q�q�qq�q�q�q�qq�q�q�q�qq�q�q�q�qr�r�r�rr�r�r�rr�r�r�rr�r�r�rr�r�r�rr�r�r�rr�r�r�rr�r�r�r

s�s�s�ss�s�s�ss�s�s�ss�s�s�ss�s�s�ss�s�s�ss�s�s�ss�s�s�st�t�t�tt�t�t�tt�t�t�tt�t�t�tt�t�t�tt�t�t�tt�t�t�tt�t�t�t

u�u�u�uu�u�u�uu�u�u�uu�u�u�uu�u�u�uu�u�u�uu�u�u�uu�u�u�uv�v�v�vv�v�v�vv�v�v�vv�v�v�vv�v�v�vv�v�v�vv�v�v�vv�v�v�v

w�w�w�ww�w�w�ww�w�w�ww�w�w�ww�w�w�ww�w�w�ww�w�w�ww�w�w�wx�x�x�xx�x�x�xx�x�x�xx�x�x�xx�x�x�xx�x�x�xx�x�x�xx�x�x�x

y�y�y�y�yy�y�y�y�yy�y�y�y�yy�y�y�y�yy�y�y�y�yy�y�y�y�yy�y�y�y�yz�z�z�z�zz�z�z�z�zz�z�z�z�zz�z�z�z�zz�z�z�z�zz�z�z�z�zz�z�z�z�z

{�{�{�{�{{�{�{�{�{{�{�{�{�{{�{�{�{�{{�{�{�{�{{�{�{�{�{{�{�{�{�{{�{�{�{�{|�|�|�||�|�|�||�|�|�||�|�|�||�|�|�||�|�|�||�|�|�||�|�|�|V(i,j)

V(i,j−1)V(i−1,j−1)

Case ii b

V(i−1,j)Case i

Further MismatchMismatch

Further Left Pad MismatchCase iia

V(i,j−1)V(i+1,j−1)

V(i,j−1)

Further Mismatch

U(i−2,j)U(i−1,j)

V(i−1,j)V(i,j)

U(i−1,j−1)

V(i+1,j)

V(i,j-1)

V(i,j)

V(i,j+1)

V(i-1,j)

V(i+1,j-1)

F �HGJI}T � KLMPO

F � GJILKNM~Q � O

F �HGJILKNMPOF �HGSI�Q � KNM�O

F � GSI�Q � KNMUT � O F � GSINKLMWT � OFigure 3.5: Case 1.2 in the proof of Lemma 3.3.1: a further mismatch is caused by an error in theà´ÃÞÝB��ß�ä� Ç portion of the bottom pad of tile Ü � ÃÞÝ@��ß ÇÐû� ,-��`�¹�5 ’s immediate left neighbour Ð � ,��K(m� `�¹�5 is dependent both on the incorrect value

communicated by the pad of Ð �Æ,-��`�¹�5 and the correct values communicated by the pad of

Ðû� ,-�Z(>� `D¹�±��@5 . So in case (ii) there must be a further pad mismatch at tile Ðï�Æ,-��(>� `D¹ö5as argued below. In case (iia) there is pad mismatch on the right pad of Ðì�Æ,��¨(�� `�¹�5 either

due to a mismatch on the portion of ��,��4±�� `�¹�5 or on the portion of ^�,��e`�¹�5 . Otherwise,

in case (iib) there is no mismatch on either the ��,-�4±m�3`�¹�5 or the ^Í,-��`�¹�5 portion of the

pad between Ðx�Æ,-��`�¹�5 and Ðx�Æ,���(Ö� `D¹ö5 . This implies that ^Í,��e`�¹�5 is incorrectly computed by

Ðû� ,-��(ê� `�¹�5 (since Ðx�Æ,-��`�¹�5 has incorrectly computed ^�,��e`�¹�5 ), but Ðï�Æ,���(ê�3`�¹�5 has a correct

value of ��,-�"±�� `�¹�5 . However, ^Í,-�e`D¹ö5ìWì�G,��£±��3`�¹�5 �3���x^�,��e`�¹ ±���5 and �3��� is éö�»ê , this

implies that the right portion ^�,��e`�¹ ± �@5 of the bottom pad of Ðï�Æ,-��(Û�3`�¹�5 has an incorrect

value, and hence there is a mismatch between Ðï� ,��¨(�� `�¹�5 and Ðû� ,-�¨(��3`�¹´±Û��5 .(2) Next consider the case where the pad binding error occurs on the right pad of tile

Ðû� ,-��`�¹�5 , but there is no error on the bottom pad of Ðï� ,��e`�¹�5 . We first note that the value

of the top portion �G,��H±�8ö`�¹�5 of the right pad of Ð � ,-��`�¹�5 must have an incorrect value.

Assume the opposite case where ��,�� ± 8ö`D¹ö5 is correct. But the ^¯,-�û±�� `�¹´±Û�@5 portion of

Ðû� ,-��`�¹�5 ’s bottom pad must also have a correct value (no mismatch on the bottom pad), this

63

Page 83: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

��������������������������������������

����������������������������

����������

���������������������������������� ����������

������������������������������������

������������������������������������

������������������������

������������������������ ������������������������������������������������������������������������

��������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������ � � �  � � �  � � �  � � �  � � �  � � �  � � �  � � � 

¡�¡�¡�¡�¡¡�¡�¡�¡�¡¡�¡�¡�¡�¡¡�¡�¡�¡�¡¡�¡�¡�¡�¡¡�¡�¡�¡�¡¡�¡�¡�¡�¡¡�¡�¡�¡�¡¢�¢�¢�¢¢�¢�¢�¢¢�¢�¢�¢¢�¢�¢�¢¢�¢�¢�¢¢�¢�¢�¢¢�¢�¢�¢¢�¢�¢�¢

£�£�£�£�££�£�£�£�££�£�£�£�££�£�£�£�££�£�£�£�££�£�£�£�££�£�£�£�££�£�£�£�£¤�¤�¤�¤�¤¤�¤�¤�¤�¤¤�¤�¤�¤�¤¤�¤�¤�¤�¤¤�¤�¤�¤�¤¤�¤�¤�¤�¤¤�¤�¤�¤�¤¤�¤�¤�¤�¤

¥�¥�¥�¥�¥¥�¥�¥�¥�¥¥�¥�¥�¥�¥¥�¥�¥�¥�¥¥�¥�¥�¥�¥¥�¥�¥�¥�¥¥�¥�¥�¥�¥¦�¦�¦�¦�¦¦�¦�¦�¦�¦¦�¦�¦�¦�¦¦�¦�¦�¦�¦¦�¦�¦�¦�¦¦�¦�¦�¦�¦¦�¦�¦�¦�¦

Case i

Further MismatchCase ii

Further Mismatch

Mismatch

V(i,j) V(i−1,j)

V(i−1,j−1)V(i,j−1)

V(i−1,j) V(i−2,j)

V(i−1,j+1)

V(i−1,j)V(i,j)

U(i−2,j)U(i−1,j)

U(i−2,j+1)

V(i+1,j)

V(i,j-1)

V(i,j)

V(i,j+1) V(i-1,j+1)

V(i-1,j)

F � GSILKNM�O F � GSIVT � KNMPO

F �HGJI}T � KLMXQ � OF � GSILKNM§Q � O

F � GSI�Q � KLMPO

F � GSILKNM¨T � OFigure 3.6: Case 2.1 in the proof of Lemma 3.3.1: a further mismatch is caused by an error in the� ÃÞݨ^ a��ß Ç portion of the right pad of tile Ü � ÃÞÝ@��ß Çresults in a further correct value for the ^�,-�ì±�� `�¹�5 portion of Ðï�Æ,-��`�¹�5 ’s right pad. Thus

both ��,-�û± 8ö`�¹�5 and ^�,��x±Û�3`�¹�5 portions of Ð �Æ,-��`�¹�5 ’s right pad are correct and there must

be no mismatch on the right pad. A contradiction. Therefore, ��,��4±�8�`�¹�5 must have an

incorrect value, and hence we only need to consider this case.

(2.1) Now consider the case where the pad binding error is due to the incorrect value

of the top portion ��,-� ±Ö8ö`�¹�5 of the right pad of tile Ðï�Æ,��e`�¹�5 as shown in Figure 3.6. We

note that Ðx�Æ,-��`�¹�5 will compute an incorrect value for the right portion ^�,-�ì±��3`�¹�5 of its

top pad, according to the formula ^�,-�a±��3`�¹�5�W �G,���±�8�`�¹�5��3����^�,��ì±�� `�¹¯±���5 . Note

that Ðx�Æ,��e`�¹_(��@5 is dependent on Ðx�Æ,-�e`D¹ö5 . In case (i), tile Ðx�Æ,��e`�¹_(��@5 has a correct value

of ^¯,-�a±�� `�¹�5 . There must be a pad mismatch on ^Í,��4±>� `D¹ö5 between Ðï�Æ,-��`�¹R(m�@5 and

Ðû� ,-��`�¹�5 , since the value of ^�,-�û±�� `�¹�5 determined by Ð � ,-��`�¹�5 is incorrect. In case (ii), tile

Ðû� ,-��`�¹þ( �@5 has an incorrect value of ^Í,-�£±M� `�¹�5 , using similar argument as in case 1.1, we

can show that there must be a pad mismatch on the ��,-�£±{8ö`�¹&( �@5 portion of Ð � ,��e`�¹þ( �@5 ’sright pad.

Hence we conclude that in each case, there is a further pad mismatch between a pair

64

Page 84: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

of adjacent tiles in the neighborhood of tile Ð �Æ,-�e`D¹ö5 . Furthermore, we have shown in each

case that given the location of the initial mismatch, the location of the further pad mismatch

can be determined among at most three possible pad locations. Ê

Using the analytical methodology described in Sect. 3.2.2, we next calculate the error

rate Bc� in our two-way overlay construction. The key observation here is that the number

of assemblies with one mismatch is ݣ��W�� . In addition, since one pad mismatch is linked

with one of three possible further mismatches, we have Ý�!�W�8S'ª©�A:W�ú3' . This gives us

Pr ,�öG�h5 W ��þ(êÝö��� � ³ Á J ( Ýg!C� � !Ó³ Á J (êÝ � �� ³ Á J (h�Ù�6�,(êÝ R � � R ³

Á J (3.7)

õ ��þ(êÝ !�� � !Ó³ Á J (3.8)

W ��þ( ú3'�� � !Ó³ Á J (3.9)

õ � ± ú3'6� � !Ó³ Á J ` (3.10)

where öG� is the unique error-less assembly.

Again, we also have

Pr ,�öG�h5aW|,�,�� ± Bc�e5 ! 5 R õ � ± 8'6Bc�.� (3.11)

Putting together equations 3.10 and 3.11, we have Bc� WmA �S� !Ó³ Á J WmA B ! . Thus we have

shown,

Theorem 3.3.2 The error rate B� for assemblies constructed from version 1 error resilient

tiles is A B ! , where B is the error rate for the corresponding assembly system with no error

correction.

Note that the growth rate Ò3� õvÿ � � ³ ð F õ ÿ � � !Ó³ Á J W « Bc� . Hence the growth rate

depends linearly on the error rate. Recall that, in contrast, in the system with no error

65

Page 85: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

correction, the growth rate Ò@� õ ÿ B ! . As such, compared with the system with no error

correction, decreasing error rate in our version 1 error resilient system results in a much

less decrease in the speed of assembly.

3.4 Error-Resilient Assembly Using Three-way Overlay Redundancy

¬¬¬­­­

®�®®�®®�®¯�¯¯�¯¯�¯

°°°±±±

²�²�²³�³�³´�´�´µ�µ�µ¶�¶�¶�¶¶�¶�¶�¶·�·�·�··�·�·�·¸�¸�¸�¸¸�¸�¸�¸¹�¹�¹�¹¹�¹�¹�¹º�º�º�ºº�º�º�º»�»�»�»»�»�»�» ¼�¼�¼¼�¼�¼

½�½�½½�½�½¾¾¾¿¿¿

À�ÀÀ�ÀÀ�ÀÁ�ÁÁ�ÁÁ�Á

Â�ÂÂ�ÂÂ�ÂÂ�ÂÃ�ÃÃ�ÃÃ�ÃÃ�ÃÄ�Ä�Ä�ÄÄ�Ä�Ä�ÄÄ�Ä�Ä�ÄÄ�Ä�Ä�ÄÄ�Ä�Ä�ÄÄ�Ä�Ä�ÄÄ�Ä�Ä�ÄÄ�Ä�Ä�Ä

Å�Å�Å�ÅÅ�Å�Å�ÅÅ�Å�Å�ÅÅ�Å�Å�ÅÅ�Å�Å�ÅÅ�Å�Å�ÅÅ�Å�Å�ÅÅ�Å�Å�Å

Æ�Æ�Æ�Æ�ÆÆ�Æ�Æ�Æ�ÆÆ�Æ�Æ�Æ�ÆÆ�Æ�Æ�Æ�ÆÆ�Æ�Æ�Æ�ÆÆ�Æ�Æ�Æ�ÆÆ�Æ�Æ�Æ�ÆÆ�Æ�Æ�Æ�ÆÇ�Ç�Ç�ÇÇ�Ç�Ç�ÇÇ�Ç�Ç�ÇÇ�Ç�Ç�ÇÇ�Ç�Ç�ÇÇ�Ç�Ç�ÇÇ�Ç�Ç�ÇÇ�Ç�Ç�Ç

È�È�È�È�ÈÈ�È�È�È�ÈÈ�È�È�È�ÈÈ�È�È�È�ÈÈ�È�È�È�ÈÈ�È�È�È�ÈÈ�È�È�È�ÈÈ�È�È�È�ÈÉ�É�É�ÉÉ�É�É�ÉÉ�É�É�ÉÉ�É�É�ÉÉ�É�É�ÉÉ�É�É�ÉÉ�É�É�ÉÉ�É�É�É

Ê�Ê�Ê�Ê�ÊÊ�Ê�Ê�Ê�ÊÊ�Ê�Ê�Ê�ÊÊ�Ê�Ê�Ê�ÊÊ�Ê�Ê�Ê�ÊÊ�Ê�Ê�Ê�ÊÊ�Ê�Ê�Ê�ÊË�Ë�Ë�ËË�Ë�Ë�ËË�Ë�Ë�ËË�Ë�Ë�ËË�Ë�Ë�ËË�Ë�Ë�ËË�Ë�Ë�Ë

Ì�Ì�Ì�Ì�ÌÌ�Ì�Ì�Ì�ÌÌ�Ì�Ì�Ì�ÌÌ�Ì�Ì�Ì�ÌÌ�Ì�Ì�Ì�ÌÌ�Ì�Ì�Ì�ÌÌ�Ì�Ì�Ì�ÌÌ�Ì�Ì�Ì�ÌÍ�Í�Í�ÍÍ�Í�Í�ÍÍ�Í�Í�ÍÍ�Í�Í�ÍÍ�Í�Í�ÍÍ�Í�Í�ÍÍ�Í�Í�ÍÍ�Í�Í�Í

Î�Î�Î�ÎÎ�Î�Î�ÎÎ�Î�Î�ÎÎ�Î�Î�ÎÎ�Î�Î�ÎÎ�Î�Î�ÎÎ�Î�Î�ÎÏ�Ï�Ï�ÏÏ�Ï�Ï�ÏÏ�Ï�Ï�ÏÏ�Ï�Ï�ÏÏ�Ï�Ï�ÏÏ�Ï�Ï�ÏÏ�Ï�Ï�Ï

U(i−1,j)

V(i+1,j)

V(i,j-1)

V(i,j)

V(i,j+1)

V(i-1,j)

½ ¢¸¬1¨ ª/" � ¦

½ ¢¸¬." � ¨¤ªe¦ ½ ¢¤¬§¨ ª�¦ ½ ¢¸¬ � � ¨ ª�¦

½ ¢¸¬1¨ ª � � ¦

V(i,j-1) V(i-1,j)

V(i-1,j-1)

V(i-1,j-2)

U(i-1,j-1)

V(i,j-1) V(i-1,j-1)U(i-2,j)

U(i-2,j-1)

V(i,j-2) V(i-1,j-1)

Figure 3.7: Tile Ü ! takes inputs � ÃÞÝZ a��ß Ç , �/ÃÞÝZ a��ß q� Ç , à�ÃÞÝZ·����ßW Ç and à´ÃÞÝ@��ßW Ç ;determines à�ÃÞÝ�%����ß_ � Ç/é � ÃÞÝ0ä a��ßT � Ç�á,â � à:ÃÞÝK ����ßT$ Ç , � ÃÞÝ�%����ßV%� Ç/é � ÃÞÝ0 a��ßT � Çþá!â ! à�ÃÞÝ�r����ßTq Ç , à�ÃÞÝ@��ßR%� Ç:é �/ÃÞÝ�r����ßT � Çþá!â � à�ÃÞÝ@��ßTq Ç , �/ÃÞÝ0 ����ß Ç�é� ÃÞݪ« a��ß ÇKá,â ! à:ÃÞݨ$����ß°ä� Ç , à´ÃÞÝB��ß Çaé � ÃÞݪ·����ß Ç±á,â � à�ÃÞÝ@��ß°ä� Ç and à´ÃÞÝ�ä����ß Çìé � ÃÞݪ a��ß ÇZá,â � à�ÃÞÝ�ã����ßåã� Ç ; displays à�ÃÞÝ@��ß Ç

3.4.1 Construction

We next extend the design of our scheme to a 3-way overlay scheme. The Error-Resilient

Assembly version 2 (using 3-way overlay redundancy) uses 16 computational tile types and

5 frame tile types. One mismatch on a tile forces two more mismatches in its neighborhood.

This property further lowers the assembly error.

The basic construction is shown in Figure 3.7. In this construction, each pad encodes a

tuple of A bits and hence is an � -valued pad. The basic idea of this error-resilient assembly

is to have each tile Ð6!@,��e`�¹�5 compute error checking values for positions ,-� ± � `�¹�5 , ,-��`�¹ì±��@5 ,66

Page 86: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Table 3.1: An instance of á,â ! . This binary operation can detect the incorrect value of input � ,regardless of the correctness of input

Input 1 Input 2 Output� � �� � �� � �� � �

,��0(>� `�¹�5 , and ,-��`�¹¬(>�@5 , which are compared with corresponding error checking values

computed by Ð6!@,��e`�¹�5 ’s four neighbors. Again, the neighbors are unlikely to bind with

Ð?!@,-��`�¹�5 if such error checking values are inconsistent, and the kinetics of the assembly will

allow these tiles to dissociate from each other, as in version 1 (2-way overlay redundancy).

However, instead of introducing just one additional mismatch in Ðx!,��e`�¹�5 ’s neighborhood,

the 3-way overlay redundancy (version 2) forces two mismatches, and hence we have a

further lowered error rate.

3.4.2 Error Analysis

For error analysis, in addition to the assumptions made in Sect. 3.3.2, we require that �3�"!can detect incorrect value of input � regardless of the correctness of input 8 . This property

seems essential to guarantee two further mismatches in a tile’s neighborhood when there

is an initial mismatch on one of the tile’s four pads. One example instance of �3��! is given

in Table 3.4.2.

The middle portions of all the four pads (top, right, left, bottom) are computed as

described in the caption of Figure 3.7 and serve as the part to redundantly compute and

compare the outputs of two neighboring tiles as shown in the figure.

Without loss of generality, we again consider only the cases where the pad binding

error occurs on either the bottom pad or right pad of a tile Ðû!,-��`�¹�5 . Otherwise, if the pad

binding error occurs on the left pad of tile Ðð!,��e`�¹�5 , then use the same below argument for

67

Page 87: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

tile Ð�!,���(�� `D¹ö5 ; likewise if the pad binding error occurs on the top pad of tile Ðû!,��e`�¹�5 , use

the same below argument for tile Ð6!@,-��`�¹�(��@5 .

Lemma 3.4.1 Suppose that the neighborhood tiles independent of tile Ðû!,-�e`D¹ö5 have cor-

rectly computed ^Í,�±´`·±Ú5 and ��,�±´`·±Ú5 . If there is a single pad mismatch between tile

Ð?!@,-��`�¹�5 and another tile just below or to its immediate right, then there are at least two

further pad mismatches between pairs of adjacent tiles in the immediate neighborhood of

tile Ð?!@,-��`�¹�5 . Furthermore, given the location of the initial mismatch, the location of the

second mismatch can be determined among at most three locations in the neighborhood

of Ð�!@,��e`�¹�5 ; given the location of the initial and the second mismatches, the location of the

third mismatch can be determined among at most five locations.

Proof: Suppose a pad binding error occurs on a bottom pad or right pad of tile Ðû!,-�e`D¹ö5 but

no further pad mismatch occurs between two neighborhood tiles which are independent of

Ð?!@,-��`�¹�5 . We now consider a case analysis of possible pad mismatches.

ÐÐÐÐÑÑÑÑÒ�ÒÒ�ÒÒ�ÒÒ�ÒÓ�ÓÓ�ÓÓ�ÓÓ�Ó

Ô�ÔÔ�ÔÔ�ÔÕ�ÕÕ�ÕÕ�ÕÖ�ÖÖ�ÖÖ�ÖÖ�Ö×�××�××�××�×

ØØØØÙÙÙÙ Ú�Ú�ÚÚ�Ú�ÚÛ�Û�ÛÛ�Û�Û

Ü�Ü�ÜÝ�Ý�ÝÞ�Þ�Þ�ÞÞ�Þ�Þ�Þß�ß�ß�ßß�ß�ß�ßà�à�à�àà�à�à�àá�á�á�áá�á�á�á

âââããã

ä�ä�ä�ää�ä�ä�äå�å�å�åå�å�å�å æ�æ�ææ�æ�æç�ç�çç�ç�ç

è�è�è�è�èè�è�è�è�èè�è�è�è�èè�è�è�è�èè�è�è�è�èè�è�è�è�èè�è�è�è�èè�è�è�è�èé�é�é�éé�é�é�éé�é�é�éé�é�é�éé�é�é�éé�é�é�éé�é�é�éé�é�é�é

ê�ê�ê�êê�ê�ê�êê�ê�ê�êê�ê�ê�êê�ê�ê�êê�ê�ê�êê�ê�ê�êê�ê�ê�êë�ë�ë�ëë�ë�ë�ëë�ë�ë�ëë�ë�ë�ëë�ë�ë�ëë�ë�ë�ëë�ë�ë�ëë�ë�ë�ë

ì�ì�ì�ì�ìì�ì�ì�ì�ìì�ì�ì�ì�ìì�ì�ì�ì�ìì�ì�ì�ì�ìì�ì�ì�ì�ìì�ì�ì�ì�ìì�ì�ì�ì�ìí�í�í�íí�í�í�íí�í�í�íí�í�í�íí�í�í�íí�í�í�íí�í�í�íí�í�í�í

î�î�î�î�îî�î�î�î�îî�î�î�î�îî�î�î�î�îî�î�î�î�îî�î�î�î�îî�î�î�î�îî�î�î�î�îï�ï�ï�ïï�ï�ï�ïï�ï�ï�ïï�ï�ï�ïï�ï�ï�ïï�ï�ï�ïï�ï�ï�ïï�ï�ï�ï

ð�ð�ð�ðð�ð�ð�ðð�ð�ð�ðð�ð�ð�ðð�ð�ð�ðð�ð�ð�ðð�ð�ð�ðñ�ñ�ñ�ññ�ñ�ñ�ññ�ñ�ñ�ññ�ñ�ñ�ññ�ñ�ñ�ññ�ñ�ñ�ññ�ñ�ñ�ñ

ò�ò�ò�ò�òò�ò�ò�ò�òò�ò�ò�ò�òò�ò�ò�ò�òò�ò�ò�ò�òò�ò�ò�ò�òò�ò�ò�ò�òó�ó�ó�óó�ó�ó�óó�ó�ó�óó�ó�ó�óó�ó�ó�óó�ó�ó�óó�ó�ó�ó

U(i−1,j)

V(i+1,j)

V(i,j-1)

V(i,j)

V(i,j+1)

V(i-1,j)

½ ¢¸¬." � ¨¤ªe¦ ½ ¢¤¬§¨ ª�¦ ½ ¢¸¬ � � ¨ ª�¦

½ ¢¸¬1¨ ª � � ¦

V(i,j-1) V(i-1,j)

V(i-1,j-1)

V(i-1,j-2) Mismatch

V(i-1,j-1)

U(i-1,j-1)

V(i,j-1) V(i-1,j-1)U(i-2,j)

U(i-2,j-1)

V(i,j-2)

2nd mismatch

Case 1.1.a

½ ¢¸¬1¨ ª/" � ¦3rd mismatch

Figure 3.8: Case 1.1.a in the proof of Lemma 3.4.1

(1) First consider the case where the pad binding error occurs on the ^Í,-� ±�� `�¹G±�8 5portion of the bottom pad of tile Ð6!,��e`�¹�5 .

68

Page 88: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

ôôôôõõõõö�öö�öö�öö�ö÷�÷÷�÷÷�÷÷�÷

ø�øø�øø�øù�ùù�ùù�ùú�úú�úú�úú�úû�ûû�ûû�ûû�û

üüüüýýýý þ�þ�þþ�þ�þÿ�ÿ�ÿÿ�ÿ�ÿ

������������������������������������������������������������������

������

������������ ��������������������

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��������������������������������������������������������

����������������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������

����������������������������������������������������������������������������������������������������������������

U(i−1,j)

V(i+1,j)

V(i,j-1)

V(i,j)

V(i,j+1)

V(i-1,j)

½ ¢¸¬." � ¨¤ªe¦ ½ ¢¤¬§¨ ª�¦ ½ ¢¸¬ � � ¨ ª�¦

½ ¢¸¬1¨ ª � � ¦

V(i,j-1) V(i-1,j)

V(i-1,j-1)

V(i-1,j-2) Mismatch

V(i-1,j-1)

U(i-1,j-1)

V(i,j-1) V(i-1,j-1)U(i-2,j)

U(i-2,j-1)

V(i,j-2)3rd mismatch

2nd mismatch

½ ¢¸¬1¨ ª/" � ¦Case 1.1b

Figure 3.9: Case 1.1.b in the proof of Lemma 3.4.1

(1.1) Consider the case where the pad binding error is due to the incorrect value of

the right portion ^Í,�� ±�� `D¹�±Ö8 5 of the bottom pad of tile Ðð!,-��`�¹�5 (there may also be the

incorrect value of the other portions of the bottom pad of tile Ðû!,��e`�¹�5 ). Further consider

case (1.1a) (Figure 3.8) when there is no mismatch on the bottom portion ��,-�3±I8ö`�¹4±��@5 of

the right pad. Immediately, we have a mismatch on the portion ^�,��?± � `�¹ ± �@5 of the right

pad of Ð?!@,-��`�¹�5 , since ^�,-� ± � `�¹4±��@5aWì��,�� ±I8ö`D¹H±��@5ð�3�Õ�6^�,�� ±��3`�¹4±�8 5 and �3��� is éö�aê .

Furthermore, tile Ð�!,��e`�¹�5 will determine an incorrect value for the ^¯,-�@±I� `�¹ï±I��5 portion of

its top pad, resulting in a mismatch either on the bottom or on the right pad of Ðû!@,-��`�¹ (ê�@5 .Next consider case (1.1b) (Figure 3.9) when there is a mismatch on the ��,-�a±�8ö`D¹¯±��@5portion of the right pad of Ð6!,-��`�¹�5 . This will result in an incorrect value of ��,-�ð± � `�¹/±ê�@5portion of Ð�!@,-��`�¹�5 ’s left pad (since � �ö! can detect the incorrect value of �G,��û± 8ö`�¹´±Û�@5 ),leading to a further mismatch either on the right pad or on the bottom pad of Ðû!@,-�¨(��3`�¹�5 .

(1.2) (Figure 3.10) Consider the case where the pad binding error is due to the incorrect

value of the middle portion ^�,��ï±�� `�¹�±��@5 of the bottom pad of tile Ðð!@,-��`�¹�5 , but there is

a correct match in the right portion ^�,��ï±�� `�¹�±ê8 5 of tile Ð6!@,-��`�¹�5 (there may also be the

incorrect value of the left portion ^¯,-��`�¹:± 8 5 of the bottom pad of tile Ðð!@,-��`�¹�5 ). Since the

69

Page 89: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

��������������������������������

������������������������������ � � � �

!!!!"""" #�#�##�#�#$�$�$$�$�$

%�%�%&�&�&'�'�'�''�'�'�'(�(�(�((�(�(�()�)�)�))�)�)�)*�*�*�**�*�*�*

+++,,,

-�-�-�--�-�-�-.�.�.�..�.�.�. /�/�//�/�/0�0�00�0�0

1�1�1�1�11�1�1�1�11�1�1�1�11�1�1�1�11�1�1�1�11�1�1�1�11�1�1�1�11�1�1�1�12�2�2�22�2�2�22�2�2�22�2�2�22�2�2�22�2�2�22�2�2�22�2�2�2

3�3�3�33�3�3�33�3�3�33�3�3�33�3�3�33�3�3�33�3�3�33�3�3�34�4�4�44�4�4�44�4�4�44�4�4�44�4�4�44�4�4�44�4�4�44�4�4�4

5�5�5�5�55�5�5�5�55�5�5�5�55�5�5�5�55�5�5�5�55�5�5�5�55�5�5�5�55�5�5�5�56�6�6�66�6�6�66�6�6�66�6�6�66�6�6�66�6�6�66�6�6�66�6�6�6

7�7�7�7�77�7�7�7�77�7�7�7�77�7�7�7�77�7�7�7�77�7�7�7�77�7�7�7�77�7�7�7�78�8�8�88�8�8�88�8�8�88�8�8�88�8�8�88�8�8�88�8�8�88�8�8�8

9�9�9�99�9�9�99�9�9�99�9�9�99�9�9�99�9�9�99�9�9�9:�:�:�::�:�:�::�:�:�::�:�:�::�:�:�::�:�:�::�:�:�:

;�;�;�;�;;�;�;�;�;;�;�;�;�;;�;�;�;�;;�;�;�;�;;�;�;�;�;;�;�;�;�;<�<�<�<<�<�<�<<�<�<�<<�<�<�<<�<�<�<<�<�<�<<�<�<�<

U(i−1,j)

V(i+1,j)

V(i,j-1)

V(i,j)

V(i,j+1)

V(i-1,j)

½ ¢¸¬." � ¨¤ªe¦ ½ ¢¤¬§¨ ª�¦ ½ ¢¸¬ � � ¨ ª�¦

½ ¢¸¬1¨ ª � � ¦

V(i,j-1) V(i-1,j)

V(i-1,j-1)

V(i-1,j-2)

V(i-1,j-1) Mismatch

U(i-1,j-1)

V(i,j-1) V(i-1,j-1)U(i-2,j)

U(i-2,j-1)

V(i,j-2)

2nd mismatch

3rd mismatch

½ ¢¸¬1¨ ª/" � ¦Case 1.2

Figure 3.10: Case 1.2 in the proof of Lemma 3.4.1

value of ^�,��e±´� `�¹�± 835 is correct and ^�,-��±��3`�¹�±��@5 is determined by ��,��e±Ú8ö`�¹�±���56�3�?�?^¯,-��±� `D¹¯±�8 5 and �3�Õ� is éö�aê , we immediately have that there must be a mismatch on the

��,-�?±)8�`�¹ ±ê�@5 portion of Ð�!@,-��`�¹�5 ’s right pad, due to the incorrect value of �G,��6±�8ö`�¹ ± �@5portion of this pad. However, since ^Í,-�±��3`�¹ ±��@5aWì��,-�±¯8ö`�¹ ±��@5û�3����^�,��S±�� `�¹ ±¯8 5 , the

value of ^Í,-�·±¯� `�¹û±Í�@5 (right portion of its top pad) computed by Ðû!,��e`�¹�5 must be incorrect,

resulting in a further mismatch either on the bottom or on the right pad of Ðx!,��e`�¹°(��@5 .

=�==�==�==�=>�>>�>>�>>�>

?�??�??�?@�@@�@@�@A�AA�AA�AB�BB�BB�B

CCCDDDEEEFFF

G�G�G�GG�G�G�GH�H�H�HH�H�H�HI�I�I�II�I�I�IJ�J�J�JJ�J�J�JK�K�K�KK�K�K�KL�L�L�LL�L�L�L M�M�MN�N�NO�O�OP�P�PQ�Q�QQ�Q�QR�R�RR�R�RSS

STTT

U�U�U�U�UU�U�U�U�UU�U�U�U�UU�U�U�U�UU�U�U�U�UU�U�U�U�UU�U�U�U�UU�U�U�U�UV�V�V�VV�V�V�VV�V�V�VV�V�V�VV�V�V�VV�V�V�VV�V�V�VV�V�V�V

W�W�W�WW�W�W�WW�W�W�WW�W�W�WW�W�W�WW�W�W�WW�W�W�WW�W�W�WX�X�X�XX�X�X�XX�X�X�XX�X�X�XX�X�X�XX�X�X�XX�X�X�XX�X�X�X

Y�Y�Y�Y�YY�Y�Y�Y�YY�Y�Y�Y�YY�Y�Y�Y�YY�Y�Y�Y�YY�Y�Y�Y�YY�Y�Y�Y�YY�Y�Y�Y�YZ�Z�Z�ZZ�Z�Z�ZZ�Z�Z�ZZ�Z�Z�ZZ�Z�Z�ZZ�Z�Z�ZZ�Z�Z�ZZ�Z�Z�Z

[�[�[�[�[[�[�[�[�[[�[�[�[�[[�[�[�[�[[�[�[�[�[[�[�[�[�[[�[�[�[�[[�[�[�[�[\�\�\�\\�\�\�\\�\�\�\\�\�\�\\�\�\�\\�\�\�\\�\�\�\\�\�\�\

]�]�]�]�]]�]�]�]�]]�]�]�]�]]�]�]�]�]]�]�]�]�]]�]�]�]�]]�]�]�]�]^�^�^�^^�^�^�^^�^�^�^^�^�^�^^�^�^�^^�^�^�^^�^�^�^_�_�_�__�_�_�__�_�_�__�_�_�__�_�_�__�_�_�__�_�_�_`�`�`�``�`�`�``�`�`�``�`�`�``�`�`�``�`�`�``�`�`�`

U(i−1,j)

V(i+1,j)

V(i,j-1)

V(i,j)

V(i,j+1)

½ ¢¸¬1¨ ª/" � ¦

½ ¢¸¬." � ¨¤ªe¦ ½ ¢¤¬§¨ ª�¦ ½ ¢¸¬ � � ¨ ª�¦

½ ¢¸¬1¨ ª � � ¦

V(i,j-1) V(i-1,j)

V(i-1,j-1)

V(i-1,j-2)

V(i-1,j-1)

U(i-1,j-1)

V(i,j-1) V(i-1,j-1)U(i-2,j)

U(i-2,j-1)

Case 1.3.a

Mismatch V(i,j-2)

V(i-1,j)

2nd mismatch

3rd mismatch

Figure 3.11: Case 1.3.a in the proof of Lemma 3.4.1

(1.3) Consider the case where the pad binding error is due to the incorrect value of

70

Page 90: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

the left portion ^Í,��e`�¹�±�8 5 of the bottom pad of tile Ð6!,-�e`D¹ö5 , but there are both correct

matches in the right portion ^Í,��H±>� `�¹�±�8 5 and middle portion ^�,-�4±}�3`�¹�±>�@5 of the

bottom pad of tile Ð�!@,��e`�¹�5 . Further consider case (1.3a) (Figure 3.11) when there is no

mismatch on the ��,-�ì±Û8ö`�¹�±��@5 portion of the right pad. Then Ðð!,��e`�¹�5 must compute a

correct value for ��,-� ±�� `�¹�±���5KW ��,-� ±ê8ö`�¹G±��@54� �ö! ^¯,-� ±�� `�¹�±Ö8 5 . Ð?!,-�e`D¹ö5 further

computes both an incorrect value of ^¯,-��`�¹/±ê�@5 portion of its top pad (since ^Í,��e`�¹/± �@5aW��,-�ï±�� `�¹G±��@5��3����^Í,��e`�¹�±Û835 ) and an incorrect value for ^�,-��`�¹G±��@5 portion of its left

pad. The first incorrect value will result in a mismatch either on the bottom or on the right

pad of Ð�!@,��e`�¹�( �@5 . The second incorrect value will result in a mismatch either on the right

or on the bottom pad of Ð6!,���(��3`�¹�5 . Next consider case (1.3b) when there is a mismatch

on the �G,��ð± 8ö`�¹/± �@5 portion of the right pad of Ðð!�,-�e`D¹ö5 . But this case cannot occur since

both ^¯,-�ð±ê�3`�¹´±ê�@5 and ^Í,��ð±ê� `�¹J± 8 5 portions of Ð6!,��e`�¹�5 ’s bottom pad are correct, and

^¯,-�û±Ö� `�¹´±Û�@5ìWë��,-�û± 8�`�¹�±Ö��5ï�3�Õ�x^Í,-�û±Ö�3`�¹�± 8 5 , where �3�Õ� W éö�»ê .

a�a�a�aa�a�a�ab�b�b�bb�b�b�bc�c�c�cc�c�c�cd�d�d�dd�d�d�d e�e�ee�e�ef�f�ff�f�fg�gg�gg�g

h�hh�hh�h

iiijjjkkklll

m�m�m�mm�m�m�mn�n�n�nn�n�n�n o�o�op�p�pq�q�qr�r�rssstttu�uu�uu�uu�uv�vv�vv�vv�v

w�ww�ww�wx�xx�xx�x

y�y�y�yy�y�y�yy�y�y�yy�y�y�yy�y�y�yy�y�y�yy�y�y�yy�y�y�yz�z�z�zz�z�z�zz�z�z�zz�z�z�zz�z�z�zz�z�z�zz�z�z�zz�z�z�z

{�{�{�{�{{�{�{�{�{{�{�{�{�{{�{�{�{�{{�{�{�{�{{�{�{�{�{{�{�{�{�{{�{�{�{�{|�|�|�||�|�|�||�|�|�||�|�|�||�|�|�||�|�|�||�|�|�||�|�|�|

}�}�}�}�}}�}�}�}�}}�}�}�}�}}�}�}�}�}}�}�}�}�}}�}�}�}�}}�}�}�}�}}�}�}�}�}~�~�~�~~�~�~�~~�~�~�~~�~�~�~~�~�~�~~�~�~�~~�~�~�~~�~�~�~

������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������������������������������

U(i−1,j)

V(i+1,j)

V(i,j-1)

V(i,j) V(i-1,j)

½ ¢¸¬1¨ ª/" � ¦

½ ¢¸¬." � ¨¤ªe¦ ½ ¢¤¬§¨ ª�¦ ½ ¢¸¬ � � ¨ ª�¦

½ ¢¸¬1¨ ª � � ¦

V(i,j-1) V(i-1,j)

V(i-1,j-1)

V(i-1,j-2)

V(i-1,j-1)

U(i-1,j-1)

V(i,j-1) V(i-1,j-1)U(i-2,j)

U(i-2,j-1)

Case 2.1 V(i,j+1)

Mismatch

3rd mismatch

V(i,j-2)

2nd mismatch

Figure 3.12: Case 2.1 in the proof of Lemma 3.4.1

(2) Now consider the case where the pad binding error occurs on the right pad of

Ð?!@,-��`�¹�5 , but there is no binding error on the bottom pad of Ðû!,��e`�¹�5 .We note that since both ^Í,-��± � `D¹ ±)8 5 and ^�,-�Õ± � `�¹ ± �@5 portions of the bottom pad

71

Page 91: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

are correct, the ��,-�x± 8ö`�¹´±���5 and ^�,��û±��3`�¹:±Û�@5 portions of the right pad must also be

correct, so we only need to consider the case (2.1) (Figure 3.12) where the binding error is

due to an incorrect value of the top portion ��,-��± 8ö`�¹�5 of the right pad of Ðð!@,-��`�¹�5 , but there

is no mismatch on other portions of the right pad of Ðð!@,��e`�¹�5 . First note an incorrect value of

��,-�S±�8�`�¹�5 will result in an incorrect value of the right portion ^Í,��3±�� `�¹ ±��@5 of the top pad

of Ð?!,-��`�¹�5 . And this will lead to a further mismatch either between Ðð!,-�e`D¹ö5 and Ð?!,-��`�¹<(M�@5or between Ð�!@,-��`�¹)( ��5 and Ð?!@,-��±9� `�¹H(|�@5 . Next note that Ð�!@,��e`�¹�5 must compute an

incorrect value for the ��,-�6± � `�¹�5 portion of its left pad, resulting in yet another mismatch

either between Ð�!,��e`�¹�5 and Ð�!@,��¨(�� `�¹�5 or between Ð�!�,-�¨(�� `D¹ö5 and Ð?!@,-�¨(�� `�¹å(��@5 .We have thus proven that a mismatch in the right or bottom pad of Ðû!,��e`�¹�5 results in at

least two further mismatches. And given the location of the first mismatch, the location of

the second mismatch can be determined among at most three locations (between Ðx!,-�e`D¹ö5and Ð?!,-�x±�� `D¹ö5 , or between Ð�!,-��`�¹�5 and Ð�!@,��e`�¹W(��@5 , or between Ð�!@,-��`�¹W(��@5 and Ð�!,�� ±� `D¹T(��@5 ). Furthermore, given the locations of the first two mismatches, the location of

the third mismatch can be determined among at most five locations (between Ðx!,-��`�¹�5 and

Ð?!@,-��(�� `�¹�5 , between Ð�!@,-�ª(�� `�¹�5 and Ð?!,-�ª(�� `�¹�±��@5 , between Ð�!@,-��`�¹�5 and Ð�!@,��e`�¹W(��@5 ,between Ð�!,-�e`D¹­(Û��5 and Ð�!,���± � `�¹�(ê�@5 , or between Ð6!,���(ê� `D¹ö5 and Ð�!,-��(Ö� `�¹­(Ö�@5 ). Ê

We again calculate the error rate B�! for our versioin 2 construction using thermody-

namic analysis. The key observation here is that the number of assemblies with exactly

one mismatch or exactly two mismatches are � . In addition, since one pad mismatch is

linked with a second mismatch at one of three possible locations, and each of these three

second mismatch is in turn linked with a third mismatch at one of five possible locations,

72

Page 92: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

we have Ý W�8S'ª©�A © =JW�A �3' . As such, we have

Pr ,�öG�h5 W ��þ(êÝö��� � ³ Á J ( Ýg!C� � !Ó³ Á J (êÝ � �� ³ Á J (h�Ù�6�,(êÝ R � � R ³

Á J (3.12)

õ ��þ(êÝ � �� ³ Á J (3.13)

W ��þ( A �3'�� �� ³ Á J (3.14)

õ � ± A �3'�� �� ³ Á J ` (3.15)

where öG� is the unique error-less assembly.

Again, we also have

Pr ,�öG�h5aW|,�,�� ± Be!c5 ! 5 R õ � ± 8'6Be!!� (3.16)

Putting together equations 3.15 and 3.16, we have BC!�W �@=3�S�� ³ Á J W �@=3Be . Thus we

have shown,

Theorem 3.4.2 The error rate B�! for assemblies constructed from version 2 error resilient

tiles is �@=3Be , where B is the error rate for the corresponding assembly system with no error

correction.

Note that the growth rate Ò�! õ'ÿ �S� !Ó³ Á J õ ,�� w �@= 5 !�� ÿ ,_Be!c5 !�� .Note that each pad encodes a tuple of three bits, and the values of the left pad, the top

pad, the middle portion of the right pad, and the middle portion of the bottom pad each

depend only on the values of the top portion and the bottom portion of the right pad and

the right and left portion of the bottom pad. As such, the tile type depends on only Ù binary

bits, and hence only 8 ÿ W �·ú tile types are required in addition to the initial frames at the

bottom and to the right (requiring = additional tiles).

73

Page 93: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

3.5 Computer Simulation

We first give below the construction of a Sierpinsky triangle using our error resilient as-

sembly version 1, and then perform empirical study of the error rates using computer

simulation of assembly of the Sierpinsky triangle and compare the results with those of

Winfree’s [101].

We show below the construction of a binary counter and a Sierpinsky triangle. For each

of them we use a total of 12 tiles, including 8 counter tiles and 4 frame tiles as shown in

Figure 3.13 and Figure 3.14.

We would like to again emphasize that although we give the construction of the tiles in

previous sections with each pad having two or three distinct portions, a mismatch on any

portion of a pad results in a total mismatch of the whole pad instead of a partial mismatch of

only that portion. Hence, in Figure 3.14, we use a distinct label for each pad, emphasizing

the wholeness of the pad.

Scccc

a

a

a

b

b

aS b c

0 1

0 0 0

0

0

0

0

1

1

1 1

1

1 10

0 0

0

000 0

0

1 1

1

00

00

00

00

00

11

11

00

00

00

11

01

10

01

010

1

00

01

11

00

01

00

00

11

01

01

00 00 00 00

00

00

0

00

00

00

00

00

00

00

11

11

00

00

00

00

11

11

01

01

01

00

01

00

00

0

01

00

00

00

00

00

11

01

10

Assembled Binary CounterPads

Tiles01

10

11

01 11

01

01 10

01

01

11

10

00

11 0

1

1001

0001

00

110101

10

01

110

1

10

Figure 3.13: The construction of a binary counter using error resilient assemblies version 1. Thepads and the tile set are shown on the left and the assembled binary counter is shown on the right.The pads of strength 2 have black borders while the strength 1 pads are border-less. The seed tileis labeled with S. Tiles a, b and c are the other frame tiles

For the simulation study, we used the Xgrow simulator by Winfree [101] and simulated

the assembly of Sierpinsky triangles for the following cases:

# assembly without any error correction,

# assembly using Winfree’s 8G( 8 proofreading tile set,

74

Page 94: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Scccc

a

a

a

b

b

aS b c

0

0

0

0

1

1 1

1

0

0

000

0

0

00

0

0

1

1

1

1

11

1

1

1 1

0

001

01

0

11

0

00

0

01

1

11

0

11

0

11

00

1

01

1

11

11

1

00

00

01

00

01

0

01

00

10

00

01

01

11 11

110

11

00

11 0

01

00

11

1

01

00

00

00

11

01

11

01

11

0

01

01

1

11

01

01

00

11100100

1

10

010

100

00

11

11

10

Tiles

0110

00 0

1

10

01

10

01

01

0

0

1

10

10

01

01

01

10

00

1

1

1

1

1

Sierpinsky TilingPads

11

00 110

Figure 3.14: The construction of a Sierpinsky triangle using error resilient assemblies version 1.The pads and the tile set are shown on the left and the assembled Sierpinsky triangle is shown onthe right. The pads of strength 2 have black borders while the strength 1 pads are border-less. Theseed tile is labeled with S. Tiles a, b, and c are the other frame tiles

# assembly using Winfree’s A�( A proofreading tile set,

# assembly using our error resilient scheme version 1, Ðï� (construction in Figure 3.14),

# assembly using our error resilient scheme version 2, Ðû! (construction not shown).

We performed simulations of the assembly process of a target aggregate of =ö�@8�(M=��@8tiles. A variable ² is defined as the largest number of tiles assembled without any perma-

nent error in the assembly in =S�~ï of all test cases. The variations in the value of ² are

measured as we increase the value of the probability of a single mismatch in pads ( B ) by

changing the values of Z_� � and Z�f � , where Z � � and Z�f � are the free energies [101]. As

suggested in [101], the experiments were performed near equilibrim, where ZT� �þõ 83Z´f � ,to achieve optimal error rate B õ 8S� � ³ Á J .

Figure 3.15 shows the variation in ² with .§0 2 � B . From the figure it can be seen that

the performance of our version 1 ( Ðï� ) construction is comparable to Winfree’s 8K(Í8 proof-

reading tile set construction, while our version 2 ( Ðð! ) performs comparably to Winfree’s

A�({A proofreading tile set construction.

75

Page 95: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

−6 −5.8 −5.6 −5.4 −5.2 −5 −4.8 −4.6 −4.4 −4.2 −40

0.5

1

1.5

2

2.5

3

3.5x 104

log(probability of single mismatch)

Siz

e of

err

or−f

ree

aggr

egat

e

Size of error−free aggregate vs Probability of single mismatch

No error correctionOur T

1 construction

Winfree 2x2 constructionOur T

2 construction

Winfree 3x3 construction

Figure 3.15: A graph showing the variation of � v.s. increasing value of error (probability ofsingle mismatch) �3.6 Discussion

In the proof of this paper, we require �3�?� to be é"�aê , for concreteness. However, note that

our constructions apply to more general Boolean arrays in which �3��� is an input sensitive

operator, i.e. the output changes with the change of exactly one input.

Note that � ��� and �3��! are both the function é"�aê for the example assemblies for

the Sierpinsky triangle but this is not true for the assembly for a binary counter of N bits,

since �3�ö! is the logical æXç~è in that example. It is an open question whether our above

error-resilient constructions can be further simplified in the case of special computations,

such as the Sierpinsky Triangle, where the � �Õ� and �3�ö! are the same function such as

éö�aê .

Another open question is to extend the construction into a more general construction

such that the error probability can be decreased to B for any given Ý , or alternatively, prove

an upper bound for Ý .

76

Page 96: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Part II:

Nano-Device

77

Page 97: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Chapter 4

Theoretical Designs and ExperimentalConstruction of Autonomous DNA WalkingDevices

A major challenge in nanotechnology is to precisely transport a nanoscale object from one

location on a nanostructure to another location following a designated path. The successful

construction of self- assembled DNA nanostructures provides a solid structural foundation

to meet this challenge. DNA, with its immense information encoding capacity and well

defined Waston-Crick complementarity, has been explored as an excellent building mate-

rial for nanoconstruction [53, 79]. In particular, recent years have seen remarkable success

in both the construction of self-assembled nanostructures and individual nanomechanical

devices. For example, one and two dimensional DNA lattices have been constructed from

a rich set of branched DNA molecules [2, 35, 44, 49, 73, 102, 106, 107]. These DNA

lattices could provide a platform for embedded DNA nanomechanical devices to perform

the desired transportation. A diverse group of DNA nanomechanical devices have also

been demonstrated. These include DNA nanodevices executing cycles of motions such

open/close [83, 84, 115], extension/contraction [10, 27, 39], and rotation [50, 108], medi-

ated by external environmental changes such as the addition and removal of DNA “fuel”

strands [10, 27, 39, 83, 84, 108, 115] or the change of ionic composition of the solu-

tion [50]. However, these devices are unsuitable for the above challenge for two reasons.

First, they demonstrate only local conformation changes, not progressive motion. Sec-

ondly, they do not move autonomously. Various schemes of autonomous DNA walker

devices based on DNA cleavage and ligation have been explored theoretically but not ex-

perimentally [62]; these were limited to random bidirectional movement. The use of DNA

78

Page 98: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

hybridization as an energy source for autonomous molecular motors has also been pro-

posed [91]. Recent papers report the construction of a non-autonomous DNA biped walker

device [81, 82] and autonomous DNA tweezers [22, 21].

In this Chapter, I take the next important step of designing and constructing DNA

walkers capable of autonomous, unidirectional, linear progressive motion. In particular,

by embedding dangling DNA duplex fragments in self-assembled DNA lattices, we have

designed a suite of walking DNA devices capable of autonomous, programmable, unidi-

rectional motions along linear tracks [110]. The practicality of our designs is partially

supported by our experimental construction of a three-anchorage walking device [113].

The autonomous, unidirectional, along-the-track motion demonstrated by these prototype

systems represents a novel type of motion for DNA based nanomechanical devices. Em-

bedding a walking device of this kind in a DNA lattice would result in a nano-robotics

lattice that can meet the challenge stated above: a nanoscale “walker” that moves au-

tonomously along a designated path over a microscopic structure, serving as a carrier of

information and possibly physical cargo such as nanoparticles.

In the remainder of this chapter, we first present our theoretical designs of three DNA

walking devices in Section 4.1, and then describe a demonstration experimental imple-

mentation of device I in Section 4.2. We conclude with discussions on technical issues in

designing and implementing such devices in Section 4.3.

4.1 Theoretical Designs of Three Autonomous DNA Walking Devices

In this section, we present three designs of autonomous DNA walking devices. Each

device consists of a track and a walker. The track of each device contains a periodic linear

array of anchorage sites. A walker sequentially steps over the anchorages in an autonomous

unidirectional fashion. Each walking device makes use of alternating actions of restriction

79

Page 99: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

enzymes and ligase to achieve unidirectional translational motion. The action of ligase

consumes ATP as energy source. The walking devices described here make the follow-

ing improvements over the walking device presented in [63]. Firstly, they demonstrate

unidirectional motion rather that random bidirectional motion. Secondly, the moving part

(walker) in each walking device is a physical entity with a flexible body size rather than a

symbolic entity, and thus the walker can serve not only as an information carrier but also

as a nanoparticle carrier.

Device I is a short duplex DNA fragment and hence has limited capacity to serve as a

nanoparticle carrier. In contrast, device II allows a much larger body size and hence has

more carrying capacity, but it suffers a (low) risk of falling off the track. Device III com-

bines advantages of device I and device II and results in a two-footed walker with a larger

scale body and zero probability of falling off the track, though it is a more complicated

(hence less practical) construction and assumes a not yet fully-substantiated restriction en-

zyme property. For each walking device, we first present its structure and operation, and

then describe its implementation using conceptual enzymes followed by one or more con-

crete examples using commercially available enzymes. The design using conceptual en-

zymes illustrates the general principle of the design and reveals the essential information

encoding of the device that dictates its operation, while the examples using real enzymes

both validate the practicality of the design principles and illustrate some technical compli-

cations in mapping the conceptual design to real enzymes.

4.1.1 Definitions

A basic structural unit used in the construction of the walking devices is a dangler. A

dangler is a duplex DNA fragment with single strand extensions at both ends: one end is

the fixed end that is usually attached to another structural unit (e.g. the backbone of the

track or the body of the walker); the other end is the sticky end. The flexible single strand

80

Page 100: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 4.1: (a) Hybridization and melting. (b) Ligation. (c) Restriction

DNA at the fixed end allows the otherwise stiff dangler to move rather freely around the

fixed end. This property is crucial to the operation of the devices. The fixed end only serves

to structurally join a dangler to another component of the device in a flexible fashion (e.g.

the linkage of an anchorage to the backbone of the track/the linkage of a foot to the body a

walker); the sticky end, in contrast, usually encodes information and participates actively

in dictating the motion of the walker.

Two basic operational events driving the unidirectional motion of the devices are lig-

ations and restrictions. Two neighbouring danglers with complementary sticky ends can

associate with each other via the hybridization of their sticky ends. Subsequent to this hy-

bridization, the nicks at either end of the hybridized section can be sealed by a ligase and

the two duplex fragments are joined into one in a process referred to as ligation. When

the context is clear, the whole process of hybridization and subsequent ligation (joining of

two DNA strands) is referred to as ligation, for simplicity. See Figure 4.1 (a) and (b) for

schematic illustrations of hybridization and ligation, respectively. In restriction, an approx-

imately reverse process to ligation, a duplex DNA fragment is cut into two separate duplex

parts (with each usually possessing a complementary sticky end) by enzymes known as

endonucleases or restriction enzymes. Following restriction, the two duplex DNA frag-

ments (each with a sticky end) can go apart in a process known as melting. When the

81

Page 101: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

context is clear, the whole process of restriction and subsequent melting is referred to as

restriction. See Figure 4.1 (c) and (a) for schematic illustrations of restriction and melt-

ing, respectively. Note that melting and hybridization are in dynamic balance as shown

in Figure 4.1 (a). Restriction by an endonuclease usually requires that the substrate DNA

fragment contains recognition site (specific DNA sequences) corresponding to the endonu-

clease and that the restriction happens at specific restriction site along the DNA fragment.

There are a rich set of restriction enzymes. Figure 4.2 illustrates three types of restriction

enzymes. Figure 4.2 (a), (c) and (e) describe the conceptual restriction enzymes that will be

used in the construction of our devices. In this figure, Ò is the length of the recognition site

in number of bases; � and � are parameters (in number of bases) that dictate the restriction

patterns. In Figure 4.2 (a), the value � (ê� is also a parameter constituting the recognition

site: ��( � has to be a specific value for a given restriction enzyme. Figure 4.2 (b), (d) and

(e) show examples of corresponding real enzymes. In contrast to restriction, ligation does

not require specific recognition sites, but it requires complementary sticky ends from the

two parts to be joined together. Restriction uses no energy input from external environment

while ligation consumes one molecule of ATP as energy source.

4.1.2 Device I

Overview. In device I, the walker, a short DNA fragment, moves along a track unidirec-

tionally in an autonomous fashion. Along the track is a linear array of repeated anchorage

sites, on top of which the walker can be bound. At each step, the walker currently bound to

an anchorage is ligated to another anchorage site immediately downstream along the track,

producing a ligation product that contains a new restriction enzyme recognition site. The

walker is subsequently cut onto the second anchorage, exposing a new sticky end that is

complementary to the sticky end of the next anchorage further down the track. Hence the

walker can move down the track inductionally. We describe below the structural design

82

Page 102: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 4.2: Panels (a), (c) and (e) illustrate conceptual endonucleases used in the construction ofthe walking devices. The sequences constituting the recognition site of the endonuclease in (a) arelabeled with � , É � , ��� and É ��� ; the sequences constituting the recognition site of the endonuclease in(c) are labeled with and É ; the sequences constituting the recognition site of endonuclease in (e)are labeled with � and É� . Symbols � , � and � are length parameters in number of bases. Panels (b),(d) and (f) show examples of real restriction enzymes corresponding to (a), (c) and (e). In panel(b), sequences � ä æ , æ Ü&� , �&Ü æ and æ ä � correspond to sequences � , É � , ��� and É ��� in panel(a), respectively. In this case, the values of � , � and � are � , � and , respectively. In panel (d),sequences æ Ü&��� ä&� and � ä æKæ Ü æ correspond to sequences and É in panel (c), respectively.In this case, the values of � , � and � are � , ��� and �p� , respectively. In panel (e), sequences � æ �å�and æ � æ�æ correspond to sequences � and É� in panel (f), respectively. In this case, the values of � ,� and � are � , �� and �� , respectively. Note that we use negative values for � and � to differentiatethis cutting pattern from that in panel (c). In all the panels, recognition sites and restriction sitesare indicated with red (dark) boxes and pairs of red (dark) arrows, respectively. N indicates theposition of a base whose value does not affect recognition by an endonuclease

83

Page 103: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

C D AA*B

Ligase

AA B*C D

C*D

D AB

Ligase

Step 1a

Step 1b

Step 2a

Step 2b

Step 3a

Step 3b

Step 4a

Step 4b

A*B C D A

Anchorage

Walker

Backbone

AA B

(a)

(b)

A

Endonuclease IIC*

Ligase

CB D*A

B*C D A

Endonuclease I

Endonuclease I

Endonuclease II

ACD*

DA*

A CB

Ligase

A*B C D A

A

A

BA

Figure 4.3: The structural design and step by step operation of the device. (a) Structural design ofthe device. (b) Step by step movement of the walker

and the operation of the device.

Along the backbone of the track of device I, a linear array of repeated anchorages

are attached in the pattern ,-FÚ<R²��I5 R , where F , < , ² , and � are four distinct types of

anchorages. Fig 4.3 (a) illustrates a basic cyclic unit of the track with the walker bound

on top of anchorage F . Structurally, each anchorage is a DNA dangler and the walker is

a short DNA fragment. The anchorages are spaced in a way such that only neighboring

anchorages can hybridize with each other, provided that they possess complementary sticky

ends.

The walker steps sequentially on top of the anchorages in an autonomous and unidirec-

84

Page 104: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

tional fashion. Denote the walker with © , then each anchorage � , where � W�F , < , ² , or

� , can exist in two states, the blank state, denoted as � , and the star state, denoted as � ¯ .Here we abuse the terminology and also call �$¯ an anchorage. The walker initially resides

on top of anchorage F ( Hence this anchorage is denoted as F ¯ ). To make the walker exe-

cute cycles of unidirectional motion on top of the anchorages, we implement the following

four reactions,

F ¯ ( <9y F ¯ <}y F ( < ¯< ¯ ( ²>y < ¯ ²>y <�(%² ¯² ¯ (�� y ² ¯ � y ² (�� ¯� ¯ ( F�y � ¯ F�y �d( F ¯

where F ¯ < , < ¯ ² , ² ¯ � and � ¯ F are ligation products between the two corresponding

anchorages. Each reaction has two phases: in phase & , the star anchorage which binds

the walker is ligated to its immediate downstream neighbour (the formation of F ¯ < , < ¯ ² ,

² ¯ � , and � ¯ F ); in phase � , the ligation product is cleaved by an endonuclease such that the

walker ( © ) is cut to the downstream anchorage. Thus each reaction moves the walker from

one anchorage to its immediate downstream neighbour. Put all four reactions together, we

have,

F ¯ <R²��¯F�y FÚ< ¯ ²��¯F�y F <T² ¯ �ÍFy FÚ<R²�� ¯ F�y FÚ<R²��ÍF ¯

This is a full induction cycle of the motion of the walker, and hence the walker can (in

principle) move on infinitely. We will further make phase & of each reaction not reversible,

as explained later. Thus the whole reaction is irreversible and the walker can move along

the track in only one direction. This process is depicted in Fig. 4.3 (b).

We next describe how to implement the above reactions. We first describe an imple-

mentation with conceptual restriction enzymes to reveal the general principle of the design;

85

Page 105: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

then we give a concrete molecular implementation with commercially available enzymes

to demonstrate the practicality of the design; finally we discuss some variants of the design

with, again, conceptual restriction enzymes.

Implementation with conceptual enzymes. The above reactions can be implemented

with two conceptual enzymes b� and b:8 that have similar cutting patterns as the one

shown in Figure 4.2 (a). We required that Ò3�4W�Ò·! , �"�4W��g! , and �S�4W��·! , where Ò ¬ , � ¬ , and

� ¬ are the length parameters for endonuclease b�� .In Fig 4.2 (a), sequences � , E � , � � , and E � � constitute the recognition site for endonucle-

ase I. Sequence E � and E � � are complementary to � and � � , respectively (The complementary

sequence of a sequence � is usually denoted as E� ). In real enzymes, ��� is usually identical

to E � � , where � � denotes the reverse sequence of � . In other words, the recognition site

usually has palindromic sequence. This often adds additional complication to the imple-

mentation as discussed in Section 4.1.3. Recall that to constitute a legal recognition site

for endonuclease I, the distance between the sequences � and � � (and hence the distance

between E � and E � � ) must be �£�ª( �S� , a parameter determined by the endonuclease in action.

If the distance is incorrect, we say that � and � � are not in frame.

Using two such enzymes, we give an implementation as shown in Fig. 4.4, which

describes the detailed step by step reactions that dictate the motion of the walker. These

steps correspond to those in Figure 4.3 (b). Since only the sequences in the region near

the top of each anchorage are relevant, we only illustrate these sequences in Figure 4.4.

In Figure 4.4, we require �ä�W 8 , � � �W\8 � , Ôú�Wød , and that the recognition sequences are

in frame if and only if they are labeled with the horizontal bracket. Also recall that two

anchorages can be ligated if and only if they possess complementary sticky ends and are in

neighboring positions. We go through the detailed reaction steps below.

At the start of the reaction, the walker resides at the end of anchorage F . In step 1a,

86

Page 106: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 4.4: Implementation of device I using two conceptual restriction enzymes. Endonucleaserecognition sites and restriction sites are indicated with red (dark) boxes and pairs of red (dark)arrows, respectively. The walker (moving part) is shaded with blue (grey)

87

Page 107: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

the sticky end d of F ¯ is ligated to its complementary sticky end Ed of < ( FW¯�(M<}y F°¯C< );

in step 1b, the ligated product F ¯ < is cut into F and < ¯ ( F ¯ < y Fú(�< ¯ ). Now the

walker moves from F to < , and <¬¯ possesses sticky end EÔ , which is complementary to

the sticky end Ô of anchorage ² . Note that the restriction pattern depicted in the figure

(corresponding to the recognition site determined by the left � and � � ) is the only possible

one for F ¯ < . In particular, only � can constitute with � � an endonuclease b¯� recognition

site in F ¯ < in Figure 4.4 (a), as indicated with the pair of red (dark) boxes.

In step 2a, ² is ligated with < ¯ subsequent to the hybridization of their sticky ends,

Ô and EÔ ; in step 2b, ² ¯ < is cut into ² ¯ and < , moving the walker to anchorage ² while

generating the sticky end d of ² ¯ . Hence ² ¯ is ready to hybridize with � , whose sticky

end is Ed . Note that now F is in its blank state and has sticky end EÔ , and its neighbour

anchorage < is also in its blank state and possess sticky end d . Since Ôh�W9d , EÔ and d are

not complementary to each other and hence no undesirable ligation between F and < can

happen.

The reactions in step 3 and step 4 are similar as above, and their descriptions are omitted

for brevity. Upon finishing step 4, the walker ( © ) moves to an anchorage of type F again,

and thus the cycle can go on in an induction way.

Proof of correctness. To further prove the correctness of the above induction movement,

we will prove: 1) the motor can only move unidirectionally – it cannot go backwards; 2)

the motor never falls off the track; 3) no undesirable ligations or restrictions can happen

other than those described above and in an innocuous idling process described below.

To see the unidirectionality of the motion, note that in each step, phase & is irreversible

since there is neither restriction site of enzyme I nor of enzyme II that could have un-

desirably cut the walker back to the previous anchorage. This effectively establishes the

unidirectionality of the motion of the walker. In contrast, phase � of each step is reversible

88

Page 108: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

– i.e. the cleaved walker can be ligated back to the previous anchorage. However, the

re-ligation product is immediately cut such that the walker is severed from the previous

anchorage and returned to the current anchorage. Thus this is only an idling process in the

motion of the walker: it neither reverses nor blocks the unidirectional motion of the walker.

To show the walker never falls off the track, we need to make explicit one assumption in

the model: the presence of nicks in the DNA duplex region spanned by recognition sites of

an endonuclease effectively inhibits the restriction by that endonuclease. This immediately

implies that the cleavage of a hybridization product formed between two anchorages is

conditional on the ligation of the two. Hence the walker cannot be cut from one anchorage

unless it is already bound to another. Therefore, the walker can never fall off the track. We

note that this assumption may only hold under certain experimental conditions. In fact, an

external energy free variation of the DNA device described in a later part of the chapter

requires an opposite assumption.

It is straightforward to see that no undesirable restriction can occur by inspection of

the ligation product formed at each step. To see that no undesirable ligations can occur,

we note the following properties of the device: 1) Without the walker bound at the top,

no two neighboring anchorages possess complementary stick ends; 2) After the walker

passes an anchorage, the anchorage is restored to its initial state. Also recall that a ligation

can only happen between two neighbouring anchorages with complementary sticky ends.

Thus, before and during the motion of the walker, no undesirable ligations can occur.

Hence, we have proven that the construction in Figure 4.4 results in a walker that moves

along the track in an autonomous unidirectional way.

Implementation with real enzymes. The implementation with conceptual enzymes in

Figure 4.4 can be mapped precisely to an implementation with two real enzymes shown in

Figure 4.5. Here, we use endonuclease PflM I and BstAP I for conceptual endonuclease

89

Page 109: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 4.5: Real enzymes used in the construction of device I. Endonuclease recognition sites andrestriction sites are indicated with red (dark) boxes and pairs of red (dark) arrows, respectively. Nindicates the position of a base whose value does not affect recognition by an endonuclease

Table 4.1: Implementation of device I with endonuclease PflM I and BstAP I. Ligation sites andrestriction sites are denoted with and ˆ, respectively. The bases that determine recognition sitesin action are in upper case. The bases constituting the walker fragment are in italic fonts

Reactions Enzymes DNA sequencesStep 1a: Ligase � � . . . ccac can ntg-gtgc . . . ���� "�� { ��� � � . . . ggtg gtn-nac cacg . . . � �Step 1b: PflM I � � . . . CCAc canˆnTG Gtgc . . . ���� � { � "�� � � . . . GGTgˆgtn nAC Cacg . . . � �Step 2a: Ligase � � . . . gcag can-ntg gtgc . . . �  "�� � {   � � � . . . cgtc-gtn nac cacg . . . � �Step 2b: BstAP I � � . . . gcaG CAn ntgˆgTGC . . . �  � � {   � "¡� � . . . cgtC GTnˆnac cACG . . . � �Step 3a: Ligase � � . . . gcag can ntg-ctgg . . . �  � "�¢ {   � ¢ � . . . cgtc gtn-nac gacc . . . � �Step 3b: BstAP I � � . . . GCAg canˆnTG Ctgg . . . �  � ¢ {   "¡¢ � � . . . CGTcˆgtn nAC Gacc . . . � �Step 4a Ligase � � . . . ccac can-ntg ctgg . . . �� "�¢ � { � � ¢ � . . . ggtg-gtn nac gacc . . . � �Step 4b PflM I � � . . . ccaC CAn ntgˆcTGG . . . ���� ¢ { ��� "¡¢ � . . . ggtG GTnˆnac gACC . . . � �

90

Page 110: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

I and II, respectively. The reactions are shown in Table 4.1.2. In this construction, the

walker ( © ) is the ú -base DNA fragment,

� � NTG � � GTN � �

Due to the limited space between the recognition sequences of the real endonucleases

( = bases), we have the recognition sequences of PflM I and BstAP I each overlapped with

the walker sequence by two bases in implementation in Table 4.1.2. This technique of

overlapping bases to reduce encoding space is a useful technique in DNA robotics device

design. Note that, in this case, the overlapping of the two bases requires the recognition

sequences of PflM I and BstAP I have four identical base pairs (the inner four).

External energy free design. Two ATPs are consumed at each step as energy source for

the motion of the device. The energy provided by ATP is stored in the phosphodiester

bonds of DNA created by the ligation. This energy is later released into the system during

the hydrolysis of the phosphodiester bonds of DNA by endonucleases.

A variant of the ATP driven device I is an autonomous uni-directional DNA device fu-

eled by energy released from hydrolysis of phosphodiester bonds of the DNA molecules.

The design is in the same spirit of an energy free computational device constructed by

Shapiro’s group [12]. It is based on the assumption that, at a sufficiently low temperature,

the presence of nicks in the neighborhood of the recognition sequence of an endonuclease

does not inhibit the cleavage of DNA by that endonuclease. The above property has been

observed in the IIS family enzyme Fok I at a low temperature [12]. The immediate conse-

quence is that a hybridization complex can be cut without being ligated. Also observe that

the energy released by hydrolysis of the phosphodiester bonds of a DNA strand is sufficient

to power the restriction by an endonuclease.

Based on the above assumption and observation, we describe below an autonomous

91

Page 111: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

unidirectional DNA device that requires no external energy supply. However, in stead of

moving a physical entity (a DNA fragment) down a track, the external-energy-free device

induces cleavages of its anchorages sequentially and thus moves a signal or a symbol down

the track (the cleaved and hence shortened status of the anchorages).

The structure of this device is very similar to that of the device I. The track contains

a linear array of periodic anchorages. On top of each anchorage, except for the first one,

resides a star fragment, which will be cleaved and diffuse away. At each step, one star frag-

ment free anchorage hybridizes with its immediate neighbour down the track that hosts a

star fragment, creating an endonuclease recognition site. The star fragment is cut away by

the ensuing restriction of the hybridization product from the latter anchorage, and subse-

quently diffuses away. Note that no ligation occurs and hence no energy is consumed. Now

the latter star fragment free anchorage has a sticky end complementary to the sticky end of

its downstream star fragment carrying neighbour and the hybridization between these two

creates a new restriction site. Hence the reaction goes on.

Denote the star fragment as © , then each anchorage � , where � W]F , < , ² , or � ,

can exist in two states, the blank state, denoted as � , and the star state, denoted as �·¯ .Initially, the configuration of the array of anchorages is

F < ¯ ² ¯ � ¯ ,-F ¯ < ¯ ² ¯ � ¯ 5 RWe will implement the following reactions,

F ( < ¯ y F ¯ <my F ( ©­( <

<�(%² ¯ y < ¯ ²�y <�( © (%²² (�� ¯ y ² ¯ � y ² ( © (��� ( F ¯ y � ¯ F�y �d( © ( F

92

Page 112: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

The first part of each reaction is hybridization and the second part is restriction. Note that

F ¯ < , < ¯ ² , ² ¯ � and � ¯ F represent the hybridization products between the two corre-

sponding anchorages, not ligation products as in the ATP-driven device I. Applying the

above reactions to the track of anchorages, we get

F < ¯ ² ¯ � ¯ F ¯ y FÚ<R² ¯ � ¯ F ¯ y FÚ<R²�� ¯ F ¯y FÚ<R²��ÍF ¯ y FÚ<R²��ÍF

This finishes a full induction cycle, and the reaction can go on down the track infinitely.

We give a molecular implementation of the device with real enzymes in Table 4.1.2.

Table 4.2: Implementation of the external energy free variant of device I with endonucleases PflMI and BstAP I. Restriction sites and nicks are denoted with ˆand £ , respectively. The bases thatdetermine recognition sites in action are in upper case. The bases of the star fragment that diffusesaway subsequent to the restriction are in italic fonts

Reactions Enzymes DNA Sequences� "�� � { ��� � N.A. � � . . . gcag can ¤ ntggtgc . . . � � . . . cgtc ¤ GTn naccacg . . . � ���� � { � " ¯ "�� BstAP I � � . . . gcaGCAn ¤ ntg ˆgTGC . . . � � . . . cgtC ¤ GTn ˆnaccACG . . . � �  � "¥� {   � � N.A. � � . . . ccagcan ntg ¤ gtgc . . . � � . . . ggtcgtn ¤ nac cacg . . . � �  � � {   " ¯ "�� PflM I � � . . . CCAgcanˆ nTG ¤ Gtgc . . . � � . . . GGTcˆ gtn ¤ nACCacg . . . � �  "¡¢ � {   � ¢ N.A. � � . . . ccag can ¤ ntgctgc . . . � � . . . ggtc ¤ gtn nacgacg . . . � �  � ¢ {   " ¯ "¡¢ BstAP I � � . . . ccaGCAn ¤ ntg ˆcTGC . . . � � . . . ggtC ¤ GTn ˆnacgACG . . . � �� � "�¢ { � � ¢ N.A. � � . . . gcagcan ntg ¤ ctgc . . . � � . . . cgtcgtn ¤ nac gacg . . . � �� � ¢ { � " ¯ "�¢ PflM I � � . . . GCAgcanˆ nTG ¤ Ctgc . . . � � . . . CGTcˆ gtn ¤ nACGacg . . . � �Binomial distribution of the walker. The positions of the walker along an infinite

track is of binomial distribution at any time point. Let Q denote the probability of the

walker moving forward from an anchorage to the next anchorage in � time unit, and hence

�J±·Q is the probability that it stays at this anchorage in � time unit. Note that once the

walker moves forward, it cannot come back. At time Ñ , the probability that the walker is at

Ý -th anchorage is � for ݧ¦ ÑZ(�� ; for Ý�°ÖÑ�(�� , the probability is

, � � � 5�Q � � ,�� ±«QÕ5 � � ¢ � � ¦93

Page 113: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

If there are a finite number ' of anchorages, then when Ѩ¦�' , the probability that device

is at ' -th anchorage is

l � " � 2 R , � � � 5�Q � � ,�� ±yQÕ5 � �¢ � � ¦

As Ñ�y © , this probability approaches 1. In other words, the walker is destined to reach

the end of the track if allowed sufficiently long time.

4.1.3 Device II

Design overview. Device II consists of two parts: the track and the walker. The walker is

the moving part of the device while the track is the immobile part along which the walker

moves. Figure 4.6 (a) gives a schematic drawing of the structure of device II. The track

contains a linear array of anchorages, F and < . Each anchorage is a duplex DNA fragment

with a sticky end on the top, and rigidly attached to the backbone of the track. The walker

stands on top of the track. The walker consists of two parts, the body and the feet (a

front foot ² and a hind foot � ). The body is a duplex DNA segment and each foot is

a DNA dangler tethered to the body via a flexible single strand DNA joint. The flexible

joint allows a foot of the walker to rove to and only to the two anchorages immediately

neighbouring the current anchorage which it has been standing on. The sticky end of a foot

is complementary to the sticky end of the anchorage which it is standing on and hence the

foot can hybridize with and be ligated with the anchorage. The ligation product between

a foot and an anchorage will be cut by an endonuclease such that both the foot and the

anchorage change their sticky ends. As a result, the foot will possess a sticky end that

is complementary to the sticky end of the anchorage immediately ahead of the anchorage

� which the foot has been standing on, but not complementary to the sticky end of the

anchorage immediately behind � . Consequently, the foot can only hybridize with and be

ligated with the anchorage immediately ahead of � , but not to the one immediately behind

it. This guarantees the forward motion of the walker. The motion of the walker is described

94

Page 114: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

in more detail below (Figure 4.6 (b)).

A foot or an anchorage � can exist in two forms, � and � ¯ , where � WÎF , < , ²and � . � ¯ is derived from � by altering its sticky end. � and �ä¯ are required to satisfy

certain properties that will be described later. At any moment during the motion, the track

in front of the front foot ² and behind the hind foot � consists of alternating danglers Fand < ¯ while the track between them consists of alternating F ¯ and < . Assume w.l.o.g.

that at the start of the motion, both feet ² and � are ligated to anchorages of type F ,

forming F ¯ ² and F ¯ � respectively. Thus the initial configuration of the walker and track

complex can be written as,

,-FÚ< ¯ 5 ¬ ³þF ¯ �zµ©<�,-F ¯ <G5 ª ³þF ¯ ² µ©< ¯ ,�F < ¯ 5@ where ³OF ¯ ²Wµ (resp. ³OF ¯ �zµ ) is the complex between anchorage F ¯ and the front foot ²(resp. hind foot � ). To make the walker move unidirectionally down the track, we imple-

ment the following reactions between a foot and an anchorage,

F ( ² ¯ y F ¯ ²>y F ¯ (%²< ¯ ( ²>y < ¯ ²>y <�(%² ¯F ¯ (�� y F ¯ � y F%(�� ¯<�(�� ¯ y <ª� ¯ y < ¯ («�

In phase & of each reaction, a foot is ligated to an anchorage; in phase � , the foot and the

anchorage are cut separate by a restriction enzyme, each now possessing a new sticky end.

Applying the reactions to the walker-track complex, we have the following motion of the

walker along the track,

,-FÚ< ¯ 5 ¬ ³þF ¯ �zµ©<�,-F ¯ <G5 ª ³þF ¯ ² µ©< ¯ ,�F < ¯ 5@ y ,-FÚ< ¯ 5 ¬ F¬³þ< ¯ �zµ ,-F ¯ <G5 ª F ¯ ³ < ¯ ²Wµ ,-FÚ< ¯ 5B

95

Page 115: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Rove forwardRove forward

Ligation

Walker

Track

Hind foot

Joint

Moving direction

AnchorageFront foot

Ligation

Step 1a

Step 1b

Rove forward Rove forward

LigationLigation

Step 1c

Step 2a

Step 2c

Step 2b

¬®­�¯ ¬°­¯ ¯ ¯±­²¬¯±­ ³ �cut

¬ ­µ´ ¬ ­µ¶³ �

cut

¶·­¬®­�¯ ¬°­¯¬®­ ¯ ¬ ¯±­²¬¯±­

´

´ ­ ¶¬®­¸¯ ¬®­¯¬ ¯ ¬®­¹¯±­º¬¯±­

´ ­ ¶¬ ­ ¯ ¬ ­¯¬ ¯ ¬ ­ ¯ ­ ¬¯ ­

(a)

¬ ­ ¯ ¬ ­¯ ­¬ ¯ ¬ ­ ¯ ¬¯ ­

¬ ­ ¯ ¬ ­¯ ­¬ ¯ ¬ ­ ¯ ¬¯ ­´ ¶ ­

´ ¶ ­(b)

¬°­¸¯ ¬°­¬ ¯ ¬°­ ¬¯±­ ¯´ ­ ¶

¯ ­

¬°­¸¯ ¬°­¬ ¯ ¬°­ ¬¯±­¯±­ ´ ¯±­ ¶

³ �cut

³¼»cut

Figure 4.6: The structural design and step by step operation of device II. (a) Structural design ofthe device. (b) Step by step operation of the device

96

Page 116: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

y ,-FÚ< ¯ 5 ¬#" ��³þF ¯ �zµ1<I,-F ¯ <G5 ª ³OF ¯ ² µ©< ¯ ,-F < ¯ 5B � �The above is a full induction cycle of the motion of the walker, and hence the walker can

(in principle) move forward along the track infinitely. We further require that phase & of

each reaction is not reversible, thus the whole reaction is irreversible. Consequently, the

walker can move along the track in only one direction.

There is nice dual property between front foot ² and hind foot � . In the process of the

motion, front foot ² changes the configuration of the track from ,-FÚ< ¯ 5 to F ¯ < ; hind foot

� moves on the modified track and restores it to its original configuration F < ¯ .

Implementation with conceptual endonucleases. To implement the designed reactions,

we use four conceptual enzymes b� , b:8 , b�A and b�٠. The cutting patterns of these en-

zymes are similar to the one depicted in Figure 4.2 (a). Here we require that ����±��S�:W� ÿ�± �ÆÿÚW>�·!H± �g!�Wm� ± � , where � ¬ and � ¬ are the length parameters for endonuclease

b´� . Figure 4.7 describes the detailed step by step reactions that dictate the motion of the

walker. Since only the region near the end of an anchorage or a foot is relevant for the

reactions, we only depict the end regions in Figure 4.7.

Figure 4.7 (a) depicts reaction F ( ²V¯�y F°¯n²>y F�¯�(%² . In this reaction, the sticky

end EÔ of anchorage F is first ligated to the sticky end Ô (complementary to EÔ ) of foot ² ¯ ,generating ligation product F ¯ ² . This corresponds to the reaction of the front foot in Step

1a in Figure 4.6 (b): F (�²V¯Hy F°¯n² . F�¯n² contains a recognition site for endonuclease b¯�and is cut by b¯� into F ¯ and ² (Step 1b in Figure 4.6 (b): F ¯ ²>y F ¯ («² ). Note that now

front foot ² possesses a new sticky end EÔ . Recall that the anchorage immediately ahead

of the anchorage F ¯ , which front foot ² is standing on, is anchorage < ¯ . < ¯ possesses a

sticky end Ô (complementary to EÔ ). Thus ² can rove forward and hybridize with < ¯ (Step

1c in Figure 4.6 (b)). This brings us to the reaction depicted in Figure 4.7 (b): <R¯�( ²}y< ¯ ²>y <^( ² ¯ . First, the hybridization product between < ¯ and ² is ligated to form < ¯ ²

97

Page 117: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 4.7: Implementation of device II using four conceptual restriction enzymes. Endonucleaserecognition sites and restriction sites are indicated with red (dark) boxes and pairs of red (dark)arrows, respectively

98

Page 118: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

(Step 2a in Figure 4.6 (b): <R¯�( ²�y <T¯n² ). This ligation product is subsequently cut into

< and ² ¯ by endonuclease b:8 (Step 2b in Figure 4.6 (b): < ¯ ²�y <ú( ² ¯ ). Now front

foot ²_¯ possesses sticky Ô , and hence it will rove forward and hybridize with anchorage

F down the track (Step 2c in Figure 4.6 (b)). This completes a full induction cycle for the

front foot.

Note that the reactions F (r² ¯ y F ¯ ² is irreversible: there is no restriction enzyme

that can cut F ¯ ² back into F and ² ¯ . This effectively establishes the irreversibility of the

motion of foot ² . However, we note that after F ¯ ² is cut into F ¯ and ² , the two can be

religated into F ¯ ² (which is subsequently cut back into F ¯ and ² ). This represents an

idling step in the motion of the walker. Similar analysis applies to the reaction < ¯ ( ²>y< ¯ ²>y <�(%² ¯ .

The motion of hind foot � is similar to motion of front foot ² and we omit its detailed

description for brevity.

Using an overlay technique, we can reduce the number of restriction enzymes to two

(Figure 4.8). The basic idea is to use b¯� and b:8 (in a “complementary reverse” fashion) in

place of b�Ù and b:A , respectively. However, in this construction, we need to put a further

restriction that � �W E � � � and 8��W E8 � � , where E � � � (resp. E8 � � ) is the reverse of E � � (resp.

E8 � ). In other words, neither of endonucleases b¯� and b�8 can have palindromic recognition

site. Otherwise, there would be additional idling processes: < ¯ ² can also be cut by bÌ�into < ¯ ('² ; similarly, < ¯ � can be cut by b�8 into < (½� ¯ . However, these reactions

would only count as idling reactions: the unidirectional motion of the walker can neither

be reversed nor blocked. Note that the non-palindromic assumption generally does not

hold for real endonucleases.

Molecular implementation using real enzymes. We give two implementations with real

enzymes. The first one is a direct mapping of the implementation using the conceptual

99

Page 119: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 4.8: Construction of device II using two conceptual restriction enzymes

100

Page 120: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 4.9: Real enzymes used in the construction of device II. Endonuclease recognition sites andrestriction sites are indicated with red (dark) boxes and pairs of red (dark) arrows, respectively. Nindicates the position of a base whose value does not affect recognition by an endonuclease

Table 4.3: Implementation of device II with endonucleases Ahd I, Fnu4H I, ScrF I and Xcm I.Ligation sites and restriction sites are denoted with and ˆ, respectively. The bases that determinerecognition sites in action are in upper case

Reactions Enzymes DNA Sequences� "   � { � �   Ligase � � ...gaccc-ngcgtc... � � ...ctgggn-cgcag... � �� �   { � � "   Ahd I � � ...GACcc nˆgcGTC... � � ...CTGggˆn cgCAG... � �� � "   { � �   Ligase � � ...ccanngcn-gcgtc... � � ...ggtnncg-ncgcag... � �� �   { �°"   � Fnu4H I � � ...ccannGCˆn GCgtc... � � ...ggtnnCG nˆCGcag... � ���� "�¢ { ��� ¢ Ligase � � ...gacccn-ggnntgg... � � ...ctggg-nccnnacc... � �� � ¢ { � "¡¢ � ScrF I � � ...gacCCˆn GGnntgg... � � ...ctgGG nˆCCnnacc... � ���"¡¢ � { � � ¢ Ligase � � ...ccanngc-nggnntgg... � � ...ggtnncgn-ccnnacc... � �� � ¢ { � � "¡¢ Xcm I � � ...CCAnngc nˆggnnTGG... � � ...GGTnncgˆn ccnnACC... � �

101

Page 121: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Table 4.4: Implementation of device II with endonucleases Aci I, Hha I and Drd I. Ligation sitesand restriction sites are denoted with and ˆ, respectively. The bases that determine recognitionsites in action are in upper case

Reactions Enzymes DNA Sequences� "   � { � �   Ligase � � ...gacnccg-c... � � ...ctgng-gcg... � ����   { ��� "   Aci I � � ...gacnCˆCG C... � � ...ctgnG GCˆG... � �� � "   { � �   Ligase � � ...c-cgc... � � ...cgc-g... � �� �   { �å"   � Hha I � � ...G CGˆC... � � ...CˆGC G... � �� � "¡¢ { � � ¢ Ligase � � ...gacnc-cggngtc... � � ...ctgnggc-cncag... � �� � ¢ { � "�¢ � Drd I � � ...GACnc cgˆgnGTC... � � ...CTGngˆgc cnCAG... � ���"¡¢ � { � � ¢ Ligase � � ...gcg-gngtc... � � ...c-gccncag... � �� � ¢ { � � "¡¢ Aci I � � ...GˆCG Gngtc... � � ...C GCˆCncag... � �enzymes in Figure 4.7. The real enzymes used are shown in Figure 4.9 (a). Here, real en-

donucleases F/���¿¾ , D´'?Ô�Ù¿ÀÁ¾ , %ì�cÒSD¾ and � �h;§¾ correspond to conceptual endonucleases

b� , b�8 , b:A and b�٠, respectively. The reactions are shown in Table 1 in a compact style.

The second implementation reduces the number of endonucleases to three by using a

non-palindromic endonuclease (Aci I) and its slightly more involved construction is shown

in Table 2. The real enzymes used are shown in Figure 4.9 (b). Note that Aci I shown in

Figure 4.9 (b) is the same as the Aci I shown in Figure 4.2 (d): the latter figure is obtained

by rotating the former one 180 degrees.

Processivity of device II. A key technical issue in the construction of device I is to assure

that the walker is constrained to stay on or near the track. An isolated foot ² or � would

easily fall off the track and diffuse away. However, we can reduce the falling-off probability

by constructing a multi-footed walker. In stead of possessing only two feet as in Figure 4.6,

the walker has an array of alternate ² and � feet. The feet are attached to a common

backbone: if the backbone does not move then the feet have freedom to move up and down

the track by one unit only. The walker is held to the track by multiple bonds - even if none

102

Page 122: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

are ligated (so all bonds are weak 1- or 2-base hydrogen bonds) then the probability of

detachment is small. This is precisely what is needed - feet are held in the right place with

the right amount of freedom to move - it introduces the constraint that no foot can move

more than two anchorages forward until all feet have moved at least one anchorage.

4.1.4 Device III

Overview. A potential problem of device II is that it may fall off the track. Though a

walker with more feet risks lower probability of falling off as argued above, we can not

completely eliminate such risk. In contrast, the device we describe next is guaranteed to

stay on the track, though it has a more complicated (hence less practical) construction and

assumes a restriction enzyme property that has not yet been fully-substantiated. In device

III, a two-footed walker steps over the anchorages along a track unidirectionally. The

design of device III is based on the following principle: the lifting of one foot off the track

is conditional on the attachment (ligation) of the other foot to the track. This attachment

principle can ensure that at any moment, at least one foot of the walker is attached to the

track. We describe the structure and and step by step operation of device III below.

The track and the walker are depicted in Figure 4.10 (a). As in device II, the track

contains a linear array of anchorages. But the anchorages in device III are different. As

depicted, each anchorage is a duplex DNA fragment with single strand DNA overhangs

at both ends and its midpoint is tethered to the backbone of the track via single strand

DNA. Thus the anchorage is like a two-ended dangler. In addition, between every two

neighboring anchorages is tethered another dangler, referred to as a switch. As we shall

see below, the alternating arrangement of anchorages and switches are used to construct

a signaling mechanism which ensures the unidirectional and non-falling-off-track motion

of the walker. The anchorages and switches are denoted as Ð ¬ and % ¬ respectively, where

�4W � `h8ö`CA"`������c`C' . A switch % ¬ can only be ligated to its immediate anchorage neighbours

103

Page 123: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 4.10: The structural design and step by step operation of device III. (a) Structural design ofthe device. (b) Step by step operation of the device

104

Page 124: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Ð ¬ � � and Ð ¬ . The upper ends of Ð are of type ²T¯ , and the lower end of Ð ¬ is of type F�¯and < ¯ for odd and even � -s, respectively. Note that since an anchorage is tethered to the

backbone of the track via single strand DNA, the upper and lower ends of an anchorage

can not be held constantly in upper and lower positions – we just denote the ² ¯ type end

as upper end the F ¯ / <R¯ type end as lower end for ease of exposition. In fact, we shall see

that we do not need to fix the relative upper and lower positions of the ends for the valid

operation of device III.

The walker consists of two danglers connected with a single strand DNA . The two

danglers serve as the feet of the walker and are denoted as D�� and D ! . The ends of both

Dì� and Dx! are of type ² . The walker stands on top of the upper ends of the anchorages

and walks down the track unidirectionally, with the switch/anchorage complex of the road

serving both as attaching points and as a signal transducing device to dictate the lifting

and attaching of its feet in an alternating fashion such that it never falls off the track. In

particular, at any point, if one foot is attached to anchorage Ð ¬ , the other foot can only be

attached to Ð ¬ ’s immediate neighbours, Ð ¬ � � and Ð ¬#" � .The ends of the feet of the walker, of the anchorages and of the switches have the

following properties:

1. The complementary end pairs are: ,�F:`CFW¯h5 , ,-F�`�<R¯�5 , ,-<R¯Æ`CF°¯h5 , ,_<¯`�<R¯�5 , and ,k²J`n²V¯�5 .Two danglers with these complementary ends can be ligated.

2. The formation of ²V² ¯ ligation product at the upper end of the anchorage introduces

a recognition site on the anchorage for endonuclease b:A . Endonuclease b:A has similar

cleavage pattern as the one depicted in Figure 4.2 (b). And this results in a cleavage at the

other end of the anchorage such that the anchorage is cut from the switch currently ligated

to it (if there is one). Similarly, the formation of F ¯ F (resp. < ¯ < ) at the lower end of

the anchorage will produce a recognition site on the anchorage for endonuclease b¯� (resp.

b:8 ) and this will result in the cleavage of ²V² ¯ at the upper end of the anchorage if there

105

Page 125: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

is a foot end ² ligated to ²T¯ .We will next see how these properties guarantee the desired motion of the walker as we

go through a step by step description of the walker’s motion.

Step by step motion. Now we describe the four steps of the walker’s motion that completes

a full inductional cycle. Initially, the walker and track complex is assembled in such a way

that the feet D4� and Dx! of the walker are ligated with anchorages Ðï� and Ð?! , respectively;

each switch % ¬ is ligated to the lower end of Ð ¬ , forming <�F ¯ for odd � and F < ¯ for even

� . Note that <�F ¯ and F < ¯ are different.

Step 0. Upon introduction of enzymes into the system, switches %ì� and %�! are cut from

anchorages Ðx� and Ð?! respectively, since the ²V² ¯ sequences at the upper ends of Ð � and Ð?!constitute endonuclease b:A recognition sites and result in cleavages at the lower ends of

Ðû� and Ð�! . Now %�! (with end F ) can explore its neighbouring space and be ligated to either

Ðû� (with end F ¯ ) or Ð�! (with end <R¯ ), since ,-F:`eF�¯�5 and ,-F�`�<R¯�5 both are compatible end

pairs. Ligation between %ð! and Ð?! is a just an idling step, since the ligation product will be

subsequently cut open again, while ligation of %ð! and Ðû� brings the system to Step 1.

Step 1. The ligation of %6! (with end F ) and Ðx� (with end F ¯ ) introduces a recognition

site for b� , and results in the restriction of D�� from the upper end of � . Note that the

ligation product between the lower end of Ðï� and %�! contains recognition site ( FÚF ¯ ) for

endonuclease b¯� while the ligation product between foot DH� and the upper end of Ð � con-

tains recognition site ( ²V² ¯ ) for endonuclease b:A . As such, both bÌ� and b�A will compete

to perform restriction on the common ligation product. (See Figure 4.11 (a) for detail.) It

is possible that endonuclease b:A cuts switch %û! away from anchorage Ðx� , resulting in an

idling step. However, there must also be non-zero probability that endonuclease b� cuts

foot Dì� away from anchorage Ð � , advancing the system to Step 2.

Step 2. Now foot D4� has free end ² and can swing around the ligation product between

106

Page 126: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

foot D ! and anchorage Ð6! and get ligated with the upper end ²T¯ of anchorage Ð . Note

that now foot Da� is in front of foot D ! . The ligation of ²V² ¯ subsequently results in the

restriction of % from Ð .Step 3. Switch % has free end < and is ligated to the < ¯ end of anchorage Ð6! , and the

newly formed recognition site <:<H¯ leads to the action of endonuclease b�8 and results in

the cleavage between foot Dï! and anchorage Ð6! .Step 4. Foot D ! swings in front of foot Da� and is ligated with anchorage Ð�ÿ , resulting

in the cleavage of switch %�ÿ from the lower end of anchorage Ð�ÿ .Upon completion of Step 4, the walker has moved from anchorages Ð � and Ð�! to anchor-

ages Ð and ÐÕÿ . This finishes a full inductional cycle, and hence the walker can continue

moving down the track.

Correctness. To show the correctness of the design, we prove the following three proper-

ties of the walker: 1) the motion of the walker is unidirectional; 2) the walker never falls

off the track; 3) the motion of the walker is never blocked. We first give high level intuition

here, and then present a rigorous proof.

To see the unidirectionality of the motion, first note that once a foot of the walker,

say, Da� , is attached to an anchorage Ð ¬ , it can not be cut from anchorage Ð ¬ unless the

other foot D ! is attached to anchorage Ð ¬#" � further down the track. But once that has

happened, the first foot is constrained to only explore the space where anchorages Ð ¬ and

Ð ¬#" ! lie. In particular, it can not reach anchorage Ð ¬ � � , which could have resulted in one

step backwards.

The reason why the walker always stays on the track is because the detachment of

one foot from an anchorage is conditional on the attachment of the other foot to another

anchorage. Thus at any time point, at least one foot is attached to an anchorage.

To prove that the motion is never blocked, first note that there are always moments

107

Page 127: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

when both of the feet of the walker are attached to neighboring anchorages. This is because

we have shown that the walker never falls off the track and hence the attachment of one

foot will result in the attachment of the other foot to a neighboring anchorage since all the

upper ends of the anchorages are of the same end type ( ² ¯ ) which is compatible to the end

type ( ² ) of either feet of the walker. However, the attachment of both feet to the track will

necessarily result in the ligation between the lower end of the anchorage, which the current

hind foot is attached to, and the end of the immediate downstream switch. This event in

turn results in the cleavage of the current hind foot from the anchorage and it has non-zero

probability to explore the downstream neighbor of the anchorage that the current front foot

stands on, and hence the motion moves on.

We next prove the above intuition in a more formal fashion.

Let à denote the walker. Recall that D4� and Dx! denote the two feet of à ; % ¬ and Ð ¬denote the switches and anchorages respectively, where �GW �3`h8ö`�A"`������c`C' . For the ease

of exposition, we introduce some more definitions and notations. If an end of a foot,

anchorage or switch is not ligated to some other end, then it is referred to as a free end.

Denote a ligation between � and Ä as 7 , and a restriction that cuts a ligation product

�ÅÄ into � and Ä as � ÆÇÄ , where � / Ä can be one of D ¬ , % ª , and Ð ª , ��W � `h8 and

¹GW>� `h8ö`CA"`������c`C' . By D>7�Ð ª , we mean either D4�ì7�Ð ª or D !H7�Ð ª .

Lemma 4.1.1 After the occurrence of D 7 Ð ¬ , ligation D 7 Ð ª cannot happen, where

AG°Û�a°�' and ¹Í°Û�ð± 8 .

Proof: Prove by induction. We first show that the claim holds for � W�' .

Suppose we have D4�H7>Ð R . Note that cleavage D4�ÈÆ�Ð R cannot happen since Ð R is the

last anchorage and only a ligation between % ¬ " � and Ð ¬ can result in a cleavage on the Ð ¬end. Due to the space constraint (only danglers in proximity of each other can interact),

ligation D�7�Ð ª cannot happen for ¹Í°Û�ð± 8 .

108

Page 128: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Next we prove that the claim holds for �)® ' . Suppose for �ɦ Ý , the claim in

Lemma 4.1.1 holds, we show that it also holds for �HW|Ý . Suppose w.l.o.g. that DH��79Ъ .Prove by contradiction. Suppose that ligation Dz7XÐZ � ! happens subsequent to ligation

Dì�K7 Ш . Then Da�ÊÆ Ð¨ must have occurred. Otherwise, D�� cannot be ligated with Ъ � !since Da� is not a free end; due to the space constraint, ligation D !´7 Ш � ! cannot happen

either. Thus, cleavage D4�ËÆ�Ш must have occurred. But this means that ligation ÐZ �7m%ª " �must have occurred. This further implies that cleavage %± " �¥Æ9Ш " � must have occurred.

This is only possible if ligation D !H7�Ш " � have occurred. But we know from the induction

hypothesis that ligation Dz7X� � ! cannot occur after ligation Dz7X� " � . We have thus

reached a contradiction. Schematically, we have shown the following causal relations,ÌÎÍ Ü� � !ÐÏ Ì �ÒÑ Ü� �Ï Ü� Í $� " �ÓÏ$� " � Ñ Ü� " �ÒÏ Ì ! Í Ü� " �ÔÏÖÕ ÌÎÍ Ü� � !

Ê

Lemma 4.1.2 At any time point during walker à ’s motion, there is always a ligation

D�7�Ð ¬ for some Ð ¬ .

Proof: Prove by contradiction. At the start of the reaction, the claim is obviously true.

Now assume at time Ñ , the first violation of the claim occurs. Suppose w.l.o.g. that the

violation happens as the cleavage D4�ÅÆ Ð ¬ occurs. At time Ñ , Dï! must be a free end;

there must be a ligation % ¬#" � 7�Ð ¬ . By the same token of arguement as in Lemma 4.1.1,

% ¬#" �×Æ Ð ¬#" � must have occurred; D÷7 Ð ¬#" � must have occurred. But since at time Ñ , D !is a free end, DØÆ}Ð ¬#" � must have occurred. Hence % ¬ " !Ú7}Ð ¬#" � , % ¬#" !×Æ>Ð ¬#" ! , D 7}Ð ¬#" !must have occurred. Thus we must have that D}7�Ð ¬ occurs after D>7�Ð ¬#" ! , contradicting

Lemma 4.1.1. Ê

Lemma 4.1.3 In the case D 7ÎÐ ¬ and D 7ÎÐ ¬#" � , where ��°ù' ±�8 , a cleavage on the

ligation D�7�Ð ¬ is guaranteed to occur.

109

Page 129: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Proof: Suppose w.l.o.g. that D4�:7XÐ ¬ and D !Ì7ùÐ ¬#" � . Since we have Dï!Ì7ùÐ ¬#" � , there

must be a cleavage on the ligation % ¬#" ��7mÐ ¬#" � . Now we only need to show that % ¬#" � can

form a ligation with Ð ¬ , which will result in a cleavage on the ligation D>7�Ð ¬ . In turn, we

only need to show that Ð ¬ can be a free end at this point. But this is obviously true because

ligation Ð ¬ 7�Da� introduces a cleavage site at Ð ¬ . Ê

Lemma 4.1.4 Walker à can move down the track without occlusion.

Proof: Study the time point when D��J7\Ð ¬ and D !G7\Ð ¬#" � . According to Lemma 4.1.3,

walker à can always lift its current hind foot D�� at this point. We only need to show that

it can attach Da� to Ð ¬#" ! , but this is trivially true since Ð ¬#" ! is a free end compatible with D4� .Ê

Lemmas 4.1.1, 4.1.2, 4.1.3 and 4.1.4 lead immediately to the following theorem,

Theorem 4.1.5 Walker à is guaranteed to move unidirectionally towards and reach РR .

Implementation with conceptual enzymes. The above reactions can be implemented

with three conceptual enzymes b¯� , b�8 and b�A that have similar cutting patterns as the

one shown in Figure 4.2 (a). We require that �£�¯W �g!�W � and �S��W �·!�W � , where

� ¬ and � ¬ are the length parameters for b ¬ for �´W � , 8 and A . Figure 4.11 describes the

implementation of device III with these conceptual restriction enzymes. In Figure 4.11 (a),

two anti-parallel flows of reactions are depicted. Starting from the top, end F (of a switch)

has sticky end sequence complementary to end F ¯ (lower end of an anchorage) and hence

the two are ligated together. This creates a recognition site for endonuclease b¯� , and

results in the restriction of end ² (of a foot) from end ² ¯ (upper end of an anchorage).

This downward flow of reactions can be fully reversed into the anti-parallel upward flow

starting from the bottom with ² ¯ and ² and ends at the top with F and F ¯ . We note that

110

Page 130: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 4.11: Actions of conceptual enzymes used in the construction of device II. (a) Sequences � , , ��� , É � , É and É ��� (sequences of äHä ¯ ) together constitute the recognition site (red (dark) box) forconceptual endonuclease è � , whose restriction site is indicated with a pair of red (dark) arrows.Sequences É� � � , ɧ � , É� � � � � , § � and � � (sequences of æ ¯ æ ) together constitute the recognition site(blue (grey) box) for conceptual endonuclease è � , whose restriction site is indicated with a pairof blue (grey) arrows. (b) Two anti-parallel flows of reactions by è and è � . (c) and (d) Neitherligation of ä å ¯ or å ä ¯ results in restriction of æ�æ ¯

111

Page 131: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Table 4.5: Implementation of device III with endonucleases Bpm I, Bsg I and BpuE I. Ligation sitesand restriction sites are denoted with and ˆ, respectively. The bases that determine recognitionsites in action are in upper case

Reactions Enzymes Sequences� " � �ÚÙÛÙ   �   { Ligase � � ...ctg-gag(n) ¿¹¿ ctcaag... ��Ü��� ÙÛÙ   �   � ...g-acctc(n) ¿¹¿ gagttc... ��Ü��� ÙÛÙ   �   { Bpm I � � ...CTGGAG(n) ¿¹¿ ctc aaˆg... ��Ý��� ÙÛÙ   � "   � ...GACCTC(n) ¿¹¿ gagˆtt c... ��Ý� �ÞÙÛÙ   � "   { Ligase � � ...ctggag(n) ¿¹¿ ctcaa-g... ��Ü��� ÙÛÙ   �   � ...gacctc(n) ¿¹¿ gag-ttc... ��Ü� �ÞÙÛÙ   �   { BpuE I � � ...c tgˆgag(n) ¿¹¿ CTCAAG... �� " � �ÚÙÛÙ   �   � ...gˆac ctc(n) ¿¹¿ GAGTTC... ��å"�� �ÞÙÛÙ   �   { Ligase � � ...gtg-cag(n) ¿¹¿ ctcaag... ��ß� �ÚÙÛÙ   �   � ...c-acgtc(n) ¿¹¿ gagttc... ��ß� �ÚÙÛÙ   �   { Bsg I � � ...GTGCAG(n) ¿¹¿ ctc aaˆg... ��à� �ÞÙÛÙ   � "   � ...CACGTC(n) ¿¹¿ gagˆtt c... ��à� � ÙÛÙ   � "   { Ligase � � ...gtgcag(n) ¿¹¿ ctcaa-g... ��ß� �ÚÙÛÙ   �   � ...cacgtc(n) ¿¹¿ gag-ttc... ��ß� � ÙÛÙ   �   { BpuE I � � ...g tgˆcag(n) ¿¹¿ CTCAAG... ��å"�� � ÙÛÙ   �   � ...cˆac gtc(n) ¿¹¿ GAGTTC... �� � � ÙÛÙ   � "   { Ligase � � ...ctgcag(n) ¿¹¿ ctcaa-g... �� � �ÞÙÛÙ   �   � ...gacgtc(n) ¿¹¿ gag-ttc... �� � � ÙÛÙ   �   { BpuE I � � ...c tgˆcag(n) ¿¹¿ CTCAAG... �� "¥� � ÙÛÙ   �   � ...gˆac gtc(n) ¿¹¿ GAGTTC... �� � � ÙÛÙ   � "   { Ligase � � ...gtggag(n) ¿¹¿ ctcaa-g... �� ��� ÙÛÙ   �   � ...cacctc(n) ¿¹¿ gag-ttc... �� � � ÙÛÙ   �   { BpuE I � � ...g tgˆgag(n) ¿¹¿ CTCAAG... ���" � � ÙÛÙ   �   � ...cˆac ctc(n) ¿¹¿ GAGTTC... �

due to the fully reversible nature of reactions, the reaction system has non zero probability

to explore all three states: the top one ( F , F ¯ w w�² ¯ ² ), the middle one ( FÚF ¯ w§w�² ¯ ² ), and

the bottom one ( FKF ¯ w§wX² ¯ , ² ), where w w represents the duplex portion of DNA connecting

the two ends. Similar fully reversible anti-parallel flows of reactions involving b:8 and b:Aare depicted in Figure 4.11 (b). In contrast, reactions in Figure 4.11 (c) and 4.11 (d) are

not fully reversible since neither ligation of F < ¯ or <:F ¯ can result in a recognition site

for an endonuclease, and hence ²_²R¯ can not be cleaved. This irreversibility ultimately

accounts for the unidirectionality of the motion of the walker. The downward reaction

flow in Figure 4.11 (a), the upward reaction flow in (d), the downward reaction flow in

(b) and the upward reaction flow in (c) correspond to Steps 1, 2, 3 and 4 in Figure 4.10,

respectively.

Molecular implementation with real enzymes. The above conceptual enzymes can be

112

Page 132: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 4.12: Real enzymes used in the construction of device III. Endonuclease recognition sitesand restriction sites are indicated with red (dark) boxes and pairs of red (dark) arrows, respectively.N indicates the position of a base whose value does not affect recognition by an endonuclease

mapped directly to real enzymes in Figure 4.12, where conceptual enzymes b¯� , b:8 and

b�A correspond to real enzymes Bpm I, Bsg I and BpuE I, respectively. Table 4.5 describes

the implementation with these real enzymes. Note that we have the following mapping

from sequences in Figure 4.11 to the sequences in Table 4.5: �ÚW�² , ÔIW�ÐÚZ , � � W�ZJF Z ,

8JW�Z , 8 � W�²JF Z , EA � � W'²/а² , Ed � W�FKF and EA � W�Z .

4.2 Experimental Implementation of an Autonomous DNA Walker

In the previous section, we presented the designs of three autonomous DNA walker de-

vices. In the chapter, we give the partial implementation of Device I described in the

previous chapter. In particular, we describe our work of the experimental construction of a

unidirectional DNA walker that moves autonomously along a linear DNA track [113]. The

self-assembled track contains three anchorages at which the walker, a six- nucleotide DNA

fragment, can be bound. At each step the walker is ligated to the next anchorage, then

cut from the previous one by a restriction endonuclease. Each cut destroys the previous

restriction site and each ligation creates a new site in such a way that the walker can not

run backwards. The motor is powered by the hydrolysis of adenosine triphosphate (ATP),

a kinetically inert fuel whose breakdown may be accelerated by many orders of magnitude

by protein catalysts [96]. Operation of the motor was verified by tracking the radioactively

labeled walker using gel electrophoresis.

113

Page 133: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

4.2.1 Experimental Design

The structural design of the device is shown in Figure 4.13 (a). The track of the device

consists of three evenly spaced DNA double helical “anchorages” ( F , < , and ² ), each

tethered to another DNA duplex segment which forms part of the backbone of the track

by means of a Ù -nucleotide single-stranded DNA fragment which acts as a hinge. Each

anchorage consists of ��A base pairs, with a A -nucleotide single-strand overhang (“sticky

end”). Each anchorage is positioned A helical turns ( Aö� or Ag8 base pairs) away from its

nearest neighbours. The duplex segments of the backbone of the track and of the three

anchorages are expected to behave like rigid rods since they are much shorter than the

persistence length of duplex DNA (greater than ��� turns) [47, 85]. In contrast, the ٠-

nucleotide single-strand hinge is expected to be flexible, since the persistence length of

the single DNA strand is A nucleotides [86]. A ú -nucleotide DNA “walker”, labeled *

and coloured red, moves sequentially along the track from anchorage F to < , then to ² .

The device is constructed by mixing stochiometrically purified DNA oligonucleotides in

hybridization buffer (see Methods) and slowly cooling the system from 90 á C to 37 á C.

The solution is then supplemented with T4 ligase, endonuclease PflM I, and endonuclease

BstAP I and incubated at 37 á C. Autonomous motion of the walker is initiated by the

addition of the energy source, ATP.

The recognition sites and restriction patterns of PflM I and BstAP I are shown in Fig-

ure 4.13 (b). Figure 4.13 (c) shows the sequence of structural changes that occur during the

motion of the walker; the right portion shows the base sequence at the end of each anchor-

age at each stage, and how these are transformed by enzyme actions. The motion of the

walker depends on alternate enzymatic ligation and restriction (cleavage). Before the mo-

tion starts the walker, whose position is indicated by *, resides at anchorage F , as shown in

panel � of Figure 4.13 (c). In this state anchorages F ¯ and < have complementary sticky

114

Page 134: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

ends which can hybridize with each other. T4 ligase can then heal the nicks at either end

of the newly-hybridized section, covalently joining the two anchorages ( F ¯ ()<}y F ¯ < );

this is an irreversible step that consumes energy provided by the hydrolysis of ATP. The

ligation of F ¯ < creates a recognition site for endonuclease PflM I. In process II, PflM I

cleaves F�¯C< in such a way that the walker moves to anchorage < : FW¯�< y F%(Ö<R¯ . The

sticky end of anchorage < ¯ can then hybridize with the complementary sticky end of an-

chorage ² , and the two anchorages are ligated to form < ¯ ² in process III. Ligation product

< ¯ ² contains a recognition site for the second endonuclease BstAP I. In process IV, < ¯ ²is cleaved by BstAP I to regenerate anchorage < and create ² ¯ . Thus the walker moves

from anchorage < to ² , completing the autonomous, programmed motion of the walker.

The motion of the walker is unidirectional: the product of ligation between two neigh-

bouring anchorages can only be cleaved such that the walker moves onto the downstream

anchorage ( F ¯ < and < ¯ ² can only be cut such that the walker is left attached to < and

² respectively). Two idling steps are possible: < ¯ can be religated to A, and regenerated

by restriction by PflM I; similarly ² ¯ can be relegated to < and regenerated by BstAP I.

However, these idling steps neither reverse nor block the overall unidirectional motion of

the walker. Once < ¯ has been ligated to ² the walker can never return to A.

4.2.2 Methods and Materials

Design and Assembly. DNA sequence was designed and optimized with the SEQUIN

DNA designing software [77]. DNA strands were commercially synthesized by Integrated

DNA Technology, Inc.. The strands whose 5’ ends participate in ligation reactions were

ordered with their 5’ ends phosphorylated. DNA strands were further purified with elec-

trophoresis on 10 ï -12 ï denaturing polyacrylamide gels. Bands were cut out of the gel

and eluted in a solution of 500 mM ammonium acetate, 10 mM magnesium acetate, and

1 mM EDTA. After butanol purification, DNA strands were precipitated and with 70 ï115

Page 135: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

ethanol. The precipitated strands were subsequently dried and dissoloved in double dis-

tilled water. The concentrations of DNA strands were determined by ultraviolet absorption

at 260 nm. DNA strands were mixed stoichiometrically at 0.3 â M in hybridization buffer

and incubated in a heating block from 90 á C to 37 á C over a period of 3 hours. We use

NEB 3 buffer (New England Biolabs) as the hybridization buffer, which contains 100 mM

NaCl, 50 mM Tris-HCl, 10 mM MgCl ! , and 1mM dithiothreitol (pH 7.7 at 37 á C).

Enzymes and Buffers. PflM I and BstAP I were purchased from New England Biolabs.

T4 ligase was purchased from Invitrogen Inc. The reaction solution was NEB 3 buffer

supplemented with BSA and ATP, containing 100 mM NaCl, 50 mM Tris-HCl, 10 mM

MgCl ! , 1 mM dithiothreitol (pH 7.7 at 37 á C), 100 â g/ml BSA and 1 mM ATP.

Radioactive labeling of DNA Strands. DNA strands were labeled with T4 polynucleotide

kinase purchased from Invitrogen Inc. To label a DNA strand without 5’ phosphate group,

five picomoles of DNA strand was dissolved in 25 â L Forward Reaction Buffer supple-

mented with s -P ! -ATP (Perkin Elmer) and T4 polynucleotide kinase, containing 70 mM

Tris-HCl (pH 7.6), 10 mM MgCl ! , 100 mM KCl, 1 mM 2-mercaptoethanol, 0.1 â M s -P ! -ATP (1 â Ci/ â L) and 10 units of T4 polynucleotide kinase, and incubated for 90 minutes

at 37 á C, followed by heat deactiviation at 90 á C for 10 minutes. To label a 5’ phos-

phylated DNA strand, the Exchange Reaction Buffer was used, which contains 50 mM

Imidazole-HCl (pH 6.4), 12 mM MgCl ! , 70 â M ADP, and 1 mM 2-mercaptoethanol. La-

beled oligonucleotides were subsequently purified.

Ligation and Endonuclease Restriction. 30 â l solution containing 1 picomole of assem-

bled device was supplemented with BSA and ATP such that it contains 100 â g/ml BSA

and 1 mM ATP. 1 unit of T4 Ligase, 24 units of PflM, and 5 units of BstAPI were added

116

Page 136: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

to the solution, followed by overnight incubation at 37 á C. Denaturing Polyacrylamide Gel

Electrophoresis. The mixture was heated at 90 á C for 10 minutes, and applied to denatur-

ing polyacrylamide gel. The position of the radioactively labeled strands was detected via

phosphor-imager.

4.2.3 Experimental Results

The autonomous and unidirectional motion of the walker was verified by using denaturing

polyacrylamide gel electrophoresis (PAGE) to track the motion of the walker, which was

radioactively labeled. The position reached by the walker in the presence of different

combinations of enzymes can be determined by measuring the size of the labeled DNA

fragment. Figure 4.14 (a) is a schematic drawing of the experimental design. The 5’ end

of the walker (red) was labeled with s -P ! , represented by a red dot in Fig 2a. Initially,

the labeled strand (part of A*) measures 52 nucleotides. The completion of processes I,

II, III, and IV can be detected by the appearance of radioactively labeled bands of 68, 19,

57 and 41 nucleotides respectively, corresponding to the transfer of the radioactive labeled

fragment between the anchorages along the track. The system was incubated at 37 á Cin hybridization buffer supplemented with ATP and BSA and in the presence of different

combinations of enzymes. Figure 4.14 (b) is an autoradiograph of a denaturing gel showing

the products formed during each reaction. Lane 1 contains the control reaction without

enzyme or ATP. Lane 2 contains T4 ligase and ATP: the walker is expected to complete

process I to produce a radio labeled strand of 68 nucleotides corresponding to the formation

of F ¯ < . Lane 3 contains both T4 ligase and endonuclease PflM I: the walker is expected to

run to the completion of process III. Upon completion of process II, F ¯ < is cut to produce

F and <R¯ , resulting in a radio labeled strand of 19 nucleotides. Subsequently, <z¯ can be

ligated to ² to form < ¯ ² , giving rise to a strand of 57 nucleotides. (These stages in the

motion of the walker were also observed in a time course experiment - see Figure 4.17).

117

Page 137: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Lane 4 contains all three enzymes: the walker is expected to run autonomously to the

completion of process IV in which B*C is cleaved by BstAP I to produce ² ¯ , producing

a labeled strand of 41 nucleotides. The radioactively labeled bands in the gel shown in

Figure 4.14 (b) agree with all the above expectations and hence provide evidence for the

designed autonomous, unidirectional motion of the walker.

To further test the operation of the system we forced the device to operate in a stepwise

fashion (rather than autonomously) by sequentially adding and deactivating the enzymes.

This experiment enabled us to inspect more closely the products formed at the end of each

process. The walker was radioactively labeled as described above. Figure 4.14 (c) is an

autoradiograph of a denaturing gel showing the products after each step. The system was

first supplemented with T4 ligase: the appearance of a 68-nucleotide DNA band in Lane 2

demonstrates the completion of process I and the formation of FV¯�< . The solution was left

at 37 á C for one day to deactivate T4 ligase, then PflM I was added (Lane 3). The band of

68 nucleotides, corresponding to F ¯ < , diminished while a band of 19- nucleotides, corre-

sponding to < ¯ , appeared, which confirms the completion of process II. The system was

then incubated at 37 á C for two more days to deactivate PflM I, and was again supple-

mented with T4 ligase and ATP (Lane 4). The intensity of the 19-nucleotide band, cor-

responding to <¬¯ , dramatically decreased while the intensity of the 68- nucleotide band,

corresponding to F ¯�< , increased and a 57-nucleotide band, corresponding to <z¯p² also

appeared. This is consistent with our expectation that < ¯ can be ligated to both F and C.

Note that the formation of F ¯ < is only an idling step in the motion of the walker. After the

enzyme activity of T4 ligase died out one more day later, the addition of BstAP I resulted

in the disappearance of the 57-nucleotide band and the appearance of a 41-nucleotide band

indicating the cleavage of < ¯ ² to < and ² ¯ (Lane 5). Note that the intensity of the 68-

nucleotide band was approximately unchanged, which confirms that F_¯�< is resistant to

the restriction activity of BstAP I as designed. These measurements provide further confir-

118

Page 138: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

mation that the device operates as designed. The unidirectional motion of the walker was

also tested by the two control experiments depicted in Figure 4.15. In the first experiment,

shown in Figure 4.15 (a), b, we intentionally constructed the device such that the walker

initially resides at anchorage < . Figure 4.15 (a) shows the forward and idling processes

that we expect to be allowed, and reversing processes that we expect to be forbidden. The

19-nucleotide strand of < ¯ was labeled with s -P ! at its 5’ end, indicated by the red dot.

In the presence of T4 ligase (Lane 2 of Figure 4.15 (b)) the appearance of 68- and 57-

nucleotide bands indicate the formation of F ¯ < and < ¯ ² respectively. Subsequent lanes

show the results of adding different combinations of restriction enzymes and ligase: in

these lanes the presence of a 19- nucleotide band indicates the regeneration of <z¯ , and a

16-nucleotide band (if any) corresponds to the appearance of < and thus implies the gen-

eration of F ¯ or C*. Addition of PflM I (Lane 4), which is designed to cut F ¯ < into Fand <R¯ , generates <R¯ and not B, and decreases the intensity of the FW¯�< band as expected.

Addition of BstAP I (Lane 5), which is designed to cut < ¯ ² into < and ² ¯ , generates <and not < ¯ and decreases the intensity of the < ¯ ² band as expected. Lane 3 shows the

case when all three enzymes are present.

In the second control experiment depicted in Figure 4.15 (c) and (d), the device was

constructed with the walker initially at anchorage ² . The 5’ end of the 41-nucleotide

strand of anchorage ²V¯ was labeled with s -P ! . In the presence of T4 ligase (Lane 2 of

Figure 4.15 (d) the appearance of a 57-nucleotide band indicates the formation of < ¯ ²as expected. Subsequent lanes, corresponding to different combinations of restriction en-

zymes and ligase, show that < ¯ ² can be restricted to < and ² ¯ by BstAP I as expected,

but that no combination of enzymes leads to the backwards step < ¯ ²�y < ¯ (r² (which

would have been indicated by a 19-nucleotide radio-labeled band corresponding to < ¯ ).The reactions described in this paper were carried out in solution, where the possibility

exists that the anchorages of two individual devices might interact with each other in such a

119

Page 139: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

way that the walker of one device might deviate from its designated track and move onto the

track of another device. In a control experiment described in Supplementary Information

we have shown that under conditions corresponding to the measurements described above

the linkage of two tracks is undetectable (see Figure 4.18).

4.3 Discussion

In this chapter, we describe three designs of autonomous DNA walking devices as well as

a partial experimental implementation of device I.

How practical are our theoretical designs? Though we have proved that each walker

will behave in its designated way in a theoretical setting, closing the gap between a theoret-

ical construction on the paper and a working device in the real world remains challenging.

Our partial experimental implementation of device I is an exciting first step towards bring-

ing these designs from theory to practice.

A critical challenge in further improving our experimental implementation is to in-

crease the efficiency of the experimental system. By measuring the intensities of the bands

in Figures 4.17 we have estimated the following yields for steps in the operation of the

device: F�¯Gy F�¯�< , 46 ï ; F�¯e< y <R¯p² , 51 ï ; <¬¯n²Îy ²_¯ , 97 ï . Both imprecise sto-

ichiometry and low ligation/cleavage efficiency could cause low measured yields. Low

enzymatic efficiencies might result from the steric constraints imposed by the design of the

motor; each substrate is created by hybridization of two anchorages, which are also linked

by the backbone of the track. Future work should investigate design improvements in-

cluding structural modifications such as increasing the length of the linkage between each

anchorage and the backbone.

In designing device III, one assumption we make about the enzyme is that the presence

of a single strand between the recognition site and restriction site of each endonuclease

used above will neither alter the specificity nor totally inhibit the activity of that endonucle-

120

Page 140: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

ase. Our preliminary experimental result is in agreement with this assumption. However,

more rigorous study is still required to further substantiate this assumption.

A walker moving unidirectionally along a track holds striking similarity to a finite state

automaton; a walker moves bi-directionally in a controllable way behaves much like the

head of a Turing Machine, a universal computing device based on which modern computers

are built. By encoding information into the walker and the anchorages, the walking device

can be extended into a powerful autonomous computing device (and hence an “intelligent”

robotics device). In next chapter, I will present my work on designing such “Autonomous

DNA Cellular Computing Devices”.

It is also possible to embed multiple walking devices in a microscopic self-assembled

DNA lattice [35, 49, 102, 106, 107] such that each walker moves autonomously along its

own programmed route and serves as an information and/or nano-particle carrier. Col-

lectively they would produce a complicated pattern of motion and possibly form a coor-

dinated and sophisticated signaling/transportation network. Nano-robotics systems of this

kind would open new horizons in nano-computing, nano-fabrication, nano-electronics, and

nano-diagnostics/therapeutics.

121

Page 141: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

ãåä æ ç

ã æèä ç

ã æ çéäã æèäêç

ãåäëæ çìì ìì ì ìì í

îïðñò

óôõõ ôõõõ ôö ôóöö óöö õ ôõöõ óöö óöõ ö ôõöôóöõ óöõ÷ø÷Ûù ÷ú÷ûùÛüý÷ ü þëþ ü þë÷þëþëü þ�þëü ùÿþ ù ÷û÷ ùê÷ûþõ ôõõôóöö óöö ö ôõõ ôõöõ ôóõ óöö óöõ

÷ûþëü þ þëü ÷éü ùê÷ûþþë÷Ûù ÷ú÷Ûù þ�ù ü þë÷óôõö ôõöõ ôö ôóöõ óöõ

ö ôõöôóöõ óöõö ôõöôóöõ óöõ

õ ôõõôóöö óööõ ôõõôóöö óöö õ ôõöõ óöö óöõ

ã�ä æ ç

ã æèä ç

ã æ ç�äã æèä ç

ã�äëæ ç� ò���� ��� ��� � �Úì� ò���� ��� �æ � ã � ì

� ò���� ��� ��� � �Úì� ò���� ��� �æ � ã � ì

� � � �

�����������! #" $&% #'!"

( ���)#�!*+��'!"

,-�+. ��"�*

/!0 /�11�2 0#1343

5

1+/1+6

//

1�/1�6 1�/1+6

7�7+8 9:9�9�9�9 ;�<�<<�<�; 9:9�9�9�9 8�7�7 <�7+8 9:9�9�9�9 ;�<�77�<�; 9:9�9�9:9 8#7�<

=?> . @BA �DCFE ( =GA

3!3

H�I�J

HFKLJ

H�M�JFigure 4.13: The structural design and operation of the autonomous unidirectional device. (a)Structural design. The numbers give the lengths of DNA fragments in bases. (b) Recognition sitesand restriction patterns of PflM I and BstAP I. � indicates the position of a base that does not affectrecognition. (c) Operation of the device. The left portion shows the sequence of structural changesthat occur during the device’s operation; the right portion describes the accompanying enzymeactions and shows how they affect the ends of the anchorages. In (b) and (c), green (pink) boxesindicate the recognition site of PflM I (BstAP I) and green (pink) arrows indicate its restrictionsites; bases that are important for PflM I (BstAP I) recognition are shown in bold green (pink)fonts; purple curves indicate the ligation sites

122

Page 142: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

N�O P Q

N P�O Q

N P Q+O

N P�O Q

N�O P QR

R R

R R R

R S T

UWV

XFY

UWZ

[ \

]�[

^ _ ` a b c

d e f ghR

^ _ ` a b c

iWb j kWdlRm n

o n n

pq r st

u mv wu xy o

o z

{ | } ~ � � ��{W| } ~ � � �{W� � � ��� {W| } ~ � � �{W� � � �!�{W� � � � � �| �N�O PP�O QN�OQ+O

P�O

�l� � � � � � � �

� �� �� ����

� �� �pq r st �

{W| } ~ � � � {W� � � � � �{ | } ~ � � �| � {W� � � ���N+O PP�O QN+OQ�O

P�O

�l� � � � � � � �

����� �l��� �� ��Figure 4.14: Evidence of the autonomous unidirectional motion of the walker. (a), Experimentaldesign. The six-nucleotide walker is coloured red. The red dot indicates the radioactive label;at each stage the radioactively labeled strand is illustrated as a thickened line, with its length inbases shown near its 5’ end. (b), PAGE analysis of the autonomous motion of the walker. Anautoradiograph of a 20 ¡ denaturing polyacrylamide gel identifies the position of the radioactivelylabeled walker. Lane 0: labeled 10 bp DNA ladder marker. Lane 1: device with no enzymes(control). Lanes 2-4: device with T4 ligase, ATP and different combinations of endonucleasesPflM I and BstAP I as indicated. (c), PAGE analysis of the stepwise motion of the walker. Lane 0:labeled 10 bp DNA ladder marker. Lane 1: device with no enzymes (control). Lanes 2-5 containsamples corresponding to the stepwise completions of process I, II, III, and IV in Figure 4.14 (a)respectively as described in the text. Oligonucleotide lengths (in bases) corresponding to DNAbands are indicated beside the gels

123

Page 143: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

¢ £+¤¥

¢ £¥ ¤

¢ ¥ ¤ £¦W§

¨�©

© ª

«¬�­® ¯ ° ± ²¥�± ³ ¢+´!µ

¶· ¸ · ¹ º » ¼ ½

¾ ¿ À » ¼ ½

Á๠Ä�Å ¹ ¿

Æ ÇÈWÉ

Ê ËÊ È

ÌWÍ

ÎÏ Ð ÑÒ Ó

ÔÕ Ö ×WØ Ù Ú Û Ü Ö Ý Þ ß àâáÖ ×WØ Ù Ú Û ÜÖ ãFÛ ä ålÝ�á Ö Ý Þ ß àâáÖ ×WØ Ù Ú Û Ü Ö ×WØ Ù Ú Û Ü

Ö ãFÛ ä ålÝá

¢ ¤ ¥¥ ¤ £

¥ ¤¥

à4æ ä æ çæ è ß é

Ô�Õ Ö ×ÃØ Ù Ú Û Ü Ö Ý Þ ß àâáÖ ×WØ Ù Ú Û ÜÖ ãFÛ ä ålÝ�á

êÊÆ Ç

Ö Ý Þ ß àâáÖ ×WØ Ù Ú Û Ü Ö ×ÃØ Ù Ú Û ÜÖ ãlÛ ä å�Ýá

ÌWÍ

ÎÏ Ð ÑÒ Ó

¥ ¤ ££+¤

à4æ ä æ çæ è ß é

¢ ¤ ¥ £

¢ ¥ ¤ £

¢ ¥ £+¤

¢ ¥ ¤ £¢ ¤ ¥ £ëFì ¦W§

© ª

Ê ÈÊ È

«¬�­® ¯ ° ± ²

¥�± ³ ¢+´�µ

´Fí î ïðµ ¾ ¿ À » ¼ ½ Á๠Ä�Å ¹ ¿

Á๠Ä�Å ¹ ¿¶�· ¸ · ¹ º » ¼ ½

Á๠Ä�Å ¹ ¿ ¶· ¸ · ¹ º » ¼ ½

¾ ¿ À » ¼ ½«¬�­® ¯ ° ± ²

ñ+ò�ó ñ�ôõó

ñ+ö�ó ñ+÷DóFigure 4.15: Control Experiments. (a) and (c) show the design of control experiments in whichthe device is prepared with the walker (coloured red) initially attached to anchorages å and ærespectively. Red dots indicate the ø -P ! label; the corresponding labeled strand is shown as athickened line, with its length in bases shown near its 5’ end. A red cross on a broken arrowmeans the reaction indicated by that arrow is not expected to happen. b and d are autoradiographsof denaturing 20 ¡ PAGE gels showing the results of the experiments indicated in parts (a) and(c) respectively. In both gels, Lane 0 contains a labeled 10 bp molecular ladder marker. Lane 1contains the device with no enzymes (control). Lanes 2-5: device with T4 ligase, ATP and differentcombinations of endonucleases PflM I and BstAP I as indicated. Oligonucleotide lengths (in bases)corresponding to DNA bands are indicated beside the gels

124

Page 144: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

ù�ú�ú+û�úüù�ú�ý�ý�ú?û�ý4û�ú�ú ý�ùÃû�ý4û�û+ù�ú�ý�ú�ú!ù�ý�ú�ù�ú�ù�ý�ùFûþù�ù�ý�ú�ú!ý!û�û�ú!ù�ý�ýÿû�ú�û�ú�ù�ùFû�ú�ú�ù�ú�ùÃû+ù�ý�ú�ý4û�û�ú�û�ù�ú�ý�ý�úüùÃû�ú�ù�ú ý�ù�ý�ù�ú�û�ý�ý�ú+û�ù�ú�ú�ù�ú�ý�ú�ù�ý�ù?ýû�ý�ý�ù�ýüû�ý�ú�ú!ý�ù�ú�ù�ý�ý ú+û+ù�ú!ù�ùÃû�ý�ú!ý�ý4û�ú�ý4û�ý4û�ú+û+ù#û�û�ú�ý�ý�ú�ù�ù�ý4û�ú�ú ù�ý�ù�ý4û�û+ù�ý�ý4û�ý4û�ùFû�ú�ý�ú�ù�ù�ý�ùÃû�ý�ú!ú�ýüû�ù�ý4û�ý ú+û�ú+û�ý�ù�ú�ú�ý�ù�û�ý�ý4û�ý�ú�ý4û�ú+û�ú

���� ��� ������� ��

� �� ���� ��� ������� ����� ��ùùùù

�� ���� ���� ��� �����

� ��� ���� ��� ����� ��

��� �� ùùùù

� ���� ��� �����

���� ���� ��� ����� ��

��� �� ùùùù

��� � �

� � �

ù�ú�ú�û�ú�ù�ú�ý�ý�ú?û�ý4û�ú�ú ý�ùÃû�ý4û�û+ù�ú�ý�ú!ú�ù�ý ú�ù�ú!ù�ý�ùÃû+ù�ù�ý�ú�ú�ý4û�û�ú�ù�ý�ýÿû�ú+û�ú!ù�ùFû�ú�ú�ù�ú!ùÃû+ù�ý�ú�ý4û�û�ú+û+ù�ú!ý�ý�ú�ùFû�ú�ù�ú ý�ù�ý�ù�ú+û�ý�ý�ú�û�ù�ú!ú�ù�ú!ý�ú�ù�ý�ù?ýû�ý�ý�ù�ý�û�ý�ú!ú�ý�ù�ú!ù�ý�ý ú�û�ù�ú�ù�ùFû�ý�ú�ý�ý4û�úþý4û�ý4û�ú+û+ù#û�û�ú�ý�ý�ú�ù�ù�ý4û�ú�ú ù�ý�ù�ý4û�û�ù�ý�ý4û�ý4û�ùFû�ú�ý�ú�ù�ù�ý�ùÃû�ý�ú�ú�ýüû+ù�ý4û�ý ú+û�ú+û�ý�ù�ú�ú�ý�ù#û�ý�ý4û�ý�ú�ý4û�ú�û�ú��� ���ùùùù

��ùùùù

� ���� ��� �����

���� ���� ��� ����� ��

��� �� ùùùù

� � �û�ù�ý�ú!ú�ý�ú+û�û�ú�ú�û�ú!ú+û�ù�ú�ù�ý�ý�ù�ý�ú�ý4û+ù�ý�ù�ý�ú�ý�úùFû�ú�ý�ý�ú�ý�ù�ù�ý�ý�ù�ý�ý�ùFû�ý4û�ú�ú+û�ú�ý�ú!ùÃû�ú+û�ú!ý�ú�ý

��

ù�ú�ú+û�úüù�ú�ý�ý�ú?û�ý4û�ú�ú ý�ùÃû�ý4û�û+ù�ú�ý�ú!ú�ù�ý�ú�ù�ú�ù�ý�ùÃûþù�ù�ý�ú�ú!ý4û�û�ú�ù�ý�ý û�ú+û�ú�ù�ùFû�ú�ú�ù�ú�ùÃû+ù�ý�ú!ý4û�û�ú+û�ù�ú�ý�ý�úüùÃû�ú�ù�ú ý�ù�ý�ù�ú�û�ý�ý�ú+û�ù�ú�ú�ù�ú�ý�ú�ù�ý4ùâýû�ý�ý�ù�ý�û�ý�ú�ú!ý�ù�ú�ù�ý�ý ú�û�ù�ú!ù�ùFû�ý�ú�ý�ý4û�ú�ý4û�ý4û�ú+û+ù#û�û�ú�ý�ý�ú�ù�ù�ý4û�ú�ú ù�ý�ù�ý4û�û+ù�ý�ý4û�ý4û+ùÃû�ú�ý�ú�ù?ù�ý�ùÃû�ý�ú!ú�ýüû�ù�ý4û�ý ú+û�ú�û�ý�ù�ú�ú�ý�ù�û�ý�ý4û�ý�ú�ý4û�ú+û�ú

�� ������� ����� �� ���� ��� ������� ��

�� ������� ����� �� ���� ��� ������� ����� ���ùùùù

�� ��� ���� ��� �����

� ��� ���� ��� ����� ��

��� �� ùùùù

� ���� ��� �����

���� ���� ��� ����� ��

��� �� ùùùù

� � �

ù�ú!ú+û�úüù�ú�ý�ý�ú?û�ý4û�ú�ú ý�ùFû�ý4û�û�ù�ú�ý�ú�ú�ù�ý�ú�ù�ú!ù�ý�ùÃû ù�ù�ý�ú�ú�ý4û�û�ú�ù�ý�ýÿû�ú+û�ú�ù�ùÃû�ú!ú�ù�ú�ùFû�ù�ý�ú�ý4û�û�ú+û+ù�ú�ý�ý�úüùFû�ú�ù�ú ý�ù�ý�ù�ú+û�ý�ý�ú+û+ù�ú�ú!ù�ú�ý�ú!ù�ý�ù?ýû�ý�ý�ù�ýüû�ý�ú�ú�ý�ù�ú�ù�ý�ý ú+û�ù�ú�ù�ùÃû�ý�ú�ý�ý4û�ú�ý4û�ý4û�ú�û�ù#û�û�ú!ý�ý�ú�ù�ù�ý4û�ú�ú ù�ý�ù�ý4û�û�ù�ý�ý4û�ý4û�ùÃû�ú!ý�ú�ù�ù�ý�ùFû�ý�ú�ú�ýüû+ù�ý4û�ý ú+û�ú+û�ý�ù�ú!ú�ý�ù#û�ý�ý4û�ý�ú�ý4û�ú+û�úùùùù

ùùùùùùùù���� �� ����� ��

���� �� ����� ��� ��

���� �� ����� ��

���� �� ����� ��� ��

���� ��� �� ��

��� �� ��� ��

� � �

�����

�� !�

�#"$�

Figure 4.16: DNA strand structure and sequences. (a) Base sequences of the oligonucleotides thatmake up the molecular device. (b) and (c) Base sequences of the oligonucleotides used to constructthe monomer and dimer control molecules described in the caption to Figure 4.18

125

Page 145: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

%�& ')(+*-,/.0* '21 3/1 401 561

7089�:7/;

<)=8�>

. ' 3 , 4 ( ? @

ACBEDD�BGFACB

D�B

.

Figure 4.17: Supplementary Figure 2 Time course experiment. Supplementary Figure 2 is anautoradiograph of a 20 ¡ denaturing gel showing the time course of the device’s motion underconditions corresponding to Fig. 2b lane 3. Lane 0: 10 bp ladder marker. Lane 1: device with noenzymes (control). Lanes 2-7 contain samples incubated with T4 ligase and PflM I at 37 á C for 15minutes, 30 minutes, 1 hour, 2 hours, 4 hours, and 8 hours respectively. The monotonic increasein the concentration of the product å ¯ æ , and the decrease in the concentration of the intermediateB* after the first 30 minutes, are consistent with the designed unidirectional motion of the walker

126

Page 146: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

HGI0J K2LNM OPHGI+J K2LNM OH)Q6R S TVUH)W�M X YZQ[U

\ ] ^ _

` J a�O b+ced2f2X b d2S

TgdNfEdNa[O b+chdNf2X b d2S

i

TgdNfEdNa[O bced2f2X b d2S TgdNX d2bdNfES j ` J a�O bced2f2X b d2S

kElm

k+n[m

k+o[m

Figure 4.18: Test for inter-molecular reactions. Complexes produced during the operation ofthe device were analyzed using a native gel to test for the formation of dimers caused by cross-linkage of two devices. (a) and (c) depict the molecular designs of two controls correspondingto a monomer (a single device at the end of process I or III in Figure 4.13 (c) and to a dimer(formed by intermolecular ligation of two anchorages) respectively. The control complexes do nothave exactly the same sequences or structures as the corresponding states of the device; they aredesigned to have approximately the same structures, and to migrate at approximately the same rates,while minimizing the possibility of the formation of higher multimers. For complete sequenceinformation, see Figure 4.16 (b) and (c). (b), Autoradiograph of the 8 ¡ native polyacrylamide gelused to test for inter-molecular reactions. The assembled device system was incubated at 37 á C inhybridization buffer supplemented with ATP and BSA and in the presence of various combinationsof enzymes. Lane 1: labeled monomer control. Lane 2: device with no enzymes (control). Lanes3: device with T4 ligase. Lane 4: device with T4 ligase, endonucleases PflM I and BstAP I.Lane 5: labeled dimer control. No dimer band in lanes 2-4 was detected, indicating the lackof inter-molecular interactions during the operation of the device. We note that there is a slightdisplacement between bands in lanes 1 and 2, and a matching broadening of bands in lanes 3 and4. This is consistent with the hypothesis that a device with no linkages between its anchorages(present in lane 2 and as part of the population in lanes 3 and 4) migrates slightly more slowly thana device with two anchorages ligated together (control lane 1 and part of the population in lanes 3and 4)

127

Page 147: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Chapter 5

Designs of Autonomous DNA Cellular ComputingDevices

Intelligent nanomechanical devices that operate in an autonomous fashion are of great

theoretical and practical interest. Recent research has explored DNA as a material for

self-assembly of nanoscale objects [2, 35, 49, 80, 102, 106, 107], for performing compu-

tation [3, 12, 14, 13, 46, 44, 48, 97, 100, 103], and for the construction of nanomechanical

devices [10, 21, 22, 27, 39, 50, 81, 82, 83, 84, 91, 108, 114, 115]. The exciting progress

in these three subfields of Nanoscience (DNA self-assembly, DNA computation, and DNA

robotics) has provided a solid foundation for the next step forward: designing and con-

structing autonomous DNA computing devices embedded in well defined DNA lattices

that are capable of parallel universal computation. We call them DNA cellular computing

devices. These autonomous DNA cellular computing devices are a significant step beyond

the prior computational DNA lattices, which perform a one-time computation during their

assembly. In contrast, autonomous DNA cellular computing devices, once assembled, are

capable of repeated parallel computations. These DNA cellular computing devices also

represent a new category of autonomous nanomechanical devices that are capable of arbi-

trary complex motion – they can perform such motion during the process of the computa-

tion. As such, the DNA cellular computing devices represent an exciting converging point

for nanoobject assembly, nanorobotics, and nanocomputing, and may have important ap-

plications in nanofabrication, nano-sensors, and nano-actuated electronics. In this chapter,

we present the designs of such DNA cellular computing devices, an Autonomous DNA

Turing Machine and an Autonomous DNA Celluar Automaton.

A rich family of DNA computation schemes were proposed and implemented [26, 37,

128

Page 148: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

43, 46, 48, 55, 60, 61, 59, 68, 74, 87, 97] following Adleman’s seminal report in 1994 [3].

Among them, the most relevant work that has inspired the construction here is the universal

DNA Turing machine design by Rothemund [68] and the autonomous 2-state 2-color finite

state automata constructed by Shapiro’s group [12, 13, 14]. In Rothemund’s innovative

design, the transition table of a universal Turing machine is encoded in a circular DNA

and the encoded transitions are carried out by enzymic cleavages and ligations. However,

these reactions need to be carried out manually for each transition. In contrast, the DNA

cellular computing devices described here operate in an autonomous fashion with no exter-

nal mediation. In the inspiring construction by Shapiro’s group, a duplex DNA encoding

the sequence of input symbols is digested sequentially by an endonuclease in a fashion

mimicking the processing of input data by a finite state automaton. A limitation of the

finite state automata construction is that the data are destroyed as the finite state automaton

proceeds. Though this feature does not affect the proper operation of a finite state au-

tomaton, it poses a barrier to further extending the finite state automaton to more powerful

computing devices such as Turing machines.

The autonomous DNA cellular computing devices are fundamentally different from

the tiling scheme in that cellular computing devices perform computation via coordinated

nanomechanical behavior of DNA molecules embedded in DNA lattices. As a conse-

quence, once assembled, the DNA cellular computing devices are capable of repeated par-

allel computations. In contrast, the computational tiling scheme is only capable of one-time

computation, namely the computation conducted during the assembly process. In addition,

the DNA cellular computing devices are more compact – e.g. our one dimensional de-

vices are as powerful as two dimensional tilings while our two dimensional devices are as

powerful as three dimensional tilings.

129

Page 149: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

5.1 Designs of Autonomous DNA Turing Machine

In this section, we present the design of an Autonomous DNA Turing Machine, a nanome-

chanical device embedded in a DNA lattice that mimics the operation of a Turing machine

in an autonomous fashion. The Autonomous DNA Turing Machine described here is ca-

pable of universal computation, by mimicking the operation of a 2-state 5-color universal

Turing machine described in [104]. In the process of computation, the device can also

demonstrate universal translational motion, which we define as the motion demonstrated

by the head of universal Turing machine.

5.1.1 Introduction to Universal Turing Machine

A Turing machine is a theoretical computational device invented by Turing for performing

mechanical or algorithmic mathematical calculations [92, 93]. Though the construction

and operational rules of a Turing machine may seem beguilingly simple and rudimentary,

it has been shown that any computational process that can be done by present computers

can be carried out by a Turing machine.

A Turing machine consists of two parts, a read-write head and a linear tape of cells

encoding the input data. The head has an internal state o and each cell has a color (or data)

� (as described in [104]). At any step, the head resides on top of one cell, and the color of

that cell and the state of the head together determines a transition: (i) the current cell may

change to another color; (ii) the head may take a new state; (iii) the head may move to the

cell immediately to the left or the right of the current cell.

A universal Turing machine is a Turing machine that can simulate the operation of

any other Turing machine. Let ; and ' be the number of possible states and the number

of possible colors of a Turing machine, respectively. The Turing machine with a proven

universal computation capacity and the smallest ; ()' value is a 2-state 5-color Turing

130

Page 150: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

machine described in [104].

In this section, we describe the design of a DNA nanomechanical device that can sim-

ulate the operation of an arbitrary 2-state 5-color Turing machine whose head moves to

either its left or right neighbor in every transition, and thus, in particular, can simulate the

universal Turing machine described in [104].

5.1.2 Notation

For ease of exposition, we first introduce some notation. The Autonomous DNA Turing

Machine contains array of dangling-molecules tethered to the DNA tracks. A dangling-

molecule is a duplex DNA fragment, with one end tethered to the track via a flexible single

strand DNA fragment and the other end possessing a single strand DNA extension (the

sticky end). Due to the flexibility of the single strand DNA linkage, a dangling-molecule

moves rather freely around its joint on the track. The only possible interactions between

two dangling-molecules are between those that are located close enough to each other. In

constrast to a dangling-molecule, a floating-molecule is a free floating (unattached to the

tracks) duplex DNA segment with a single strand overhang at one end (sticky end). A

floating-molecule floats freely in the solution and thus can interact with another floating-

molecule or a dangling-molecule.

An information encoding DNA molecule, such as a dangling-molecule or a floating-

molecule, is denoted as

� N ³OüYµ P `where � is its duplex portion, ³OüXµ is its sticky end portion, and & and � respectively represent

the state information encoded in � and ³OüXµ . This is illustrated in Figure 5.1. As shown in

the figure, there are two ways to encode information & in the duplex � . In Figure 5.1 (a), &is encoded as a unique DNA sequence ZJ�F ; in Figure 5.1 (b), & is encoded as the number

of base pairs (ß

bp in the figure) between an endonuclease recognition site and the sticky

131

Page 151: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.1: Encoding state information in DNA molecules. The backbones of DNA strands aredepicted as directed line segments. p and q ë�r represent the duplex portion and the sticky end of theDNA molecule, respectively. The bases shaded in blue are used to encode state information, s ort. “ ñ bp” indicates ñ DNA base pairs, where ñ is a non negative integer. In Figure (b), the numberñ is used to encode state information

tand is thus shaded in blue. The red (dark) box indicates the

recognition site for an endonuclease, in this case, EcoPl5 I

end of DNA molecule. The sequence of the sticky end ³OüYµ , in this case ²´ZV² , encodes the

state information � . Furthermore, we use ³ EüYµ to denote the complementary sticky end of

³OüYµ .The ligation of two molecules � N ³OüYµ P and ³ Eü~µ � ÷ ; is described by the equation

� N ³OüYµ P ('³ Eü~µ � ÷ ; y � Äî�Suppose � Ä incorporates an endonuclease recognition site and is cut into � N � ³OÔ�µ P � and

³ EÔ�µ � � ÷ ; � . This is represented as

�ÅÄmy � N � ³þÔ�µ P � ('³ EÔ�µ � � ÷ ; � �We can combine the above two equations and obtain,

� N ³þüXµ P (�³ Eü~µ � ÷ ; y �ÅÄmy � N � ³þÔ�µ P � (�³ EÔ�µ � � ÷ ; � `or simply,

� N ³OüXµ P ('³ Eü~µ � ÷ ; y � N � ³OÔ�µ P � (�³ EÔ�µ � � ÷ ; � �

5.1.3 Structural Overview

Figure 5.2 illustrates the structure of Autonomous DNA Turing Machine. Autonomous

DNA Turing Machine operates in a solution system. The major components of Autonomous

132

Page 152: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

uNv w xhyEz { v | } { v~ � �2� � � ��� � � � � � �

u2v w xN� � w | �

� � y2� z { � � w | �

� � � � � � � �N��� � � � � � ���� �+� �0� �+� �0�

� � � � �� � � � �Figure 5.2: Schematic drawing of the structure of Autonomous DNA Turing Machine. � ¬ and$ ¬ denote head-molecule and symbol-molecule, respectively. The backbones of DNA strands aredepicted as line segments. The short bars represent base pairing between DNA strands

DNA Turing Machine are two parallel arrays of dangling-molecules tethered to two rigid

tracks. The two rigid tracks can be implemented as rigid DNA lattices, for example, the

rhombus lattice [49] as shown in Figure 5.2. The upper and lower arrays of dangling-

molecules are called head-molecules, denoted as À , and symbol-molecules, denoted as

% , respectively. We require that the only possible interactions between two dangling-

molecules are either a reaction between a head-molecule and the symbol-molecule im-

mediately below it or a reaction between two neighboring dangling-molecules along the

same track. This requirement can be ensured by the rigidity of the tracks and the properly

spacing of dangling-molecules along the rigid tracks.

In addition to the two arrays of dangling-molecules, there are floating-molecules. There

are two kinds of floating-molecules: the rule-molecules and the assisting-molecules. The

rule-molecules specify the computational rules and are the programmable part of Au-

tonomous DNA Turing Machine while the assisting-molecules assist in the carrying out

the operations of Autonomous DNA Turing Machine, as described in detail later.

The array of symbol-molecules represent the data tape of a Turing Machine; the array

of head-molecules represent the moving head of a Turing Machine (more specifically, at

any time, only one head-molecule is active, and its position indicates the position of the

head of a Turing Machine); the rule-molecules collectively specify the transition rules for

133

Page 153: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.3: Four endonucleases used in the molecular implementation of the Autonomous DNATuring Machine. The recognition site of an enzyme is bounded by a red (dark) box and the restric-tion site indicated with a pair of red (dark) arrows. The symbol “ ” indicates the position of a basethat does not affect recognition

Autonomous DNA Turing Machine; the assisting-molecules are auxiliary molecules that

assist in maintaining the operation of Autonomous DNA Turing Machine.

The duplex portion and/or the sticky end of a DNA molecule may encode the follow-

ing information: (i) state, the Turing machine state; (ii) color, the color (data) encoded in a

symbol molecule; (iii) position, the position type of a head-molecule. The state, color,

and position information are denoted as o , � , and Q , respectively, where o}l òOj � Wß�� ² ZÌ`pj � W % À � ô ÐJô , �Öl òO² � `n² � `p²   `n² ¢ `p²��ïô , Q÷l ò�  � `¡  � `¡    ô for the 2-

state 5-color Autonomous DNA Turing Machine. The position information Q indicates the

position type of a head-molecule. This information is essential for dictating the bidirec-

tional motion of the head. The array of head-molecules is denoted as , À ��` ÀG!·` À `�������5 ;the array of symbol-molecules is denoted as ,_% �h`h%�!�`�% `������e5 . To specify the motion of

Autonomous DNA Turing Machine head, we have head-molecules arranged in periodic

linear order along the head-track

, À£¢¤� ` À£¢¦¥! ` À§¢¦¨ ` À£¢¤ÿ ` À£¢¦¥� ` À£¢¦¨© �����C5The action of the Autonomous DNA Turing Machine is driven by the protein enzymes

present in the solution. The enzymes used are illustrated in Figure 5.3.

134

Page 154: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

5.1.4 Operational Overview

At the beginning of a transition operation of Autonomous DNA Turing Machine, all the

symbol-molecules possess sticky ends ³ EØ�µ . A symbol-molecule with a sticky end ³ EØ�µ is

referred to as in its default configuration; the ³ EØ�µ sticky end is referred to as a default sticky

end. One of the head-molecules encodes the current state of Autonomous DNA Turing

Machine in its duplex portion and possesses an active sticky end ³ Ø�µ that is complementary

to the sticky end ³ EØ�µ of the symbol-molecule just below it. This head-molecule is referred to

as the active head-molecule. In contrast, all other head-molecules (with sticky ends other

than ³ Ø�µ ) are in default or inactive configuration.

Figure 5.4 gives a high level description of the events that occur during one transition

of Autonomous DNA Turing Machine. For ease of exposition, we describe the operation

in 4 stages. The 8 types of ligation events that correspond to the detailed 8-step implemen-

tation of Autonomous DNA Turing Machine (Sect. 5.1.5) are also marked in the figure to

assist the reader in relating the high level description in this section to detailed step-by-step

implementation in Sect. 5.1.5.

In Stage 1, the active head-molecule (labeled with a triangle, À«ª in the example shown

in Figure 5.4) is ligated to the symbol-molecule ( %» in Figure 5.4) directly below it, creating

an endonuclease recognition site in the ligation product (event ,�­@5 in Figure 5.4). The

ligation product is subsequently cleaved into two molecules by an endonuclease. The

sticky end of each of the two newly generated molecules encodes the current state and the

current color of Autonomous DNA Turing Machine.

In Stage 2, both the new symbol-molecule and the new head-molecule are ligated to

floating rule-molecules (events ,[835 and ,-Ù~5 in Figure 5.4), which possess complementary

sticky ends to them and correspond to one entry in the Turing machine transition table. The

ligation product between the symbol-molecule and the rule-molecule is in turn cleaved,

135

Page 155: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

® ¯ ®±° ®±² ®´³

® ¯ ®µ° ®±² ®´³¶ ¯ ¶ ° ¶ ² ¶ ³

¶ ¯ ¶ ° ¶ ² ¶ ³® ¯ ®µ° ®±² ®´³

¶ ¯ ¶ ° ¶ ² ¶ ³

® ¯ ®µ° ®±² ®´³

¶ ¯ ¶ ° ¶ ² ¶ ³

® ¯ ®µ° ®±² ®´³

¶ ¯ ¶ ° ¶ ² ¶ ³

® ¯ ®±° ®±² ®´³

¶ ¯ ¶ ° ¶ ² ¶ ³® ¯ ®±° ®±² ®´³

¶ ¯ ¶ ° ¶ ² ¶ ³

® ¯ ®µ° ®±² ®´³

¶ ¯ ¶ ° ¶ ² ¶ ³® ¯ ®±° ®±² ®´³

¶ ¯ ¶ ° ¶ ² ¶ ³

¶0· ¸)¹0º�»

¶0· ¸)¹+º½¼

¶0· ¸)¹0º¿¾

¶0· ¸)¹0º½ÀÁ Á  Â

ÂÃ Ä Å

Ã Æ Å Ã ÇeÅ

Ã È ÅÃ É Å

Ã Ê Å Ã ËhÌ Í Å

Figure 5.4: Operational overview of Autonomous DNA Turing Machine. The dangling head-molecules and symbol-molecules are depicted as red (dark) line fragments. The floating rule-molecules and assisting-molecules are depicted as light colored light segments. � , Î , Ï , and ädenote head-molecule, symbol-molecule, rule-molecule, and assisting-molecule, respectively. Thetriangle indicates the active head-molecule. ÃÞÝ Ç indicates the ligation event that occurs in StepÝ as will be described in the detailed step-by-step implementation of Autonomous DNA TuringMachine (Sect. 5.1.5)

136

Page 156: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

generating a new symbol-molecule dictated by the current state and color information as

well as the transition rule. The new symbol-molecule encodes the new color in its sticky

end. Similarly, the ligation product between the head-molecule and the rule-molecule is

cleaved, generating a new head-molecule whose duplex portion encodes information of

Turing machine’s next state and whose sticky end encodes the moving direction of the

head.

In Stage 3, the newly generated symbol-molecule is further modified by an assisting-

molecule so that it will encode the new color in its duplex portion (rather than sticky end)

and possess an ³0ÐØ�µ sticky end (event Ñ�Ò´Ó in Figure 5.4). The sticky end of the head-molecule

will dictate it to hybridize with either the head-molecule to its left or to its right, depending

on which of its neighbors possesses a complementary sticky end (event Ñ�ÔÕÓ in Figure 5.4,

ÀÖª is ligated with its left neighbor ÀØ× ). Next, the ligation product between these two

head-molecules is cleaved.

In Stage 4, the two head-molecules are modified by floating assisting-molecules (events

Ñ-ú´Ó and ÑZÙ�`V�´Ó in Figure 5.4) so that the first head-molecule is restored to its inactive config-

uration (with a default sticky end) and the second head-molecule encodes the state infor-

mation in its duplex part and possesses an active sticky end ³ Ø�µ and thus becomes an active

head-molecule, ready to interact with the symbol-molecule located directly below it.

This finishes a transition and the operation can thus go on inductively. We emphasize

that we describe the events in stages only for ease of exposition. The proper operation of

Autonomous DNA Turing Machine does not require the synchronization of the events as

described above. For example, event Ñ�ÒÕÓ in Stage 3 can occur either before event Ñ�ÙµÓ or

after event Ñ �ÕÓ .

137

Page 157: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

5.1.5 Step-by-step Implementation

We next give a detailed 8-step description of the operation of Autonomous DNA Tur-

ing Machine. Each step consists of ligation and cleavage events. The ligation events are

marked in Figure 5.4 with Ñ-�ÚÓ , where �!ÛÜ­ `CÝö`������Æ`}� . To demonstrate the practicality of

our design, we give full DNA sequence for the reactions of each step. In addition to theÞ

, Ð , ² , and ß bases, we also occasionally require another pair of unnatural bases which

we denote as b and à . The reason to use b and à is to minimize the futile reactions

as described later and hence increase the efficiency of our Autonomous DNA Turing Ma-

chine. The practicality of use of b and à is justified by the existing technology to make

such bases and incorporate them into DNA strands. For a recent survey on unnatural bases,

see [30].

At the start of the operation of Autonomous DNA Turing Machine, the configuration

of the head-molecules array along the head-track is

ÑâáÀ«ã ¿åä� ³ Ø�µåÓ�Ñt³ Ð��µ ã ½ À«ã ½× Ó�Ñt³ Ð�/µ ã ¾ À«ã ¾ª Óª�����where Q ¬ Ûæ  � for �çÛæÒgÝ (è­ , Q ¬ Ûæ  � for �«ÛéÒgÝ�(êÝ , Q ¬ Ûæ    for �«ÛæÒgÝ (ëÒfor ÝìÛèí"`�­ `¡Ýö`������ . The first head-molecule is special: it is the active head-molecule and

represents the current position of the active head. We use the symbol á to denote the active

configuration of a head-molecule. ÀI� has the unique sticky end ³ Ø�µ , which is complemen-

tary to the sticky end ³0ÐØ�µ of a symbol-molecule in default configuration (in particular, the

symbol-molecule directly below it). Thus, ÀI� can hybridize and be ligated with symbol-

molecule î � , and this will start the operation of the Turing machine. Recall that Q encodes

the position type information of a head-molecule. This position type information is en-

coded both in the sticky end portion and in the duplex portion of a head-molecule. As

we will see below, the sticky end encoding of Q is necessary for dictating the appropriate

motion of an active head; the duplex portion encoding is necessary for restoring a head-

138

Page 158: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.5: Step 1 of the operation of Autonomous DNA Turing Machine. The current state � andcolor � are initially encoded in the duplex portion of the head-molecule and the symbol-molecule,respectively. After the ligation and cleavage, both the sticky ends of the new head-molecule andsymbol-molecule encode the current state � and the current color � . The encoding scheme of � isdescribed in Table 5.1.Bsl I recognition sites and cleavage sites are indicated with red (dark) boxesand pairs of red (dark) arrows, respectively

molecule to its default configuration after it turns from an active to an inactive state.

The symbol-molecules array along the symbol-track is

Ñt³0ÐØ�µ/î � ¿� Ó�Ñ}³0ÐØ�µ/î � ½× Ó�Ñ}³/ÐØ�µ/î � ¾ª Óª�����All the symbol-molecules have the same sticky end ³/ÐØ�µ . As such, whenever a head-molecule

directly above a symbol-molecule becomes active, this symbol-molecule can interact with

the active head-molecule. Note that ³/ÐØ�µ encodes no color information – the color informa-

tion � ¬ is instead encoded completely in the duplex portion of a symbol-molecule.

Reaction Between a Head-Molecule and a Symbol-Molecule

Step 1. In step 1, the active state-encoding head-molecule is first ligated with the color-

encoding symbol-molecule below it, and then the ligation product is cut into a new head-

molecule and a new symbol-molecule, the sticky ends of which both encode the current

state and color information.

Let áÀ ã ä¬ ³ Ø�µ be the current active head (encoding position type Q and current state o );let ³0ÐØ�µ/î �¬ be the symbol-molecule below it (encoding current color � ). áÀ ¬ and î ¬ has com-

139

Page 159: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

plementary sticky ends and hence these two are ligated into Ñ À ¬ î ¬ Ó ã ä � . An endonuclease

recognizes the newly formed recognition site in the ligation product and cuts the ligation

product into áÀ 㬠³OÒOµ ä � and ³0ÐÒ,µ ä � î ¬ . Now the sticky ends of both áÀ ¬ and î ¬ encode the current

color and state. Step 1 can be described by the following equation,

áÀ ¬ ã ä ³¸Ø�µ¸(�³/ÐØ�µ/î �¬ y Ñ À ¬ î ¬ Ó ã ä � y áÀ ¬ ã ³OÒ,µ ä � ('³0ÐÒ,µ ä � î ¬The first part of the equation is the ligation of head-molecule áÀ ¬ ã ä ³¸Ø�µ with symbol-molecule

³0ÐØ�µåî �¬ into Ñ À ¬ î ¬ Ó ã ä � ; the second part is the cleavage of the ligation product into head-

molecule áÀ ¬ ã ³OÒOµ ä � and symbol-molecule ³0ÐÒ,µ ä � î ¬ . Note that now both the sticky ends of the

head-molecule and the symbol-molecule are encoding the current state and color. This

encoding scheme is in the same spirit as the one used in [14].

Figure 5.5 gives the molecular implementation of Step 1. For simplicity, only the

relevant end sequences are given. The encoded information Q is not shown. Both the case

when oïÛðîÐÀ � ô Ð and the case when o!Û ßñ� ²£ß are depicted. ��ü"ý is the color encoding

region for symbol-molecule î . The encoding scheme used is shown in Table 5.1.

Color Change of a Symbol-Molecule

After Step 1, the sticky end of ³/ÐÒ,µ ä � î ¬ encodes the current state and color. This sticky

end is subsequently detected by a rule-molecule �ô ³þÒ,µ ä � , which has a complementary sticky

end. �ô ³OÒOµ ä � corresponds to one entry in the transition table for Autonomous DNA Turing

Machine, and determines the next color � U that will be encoded in î ¬ . This color transition

occurs in Step 2 and î ¬ is modified to possess a sticky end ³0Ðò µ � � that encodes the new color

� U . In Step 3, î ¬ is restored to a default configuration with a sticky end ³0ÐØ�µ , and the new

color �+U encoded in its duplex portion. We next describe the reactions in detail.

Step 2. In Step 2, rule-molecule �ô ³OÒOµ ä � hybridizes and is ligated with symbol-molecule

³0ÐÒOµ ä � î ¬ . The ligation product is cut into �ôôó ³ ò µ � � (a waste molecule that diffuses away) and

140

Page 160: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Table 5.1: The molecular implementation of the color encoding scheme of a symbol-molecule. �is the color; Ä ë3î is the sticky q � r exposed when state � é ñâõ�� � ; T Ä ë is the sticky end q � r whenstate � é ά�çõöÏ�Ü . Note that all the ten sticky end sequences are different from each other�   ¤   ¥   ¨  ´÷  ´øä 2úùüûÕ  ³ ¼¡ý½þ TTA CTT CAA AEA CEAä 2 M�ÿ�û � T ¼¡ý TTT TCT TCA TAE TCE

Table 5.2: The relation between the length of the spacer, the sequence of sticky end q É� r and the newcolor of a symbol-molecule. � is the spacer length; É� is the sticky end sequence; � U is the new color� �   ¤   ¥   ¨   ÷   ø�� CA AC CT TT TGä 2 ùüû�  ³ � 8 7 6 5 4ä 2�M�ÿ�û � � 7 6 5 4 3

³0Ðò µ � � î ¬ . The sticky end ³0Ðò µ encodes the new color � U . Schematically, we have,

�ô ³OÒ,µ ä � ('³0ÐÒ,µ ä � î ¬ y Ñ ô î ¬ Ó ä ��� � y �ô ä �ó ³ ò µ � � (�³0Ðò µ � � î ¬Figure 5.6 describes the molecular implementation of Step 2 for the case when current

state is oðÛ ßñ� ²£ß , and the new color is � U Û ² � . The case for o Û îÐÀ � ô Ð is

similar, except that sticky end ³0ÐÒ!µ of î is ² Ð�ñÐü instead of Ð�ñÐü Ðý . The rule-molecule �ô ³þÒ,µ ä �consists of three parts, in the terminology of [14], Bpm I recognition site, spacer region,

and ® state,color ¦ detector. The ® state, color ¦ detector is the sticky end ³OÒ,µ ä � , which

hybridizes with and thus detects the sticky end ³0ÐÒ,µ ä � of the symbol molecule. The rule-

molecule and the symbol-molecule are ligated and Bpm I cuts the ligation product into

a waste rule-molecule �ô ä �ó ³ ò µ � � (ó

for waste), which diffuses away, and a new symbol-

molecule ³0Ðò µ � � î , effecting the color change of the symbol-molecule from � to � U . The length

of the spacer of �ô (see Figure 5.6) determines the position of the cut in the ligation product

and hence the sticky end ³0Ðò µ and the new color � U encoded in it. See Table 5.2 for the

relation between the length of the spacer, the sequence of sticky end ³0Ðò µ and the new color

�nU .Step 3. The symbol-molecule ³0Ðò µ � � î ¬ obtained from Step 2 needs to be restored to its

default configuration ³0ÐØ�µ/î � �¬ so that it can interact with the head-molecule À ¬ above it when

141

Page 161: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.6: Step 2 of the operation of Autonomous DNA Turing Machine. In this example, thecurrent state is � é ñ�õ¡��� ; the current color � and state � are encoded in the symbol-molecule’ssticky end q É� r whose sequence is ÉÄïÉë Éî ; the new color, in this case, will be � U é æ � , encoded insticky end q É� r whose sequence is “TG”. Bpm I recognition site and cleavage site are indicated witha red (dark) box and a pair of red (dark) arrows, respectively

Figure 5.7: Step 3 of the operation of Autonomous DNA Turing Machine. In this example, thenew color � U é�æ � . See Figure 5.8 for the complete set of assisting-molecules è � ¥ � � ¥ . The colorencoding regions are indicated with light blue (grey) background. EcoPl5 I recognition site andcleavage site are indicated with a red (dark) box and a pair of red (dark) arrows, respectively

Figure 5.8: The complete set of assisting-molecules è � ¥ qÛ� r � ¥ . The color encoding regions areindicated with light blue (grey) background

142

Page 162: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

À ¬ becomes active again. Note that this re-usability of the symbol-molecule is essential

for the proper functioning of Autonomous DNA Turing Machine. After the restoration, the

new color � U is encoded in the duplex portion of î ¬ , whose sticky end is the default sticky

end ³0ÐØ�µ for a symbol-molecule: it encodes no color, but is ready to interact with an active

head-molecule. The reaction of Step 3 is,

� � � ³ ò µ � � (�³0Ðò µ � � î ¬�� Ñ � îñÓ � � � � ó ³¸Ø�µ¸(�³/ÐØ�µ/î � ��Figure 5.7 gives a molecular implementation of Step 3 for the case ��U Û ² � . Color �nU

is encoded both in the sticky end portion and the duplex portion of assisting-molecule� � � ³ ò µ � � . assisting-molecule

� � � ³ ò µ � � detects the color encoding sticky end of symbol-

molecule î � and transfers its color encoding duplex portion to î � via ligation and sub-

sequent cleavage. This step generates a waste product� ó ³¸Ø�µ that diffuses away. Note that

� ó ³ Ø�µ may hybridize and be ligated with some other ³0ÐØ!µ end of a symbol-molecules,say

³0ÐØ�µåîª . However, this only represents some futile reactions that will not block, reverse, or

alter the operation of Autonomous DNA Turing Machine, since� ó ³ Ø�µ will be cut sub-

sequently away from ³0ÐØ�µ/î� by EcoPl 5I. Nevertheless, this does decrease the efficiency

of Autonomous DNA Turing Machine and as the concentration of� ó ³¸Ø�µ increases, the

negative effect on the efficiency becomes more prominent. For a complete set of assisting-

molecules� � � ³ ò µ � � , see Figure 5.8.

Note that the existence of endonuclease EcoPl5 I recognition site in the duplex portion

of î � adds extra complication to Step 1 and Step 2: it results in futile reactions which are

discussed in [112].

State Change of a Head-Molecule

Step 4. In Step 4, the head-molecule À ã� ³OÒOµ ä � generated in Step 2 (with its sticky end

encoding the current state and color) is modified by a rule-molecule that decides the state

143

Page 163: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.9: Step 4 of the operation of Autonomous DNA Turing Machine. The motion encodingregions are indicated with light blue (grey) background. � is the length of the spacer region of rule-molecule Ï . EcoPl5 I recognition site and cleavage site are indicated with a red (dark) box and apair of red (dark) arrows, respectively

transition and the motion of the head. After the modification, the new state information

is encoded in the duplex portion of the modified head-molecule, and the motion direction

of the head is encoded in the sticky end of the modified head-molecule in the form of a

sticky end complementary to one of its neighboring head-molecules. The sticky end of the

modified head-molecule will dictate it to interact with either its left or right neighbors, and

thus determines the motion of the head.

More specifically, head-molecule À ã� ³OÒ,µ ä � hybridizes and is ligated with a free floating

rule-molecule ³/ÐÒ,µ ä � ô and the ligation product Ñ À � ô Ó ã ä � is cut by endonuclease EcoPl5 I

into À ã ä �� ³�/µ ã � and ³ Ð�/µ ã � ôôó , a waste molecule that diffuses away. head-molecule À ã ä �� ³�/µ ã �encodes the new state o U in its duplex portion, and the motion direction Q U of the head in its

sticky end. The reaction of Step 4 is,

áÀ � ã ³þÒ,µ ä � (�³/ÐÒ!µ ä � ô � Ñ À � ô Ó ã ä � � áÀ � ã ä � ³��µ ã � ('³ Ð�/µ ã � ôôó

Figure 5.9 describes the molecular implementation for the case when the current state

o Û ß���� ß ; new state oOU Û îÐÀ � ô� ; the position type of the current head-molecule À �

is �ìÛ   � ; the position type of the head-molecule À�� that it will interact with is ��� Û   �(hence ¹ Û����ë­ in this case). The rule-molecule ³/Ð� µ ä�� ô consists of three parts: the de-

144

Page 164: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

tector sticky end ³/Ð� µ ä�� that encodes the current state and color; the spacer, whose length

determines the transition results (new state and motion direction of the head); and recogni-

tion site for endonuclease EcoPl5 I. The rule-molecule ³0Ð� µ ä�� ô detects the current state o and

color � encoded in sticky end ³ � µ ä�� of À � and is ligated to À � . After ligation, endonuclease

EcoPl5 I cuts into the motion encoding region of the head-molecule and exposes a new

sticky end that encodes the position type information � � ( and hence determines the motion

direction). Cleavages at motion encoding regions I and II result in new states o � Û�� ��� ßand � � Û îÐÀ � ô�

, respectively. Table 5.1.5 describes the complete set of transitions for

all the combinations of different � , � , � and � � . Note that � is not dependent on � : in each

case of � Ûë  � , � Û   � and � Û     , � is the same. This is an essential property since the

end � �! ä�� of À does not encode the � information.

Reaction between Two Adjacent Head-Molecules

head-molecule áÀ ã ä#"� �� ã " produced in Step 4 will next interact with one of its neighboring

head-molecules, � Ð� ã " À �� , where $ Û%�'& ­ for its left neighbor and $ Û%�(� ­ for its right

neighbor (Step 5). Then À)� becomes an active head-molecule encoding the new state � �(Step 6) while À � is restored to its default inactive configuration (Steps 7 and 8).

Step 5. In Step 5, head-molecule áÀ«ã ä "� �� ã " is ligated to either its left neighbor or its

right neighbor � Ð� ã " À*�� , where $ Û+�,& ­ or �-� ­ , as dictated by the �.� information encoded

in its sticky end. The ligation product Ñ À � À/��Ó ä " is cut into À ã� � Ð0 ã�ã " ä " and � 0 ã�ã " ä " áÀ)� . The

reaction of Step 5 is,

áÀ � ã ä " �� ã " �1� Ð� ã " À �� � Ñ À � À/��Ó ä " � À � � Ð0 ã�ã " ä " �+� 0 ã�ã " ä " áÀ � �Note that now both the sticky ends of À � and À)� encode position type � of À � , position

type � � of À/� , and the new state � � .Figure 5.10 gives a molecular implementation for this step. Panel I depicts an example

145

Page 165: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Table 5.3: The transition of head-molecule 2� ã� q � r ä43 to 2� ã ä65� q87 r ã 5 , and its subsequent interactionwith q:97 r ã 5 � ã 5� . In the table, ; is the position type of head-molecule 2� � , encoded in the duplexportion of � � ; q 97 r is the sticky end of head-molecule � ã� q 97 r ã in its default inactive configuration,encoding the position type ; of � � ; < is the sequence of the motion encoding region of head-molecule 2� � (see Figure 5.9); � is the moving direction of the head; ; � is the position type infor-mation encoded in sticky end q87 r of 2� � ;-= � q87 r ã 5 , dictating the moving direction of the head; q87 r � isthe reverse sequence of sticky end q87 r ; = and = � are the current state and the new state, respectively;Î and > stand for Î � õ Ï�Ü and >âõ��@? states, respectively; � is the length of the spacer region ofthe rule-molecule q 9� r ä�� Ï (see Figure 5.9). Note that q87 r � of � � is complementary to qA97 r of � �

ã B�CED F ä G ã " ä " � B CED B CHDI¢ ¤ ATC J K ¢ ¥ J 17 GA

AC TAG AG

¢¤ ATC J K ¢¦¥ ù 6 GAAC TAG AG

¢¤ ATC J L ¢¦¨ J 16 ATAC TAG TA

¢¤ ATC J L ¢¦¨ ù 5 ATAC TAG TA

¢ ¤ ATC ù K ¢ ¥ J 17 GAAC TAG AG

¢¤ ATC ù K ¢¦¥ ù 6 GAAC TAG AG

¢¤ ATC ù L ¢¦¨ J 16 ATAC TAG TA

¢¤ ATC ù K ¢¦¨ ù 5 ATAC TAG TA

¢¦¥ CAT J K ¢¦¨ J 17 ATCT GTA TA

¢ ¥ CAT J K ¢ ¨ ù 6 ATCT GTA TA

¢¦¥ CAT J L ¢¤ J 16 TGCT GTA GT

¢¦¥ CAT J L ¢¤ ù 5 TGCT GTA GT

¢¦¥ CAT ù K ¢¦¨ J 17 ATCT GTA TA

¢¦¥ CAT ù K ¢¦¨ ù 6 ATCT GTA TA

¢ ¥ CAT ù L ¢ ¤ J 16 TGCT GTA GT

¢¦¥ CAT ù K ¢¤ ù 5 TGCT GTA GT

¢¦¨ TCA J K ¢¤ J 17 TGTA AGT GT

¢¦¨ TCA J K ¢¤ ù 6 TGTA AGT GT

¢ ¨ TCA J L ¢ ¥ J 16 GATA AGT AG

¢¦¨ TCA J L ¢¦¥ ù 5 GATA AGT AG

¢¦¨ TCA ù K ¢¤ J 17 TGTA AGT GT

¢¦¨ TCA ù K ¢¤ ù 6 TGTA AGT GT

¢¦¨ TCA ù L ¢¦¥ J 16 GATA AGT AG

¢ ¨ TCA ù K ¢ ¥ ù 5 GATA AGT AG

146

Page 166: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.10: Step 5 of the operation of Autonomous DNA Turing Machine. Panel I depicts thecase when ;NMPO � , ; � MQO   , and = � MèÎ � õ ÏSR . Panel II and III describe all the cases when= � M Î � õ ÏSR and all the cases when = � MT>âõ��@? , respectively. In panel II and III, each caseis represented in a simplified fashion that only shows the ligation product before the cleavage.BslI recognition sites and cleavage sites are indicated with red (dark) boxes and pairs of red (dark)arrows, respectively. The unique sticky ends q 9U r ã�ã 5 ä 5 are shown with blue (grey) background

Figure 5.11: Step 6 of the operation of Autonomous DNA Turing Machine. Bpm I recognition sitesand cleavage sites are indicated with red (dark) boxes and pairs of red (dark) arrows, respectively

147

Page 167: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.12: Step 7 of the operation of Autonomous DNA Turing Machine. Bsl I recognition siteand cleavage site are indicated with a red (dark) box and a pair of red (dark) arrows, respectively

case in full detail; Panel II and III show all the cases in a simplified way. Note that the

sticky end � Ð0 (and � 0 ) encodes all the information for position type � of À � , position type

��� of À/� , and the new state �V� , we hence have ÒXW ÝYW Ý Ûë­¦Ý different sticky ends � Ð0 .Step 6. In Step 6, head-molecule áÀ �� is modified into a head-molecule ready to interact

with a symbol-molecule; in other words, it becomes an active head. The reaction of Step 6

is,

áÀ �� � 0 ã�ã " ä " �+� Ð0 ã�ã " ä " � À �� � áÀ �� ä " �Z �+�0ÐZ ó

Figure 5.11 describes the molecular implementation for Step 6. The mechanism of this

step is very similar to Step 4, and hence we omit its details.

Steps 7. and 8. In Step 7, the sticky end � Ð0 ã�ã " ä " of head-molecule À � is modified by

an assisting-molecule [ � 0 ã�ã "Nä6" to a new sticky end �+Ð\] ã�ã "Nä6" . In Step 8, the sticky end �+Ð\^ ã�ã "Nä6"initiates a sequential “growing-back” process which restores À � to its default (inactive)

configuration � Ð� ã À«ã� . The reaction of Step 7 is,

[ � 0 ã�ã " ä " �1� Ð0 ã�ã " ä " À � � [ À � � [ ó � \_ ã�ã " ä " �1�+Ð\] ã�ã " ä " À �The reaction of Step 8 is,

�+Ð\] ã�ã " ä " À � � � Ð� ã À«ã�148

Page 168: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.13: Step 8 of the operation of Autonomous DNA Turing Machine for the case ;`MaO � ,; � MbO � , and = � Mb>�õ¡�c? . This step consists of a sequence of alternating ligations and cleavages.At each stage d , where deM+fhgji , the head-molecule is first ligated to an assisting-molecule ? �-k(stage d-lhs ), then the ligation product is cut by an endonuclease (stage d-l t ). A waste molecule ? �mk óis generated at each stage. The last panel gives a compact representation of the whole process. Theunique sticky end generated at each stage is indicated with blue (grey) background. Endonucleaserecognition sites and cleavage sites are indicated with red (dark) boxes and pairs of red (dark)arrows, respectively

149

Page 169: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.14: Overview of the operation of Autonomous DNA Turing Machine

Figure 5.12 and Figure 5.13 describe the molecular implementation of Step 7 and Step

8 for the case � Û   � , � � Û   � , and � � Ûn� ��� ß , respectively. The figures are self-

explanatory and hence we omit the details for brevity. Note that Step 8 is a rather spectac-

ular process which illustrates a precisely controlled elongation mechanism using alternat-

ing ligations and cleavages. This mechanism may be of independent interest for designing

other molecular devices.

Overall Reaction Flow

Putting all the above steps together, we have a schematic drawing for the overall flow of the

reactions (Figure 5.14). The complete molecule set for the construction of our Autonomous

DNA Turing Machine is described in [112].

5.1.6 Complete Molecule Sets

Figure 5.15 describes the head-molecules and symbol-molecules used in the construction

of Autonomous DNA Turing Machine, respectively.

A head-molecule encodes several layers (for Steps 1, 4, 5, 6, 7 and 8) of informa-

tion in a single molecule. Figure 5.15 (a), (b) and (c) describe the lay out as well as the

150

Page 170: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

overlaid scheme of the layers of information for head-molecules of type À � , À � and À   ,

respectively.

The symbol-molecule participates in Steps 1, 2 and 3 and its construction is compara-

tively simple. Figure 5.15 (d) illustrates the relevant sequences for each of the three steps.

5.1.7 Futile Reactions

We note that there exist reactions other than those already described during the operation of

the Autonomous DNA Turing Machine. Upon careful examination, we will see that these

reactions do not block, reverse, or alter the proper operation of the Autonomous DNA

Turing Machine, although they decrease its the efficiency. Therefore, these innocuous

reactions are referred to as futile reactions.

The first kind of futile reactions (F1) occur between a rule-molecule Ñ4�/Ð�! ä�� Ó4o and its

dual [o Ñ4� �! ä6� Ó (Figure 5.16 (a)) or a rule-molecule � Ð0 and its dual [ � 0 (Figure 5.16 (b)).

Note that the ßïß sticky end in Figure 5.16 (b) is different from the �Z sticky end of the

symbol-molecules since it is a protruding Ô � end instead of a protruding Ò � end.

The second kind of futile reactions (F2) occur between symbol-molecules and head-

molecules (Figure 5.16 (c)) with complementary sticky ends. We can not completely avoid

the occurrence of these undesirable complementary sticky ends due to the limited encoding

space. See Figure 5.17 for examples. However, the endonuclease Bsl I recognition site in

the ligation product makes the undesirable ligation reversible and hence innocuous.

The third kind of futile reactions (F3) occur between î � and À � or [o – these futile

reactions are caused by the endonuclease EcoPl5 I recognition site in the duplex portion of

î � . Figure 5.16 (d) and (e) illustrate these two cases. We need to pay particular attention

to futile reaction F3b. In this case, the molecule [oqp^� � p could diffuse away and hence the

ligation for regenerating [o!î could be blocked. This would disastrously block the operation

151

Page 171: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.15: head-molecules and symbol-molecule used in Autonomous DNA Turing Machine.Panels (a), (b) and (c) describe head-molecules of position types O � , O � and O   , respectively;panel (d) illustrates a symbol-molecule. The molecule at the bottom of each panel gives the com-plete sequence information of a head-molecule or symbol-molecule while the molecules above thebottom one illustrate the relevant bases for individual steps. The sequence that belongs to an en-donuclease recognition site is bounded with a red (dark) box. The bases whose values are irrelevantto endonuclease recognition sites or unique sticky ends are denoted with “-”. In panels (a), (b) and(c), the unique sticky ends generated in Steps 5 and 8 are listed below the bottom molecule, andlabeled with q 9U r and q 9r r , respectively. Of these sticky ends, those generated by endonuclease MwoI are further labeled with blue background. See Table 5.2 for the values of s^tvu

152

Page 172: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.16: Futile reactions during the operation of Autonomous DNA Turing Machine. En-donuclease recognition sites and cleavage sites are indicated with red (dark) boxes and pairs of redarrows, respectively

153

Page 173: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.17: Encoding schemes used for 3-base sticky ends. The 3-base sequences are laid outin pairs such that two sequences in a pair are complementary to each other. Note that each 3-base sequence is written in 5’ to 3’ direction. During hybridization, the direction of one of thesequences will be reversed to 3’ to 5’. For example, AAC is paired with GTT. The reverse of GTTis TTG, and this sequence is complementary to AAC. The sequences used to encode the currentstate = and current color

3of head-molecule 2� � q � r ä�� and symbol-molecule q 9� r ä6� 2Î are bounded with

black boxes and labeled with3 f , 3Hw , 3 � , 3yx and

3 i (indicating color z � , z � , z   , z ¢ and z � ,respectively). The sequences used to encode the position types of head-molecules during Step 5 -8 are bounded with red boxes and labeled with 7{f , 7 w and 7Ü� (indicating the position types O � ,O � and O   , respectively). The sticky ends produced by endonuclease Mwo I are further indicatedwith blue background

of the whole Autonomous DNA Turing Machine. To fix this problem, we require that the

concentration of [oqp_� � p molecules stays sufficiently high in the system. This additional

requirement warrants the regeneration of [o!î and hence makes reaction F3b an innocuous

futile reaction.

In addition, we note that all the restriction reactions during the operation of Autonomous

DNA Turing Machine are reversible, and hence represent additional idling processes or in-

nocuous futile reactions.

154

Page 174: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

5.1.8 Encoding Space

One major challenge in designing DNA nanomechanical devices is the limited encoding

space dictated by the four letter vocabulary of the bases and by the sizes of the recog-

nition, restriction and spacing regions of endonucleases. Figure 5.17 gives a table of all

the possible permutations of 3-base sequences consisting of A, C, G and T. Among them,

the sequences of 3-base sticky ends used in the construction of Autonomous DNA Turing

Machine are labeled with boxes.

The encoding schemes have the following properties. First, each sticky end is unique –

this ensures that the transition of state and color, the motion of the head, and the restorations

of symbol-molecules and head-molecules are conducted according to designated rules. In

addition, we need to ensure that there are no cross ligations between these sticky ends

that can hinder the operation of the Autonomous DNA Turing Machine. Undesirable

cross reactions could result from the un-programmed hybridizations between sticky ends

of molecules during different stages. Due to the limited encoding space available to 3-base

sticky ends, we can not avoid the cross hybridization completely. However, we carefully

ensure that such cross hybridizations only result in idling processes and do not block or

alter the programmed operation of the Turing Machine. To this end, we require that the

cross hybridization only happen either between two sticky ends both generated by endonu-

clease Bsl I or two sticky ends both generated by endonuclease Mwo I. A ligation product

between two molecules resulted from such cross hybridizations will be cut back into the

original molecules and hence such ligation only represents an idling process or an innocu-

ous futile reaction (see Figure 5.16 (c)). Indeed, the cross hybridizations in Figure 5.17 are

all of such nature, with the only exception of (CAG, CTG). However, we note that both

CAG and CTG are sticky ends generated for À   molecule. But two À   molecules can not

interact with each other since they are not neighbours.

155

Page 175: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

5.1.9 Computer Simulation

The correct operation of the Autonomous DNA Turing Machine is verified using computer

simulation (For detail, see http://pengyin.org/paper/dnaUTM/).

5.2 Design of Autonomous DNA Celluar Automaton

In this section, we present the design of an Autonomous DNA Cellular Automaton which

offers the capacity for universal parallel computation, by mimicking the operation of a

2-color universal cellular automaton described in [104].

5.2.1 Introduction to Cellular Automata

A cellular automaton is a set of “colored” cells on a grid of specified shape that evolves

through discrete time steps according to a set of transition rules based on the colors of

neighboring cells [104]. If the lattice is a one (resp. two) dimensional lattice, the cel-

lular automaton is called a one (resp. two) dimensional cellular automaton. Figure 5.18

(a) shows the cells of an example one-dimensional cellular automaton. Each cell of this

automaton can have one of two states, or equivalently two colors, |^}�~!�_� and �_�_�^�v� . In

the initial configuration of this cellular automaton, all but one cells have |^}�~��_� color. The

evolving of a cellular automaton is specified by the transition rules. Figure 5.18 (b) il-

lustrates one example rule set for the cellular automaton shown in Figure 5.18 (a). The

rule set consists of 8 transition rules (rule number (1) - (8) in the figure). For exam-

ple, according to rule (1), if the current cell and both of its neighbors have color �_�^�^�v� ,

at next time step, the middle cell will change to color |m}�~��_� . This rule is denoted as

�_�_�^�]�,�E�_�_�^�]�,�E�_�_�^�]� � |^}�~!�_� . Applying the rules in Figure 5.18 (b) to the initial config-

uration in Figure 5.18 (a), we have the evolving table depicted in Figure 5.18 (c). Cellular

automaton can hold universal computing power. The cellular automaton depicted in Fig-

156

Page 176: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

(2)(1) (4)(3) (6)(5) (8)(7)

Figure 5.18: A universal cellular automaton with two colors: Rule 110. The figure is adapted fromWolfram [104]

ure 5.18 is one such universal cellular automaton, known as rule 110, as described in [104].

Compared with a Turing machine, a cellular automaton has “more” computing power

by providing a capacity for parallel computation. In the Turing machine, there is only one

active head that scans the tape and performs the computation; in contrast, in the cellular

automata, all the cells execute computations simultaneously. Cellular automata thus also

represent a category of more compact universal computing devices [32, 38]. For exam-

ple, a shift-adder multiplier can be implemented in a one-dimensional cellular automaton,

while, in contrast, its implementation requires a two-dimensional tiling assembly in the

computational tiling scheme [99]. This further attests to the compactness of our proposed

computing devices. There are a wide variety of known applications of cellular automata

computations that can, as a consequence, be executed at the molecular scale: these include

the discrete approximation of a large class of one and two-dimensional partial differential

equations, linear time execution of various matrix computations such as matrix product

and inverse, and recognition of context free languages.

5.2.2 Structural Overview

Recall that there is only one active head in a Turing machine, and hence only one active

cell in the Turing machine at any time point. In contrast, in a cellular automaton, all the

157

Page 177: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

� � � � � � ����A�y���:�A��� ����

� � �A��� � �� ¢¡!�:� £�8¤�� £¥¦ �8�� A� � �� ¢¡!�A� £��¤�� £¥§ ���A�4���:�A¡¨�:� £��¤�� £¥ ©

ª � �«� £�� �y¬­��®�A�¯£¥

°-±³²�´�µ�¶6·¸´º¹]»#¼�¼ ½E¼¶¸µH°�½4´«¾À¿�¶¸´«¾ÀÁ

Â4´�µA½H·¸´�½4µ�»m¾4Ã4°�½4´«¾ÀÁH¾À¿Ä¾Å½¯²�Æ^Ç�°È¹]»6¼�¼ ½E¼¶¸µH°�½4´«¾À¿Ä¶¸´«¾ÀÁ

Figure 5.19: Top panel: an abstract cellular automaton. Bottom Panel: schematic drawing ofthe structure of an Autonomous DNA Celluar Automaton corresponding to the abstract cellularautomaton in the top panel. The backbones of DNA strands are depicted as directed line segments.The short bars represent base pairing between DNA strands

Figure 5.20: Three endonucleases used in the molecular implementation of the Autonomous DNACelluar Automaton. The recognition site of an enzyme is bounded by a box and the cleavage siteindicated with a pair of bold arrows. The symbol “ g ” indicates the position of a base that does notaffect endonuclease recognition

cells are active and perform state transitions simultaneously. As such, given the design of

the universal DNA Turing machine, a challenge to extend it to a DNA cellular automaton

is to ensure a synchronization mechanism during the operation of the device: no cell can

advance more than one step in its state transition unless all other cells have also advanced

a step.

The Autonomous DNA Celluar Automaton operates in a solution system. Figure 5.19

illustrates an example abstract cellular automaton in the top panel, and the structure of

the corresponding Autonomous DNA Celluar Automaton in the bottom panel. The Au-

158

Page 178: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

tonomous DNA Celluar Automaton is composed of four parts: a rigid symbol track, a

linear array of dangling molecules tethered to the symbol track, a set of floating molecules,

and a group of floating protein enzymes.

É Symbol track. The symbol track provides a rigid structural platform on which the

dangling-molecules are tethered. It can be implemented, for example, as a rigid

addressable DNA lattice, such as the barcode DNA lattice reported in [106].

É Dangling DNA molecules. The array of dangling-molecules, also called symbol-

molecules, tethered to the symbol track represent the array of cells (symbols) in the

cellular automaton (and hence the name symbol-molecule). Recall that, as defined

in Sect. 5.1.2, a dangling-molecule is a duplex DNA fragment, with one end tethered

to the symbol track via a flexible single strand DNA fragment and the other end

possessing a single strand DNA extension (the sticky end). Due to the flexibility of

the single strand DNA linkage, a dangling-molecule moves rather freely around its

joint on the symbol track.

We require that the only possible interactions between two dangling-molecules are

those between two immediate neighbors. This requirement can be ensured by prop-

erly spacing the dangling-molecules along the rigid symbol track.

É Floating DNA molecules. In addition to the array of dangling-molecules, the sys-

tem contains floating-molecules. Recall that, as defined in Sect. 5.1.2, a floating-

molecule is a free floating (unattached to the tracks) duplex DNA segment with a

single strand overhang at one end (sticky end). A floating-molecule floats freely

in the solution and thus can interact with another floating-molecule or a dangling-

molecule provided that they possess complementary sticky ends. There are two kinds

of floating-molecules: the rule-molecules and the assisting-molecules. The rule-

molecules collectively specify the computational rules and are the programmable

159

Page 179: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

part of the Autonomous DNA Celluar Automaton, while the assisting-molecules as-

sist in the carrying out the operations of the Autonomous DNA Celluar Automaton,

which we describe in detail in Sect. 5.2.3.

É Protein enzymes. The system also contains floating DNA ligase and three types

of DNA endonucleases. The enzymes perform ligations and cleavages on the DNA

molecules to effect the designed structural changes and hence the information pro-

cessing in the Autonomous DNA Celluar Automaton. The cleavage patterns of the

endonucleases are described in Figure 5.20.

5.2.3 Operational Overview

Structural Changes

Figure 5.21 illustrates the structural changes during the operation of Autonomous DNA

Celluar Automaton.

Initial configuration. Figure 5.21 (a) depicts an example abstract cellular automaton in its

top panel, and a corresponding Autonomous DNA Celluar Automaton in its bottom panel.

For simplicity and clarity, the floating enzymes and the floating DNA molecules in the Au-

tonomous DNA Celluar Automaton are omitted from the figure; the symbol track, as well

as the duplex and sticky end portions of a dangling-molecule, is depicted as a thick line

segment; the flexible hinge of a dangling-molecule as a thin curve. The leftmost symbol-

molecule is a special initiator dangling-molecule, ¾ , representing the cell í in the abstract

cellular automaton (see Figure 5.21). To the right of ¾ , three types of dangling-molecules,Þ

, Ê , and Ë , are positioned evenly along the track in a periodic order such that cells

ÒÌ�v� ­ , ÒV�]� Ý , and ÒÌ�v� Ò , where � is a non-negative integer, in the abstract cellular automa-

ton are represented in the Autonomous DNA Celluar Automaton by symbol-moleculesÞ

,

160

Page 180: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

A B CA B CÍ ÎÏ

Ð Ñ Ò Ó ÔAÕ Ð Ñ Ò Ó Ô�Ö Ð Ñ Ò Ó Ô:× Ð Ñ Ò Ó ÔAØ

I

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A

A

A

A

Í ÙAÏ

Ú Õ8Û Ö Ü

Ú Õ8Û × Ü Ú Õ Û Ø Ü Ú Ö Û Ø Ü

Ú Ö Û Ö Ü Ú ×8Û Ö Ü Ú Ø Û Ö Ü

Ú Ö Û × Ü Ú ×8Û × Ü Ú ×8Û Ø Ü Ú Ø Û × Ü Ú Ø Û Ø Ü

Ð Ñ Ò Ó ÔAÕ Ð Ñ Ò Ó Ô�Ö Ð Ñ Ò Ó Ô:× Ð Ñ Ò Ó ÔAØA B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A B CI

A

A

A

A

Í Ý�Ï

Ú Õ Û Ö Ü Ú Õ8Û Ö Ü

Ú Õ Û × Ü Ú Õ8Û × ÜÚ Õ Û Ø Ü Ú Ö Û Ø Ü Ú Õ8Û Ø Ü Ú Ö Û Ø Ü Ú Ø Û Ø Ü

Ú Ö Û Ö Ü Ú ×8Û Ö Ü Ú Ö Û Ö Ü Ú Ø Û Ö Ü

Ú Ö Û × Ü Ú ×8Û × Ü Ú Ö Û × Ü Ú Ø Û × ÜÚ ×�Û Ø Ü

A B CI A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

1 2 30 4 1 2 30 4 1 2 30 4 1 2 30 4

1 2 30 4 1 2 30 4 1 2 30 4 1 2 30 4

1 2 30 4 5 6

Figure 5.21: Structural changes during the operation of an Autonomous DNA Celluar Automaton

161

Page 181: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Ê , and Ë , respectively. The symbol-molecules differ in their default sticky ends, i.e. the

sticky ends they possess in their respective initial configurations before the reaction starts.

As we shall see later, this is essential for the synchronization of the operation of the the

Autonomous DNA Celluar Automaton. The color of each cell in the abstract cellular au-

tomaton is encoded in a corresponding symbol-molecule in the Autonomous DNA Celluar

Automaton.

Reaction wave. Figure 5.21 (b) illustrates structural changes. During the operation of

the Autonomous DNA Celluar Automaton, the initiator molecule ¾ sends out a “reaction

wave” that travels down the track from left to right. In Stage 0, the reaction wave starts at

the initiator ¾ at position í , then travels sequentially toÞ

in Stage 1, Ê in Stage 2, and Ëin Stage 3. The reaction wave finishes one full cycle in Stages 1, 2, and 3, and thus goes

on inductionally down the track. This reaction wave is also indicated by a red (dark) arrow

in abstract cellular automaton depicted in the bottom panel of Figure 5.21 (b).

In Stage � , where �$Û í-��­v�¡Ý , and Ò , three types of reactions occur, namely reactions �yÞ2í ,

�4Þ+­ . and �yÞ�Ý .É In Stage 0, ¾ has a complementary sticky end to its right neighbor

Þand is thus

ligated toÞ

, and the ligation product is subsequently cleaved by an endonuclease

(Reaction ímÞ+­ ). Next, ¾ is “modified” by an assisting-molecule, depicted as a pink

(grey) line segment, and restored to its default configuration (Reaction ímÞEÝ ). The

“modification” will be implemented as ligation and cleavage events and will be de-

scribed in detail in Sect. 5.2.4. In a parallel reaction í-ÞEÝ , Þ is also modified by

another assisting-molecule such thatÞ

will possess a complementary sticky end to

Ê , and thus the reaction wave is ready to enter Stage 1 (Reaction í-Þ2Ò ).

É In Stage 1, similar structural changes occur as in Stage 0. However, after reaction

162

Page 182: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

­ÌÞ+­ , Þ will possess a sticky end that encodes the state, i.e. color, information of

itself, its left neighbor ¾ , and its right neighbor Ê . In the ensuing reaction ­vÞ�Ý , a

rule-molecule corresponding to a transition rule in Figure 5.18 recognizesÞ

’s sticky

end and effects a state transition of moleculeÞ

will then be modified by an

assisting-molecule and restored to its default configuration, encoding its new state.

In the example shown in Figure 5.21 (b), a rule-molecule corresponding to rule (7)

in Figure 5.18 changes the color encoded inÞ

from |m}�~��_� to �_�^�^�v� . In the parallel

reaction ­vÞ�Ò , Ê will be modified to posses a complementary sticky end to Ë .

É In Stages 2 and 3, reactions of the same nature as in Stage 1 will occur. We omit

obvious details for brevity.

Pipelined reaction waves. Figure 5.21 (b) depicts the four stages of a single reaction wave.

In fact, after both molecule ¾ and moleculeÞ

are restored to their default configurations,

¾ will start a new reaction wave. This reaction wave is indicated by a green (grey) arrow

in Figure 5.21 (c). As such, multiple reaction waves travel down the track in a “pipelined”

fashion. However, we have carefully engineered the system so that a reaction wave that

starts at a later stage can never overtake one that starts earlier. This ensures the synchro-

nization of the state changes of the Autonomous DNA Celluar Automaton, and hence its

correct operation.

Information Flow

We next describe the information flow during the operation of the Autonomous DNA Cel-

luar Automaton. For ease of exposition, we use the notation introduced in Sect. 5.1.2.

Initial configuration. Figure 5.22 (a) shows the Autonomous DNA Celluar Automaton in

163

Page 183: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

ßà­á

A B A B C I

1 2 3 0 4 5

ß âÅáFigure 5.22: Information flow during the operation of the Autonomous DNA Celluar Automaton

164

Page 184: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

its default configuration before the reaction starts. The top panel depicts an example ab-

stract cellular automaton; the middle panel depicts the simplified structure of Autonomous

DNA Celluar Automaton; the bottom panel depicts the symbol-molecules as information

encoding molecules using the notation we just introduced. As mentioned above, moleculesÞ

, Ê , and Ë possess different default sticky ends, namely, ��Ðã. , �0Ðä] , and �ÐåÈ respectively.

Note that the state information æ , ç , and � are encoded in the duplex portions ofÞ

, Ê ,

and Ë , not their sticky ends. This is essential to ensure that repeated reactions between

neighboring symbol-molecules can occur for multiple rounds, as described below.

Information flow. Figure 5.22 (b) illustrates the information flow during the operation

of the Autonomous DNA Celluar Automaton. We follow the framework of four-stage

structural changes presented above and enumerate the involved reactions below.

1. Reaction 0.1. Initiator molecule ¾ � � ã{ and its immediate right neighbor �åÐã. Þqè share

complementary sticky ends ã and ��Ðã� , and results in reaction,

¾ � � ã. �1�åÐã. Þ è � ¾,� 0 � è �1� Ð0 � è Þ ÞNote that the sticky end � Ð0 of product

Þencodes both the state information � from

reactant ¾ and the state information æ from reactantÞ

.

2. Reaction 0.2. The rule-molecule � Ð0 � è o restores ¾,� 0 � è to its default configuration in

reaction,

¾,� 0 � è �1� Ð0 � è o � ¾ � � ã. �1�åÐã. oéÞ3. Reaction 0.3. Molecule � Ð0 � è Þ is modified by assisting-molecule

� è � 0 � è in reaction

� è � 0 � è �1� Ð0 � è Þ � �0Ðä] �1� ä] Þ � è ÞNow

Þis transformed to � ä] Þ � è . This essentially shifts or transduces the state infor-

mation �6æ initially encoded in the sticky end ofÞ

to its duplex portion. Hence we

165

Page 185: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

term the assisting-molecule � è � 0 � è as a transducer-molecule. The above reaction

also modifiesÞ

’s sticky end to � ä] , which is complementary to the default sticky end

ofÞ

’s immediate right neighbor Ê . This effectively makesÞ

ready to interact with

Ê .

4. Reaction 1.1. MoleculeÞ � è � ä] interacts with its right neighbor �0Ðäv Êéê in reaction,

Þ � è � ä] �1�0Ðä] Ê ê � Þ � 0 � è ê �+� Ð0 � è ê Ê`Þ

Now the sticky end of the productÞ

encodes state information ��æ_ç , that is the current

state ofÞ

’s left neighbor, the current state ofÞ

, and the current state ofÞ

’s right

neighbor. This suffices to specify a transition rule shown in Figure 5.18 and results

in Reaction 1.2 below.

5. Reaction 1.2. Reaction 1.2 has two steps. In step 1.2.1,Þ � 0 � è ê interact with a rule-

molecule � Ð0 � è êÅo è " in reaction,

Þ � 0 � è ê �1� Ð0 � è ê o è " � Þ � òë ä è " �1�0Ðòë ä è " oéÞ

This essentially effects a state transition of moleculeÞ

, as specified by the rule

�6æ_ç � æ � . However, forÞ

to repeated perform computation, we need to restoreÞ

to its default configuration, i.e., a configuration with a default sticky end ��Ðã. and

encoding its new state æ � in its duplex portion. This task is carried out by another kind

of assisting-molecule called extension-molecule�

. However, as a floating molecule,�

needs to only recognizeÞ

’s current state but also distinguishÞ

from the other two

types of symbol-molecules, Ê and Ë . As such, we require the sticky ofÞ

encodes

not only its state information but also its type information � , where �Yìîí!� � �E� � �E�  ðï .Hence, in the above equation, the product

Þpossesses a sticky end � òë encoding both

its type information � and its new state æ � . This moleculeÞ � òë ä è " is then modified by

166

Page 186: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

extension-molecule� �0Ðòñ ä è " in reaction 1.2.2,

Þ � òë ä è " �1�0Ðòñ ä è " � � Þ è " ��Ðã� �+� ã. � Þ

Note that in Figure 5.22 (b), we have omitted æ_� to avoid “overloading” the figure.

6. Reaction 1.3. Molecule � Ð0 � è ê Ê is modified by transducer-molecule è ê � 0 � è ê in reac-

tion, è ê � 0 � è ê �+� Ð0 � è ê Ê � ��ÐåÈ �1� åò Ê è ê Þ

Note that now Ê encodes state æ_ç in its duplex portion (state � is not kept since it

is not required for effecting Ê ’s transition), and possesses sticky end � åó , which is

complementary to the default sticky end of Ë

7. Other reactions. Similar to reactions 1.1, 1.2, and 1.3, and hence omitted for brevity.

5.2.4 Step-by-step Implementation

To demonstrate the practicality of our design, we next give a detailed description of the

molecular implementation of the Autonomous DNA Celluar Automaton. We follow the

framework of four-stage structural changes and information flow presented in Sect. 5.2.3

and enumerate the involved reactions below. The complete DNA molecule set will be

described in detail in Sect. 5.2.5.

1. Reaction 0.1. Figure 5.23 depicts an example molecular implementation of reaction

0.1,

¾ � � ã. �1�åÐã. Þ è � ¾,� 0 � è �1� Ð0 � è Þ Þ

For simplicity, only the end sequences of dangling-moleculeÞ

are depicted; for

full sequences, see Figure 5.28. Panels (a) and (b) respectively illustrate cases

when æ Ûô|^}�~��^� and æ Ûn�_�^�^�v� . In molecule ��Ðã{ Þqè , the state information æ�ì167

Page 187: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.23: Example molecular implementation of reaction 0.1. The red (dark) box and red(dark) arrows respectively indicate the recognition and cleavage sites for endonuclease Bsl I. Theencoded state information is indicated with blue (grey) region. Base pair õñö^÷õ is a pair of unnaturalbases

íñ|^}�~��_���E�_�_�^�v� ï , is encoded by the presence or absence of a DNA base pair between

the sticky end �åÐã. (sequence TA) and the half recognition site for endonuclease Bsl

I (sequence GG/CC) in the duplex portion. This is further indicated in Figure 5.23

by the shaded blue (grey) region. In the case æ Ûø|^}�~��^� , the cleavage of ligation

product ¾ Þ by Bsl I produces a sticky end sequence GGT for molecule ¾ , and CCA

for moleculeÞ

. Both these unique sticky end sequences encode state information

�6æ .

Note that here moleÞ

contains a pair of unnatural bases, i.e. synthetic bases other

than the natural bases A, C, G, and T. They are required because the four-letter

ACGT natural vocabulary do not provide sufficient encoding space for our construc-

tion. For a survey on experimental synthesis of unnatural bases, see [30].

2. Reaction 0.2. Figure 5.24 depicts an example molecular implementation of reaction

0.2,

¾,� 0 � è �1� Ð0 � è o � ¾ � � ã. �1�åÐã. oéÞThe endonuclease involved is EcoPl5 I. This restores ¾ to its default configuration

with sticky end sequence � ã. Û AT.

168

Page 188: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.24: Example molecular implementation of reaction 0.2. The red (dark) box and red(dark) arrows respectively indicate the recognition and cleavage sites for endonuclease EcoPl5 I

Figure 5.25: Example molecular implementation of reaction 0.3. The red (dark) box and red(dark) arrows respectively indicate the recognition and cleavage sites for endonuclease EcoPl5 I.The encoded state information is indicated with blue (grey) region

3. Reaction 0.3. Figure 5.25 depicts an example molecular implementation of reaction

0.3, � è � 0 � è �1� Ð0 � è Þ � �0Ðä] �1� ä] Þ � è Þ

Here, we only illustrate the case æìÛ%|^}�~!�_� , and omit the similar case æ§Ûù�_�_�^�v�for brevity. Note that the state information �6æ initially encoded in the sticky end � Ð0 (sequence CCA) of

Þis now encoded in the blue (grey) region in its duplex portion

(sequence CGA/GCT).

4. Reaction 1.1. Figure 5.26 depicts an example molecular implementation of reaction

1.1,Þ � è � ä] �1�0Ðä] Ê ê � Þ � 0 � è ê �+� Ð0 � è ê Ê`Þ

169

Page 189: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.26: Example molecular implementation of reaction 1.1. The red (dark) box and red(dark) arrows respectively indicate the recognition and cleavage sites for endonuclease Bsl I. Theencoded state information is indicated with blue (grey) region. The bases labeled with red (dark)circles contain phosphorothioate bond and are hence resistant to enzyme cleavage. The light blue(grey) box indicate the Mwo I recognition site

Again, we only illustrate the case æ ÛT|^}�~��^� for brevity. This reaction is similar to

reaction 0.1. Both the sticky ends � 0 and � Ð0 encode state information �6æ_ç .Two technical points warrant explanation in this reaction. First, the bases labeled

with red (dark) circles contain phosphorothioate bond and are hence resistant to en-

zyme cleavage. This modification of the bases is required to prevent the unwanted

cleavage of the DNA duplex by Mwo I, whose recognition sites are indicated with

blue (grey) boxes in the figure. This trick will be used again in reaction 1.2.2. Sec-

ond, we assume here that the cleavage by endonuclease Bsl I will occur, but the

cleavage by EcoP15 I will not occur (note that both moleculeÞ

and molecule Ê con-

tain EcoP15 I recognition site, i.e. CAGCAG/CTGCTG). This assumption is based

on the fact that Bsl I, a Type II endonuclease, is far more efficient than EcoP15 I, a

Type III endonuclease.

5. Reaction 1.2.1. Figure 5.27 (a) depicts an example molecular implementation of

reaction 1.2.1,Þ ä � 0 � è ê �1� Ð0 � è ê o è " � Þ � òë ä è " �1�0Ðòë ä è " oéÞ

Recall that this reaction effects a state transition for moleculeÞ

, as specified by

the rule ��æ_ç � æv� . Here, the rule molecule incorporates a spacer region, the length

170

Page 190: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

of which ( � bp) encodes the new state information æ^� . In particular, when �êÛ-Ý ,æ � Û��_�_�^�v� ; when � Û1ú , æ � Ûû|^}�~!�_� .

When æ � Û |^}�~��^� (the EcoP15 I cleavage step not shown in Figure 5.27 (a)),

moleculeÞ

is restored to the desired target configuration ��Ðã{ Þ/è " , with the default

sticky end ��Ðã� Û AT. In this case, the next step, reaction 1.2.2, is not required. The

reaction can thus be rewritten as,

Þ ä � 0 � è ê �1� Ð0 � è ê o è " � Þ è " ��Ðã� �1� ã{ oéÞ

However, when æ � Û �_�_�^�v� (the EcoP15 I cleavage step shown in the figure),

moleculeÞ

is modified intoÞ � òë ä è " , with a unique sticky end � òë Û 1G that encodes

bothÞ

’s type information �ïÛ�� � andÞ

’s new state æ � (Recall that �éìbí!� � �E� � �E�  ðïencodes type information, in the case illustrated here, � Ûü� � . This information is

initially encoded in the blue (grey) duplex portion ofÞ

, in the form of sequenceý!þ^ÿ ý

). Then the reaction proceeds to the next step, reaction 1.2.2, which will finish

the state transition forÞ

.

Note that the spacer length � determines the transition. In a transition ����� � � � , the

value of � is determined cooperatively by � , � , and � � . For detail, see Figure 5.29.

6. Reaction 1.2.2. Figures 5.27 (b) depicts the case æ Û |^}�~��^�,�Hæ]� Ûô�_�_�m�v� , which

follows from the case illustrated in Figures 5.27 (a).

Þ � òë ä è " �1�0Ðòñ ä è " � � Þ è " ��Ðã� �+� ã. � Þ

Here, extension-molecule� �0Ðòë ä è " restores

Þto its default configuration with sticky

end ��Ðã� Û AT. However, nowÞ

encodes new state æ^� in its duplex portion.

We also illustrates the case when æÖÛ+�_�_�^�]�,�Hæ � Ûû|^}�~��_� in Figures 5.27 (c).

Note that moleculeÞ

contains a recognition site for EcoP15 I (not shown in the fig-

ure), which could introduce unwanted cut in molecule� Ý . To prevent such unwanted

171

Page 191: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

cut, as in reaction 1.1, we modify relevant bases in� Ý with phosphorothioate bonds,

which are resistant to enzyme cleavage. Details omitted for brevity.

7. Other reactions. Similar to the above reactions, and hence omitted for brevity.

5.2.5 Complete Molecule Sets

We next describe the complete DNA molecules that constitute the molecular implemen-

tation of Autonomous DNA Celluar Automaton. The dangling-molecules are depicted in

Figure 5.28. The floating rule-molecules are depicted in Figure 5.29. These two parts

are the programmable parts of the Autonomous DNA Celluar Automaton: the selections

of dangling-molecules and rule-molecules respectively determine the initial configuration

and the transition rules of the Autonomous DNA Celluar Automaton. Note that all the

sticky ends of rule-molecules are unique.

In contrast, the floating assisting-molecules only assist in the proper operation of the

Autonomous DNA Celluar Automaton and are non-programmable. Recall that there are

two types of assisting-molecules, transducer-molecules and extension-molecules. They are

respectively depicted in Figure 5.30 and Figure 5.31.

5.2.6 Futile Reactions

As in the Autonomous DNA Turing Machine, futile reactions exist in the systems of the

Autonomous DNA Celluar Automaton (see Sect. 5.1.7). An example is shown in Fig-

ure 5.32.

5.2.7 Computer Simulation

The correct operation of the Autonomous DNA Celluar Automaton is verified using com-

puter simulation (For detail, see http://pengyin.org/paper/dnaCA/).

172

Page 192: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.27: Example molecular implementation of reaction 1.2. Panel (a): reaction 1.2.1. Pan-els (b) and (c): reaction 1.2.2. The red (dark) box and red (dark) arrows respectively indicatethe recognition and cleavage sites for endonuclease EcoPl5 I. The encoded state information isindicated with blue (grey) region

173

Page 193: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.28: Complete set of dangling-molecules. Panels (a) and (b) respectively depict the caseswhen the encoded information s]ö t ö 3 M������ � and ��������� . Base pairs õñö ÷ õ , �Ìö ÷� , and �Ìö ÷� areunnatural bases [30]

����� ����� ����� ����� ����� ����� ����� ������ ! " # $ % &�� $ # � !% $ $ # � !% $

Figure 5.29: Complete set of rule-molecules. The eight columns (1-8) correspond to the eightpossible configurations of smtÌu in the rule smtÌu(' t � , where s*)Àt�)Åu+)Àt �-,/. �������0��)1� �2� �43 . Thesymbol ���5� stands for the configuration s^tvuqM . ���������6)7�������0��)1� �2� �43 . Note that the value of >is determined cooperatively by t , u , and u �

174

Page 194: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

898:8 898<; 8<;78 8<;=; ;78:8 ;78<; ;=;78 ;=;=;> ? @ A B C D E

Figure 5.30: Complete set of transducer-molecules. The eight columns (1-8) correspond to theeight possible configurations of s^tvu in the rule s^tvuF' t � , where sG)Àt6)Åu+)Àt � ,H. � �����0��)1� �2���I3 .The symbol � � � stands for the configuration smtÌu`M . � �����0��)7� �������6)1���2���I3 . The bases labeledwith red (dark) circles contain phosphorothioate bond and are hence resistant to enzyme cleavage(see reaction 1.1, Figure 5.26)

Figure 5.31: Complete set of extension-molecules. Panels (a) and (b) respectively depict thecases when the transition is sG)1� �2� �6)Åu/' ��������� and sG)7� �������6)ÅuJ' ����� � , where s*)Åu ,. � �����0��)1� �2���I3

Figure 5.32: An example futile reaction. The red (dark) box and red (dark) arrows respectivelyindicate the recognition and cleavage sites for endonuclease Mwo I

175

Page 195: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

5.2.8 Two-Dimensional Autonomous DNA Celluar Automaton

We next briefly describe how to extend the one-dimensional (1-D) cellular automaton to

two-dimension (2-D).

Structure. In place of a linear array of dangling-molecules, the 2-D Autonomous DNA

Celluar Automaton requires the dangling-molecules to be arranged in a 2-D lattice fashion,

corresponding to the layout of the cells in an abstract 2-D cellular automaton. This can be

readily achieved by embedding dangling-molecules in a 2-D DNA lattice, for example, a

rhombus lattice [49] as shown in Figure 5.33.

Operation. To illustrate the operational principle of 2-D Autonomous DNA Celluar Au-

tomaton, we first present an abstract view of the 1-D Autonomous DNA Celluar Automa-

ton in Figure 5.34 (a) and (b). Figure 5.34 (a) illustrates a reaction wave of the 1-D

Autonomous DNA Celluar Automaton: the reaction wave starts at initiator ¾ and trav-

els sequentially down the one-dimensional track. Figure 5.34 (b) examines one individual

dangling-molecule � , where � Û Þ �HÊ`�EË . Assume w.l.o.g., � Û Ê . As shown in Fig-

ure 5.34 (b), Ê in a 1-D Autonomous DNA Celluar Automaton undergoes the following

four phases in one full reaction cycle.

1. Phase 1. Ê has a sticky end that is complementary to its left neighborÞ

(indicated

by a solid square on the left of � in Figure 5.34 (b)). This is before reaction 1.1 as

depicted in Figure 5.22. In this phase, Ê encodes in its duplex portion its own state

information denoted by C (C for center).

2. Phase 2. In reaction 1.1, Ê interacts with its left (i.e. west) neighbor, and enters

phase 2. Now Ê encodes in its sticky end both the state information of itself, denoted

by � , and the state information of its west neighbor, denoted by | . This sticky end is

176

Page 196: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

complementary to a floating transducer-molecule, indicated by a circle around � .

3. Phase 3. In reaction 1.3, Ê is modified by the floating transducer-molecule and

enters phase 3. Now it encodes state information �Ì| in its duplex portion, and pos-

sesses a sticky end complementary to its right (i.e. east) neighbor (indicated by a

solid square on the right of � in Figure 5.34 (b)).

4. Phase 4. In reaction 2.1, Ê interacts with its east neighbor, and enters phase 4. Now

Ê encodes in its sticky end the state information of itself � , the state information

of its west neighbor | , and the state information of its east neighbor � . This sticky

end thus encodes sufficient state information to effect a state transition for Ê , and is

recognized by a floating rule-molecule (indicated by a thick circle). In the ensuing

reaction 2.2, Ê is modified by the floating rule-molecule, undergoes state transition,

is subsequently restored to its default configuration, and thus re-enters phase 1, en-

coding a new state information �_� ( �]� not shown in the figure).

With the above understanding of the 1-D Autonomous DNA Celluar Automaton, we

can extend it to 2-D in the following straightforward fashion. First, we take care of reaction

waves by positioning two arrays of initiators as shown in Figure 5.34 (c). Each initiator

can send out a reaction wave that travels either horizontally or vertically. Next, we take

care of the information flow and synchronization, by again examining one single molecule

� . As shown in Figure 5.34 (d), we engineer the system such that molecule � undergoes

8 phases. During these 8 phases, � sequentially interacts with its west ( | ), north ( K ),

east ( � ), and south ( L ) neighbors to garner the state information from each of them. As

such, upon entering phase 8, � carries in its sticky end the state information �Ì|6K_��L . This

state information is sufficient to effect a state transition for � . As in the 1-D case, � will

undergo a state transition and re-enters phase 1, completing a full circle.

177

Page 197: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.33: Structural overview of two-dimensional Autonomous DNA Celluar Automaton

Molecular implementation. It is conceivable that, using similar schemes presented in the

one-dimensional case, we can implement the 2-D Autonomous DNA Celluar Automaton

using DNA molecules. A potential technical difficulty is that the encoding space will be

exhausted if we are restricted to the four-letter vocabulary of natural DNA base, namely

A,T,C,G. This difficulty can be overcome by using an expanded DNA base vocabulary that

includes synthetic DNA bases besides the natural ones [30].

5.3 Discussion

In this chapter, we present the designs of an Autonomous DNA Turing Machine and an

Autonomous DNA Celluar Automaton. In addition to general design principles, we give

detailed molecular implementations using commercially available enzymes.

As a consequence of the universal computation, Autonomous DNA Turing Machine

demonstrates universal translational motion. This motion is a symbolic motion in the sense

that no physical entity is moved from one location to the other. Instead, the motion is the

motion of the active head symbol relative to the tracks. A nanorobotics challenge is to

extend Autonomous DNA Turing Machine to a device that can move a physical entity,

probably a DNA fragment, in a universal translational motion fashion. As a first step, it is

conceivable that a DNA nanomechanical device that moves a DNA fragment bidirection-

178

Page 198: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Figure 5.34: (a) (b) Operational overview of one-dimensional Autonomous DNA Celluar Automa-ton. (c) (d) Operational overview of two-dimensional Autonomous DNA Celluar Automaton. Inpanels (b) and (d), black numbers indicate the phases of p ; blue (grey) numbers indicate reactionscorresponding to reactions depicted in Figure 5.22; red (dark) letters indicate the state informationcarried by p

179

Page 199: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

ally along the track can be designed and possibly experimentally constructed.

Our complex designs of Autonomous DNA Turing Machine and Autonomous DNA

Celluar Automaton make some unconventional assumptions. Two lines of recent work

lends partial experimental support to the practicality of our designs. The first one is the

autonomous DNA finite state automata constructed by Shapiro’s group [12, 13, 14], in

which a cascade of cleavages and ligations drive the operation of the machine. A more

relevant study is our experimental construction of the autonomous unidirectional DNA

walker that moves along a DNA track, described in Chapter 4 [113]. This walking device

exploits some very similar enzyme reactions as those used in the designs of Autonomous

DNA Turing Machine and Autonomous DNA Celluar Automaton, e.g. the ligation and

cleavages of DNA duplices tethered to another DNA nanostructure and the ligation of

DNA fragments with 3-base overhangs at a relatively high temperature (37 áHË ).

Though a full experimental implementation of the Autonomous DNA Turing Machine

or Autonomous DNA Celluar Automaton appears daunting, due to the rich set of molecules,

reactions, and futile reactions involved, it might be possible to experimentally test a subset

of the mechanisms described here. Another challenge to experimental demonstration of

the Autonomous DNA Turing Machine or Autonomous DNA Celluar Automaton is the

design of an output detection mechanism.

Many futile reactions happen in the background during the operation of the Autonomous

DNA Turing Machine or Autonomous DNA Celluar Automaton. A key feature of these

futile reactions is that they are fully reversible. This is critical in ensuring the autonomous

operation of the Autonomous DNA Turing Machine or Autonomous DNA Celluar Au-

tomaton as explained below. We initially supply the system with sufficiently high concen-

trations of rule-molecules and assisting-molecules as well as all the byproducts generated

in the futile reactions. As such, the futile reactions will reach a dynamic balance and the

concentrations of all the components involved in the futile reactions, including both the

180

Page 200: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

“active” components essential for the operation of the Autonomous DNA Turing Machine

or Autonomous DNA Celluar Automaton and the “futile” byproducts, will stay relatively

constant during the operation of the Autonomous DNA Turing Machine or Autonomous

DNA Celluar Automaton. Note that since the active components will not be depleted by the

futile reactions (which could have happened should some futile reactions are irreversible),

the autonomous operation of the Autonomous DNA Turing Machine or Autonomous DNA

Celluar Automaton will not be disrupted. However, these futile reactions decrease the ef-

ficiency of Autonomous DNA Turing Machine or Autonomous DNA Celluar Automaton.

A desirable improvement of the current design is to decrease the level of futile reactions

and thus increase the efficiency and robustness of the Autonomous DNA Turing Machine

or Autonomous DNA Celluar Automaton.

181

Page 201: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Bibliography

[1] http://mrsec.wisc.edu/edetc/selfassembly/.

[2] DNA triangles and self-assembled hexagonal tilings. J. Am. Chem. Soc.,126:13924–13925, 2004.

[3] L. Adleman. Molecular computation of solutions to combinatorial problems. Sci-ence, 266:1021–1024, 1994.

[4] L. Adleman. Towards a mathematical theory of self-assembly. Technical Report00-722, University of Southern California, 2000.

[5] L. Adleman, Q. Cheng, A. Goel, and M. D. Huang. Running time and programsize for self-assembled squares. In Proceedings of the thirty-third annual ACMsymposium on Theory of computing, pages 740–748. ACM Press, 2001.

[6] L. Adleman, Q. Cheng, A. Goel, M. D. Huang, D. Kempe, P. M. de Espans, andP. W. K. Rothemund. Combinatorial optimization problems in self-assembly. InProceedings of the thirty-fourth annual ACM symposium on Theory of computing,pages 23–32. ACM Press, 2002.

[7] L. Adleman, Q. Cheng, A. Goel, M. D. Huang, and H. Wasserman. Linear self-assemblies: Equilibria, entropy, and convergence rate. In Sixth International Con-ference on Difference Equations and Applications, 2001.

[8] L. Adleman, J. Kari, L. Kari, and D. Reishus. On the decidability of self-assemblyof infinite ribbons. In Proceedings of the 43rd Symposium on Foundations of Com-puter Science, pages 530–537, 2002.

[9] G. Aggarwal, M. H. Goldwasser, M. Y. Kao, and R. T. Schweller. Complexities forgeneralized models of self-assembly. In Proceedings of 15th annual ACM-SIAMSymposium on Discrete Algorithms (SODA), pages 880–889. ACM Press, 2004.

[10] P. Alberti and J. L. Mergny. DNA duplex-quadruplex exchange as the basis for ananomolecular machine. Proc. Natl. Acad. Sci. USA, 100:1569–1573, 2003.

[11] V. Balzani, A. Credi, and M. Veturi, editors. Molecular Devices and Machines.Wiley-VCH, 2002.

182

Page 202: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

[12] Y. Benenson, R. Adar, T. Paz-Elizur, Z. Livneh, and E. Shapiro. DNA moleculeprovides a computing machine with both data and fuel. Proc. Natl. Acad. Sci. USA,100:2191–2196, 2003.

[13] Y. Benenson, B. Gil, U. Ben-Dor, R. Adar, and E. Shapiro. An autonomous molec-ular computer for logical control of gene expression. Nature, 429:423–429, 2004.

[14] Y. Benenson, T. Paz-Elizur, R. Adar, E. Keinan, Z. Livneh, and E. Shapiro. Pro-grammable and autonomous computing machine made of biomolecules. Nature,414:430–434, 2001.

[15] B. A. Bondarenko. Generalized Pascal Triangles and Pyramids, Their Fractals,Graphs and Applications. The Fibonacci Association, 1993. Translated from Rus-sion and edited by R. C. Bollinger.

[16] N. Bowden, A. Terfort, J. Carbeck, and G. M. Whitesides. Self-assembly ofmesoscale objects into ordered two-dimensional arrays. Science, 276(11):233–235,1997.

[17] R. F. Bruinsma, W. M. Gelbart, D. Reguera, J. Rudnick, and R. Zandi. Viral self-assembly as a thermodynamic process. Phys. Rev. Lett., 90(24):248101, 2003 June20.

[18] H. L. Chen, Q. Cheng, A. Goel, M. D. Huang, and P. M. de Espanes. Invadable self-assembly: Combining robustness with efficiency. In Proceedings of the 15th annualACM-SIAM Symposium on Discrete Algorithms (SODA), pages 890–899, 2004.

[19] H. L. Chen and A. Goel. Error free self-assembly using error prone tiles. In DNABased Computers 10, pages 274–283, 2004.

[20] J. Chen and N. C. Seeman. The synthesis from DNA of a molecule with the con-nectivity of a cube. Nature, 350:631–633, 1991.

[21] Y. Chen and C. D. Mao. Putting a brake on an autonomous DNA nanomotor. J. Am.Chem. Soc., 126:8626–8627, 2004.

[22] Y. Chen, M. Wang, and C. Mao. An autonomous DNA nanomotor powered by aDNA enzyme. Angew. Chem. Int. Ed., 43:3554–3557, 2004.

[23] Q. Cheng and P. M. de Espanes. Resolving two open problems in the self-assemblyof squares. Technical Report 03-793, University of Southern California, 2003.

183

Page 203: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

[24] Q. Cheng, A. Goel, and P. Moisset. Optimal self-assembly of counters at temper-ature two. In Proceedings of the first conference on Foundations of nanoscience:self-assembled architectures and devices, 2004.

[25] Matthew Cook, Paul W. K. Rothemund, and Erik Winfree. Self-assembled circuitpatterns. In DNA Based Computers 9, volume 2943 of LNCS, pages 91–107, 2004.

[26] D. Faulhammer, A. R. Cukras, R. J. Lipton, and L. F. Landweber. Molecular com-putation: RNA solutions to chess problems. Proc. Natl. Acad. Sci. USA, 97:1385 –1389, 2000.

[27] L. Feng, S. H. Park, J. H. Reif, and H. Yan. A two-state DNA lattice switched byDNA nanoactuator. Angew. Chem. Int. Ed., 42:4342–4346, 2003.

[28] R. P. Feynman. There’s plenty of room at the bottom. Engineering and Science,February 1960.

[29] K. Fujibayashi and S. Murata. A method for error suppression for self-assemblingDNA tiles. In DNA Based Computing 10, pages 284–293, 2004.

[30] A. A. Henry and F. E. Romesberg. Beyond A, C, G, and T: augmenting nature’salphabet. Curr. Opin. Chem. Biol., 7:727–733, 2003.

[31] S. H. Hong, Zhu J., and Mirkin C. A. Multiple ink nanolithography: Toward amultiple-pen nano-plotter. Science, 286, 1999.

[32] Joseph Jaja. An Introduction to Parallel Algorithms. Addison-Wesley Professional,1992.

[33] Eric Klavins. Toward the control of self-assembling systems. In Control Problemsin Robotics, volume 4, pages 153–168. Springer Verlag, 2002.

[34] T. H. LaBean. Introduction to self-assembling DNA nanostructures for computa-tion and nanofabrication. In Computational Biology and Genome Informatics eds.J.T.L. Wang and C.H. Wu and and P. P. Wang ISBN 981-238-257-7 World ScientificPublishing Singapore, 2003.

[35] T. H. LaBean, H. Yan, J. Kopatsch, F. Liu, E. Winfree, J. H. Reif, and N. C. See-man. The construction, analysis, ligation and self-assembly of DNA triple crossovercomplexes. J. Am. Chem. Soc., 122:1848–1860, 2000.

184

Page 204: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

[36] M. G. Lagoudakis and T. H. LaBean. 2-D DNA self-assembly for satisfiability.In DNA Based Computers V, volume 54 of DIMACS, pages 141–154. AmericanMathematical Society, 2000.

[37] L. F. Landweber, R. J. Lipton, and M. O. Rabin. DNA × DNA computations: Apotential ’Killer App’? In H. Rubin and D. H. Wood, editors, DNA Based Comput-ers III: DIMACS Workshop, June 23-27, 1997, University of Pennsylvania, pages161–172, Providence, Rhode Island, 1997. American Mathematical Society.

[38] Frank Thomson Leighton. Introduction to Parallel Algorithms and Architectures:Arrays, Trees, Hypercubes. Morgan Kaufmann Pub, 1991.

[39] J. Li and W. Tan. A single DNA molecule nanomotor. Nano Lett., 2:315–318, 2002.

[40] M. Li and P. Vitanyi. An Introduction to Kolmogorov Complexity and Its Applica-tions. Springer Verlag, New York, second edition, 1997.

[41] X. Li, X. Yang, J. Qi, and N. C. Seeman. Antiparallel DNA double crossovermolecules as components for nanoconstruction. J. Am. Chem. Soc., 118:6131–6140,1996.

[42] D. Lichtenstein. Planar formulae and their uses. SIAM J. Comput., 11(2):329–343,1982.

[43] R. J. Lipton. DNA solution of hard computational problem. Science, 268:542–545,1995.

[44] D. Liu, M. S. Wang, Z. X. Deng, R. Walulu, and C. D. Mao. Tensegrity: Construc-tion of rigid DNA triangles with flexible four-arm dna junctions. J. Am. Chem. Soc.,126:2324–2325, 2004.

[45] Dage Liu, Sung Ha Park, John H. Reif, and Thomas H. LaBean. DNA nanotubesself-assembled from triple-crossover tiles as templates for conductive nanowires.Proc. Natl. Acad. Sci. USA, 101:717–722, 2004.

[46] Q. Liu, L. Wang, A. G. Frutos, A. E. Condon, R. M. Corn, and L. M. Smith. DNAcomputing on surfaces. Nature, 403:175–179, 2000.

[47] G. S. Manning. A procedure for extracting persistent lengths from light-scatteringdata on intermediate molecular weight DNA. Biopolymers, 20, 1981.

185

Page 205: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

[48] C. Mao, T. H. LaBean, J. H. Reif, and N. C. Seeman. Logical computation usingalgorithmic self-assembly of DNA triple-crossover molecules. Nature, 407:493–496, 2000.

[49] C. Mao, W. Sun, and N. C. Seeman. Designed two-dimensional DNA holliday junc-tion arrays visualized by atomic force microscopy. J. Am. Chem. Soc., 121:5437–5443, 1999.

[50] C. Mao, W. Sun, Z. Shen, and N. C. Seeman. A DNA nanomechanical device basedon the B-Z transition. Nature, 397:144–146, 1999.

[51] A. A. Middleton. Computational complexity of determining the barriers to interfacemotion in random systems. Phys. Rev. E, 59(3):2571–2577, 1999.

[52] D. Natelson, R. L. Willet, K. W. West, and L. N. Pfeiffer. Fabrication of extremelynarrow metal wires. Appl. Phys. Lett., 77:1991, 2000.

[53] C. M. Niemeyer and M. Adler. Nanomechanical devices based on DNA. Angew.Chem. Int. Edit., 41:3779–3783, 2002.

[54] C. M. Niemeyer and C. A. Mirkin, editors. Nanobiotechnology. Wiley-VCH, 2004.

[55] Q. Ouyang, P. D. Kaplan, S. Liu, and A. Libchaber. DNA solution of the maximalclique problem. Science, 278:446–449, 1997.

[56] C. M. Papadimitriou. Computational complexity. Addison-Wesley PublishingCompany, Inc., 1st edition, 1994.

[57] R. D. Piner, J. Zhu, F. Xu, S. Hong, and C. A. Mirkin. Dip pen nanolithography.Sciene, 283:661–663, 1999.

[58] P. Rai-Choudhury, editor. SPIE Handbook of Microlithography, Micromachiningand Microfabrication, volume 1. 1997.

[59] J. H. Reif. Parallel molecular computation: Models and simulations. In Pro-ceedings: 7th Annual ACM Symposium on Parallel Algorithms and Architectures(SPAA’95) Santa Barbara,CA, pages 213–223, 1995.

[60] J. H. Reif. Paradigms for biomolecular computation. In C. S. Calude, J. Casti, andM. J. Dinneen, editors, First International Conference on Unconventional Modelsof Computation, Auckland, New Zealand, pages 72–93. Springer Verlag, 1998.

186

Page 206: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

[61] J. H. Reif. Local parallel biomolecular computation. In H. Rubin and D. H. Wood,editors, DNA-Based Computers 3, volume 48 of DIMACS, pages 217–254. Ameri-can Mathematical Society, 1999.

[62] J. H. Reif. The design of autonomous DNA nanomechanical devices: Walking androlling DNA. The 8th International Meeting on DNA Based Computers (DNA 8),2002.

[63] J. H. Reif. The design of autonomous DNA nanomechanical devices: Walking androlling DNA. Lecture Notes in Computer Science, 2568:22–37, 2003. Published inNatural Computing, DNA8 special issue, Vol. 2, p 439-461, (2003).

[64] J. H. Reif, S. Sahu, and P. Yin. Compact error-resilient computational DNA tilingassemblies. In Proc. 10th International Meeting on DNA Computing, pages 248–260, 2004.

[65] J. H. Reif, S. Sahu, and P. Yin. Complexity of graph self-assembly in accretivesystems and self-destructible systems. In Proc. 11th International Meeting on DNAComputing, 2005. To appear.

[66] J.H. Reif. Molecular assembly and computation: From theory to experimentaldemonstrations. In 29-th International Colloquium on Automata, Languages, andProgramming(ICALP), Mlaga, Spain, pages 1–21, 2002.

[67] R. M. Robinson. Undecidability and non periodicity of tilings of the plane. Inven-tiones Math, 12:177–209, 1971.

[68] P. W. K. Rothemund. A DNA and restriction enzyme implementation of Turingmachines. In R. J. Lipton and E. B. Baum, editors, DNA Based Computers: Pro-ceedings of the DIMACS Workshop, April 4, 1995, Princeton University, volume 27,pages 75 – 119, Providence, Rhode Island, 1996. American Mathematical Society.

[69] P. W. K. Rothemund. Using lateral capillary forces to compute by self-assembly.Proc. Natl. Acad. Sci. USA, 97(3):984–989, 2000.

[70] P. W. K. Rothemund. Theory and Experiments in Algorithmic Self-Assembly. PhDthesis, University of Southern California, 2001.

[71] P. W. K. Rothemund and E. Winfree. The program-size complexity of self-assembled squares (extended abstract). In Proceedings of the thirty-second annualACM symposium on Theory of computing, pages 459–468. ACM Press, 2000.

187

Page 207: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

[72] Paul W.K. Rothemund, Axel Ekani-Nkodo, Nick Papadakis, Ashish Kumar, Deb-orah Kuchnir Fygenson, and Erik Winfree. Design and characterization of pro-grammable DNA nanotubes. J. Am. Chem. Soc., 126:16344–16353, 2004.

[73] Paul W.K. Rothemund, Nick Papadakis, and Erik Winfree. Algorithmic self-assembly of DNA sierpinski triangles. PLoS Biology 2 (12), 2:e424, 2004.

[74] A. J. Ruben and L. F. Landweber. The past, present and future of molecular com-puting. Nature Rev. Mol. Cell Biol., 1:69–72, 2000.

[75] Phiset Sa-Ardyen, Natasa Jonoska, and Nadrian C. Seeman. Self-assembling DNAgraphs. Lecture Notes in Computer Science, 2568:1–9, 2003.

[76] Rebecca Schulman and Erik Winfree. Programmable control of nucleation for al-gorithmic self-assembly. In DNA Based Computers 10, LNCS, 2005.

[77] N. C. Seeman. De novo design of sequences for nucleic acid structural engineering.J. Biomol. Struct. Dyn., 8:573–581, 1990.

[78] N. C. Seeman. Nucleic acid nanostructures and topology. Angew. Chem. Int. Ed.,37:3220–3238, 1998.

[79] N. C. Seeman. DNA in a material world. Nature, 421:427–431, 2003.

[80] R. Sha, R. Liu, D. P. Millar, and N. C. Seeman. Atomic force microscopy of parallelDNA branched junction arrays. Chemistry and Biology, 7:743–751, 2000.

[81] W. B. Sherman and N. C. Seeman. A precisely controlled DNA biped walkingdevice. Nano Lett., 4:1203–1207, 2004.

[82] J. S. Shin and N. A. Pierce. A synthetic DNA walker for molecular transport. J.Am. Chem. Soc., 126:10834–10835.

[83] F. C. Simmel and B. Yurke. Using DNA to construct and power a nanoactuator.Phys. Rev. E, 63:041913, 2001.

[84] F. C. Simmel and B. Yurke. A DNA-based molecular device switchable betweenthree distinct mechanical states. Appl. Phys. Lett., 80:883–885, 2002.

[85] S. B. Smith, L. Finzi, and C. Bustamante. Direct mechanical measurements of theelasticity of single DNA groups by using magnetic beads. Science, 258:1122–1126,1992.

188

Page 208: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

[86] S. B. Smith, C. Yujia, and C. Bustamante. Overstretching B-DNA: the elastic re-sponse of individual double-stranded and single-stranded DNA groups. Science,271:795–799, 1996.

[87] W. D. Smith. DNA computers in vitro and in vivo. In R. J. Lipton and E. B.Baum, editors, DNA Based Computers: Proceedings of the DIMACS Workshop,April 4, 1995, Princeton University, pages 121 – 186, Providence, Rhode Island,1996. American Mathematical Society.

[88] A. Strasser, L. O’Connor, and V.M. Dixit. Apoptosis signaling. Annu. Rev.Biochem., 69:217–245, 2000.

[89] H. Sugimura and N. Nakagiri. J. am. chem. soc. Nanoscopic surface architecturebased on scanning probe electrochemistry and molecular self-assembly, 119:9226,1997.

[90] A. J. Turberfield. DNA as an engineering material. Physics World, pages 43–46,March 2003.

[91] A. J. Turberfield, J. C. Mitchell, B. Yurke, Jr. A. P. Mills, M. I. Blakey, and F. C.Simmel. DNA fuel for free-running nanomachines. Phys. Rev. Lett., 90:118102,2003.

[92] A. M. Turing. On computable numbers, with an application to the Entscheidungsproblem. In Proc. London Math. Society Ser. II, volume 42 of 2, pages 230–265,1936.

[93] A. M. Turing. On computable numbers, with an application to the entschei-dungsproblem. In Proc. London Math. Society Ser. II, volume 43, pages 544–546,1937.

[94] J. von Neumann. Probabilistic logics and the synthesis of reliable organisms fromunreliable components. Autonomous Studies, pages 43–98, 1956.

[95] H. Wang. Proving theorems by pattern recognition ii. Bell Systems Technical Jour-nal, 40:1–41, 1961.

[96] F. H. Westheimer. Why nature chose phosphates. Science, 235:1173–1178, 1987.

[97] E. Winfree. Complexity of restricted and unrestricted models of molecular compu-tation. In R. J. Lipton and E. B. Baum, editors, DNA Based Computers 1, volume 27of DIMACS, pages 187–198. American Mathematical Society, 1996.

189

Page 209: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

[98] E. Winfree. On the computational power of DNA annealing and ligation. In R. J.Lipton and E. B. Baum, editors, DNA Based Computers 1, volume 27 of DIMACS,pages 199–221. American Mathematical Society, 1996.

[99] E. Winfree. Algorithmic Self-Assembly of DNA. PhD thesis, 1998.

[100] E. Winfree. Simulation of computing by self-assembly. Technical Report 1998.22,Caltech, 1998.

[101] E. Winfree and R. Bekbolatov. Proofreading tile sets: Error correction for algo-rithmic self-assembly. In DNA Based Computers 9, volume 2943 of LNCS, pages126–144, 2004.

[102] E. Winfree, F. Liu, L. A. Wenzler, and N. C. Seeman. Design and self-assembly oftwo-dimensional DNA crystals. Nature, 394(6693):539–544, 1998.

[103] E. Winfree, X. Yang, and N. C. Seeman. Universal computation via self-assemblyof DNA: Some theory and experiments. In L. F. Landweber and E. B. Baum, edi-tors, DNA Based Computers II, volume 44 of DIMACS, pages 191–213. AmericanMathematical Society, 1999.

[104] S. Wolfram. A new kind of science. Wolfram Media, Inc., Champaign, IL, 2002.

[105] H. Yan, L. Feng, T. H. LaBean, and J. H. Reif. Parallel molecular computation ofpair-wise xor using DNA string tile. J. Am. Chem. Soc., 125(47), 2003.

[106] H. Yan, T. H. LaBean, L. Feng, and J. H. Reif. Directed nucleation assembly ofDNA tile complexes for barcode patterned DNA lattices. Proc. Natl. Acad. Sci.USA, 100(14):8103–8108, 2003.

[107] H. Yan, S. H. Park, G. Finkelstein, J. H. Reif, and T. H. LaBean. DNA-templated self-assembly of protein arrays and highly conductive nanowires. Sci-ence, 301(5641):1882–1884, 2003.

[108] H. Yan, X. Zhang, Z. Shen, and N. C. Seeman. A robust DNA mechanical devicecontrolled by hybridization topology. Nature, 415:62–65, 2002.

[109] P. Yin, S. Sahu, A. J. Turberfield, and J. H. Reif. Design of autonomous DNAcellular automata. In Proc. 11th International Meeting on DNA Computing, 2005.To appear.

190

Page 210: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

[110] P. Yin, A. J. Turberfield, and J. H. Reif. Designs of autonomous unidirectionalwalking DNA devices. In Proc. 10th International Meeting on DNA Computing,pages 119–130, 2004.

[111] P. Yin, A. J. Turberfield, S. Sahu, and J. H. Reif. Design of an autonomous DNAnanomechanical device capable of universal computation and universal translationalmotion. In Proc. 10th International Meeting on DNA Computing, pages 344–356,2004.

[112] P. Yin, A. J. Turberfield, S. Sahu, and J. H. Reif. Design of an autonomous DNAnanomechanical device capable of universal computation and universal translationalmotion. Technical Report CS-2004-07, Duke University, Computer Science Depart-ment, 2004.

[113] P. Yin, H. Yan, X. G. Daniell, A. J. Turberfield, and J. H. Reif. A unidirectionalDNA walker moving autonomously along a linear track. Angew. Chem. Int. Ed.,43:4906–4911, 2004.

[114] B. Yurke, A. P. Mills, and A. J. Turberfield. A molecular machine made of andpowdered by DNA. Biophysics, 78:2629, 2000.

[115] B. Yurke, A. J. Turberfield, Jr. A. P. Mills, F. C. Simmel, and J. L. Neumann. ADNA-fuelled molecular machine made of DNA. Nature, 406:605–608, 2000.

191

Page 211: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

Biography

Peng Yin was born on May 15th, 1976 in Hebei Province, China. After receiving a B.S. in

Biochemistry and Molecular Biology and a B.S. in Economics from Peking University in

1998, Peng Yin was recruited by the Cellular and Molecular Biology Graduate Program at

Duke University Medical Center in 1998 and he became affiliated with the Pharmacology

and Molecular Cancer Biology Department at Duke University in 1999. After receiving a

M.S. in Molecular Cancer Biology and a Certificate in Cellular Molecular Biology from

Duke University in 2000, he joined the Ph.D. program in the Department of Computer

Science of Duke University. His primary research interest lies in the emerging field of

self-assembly based nanoscience, with a current focus on theoretical, computational, ex-

perimental study of synthetic systems composed of biological molecules such as nucleic

acids (DNA, RNA) and proteins. He is also broadly interested in the emerging field of

computational biology and biomolecular computing, as well as theoretical computer sci-

ence problems arising within. See his publications at http://pengyin.org.

Related Publications.

1. J. H. REIF, S. SAHU, AND P. YIN. Complexity of graph self-assembly in accretive

systems and self-destructible systems. In Proc. 11th International Meeting on DNA

Computing, 2005. To appear.

2. J. H. REIF, S. SAHU, AND P. YIN. Compact error-resilient computational DNA

tiling assemblies. In Proc. 10th International Meeting on DNA Computing, pages

248–260, 2004.

3. P. YIN, A. J. TURBERFIELD, AND J. H. REIF. Designs of autonomous unidirec-

tional walking DNA devices. In Proc. 10th International Meeting on DNA Comput-

192

Page 212: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

ing, pages 119–130, 2004.

4. P. YIN, H. YAN, X. G. DANIELL, A. J. TURBERFIELD, AND J. H. REIF. A

unidirectional DNA walker moving autonomously along a linear track. Angewante

Chemie International Edition, 43:4906–4911, 2004.

5. P. YIN, A. J. TURBERFIELD, S. SAHU, AND J. H. REIF. Design of an autonomous

DNA nanomechanical device capable of universal computation and universal trans-

lational motion. In Proc. 10th International Meeting on DNA Computing, pages

344–356, 2004.

6. P. YIN, S. SAHU, A. J. TURBERFIELD, AND J. H. REIF. Design of autonomous

DNA cellular automata. In Proc. 11th International Meeting on DNA Computing,

2005. To appear.

Other Publications.

1. P. YIN AND A. J. HARTEMINK. Theoretical and practical advances in genome

halving. Bioinformatics, 21:869–879, 2005.

2. P. K. AGARWAL, Y. WANG, AND P. YIN. Lower bound for sparse Euclidean span-

ners. In Proc. 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pages

670–671, 2005.

3. S. H. PARK, P. YIN, Y. LIU, J. H. REIF, T. H. LABEAN, AND H. YAN. Pro-

grammable DNA self-assemblies for nanoscale organization of ligands and proteins.

Nano Letters, 5:729 – 733, 2005.

4. S. SAHU, P. YIN, AND J. H. REIF. A self-assembly model of DNA tiles with time-

dependent glue strength. In Proc. 11th International Meeting on DNA Computing,

2005. To appear.

193

Page 213: Copyright c 2005 by Peng Yin All rights reservedreif/paper/peng/thesis/thesis.pdf · helped me in my later interdisciplinary research. In particular, I would like to thank my ad-visor

5. A. SEKULIC, C. C. HUDSON, J. L. HOMME, P. YIN, D. M. OTTERNESS, L. M.

KARNITZ, AND R. T. ABRAHAM. A direct linkage between the phosphoinositide 3-

kinase-AKT signaling pathway and the mammalian target of rapamycin in mitogen-

stimulated and transformed cells. Cancer Research, 60:3504–3513, 2000.

194