Forecasting the Future of Biotechnology - The Blue Sky Workshop

41

Transcript of Forecasting the Future of Biotechnology - The Blue Sky Workshop

Page 1: Forecasting the Future of Biotechnology - The Blue Sky Workshop
Page 2: Forecasting the Future of Biotechnology - The Blue Sky Workshop

Forecasting theFuture of

BiotechnologyThe Blue Sky Workshop

September 9-10, 2001

US-EC Task Force on Biotechnology Research

Page 3: Forecasting the Future of Biotechnology - The Blue Sky Workshop
Page 4: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

3

Preface................................................................................................................4

Executive Summary.............................................................................................5

Introduction to the Presentations.......................................................................8

Presentations

Extending the Internet Throughout the Physical World.....................................11

Molecular Computing.........................................................................................17

Nanobiotechnology.............................................................................................21

Plant Genomics...................................................................................................23

Plant Bioengineering...........................................................................................29

Bioinformatics in 2011........................................................................................31

Bioinformatics and Computational Needs..........................................................35

Participants..........................................................................................................37

Contents

Keynote Address:

Page 5: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

4

For more than ten years US Government science agencies and the European Commission have enhanced commu-nication across the Atlantic via the US-EC Task Force on Biotechnology Research. Its aim has been to anticipate theneeds of the science of tomorrow and exchange ideas among program managers and administrators on thefuture directions of biotechnology research. Dr. Mary Clutter, Assistant Director of the US National Science Foun-dation and US co-chair of the US-EC Task Force on Biotechnology strongly feels that there is an increasingrealization of the need to collaborate globally. Since 1990 this US-EC Task Force has played a key role inidentifying emerging areas in biotechnology research. Bruno Hansen, Director of the EC Biotechnology, Agricultureand Food Research Directorate and EC co-chair of the US-EC Task Force believes that this Task Force hascontributed to strengthening collaboration significantly in many emerging fields such as bioinformatics, genomics,nanobiotechnology, neonatal immunity, biosafety and biodiversity through the mechanism of joint workshops witheach side contributing as equal partners.

This Blue Sky Workshop was organized to celebrate the start of the second decade of cross-Atlantic cooperationin biotechnology research. Given this anniversary, and the start of this new century, this workshop was seen as anopportunity to gather views on the future of biotechnology research and related disciplines. This special “Blue Sky”offered an opportunity for creative leaders in diverse fields to interact and formulate ideas about where researchwill be heading in 2010 and beyond.

One unique aspect of this workshop was to tap into the knowledge and ideas of the next generation of scientificleaders in Europe and the United States. In order to accomplish this goal, each participant invited to the work-shop a creative, forward-thinking student or early-career researcher. This person was not asked to make apresentation, but rather encouraged to participate actively in the panel discussions following each pair of speak-ers. It was anticipated that the panel discussions would provoke a lively dialogue among all attendees.The BlueSky Workshop was limited in attendance to seven invited speakers and about thirty participants in all. Thekeynote speaker, Dr. Larry Smarr opened the meeting on Sunday evening September 9, 2001. On Monday,September 10, 2001 there were presentations in the areas of biology, agriculture, and bioinformatics. An Ameri-can and European scientist represented each area. The presentations focused on creative ideas about whereeach discipline is headed and how the nature of scientific inquiry will change, not just in the next few years, butalso far into the future. This meeting acted as a stimulus to the Task Force members to pursue cooperativeactivities in these cutting edge areas of research.

Dr. Mary E. ClutterAssistant DirectorDirectorate forBiological SciencesNational Science Foundation

Mr. Bruno HansenDirectorBiotechnology, Agriculture and FoodResearch DirectorateEuropean Commission

Preface

Page 6: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

5

How does the future look?The integration of information science, biology, andengineering is creating vast new opportunities in agri-culture and human health. Networks of computing withGRID capabilities will allow for integrated databasesand data sharing. The challenge here is for the indi-vidual scientific fields to create the neededinfrastructure, web-based databases and analyticaltools and grid software. Molecular computing offers aunique opportunity to supplement and enhance ex-isting computing needs. Innovation in biotechnology isclearly at the crossroads of the disciplines. It is nowcritical to realize the potential of research being infor-mation-driven rather than merely hypothesis-driven.As technology modifies and augments our researchcapabilities, it is critical to monitor and evaluate theimpact on society and to anticipate the legal and so-cial acceptance of the new technological developments.

At the current time it is unlikely that DNA computing,even though it has certain advantages, will replace “insilico” computing. To enhance DNA computing we willneed a new malleable chemistry to produce new basepairs. It will also be possible to develop the ability tocompute with proteins and polymers. The rate limit-

Executive Summary

ing factors in transferring nanobiotechnologies intopractice, depends on the application area. The de-velopment of applications in the medical field requiresbiosafety and lifetime considerations and is conditionedby the successful completion of clinical trials. In theenvironment area, cost and lifetime will be the mostcritical factors. For the DNA computing and thenanotechnology communities to be successful both ofthe communities need to reach a critical mass of in-vestigators. In addition in these communities, as inmany of those discussed at this meeting, there is aneed for a combination of disciplines – biology, math,physics, and computer science. Furthermore, there isa need for both highly trained individuals in one spe-cific discipline as well as interdisciplinary training ofindividuals in two or more disciplines. In the currentenvironment there is a curriculum gap in interdiscipli-nary training both in Europe and in the US.

Biotechnology has had an extremely positive impacton plant biology. Plants are extremely sensitive livingorganisms responding to the environment and topathogens therein. In addition most plants have simi-lar genomic structures. Genomic studies of regulatory

In September 2001, a small group of renowned scientists from the European Unionand the United States was convened at a “Blue Sky Workshop” to discuss emergingand promising developments in the application of biotechnology to computing andinformation science, human health, and agriculture.

Page 7: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

6

mechanisms will clarify the similarity and differencesin a plant’s response to a pathogen and the impact ofthe environment on this process. Genetic studies andmodifications of plants are powerful and valuable ap-proaches to protect a plant from pathogens or improveits response to the environment. In doing so it is alsoimportant to consider potential unpredicted effects tothe other populations of plants and animals and toensure to retain the plant diversity.

High throughput chemical analysis has been one ofthe revolutionary events in the last decade. What hasyet to emerge is high throughput data analysis. Thefield of information technology has provided us withthe resources to store and handle large quantities ofdata, yet is still lacking in the capability to swiftly sortthrough and mine large complex heterogeneous datasets. This will require real machine learning - “organiclearning” - which has yet to be perfected. Further isthe need for state dependent dynamic models of bi-ology, through such modeling and computation it willbe possible for biologists and informaticians to startto simulate systems and understand internal and ex-ternal regulatory events.

It is also important to resist the temptation of “boxology”to help life sciences and advanced technology areasto cross-fertilize each other. It is possible to envisiona future of achieving significant progress in the newbiology consisting of scientists in disparate fields re-viewing jointly and more systematically each other’sexperiences and good practice to establish the level

of multi-disciplinarity required, at single researcherand at team levels. There is a shared belief that lifesciences will develop their holistic nature, and will of-fer innovation potentials to information technologistsand to material scientists (while the latter create newpotentials for future life sciences research in return),as long as these different expert communities can beled to work on biological fields of interest over pro-longed periods of time.

The new “Life Scientist”From this blend of experiences and perceptions sharedby experts in fundamental biology, in plant sciencesand bioinformatics, one may be tempted to draw thehypothetical picture of the emerging “Life Scientist” ofthe years to come. This would be a scientist pre-pared to:

• Replace in his/her mind-set the innate sense ofbelonging by a developed sense of acquisition;

• Return to basic biology and uncover more of thenatural biocomplexity;

• Ensure the right level of computing proficiency;• Enter discussions with bioinformaticians at an

early stage of problem definition;• Take advantage of powerful modern analytical

and modeling methods;• Develop the capacity to communicate one’s

science;• Address the relevant society questions.

Page 8: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

7

ConclusionThe integration of information science, biology, and engineering is creating vast new opportunities in agricultureand human health. Networks of computing will allow for integrated databases and data sharing. Molecularcomputing offers a unique opportunity to supplement and enhance existing computing needs. Innovation inbiotechnology is clearly at the crossroads of the disciplines. EC and US life sciences will be further promoted in thefuture through research behaviors inspired by this visionary model. The experts referred to two recurrent trends,among others, seen as cohesion factors for upcoming efforts of the research communities across the Atlantic.There is willingness to bring closer to each other American and European experts addressing the above issues,but also to develop their relevant experiences very visibly and with access to the rest of the world, hoping to drawmore partners into what might become a collective and global endeavor.

Page 9: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

8

Introduction to the PresentationsDr. Larry Smarr’s presentationreflected his views on the fu-ture computerization of theworld. In his opinion in the nearfuture there would emerge anew community of science.

There will be a wireless planetary grid providing aglobal knowledge system. In this new era there will bea wide range of opportunities for biological researchto utilize the increasing computing power (e.g. accessto and interpretation of genetic information, drug de-sign, nano-bio-info-technologies) and informationinfrastructures currently under development. Nota-bly, the Life Sciences research community has thegreatest challenge and opportunity in grid technologyand will need to base its future excellence in this areaon availing themselves of cognitive and research toolscommensurate to the magnitude and complexity ofthe data they unveil. Equally challenging, is the needto deal with the human social and ethical issues thatarise in a “wired planet.”

Moving from electronic com-puting, Dr. Laura Landweberprovided insights into, and anexample of, the current stateof DNA computing. DNAcomputing has the advantage

over “in silico” computing in that DNA offers a greaterstorage capacity and a lower requirement for energyutilization. The disadvantage of DNA computing is thatcurrently it is very “sloppy” and requires a high timecommitment to solve simple problems. The problemexample that was solved using DNA computing wasrelatively simple, however, using DNA computing it wasextremely complex to solve. Important to DNA com-puting is the single cell eukaryote – a ciliate – wherethere is a gene rearrangement equivalent to compu-tation. This organism has two nuclei. One contains theentire DNA including “non-essential DNA”; this smallermicronucleus contains 98% of the DNA. The macro-nucleus contains only the essential genes.Understanding this important evolutionary computa-tional development will be critical to understandingDNA computing.

Page 10: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

9

Dr. George RobillardDr. George RobillardDr. George RobillardDr. George RobillardDr. George Robillard pro-vided an overview of nano-technology along with someexamples of the current stateof useful nanotechnology. Inhis presentation he expresses

the view that for the immediate future, nanotechnologywill contribute in a major way in the health field, hav-ing high utility in diagnostics and therapeutics, usingtargeted and controlled release of drugs. An exampleof an immediate usefulness is the development of themicro array technologies for screening and analysis ofclinical samples, providing useful and inexpensivemeasures of large numbers of highly relevant clinicalchemistries critical for accurate diagnosis and treat-ment.

To cope with abiotic orpathogenic challenges, plantshave evolved the most di-verse sensing mechanismsnecessary for survival. Dr.Dr.Dr.Dr.Dr.Barbara BakerBarbara BakerBarbara BakerBarbara BakerBarbara Baker illustrated

how molecular tools combined with post-genomic dataenabled plant scientists to reveal recognition mecha-nisms, some of which are highly conserved amongeukaryots. Once an old observation is described inmolecular terms, such as the cross-protection againstan aggressive virus conferred to a plant by its priorexposure to a mild form of that virus, researchers candevelop a new strategy, in this case the experimentalgene silencing for large-scale gene expression stud-ies. Scientists can now precisely understand themechanisms of molecular traffic which link plants totheir changing environment. This has bearings on ourcapacity to turn genetic potentials into further progressin agriculture, as well as using plants in a perspectiveof global ecosystem management. Dr. Baker statedthat for this to happen, there must be greater inter-national integration of the community of plant scientists,including geneticists, breeders, pathologists and mo-lecular biologists, with an extended access to criticalresources. These resources include, but are not lim-ited to, research sites, genetic collections and datapools. Multidisciplinarity and wide scale data sharingwill be rate-limiting conditions.

Page 11: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

10

Dr. Lothar WillmitzerDr. Lothar WillmitzerDr. Lothar WillmitzerDr. Lothar WillmitzerDr. Lothar Willmitzer paidcredit to the pioneers of genetechnology in plants, andnoted that the area oftransgenesis developed fromthis technology. As sequenc-

ing technology becomes more readiliy available, theflood of molecular genetic data brings plant scientistsinto a position to understand the extent and value ofbiological diversity. This understanding will be unitingall levels of complexity, from populations and speciesto single genotypes, down to metabolic potentials. Therate-limiting steps are the identification and measure-ment of genetic diversity, and the supply of detectionsystems for the relevant biochemical properties. Ana-lytical technology and visualization methods for proteinand metabolic data are lagging behind current needs.Dr. Willmitzer wished one could describe plants asnetworks of molecular signals. In this relation, theo-retical biology would deserve more attention andopportunities to move to “in silico” predictions couldexpand. This would substantiate that one would goback to innumerable plant systems already described,but have them re-visited with the view to decipheringnatural complex organizations which reveal adaptivefunctions.

Dr. William NobleDr. William NobleDr. William NobleDr. William NobleDr. William Noble andDr. Alfonso ValenciaDr. Alfonso ValenciaDr. Alfonso ValenciaDr. Alfonso ValenciaDr. Alfonso Valencia eachgave presentations on the fu-ture of Bioinformatics. Themajor message in these pre-sentations is that it is important

to improve the web and de-velop intelligent analysistechniques. We now have“high-throughput” analyticaland production methods butlack any capability for “high-throughput” data analysis. Animportant challenging area is heterogeneous data sets.We will have an invisible knowledge network, and thecapability for personalized medical care. To realizethis capability we very much need “Organic” LearningAlgorithms for data analysis. Privacy, ethical and legalissues need careful consideration in this new distrib-uted knowledge system.

Page 12: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

11

After 20 years of growth, we’re reaching the end ofthe “S-curve” of building out the wired Internet—withhundreds of millions of personal computers servingas the end points in this architecture. Now we’re em-barking on a new “S-curve” describing the buildout ofthe wireless, high-speed Internet access. I will de-scribe how these technology trends will create a verydifferent world in which to study biology and practicemedicine.

Over the next decade, digital wireless links will extendthe Internet throughout the physical world. Billions ofInternet-connected cell phones, embedded proces-sors, hand-held devices, sensors, actuators, andlaboratory instruments will support radical new appli-cations in homeland security, biomedicine,transportation, environmental monitoring, civil infra-

Keynote AddressExtending the Internet Throughout the Physical World

Larry SmarrDirectorCalifornia Institute for Telecommunicationsand Information TechnologyProfessorComputer Science and EngineeringUniversity of California, San Diego

structure, new media arts, and interpersonal commu-nication and collaboration. During the same decade,tens of millions of households and businesses will switchfrom slow modems to speedy broadband Internet con-nections, and an all-optical core architecture will vastlyincrease the Internet’s capacity to support new usersand ever-more demanding applications. Peer-to-peercomputing and storage will increasingly provide a vastuntapped capability to power this emergent “plan-etary computer.”

Materials and device technologies developed over thepast few decades have provided the foundation forthe current explosion in computing systems, wirelesscommunications, and optical networks. For example,fiber-optic capacity has increased over the past de-cade from 10 to 3,000 Gbps because of fiber

Figure 1: Projected Subscribers to Fixed and Mobile Internet (Ericcson)

Mobile Internet

Fixed Internet

1999

.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2,000.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................................................................................................................................ .............

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Su

bscr

iber

s (m

illio

ns)

2000 2001 2002 2003 2004 2005

1,8001,600

1,400

1,2001,000 800

600 400 200

0

Page 13: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

12

amplifiers, semiconductor lasers, high-speed opticalmodulators, wavelength filters and routers, and high-speed devices for associated electronic circuits.Similarly, breakthroughs in materials and device tech-nology have the potential to enable dramatic advancesin new Internet telecommunications systems.

Further increases in optical network capacity will de-pend on new materials and device concepts, basedon nanoscale design and engineering of photonic ma-terials, ultra-small device structures such asthresholdless lasers, and micro-electro-mechanicalsystems (MEMS) for compact optical components. Wire-less, high-speed, handheld access to the Internet willrequire new ultra-low-power, high-speed transistors,and advanced materials and processing.

Major advances in software technologies will be re-quired to enable this vision for the future Internet, the“Grid”: a networked world of integrated computing,sensors, storage, visualization devices, and software.To be effective, however, it must address a large num-ber of software issues: security; resiliency to failure;authentication of users, code, and data; and high uti-lization. Areas for exploration include large-scale adhoc wireless networks, a secure distributed comput-ing infrastructure, mobile agent technologies, sensorsimulation, sensor network integration, and develop-ment of new middleware and human-computerinterfaces. The need for innovation in algorithms tosupport the vast size and complexity of this new

Internet, in turn, indicates that a critically importantrole will be played by mathematics research.

The Grid will be based on some large networks al-ready in development. In the summer of 2002, theNational Science Foundation will begin to install thehardware for the TeraGrid, a transcontinentally dis-tributed supercomputer that should do for computingpower what the Internet and the Web did for informa-tion sharing. Clusters of high-end microcomputers willbe set up at four sites: the National Center forSupercomputing Applications at the University of Illi-nois at Urbana-Champaign; the U.S. Department ofEnergy’s Argonne National Laboratory outside Chicago;Caltech in Pasadena, California; and the San DiegoSupercomputer Center at the University of California,San Diego. Eventually, the four sites will be networkedtogether so tightly as to create a single “virtual com-puter.”

This system will be able to process data at up to 13.6trillion floating-point operations per second, orteraflops—eight times faster than the most powerfulsupercomputer currently available. This speed will allowscientists to undertake computation-intensive tasks,for example, problems in protein folding that couldinform new drug designs, climate modeling, and analy-sis of and correlation among various astronomical “skysurveys.” TeraGrid is a prime example of “grid com-puting”—the integration of massive computer systemsto obtain unprecedented performance.

Page 14: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

13

Prototyping how the Internet can be extended wirelesslyto link a wide variety of scientific instruments with pcsand storage is the High Performance Wireless Researchand Education Network (HPWREN), a non-commer-cial, high-performance, wide-area, wireless networkin San Diego County. The National Science Foundationfunded network includes backbone nodes on the Uni-versity of California, San Diego campus and a numberof “hard to reach” areas in San Diego county. HPWRENis used for network research, supports high-speedInternet access to field researchers from several dis-ciplines (geophysics, astronomy, ecology), andprovides educational opportunities for rural NativeAmerican learning centers and schools.

Internationally, we can begin to see the future of aplanetary-scale optical network to support scientificresearch. Successfully building on the NSF funded STARTAP in Chicago, the point of entry for international re-search networks that connects many countries withInternet2, NSF is now funding Star Light which brings inoptical links of up to 10 gigabits per second. The firstto link in is SURFnet which is developing a state-of-the-art, optical, multi-gigabit connection betweenAmsterdam and Chicago to support research. Suchoptical links allow scientists to experiment with Internettechnologies, such as lambda networking and opticalswitching, which will be deployed in the near future toresolve bandwidth bottlenecks in the core of theInternet.

Figure 2: The International Research Networks Linked to the US Research Networks via NSF’s STAR TAP.

Page 15: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

14

One goal of Cal-(IT)2 is to integrate personalized (ge-nomic) medicine with wireless and sensing technologyto enhance health care delivery [Cal-(IT)2 calls this areaDigitally Enabled Genomic Medicine]. A wide varietyof sensing devices connected by a broadband wire-less Internet and linked to powerful, interoperableclinical and biomedical databases will extend the de-livery of genomic medicine to remote clinical settingsand increase speed and accuracy in diagnosis. Deci-sions crucial for urgent medical care (e.g., cardiacmonitoring) will be made remotely from the patientbased on receipt of real-time physiological data fromnoninvasive biosensors linked to wirelesstransceivers.

In 2000, the California Institute for Telecommunica-tions and Information Technology, also known asCal-(IT)2, was established. Cal-(IT)2 connects Univer-sity of California campuses at San Diego and Irvinewith research professionals from more than 40 lead-ing California telecommunications, computer, software,and applications companies. The project is funded by$100 million in state capital funds, which were re-quired to be matched by more than $200 million fromindustry, federal, private, and university resources. Theorganization’s mission is to explore the future of theInternet as it extends the reach of the current infor-mation infrastructure throughout the physical world.Cal-(IT)2 is teaming researchers and students acrossa wide variety of disciplines to create “living laborato-ries” to experiment with technologies and applicationsin real-world settings.

Figure 3:The layered structure for organizing research in Cal-(IT)2

Page 16: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

15

At the same time that these patient sensors are beingdeployed, the explosion of data from imaging medi-cal devices is continuing. What is immediately neededis a Grid infrastructure for storing and analyzing allthis data in a manner that is easy for scientists to use.Towards this goal, Cal-(IT)2 is collaborating with amajor NIH funded experiment termed the BiomedicalInformatics Research Network (BIRN), led by PI MarkEllisman at UCSD. BIRN aims to integrate (over the high-speed Internet) computing, data, and imaging

capabilities across the nation. As the software and in-frastructure are developed and deployed, this firstBIRN will enable sharing and correlation of brain dataof various species obtained using experimental meth-odologies, at different scales of resolution, and differentfile formats. This network links the San DiegoSupercomputer Center and the School of Medicine atUCSD with instruments at UCLA, Caltech, Duke, andHarvard. NIH plans a series of follow-on BIRNs fo-cused on other specific organs and disease states.

Figure 4: The BIRN Project

Page 17: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

16

In the area of molecular materials, researchers aredeveloping sensors and switches with ultra-high sen-sitivity, chemical and/or biological specificity, andultra-low-power consumption. These advances allowthe Internet to move inside the human body. The de-velopment of implantable biochips for in vivomonitoring of blood chemistry, combined with datacollection via wireless telemetry and offline analysis,will support, for example, feedback-controlled drug-delivery systems, and wireless, self-poweredenvironmental (chemical/biological) sensors to moni-tor pollution.

We can see that the simple technological trends canhave profound implications for the science of biology

and the practice of medicine. Subject to strong secu-rity and privacy protocols, imagine a world in whichmillions of people’s DNA variations can be cross-cor-related with real time read-outs of the health variablesof those same people. Powerful new data mining al-gorithms will enable basic researchers to discoverrelationships between genomic variations and the me-chanical and chemical functioning of the human bodyheretofore unrecognized. Potentially fatal events canbe detected early in time and prevented by warningor medications automatically generated to the indi-vidual. Perhaps more importantly, this new form of“bio-feedback” and pattern recognition across popu-lations can help teach us healthier life styles, whichcan lower significantly the health risks later in life.

Figure 5: The SDSC/CAL(IT)2 Knowledge and Data Engineering Laboratory

Page 18: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

17

Laura F. LandweberAssociate ProfessorDepartment of Ecology andEvolutionary BiologyPrinceton University

Molecular Computing

Silicon chips have an upper limit in speed, which hasled to searches for alternative media, one of which isDNA. DNA is an ideal option for molecular computingbecause it is self-complementary and is easily cop-ied. In addition, it is readily manipulated by usingrestriction enzymes, ligation, sequencing, amplifica-tion, and labeling.

Because much of the human body is, in theory, a sys-tem of binary operations, biological systems lendthemselves to novel concepts in computing. Thus, aseries of enzymatic or chemical pathways can be usedto solve a problem using DNA strands that correspondto each possible solution, relying on an algorithm tosort for the correct answer. One method, for example,uses DNA computing on surfaces, in which the solu-tion DNA strands are affixed to a solid medium and,in a subtractive algorithm, incorrect strands areselectively destroyed. Because DNA can easily be sepa-

rated from mixtures, the only components bound tothe “chip” are the DNA and anything attached to it.This greatly streamlines complex, repetitive chemicalprocesses and, perhaps most importantly offers thepromise of automation.

In 2000, a Princeton laboratory demonstrated the firstuse of RNA to solve a computational problem. The“Knight Problem” is a nine-bit satisfiability (SAT) prob-lem of propositional logic that asks, given a 3 x 3chessboard, what configurations of knights may beplaced on the board so that none threaten any others?There are 94 correct solutions (out of a possible 512),ranging from one solution with zero knights on theboard to two solutions with five knights on the board(see Figure 6). Landweber and associates generateda ten-bit DNA data pool, with a tenth bit included inthe event that one of the previous nine computedunreliably. Each bit can be set to an “on” (=1) or “off”(=0) sequence (see Figure 7).

Figure 6: Expected and observed frequencies of boards with specific numbers of knights

Page 19: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

18

In Landweber’s work, all but one of the retained solu-tions were correct. Analysis of the incorrect readoutshowed that the RNA strand contained an adjacentpoint mutation and a deletion, preventing its hybrid-

Figure 7

Figure 9

Each RNA strand in the data pool consisted of a seriesof ten 15-nucleotide bits separated by nine 5-nucle-otide spacers (see Figure 8).

Figure 8

Landweber’s group then separated the correct solu-tions from the data pool by ‘destroying’ the incorrectsolutions (see Figure 9),

using ribonuclease H (RNaseH) because it digests RNAsequences hybridized to DNA, thus destroying spe-cifically marked strands rather than unmarked strands.This flexibility in cleaving target sequences was an im-portant incentive for choosing RNA over DNA. WithDNA, computing can be done with the finite catalogueof restriction enzymes, but these require specific, of-ten palindromic, sequences and the use of severalenzymes.

Page 20: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

19

ization to the complementary DNA strand so it couldnot be digested. Although this error rate is higher thanthat found in a computer chip, if the data pool werebetter purified, the source of error could be elimi-nated. In addition, the error rate could be reduced ifthe amount of RNA material in the library were in-creased, with reamplificiation of the RNA library aftereach digestion.

The single-celled ciliate with its two nuclei, on the otherhand provides a very different example of a“binary” system in nature with potential in molecularcomputing (see Figure 10). All ciliates possess twotypes of nuclei: an active somatic macronucleus and agermlinemicronucleus that contributes to sexual re-production. The macronucleus forms from the

micronucleus after cell mating, during the course ofdevelopment. The process of gene unscrambling in afew particular ciliates represents a unique solution tothe problem of gene assembly involving the two nu-clei. Recombination during meiosis is in essence acomputational process. With some essential genesscrambled in as many as 51 pieces, these ciliates relyon sequence and structural cues to rebuild their frag-mented genes and genomes. Direct repeats presentat the boundaries between coding and noncoding se-quences provide pointers to help guide assembly ofthe functional (macronuclear) gene.

Figure 10: The single-celled ciliate with two nuclei

Page 21: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

20

The genomic copies of some protein-coding genes inthe micronucleus of spirotrichous ciliates can beencrypted in three ways: 1) intervening non-protein-coding DNA segments interrupt protein-coding DNAsegments and must be removed from the DNA duringmacronuclear development; 2) some micronucleargenes are permuted relative to the chronological or-der in the macronuclear copy; and 3) these “scrambled”segments are encoded in either orientation on themicronuclear DNA. During the process of decryption,the total amount of DNA eliminated from the micro-nucleus is as great as 98 percent. These genomerearrangements therefore present a potentially com-plicated cellular computational paradigm, andtheoretical modeling of this biological process sug-gests that these spirotrichous ciliates may in principlepossess the capacity to perform any formal computa-tion carried out by an electronic computer.

Clearly, molecular computing is in its infancy, and stillfaces several challenges. First, the materials used—whether DNA, RNA, or proteins—are not reusable. Amolecular computer would require periodic refuelingand cleaning. Second, the molecular components usedare specialized. Thus, for example, a different set ofoligonucleotides would be needed for each problem.And, as illustrated in the Knight Problem, the errorrate in vitro may be unacceptable for computationalproblems. In order to compete with silicon-based com-puters, error rates would have to be lowered at leastto that acceptable in a biological system.

Page 22: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

21

George T. RobillardProfessorBiomade Technology FoundationGroningen, The Netherlands

Nanobiotechnology

Nanobiotechnology is the underpinning technology thatis accelerating rapid advances in areas such asgenomics, combinatorial chemistry, high-throughputrobotic screening, drug discovery, high-throughputsequencing, and bioinformatics. The vision ofnanobiotechnology is to build tiny, molecule-sizedmachines able to manipulate matter at the atomic level.

In the last few years, researchers have begun to con-struct primitive analogs of components that would beneeded to build a functioning nanomachine. Examplesinclude carbon structural frames, nano-scale grasp-ing tools, molecule-sized motors, and logic gates thatcould serve as the basis for molecular-scale comput-ers. The premise of the technology is that DNA couldfunction as a nanoscale computer that sends instruc-tions to nanoscopic assembly units within the cells, orribosomes. The ribosomes then manufacture proteins,which function as tiny nanomachines building sub-unitsof biological cells, which in turn form whole cells, whichin turn form living creatures. The hope of nanotechresearchers is to copy life’s molecular manufacturingprocess in a more refined and improved way.

For example, analysts in the microprocessor industryexpect chip manufacturing techniques to reach physi-cal size limits sometime before 2020. Nano-technologists are devising ways to manipulate matterat extremely small size scales to replace the silicontransistors. For example, nanoscopic, rod shapedmolecules made of only a few carbon atoms could

switch between an “open” and “closed” state while oc-cupying far less space than silicon transistors. Anexample of these new revolutionary technologies thatare already affecting the way we live is lab-on-a-chipdevices. These devices are based on silicon chip tech-nology, and some are already in commercialproduction, dramatically shortening the timescales forall kinds of analysis. On a silicon chip, micro-scale chan-nels direct picolitres of sample fluid into active sensingsites that can be from 10-200 micrometers in diam-eter. The miniaturization of all components has theimmediate effect of speeding up the whole analyticalprocess, giving results in minutes rather than hours.

This lab-on-a-chip technology has many varied ap-plications, from fast-throughput DNA analysis and cellseparation to new drug discovery. Silicon chip tech-nology also offers the potential for creating biosensors.These rely on networks of molecular surfaces workingaccording to combinatorial principles.

Figure 11: Channel Conformations

Page 23: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

22

The advent of biochips creates the potential to mapan individual’s entire genetic code from a drop ofblood, almost instantaneously, which has enormousimplications for pharmacogenomics. Such chips alsohave applications in agriculture, where they could helpexplain why some strains of plants are hardier andmore disease resistant than others.

The ability to construct machines operating at a mo-lecular level also could revolutionize the field of medicalimaging and diagnostics. Instead of using x-rays, mag-netic resonance imaging, biopsies, or exploratorysurgery for diagnostic purposes, doctors would injectthe patient with a fluid containing trillions of molecule-sized nanomachines. Each one would be equippedwith light or sound-based imaging systems to scanthe patient’s physiological and biochemical processesdown to the level of individual lipids, proteins, carbo-hydrates, and nucleic acids within cells. Thenanomachines would then transmit this information toa computer outside the patient’s body for viewing andanalysis. Moreover, nanomachines could serve as dis-pensers of important biochemicals that are lacking insome individuals, for example, dispensing insulin todiabetics.

Nanobiotechnology, by exploiting the properties ofhydrophobins, is also being used to construct “mo-lecular wires,” able to conduct at high speeds. Allhydrophobins have a major effect on the water sur-face tension. Hydrophobins are small proteins that are

widespread among fungi. They form a more or lessinsoluble amphipathic membrane at hydrophobic/hy-drophilic interfaces and are among the mostsurface-active biosurfactants known. In Class Ihydrophobins, this amphipathic membrane is highlyinsoluble and is characterized, on the hydrophobic side,by a typical pattern that is also found on the outside ofmany aerial fungal structures. Modification of the ge-netic properties of hydrophobins can alter adherenceto hydrophobic surfaces. And, they can be manipu-lated to self-assemble on water-air, water-oil, andwater-hydrophobic solid interfaces into an amphipathicfilm that cannot be disassembled, even at high heat.In essence, the hydrophins line up, forming a molecu-lar wire.

Like all new technologies, obstacles to developmentmust be overcome. For example, biological instabilitycan interfere with the operation of anynanobiotechnology device. As with molecular comput-ers, finding ways to control for the natural tendency oforganic molecules to rearrange themselves is a majorchallenge.

Figure 12: Hydrophobin self-assembled into a hydro- phobic rodlet layer

Page 24: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

23

Barbara BakerResearch LeaderPlant Gene Expression CenterUSDA/ARSAdjunct ProfessorDepartment of Plant and Microbial BiologyUniversity of California, Berkeley

Plant GenomicsAgriculture is a major driving force behind environ-mental change. Pollution from pesticides, soil erosionfrom deforestation, and soil salination from irrigationand fertilizers are just a few of the widespread effectsof our growing dependence on modern agriculturalmethods. Because agriculture is our primary source offood and because so much of our land is used foragriculture, there is a clear need to develop methodsthat promote sustainable agriculture. Research in plantbiology is addressing this need. The fundamental plantprocesses—growth, development, reproduction, pho-tosynthesis, and responses to environmental conditionsand pathogens—are the underlying causes of nearlyevery agricultural issue we face, yet we know surpris-ing little about the molecular biology behind theseessential functions. Plant research has only begun to

unravel the complex molecular interactions that gov-ern plant behavior. What we have learned so faralready has tremendous potential for agricultural ap-plication.

Research on how plants sense and respond to patho-gens is one of the best illustrations of where plantresearch is headed and its impact on agriculture. Find-ings during the last two decades indicate that plantshave exquisite molecular mechanisms for sensing andresponding to other organisms and to their physicalenvironment. Plants have developed responses to pro-tect themselves against a diverse group of pathogens,including bacteria, viruses, fungi, and nematodes. Thegenetic basis of pathogen resistance has been knownfor at least 100 years—ever since it was noted that

Figure 13: Fundamental Plant Process

Page 25: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

24

certain varieties of our crops were more resistant todisease than others. However, it was not until 1994that the first resistance genes were isolated and char-acterized.

In the last eight years, many more resistance geneshave been identified and most encode receptor-likeproteins that are activated by the presence of specificpathogens in a process that is postulated to operate

much like a lock and key. Once activated, these re-ceptor proteins trigger a signaling network leading toresistance to the pathogen. Other genes encode thecomponents of the signaling network. The receptor-like proteins and signaling proteins are oftenstructurally and functionally similar across differentplant species, suggesting that pathogen receptor-likeproteins are linked to signaling networks common tomany species. We are discovering that different plant

Figure 14: Plant disease is caused by a diverse range of pathogens and plants have evolved resistance to pathogens

Page 26: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

25

The fact that plants have an inherent ability to recog-nize and respond to pathogens has profoundimplications for agriculture. We currently spend bil-

lions of dollars every year on chemical pesticides andstill continue to lose billions of dollars due to disease.If we can harness a plant’s natural ability to defenditself, then we may be able to develop alternative,environmentally benign forms of pest control. Cropspecies often lack effective genetic resistance to someof their most virulent pathogens. Although resistancegenes to these pathogens often exist in other plantspecies, barriers to inter-specie crosses frequentlyprevent them from being introduced by conventionalbreeding. However, we may be able to circumventthis by using cloning technology to transfer resistancegenes from a resistant plant species to a susceptibleplant species. We may also be able to engineer genesto confer resistance to new types of pathogens, forexample, genes that encode a receptor molecule thatmakes use of a common signaling pathway to induceresistance, but can only be activated by the specificdisease-causing pathogen.

The Baker lab is focused on the identification, isola-tion, and characterization of genes that naturally protectcrops from common diseases, and they have chosenpotato as their plant model. As the fourth most im-portant contributor to human calorie consumption, thepotato has an enormous influence worldwide. Potatois part of a family that includes several other com-mercially important crops such as tomato, pepper,and tobacco. Although each displays a number ofunique adaptive features, such as tuber formation inpotato and edible fruit in tomato, they all share similar

processes from growth and development, to photo-synthesis and pathogen resistance form complexnetworks that often overlap. Moreover, the core mecha-nisms and components of these processes are similarand transcend species lines and even cut across theboundary between the plant and animal kingdoms.

Figure 15: Pathogen sensing receptors in plants andanimals are structurally and functionally conserved

Page 27: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

26

genomes with respect to gene content and genomeorganization. This conservation provides a platformto readily leverage data and resources from one spe-cies to others in this family. Thus, the global influenceand impact of potato, coupled with its unique andshared attributes with other family members, makesit extremely important and relevant to plant biologyand genome research.

Potato is susceptible to several pathogens that causesignificant crop loss worldwide. A century of potatobreeding efforts has resulted in the introduction ofdisease-resistance traits from wild potato species.However, breeding cultivated potato for a single dis-ease-resistance trait can take years and pathogensrapidly evolve to overcome single resistance traits.Potato breeders are in need of additional informationand genetic resources to compete with the challengesof ever changing pathogen populations. Genetic stud-ies have identified vast and untapped pathogenresistance traits in wild potato. Accumulating evidencesuggests that regions of wild potato genomes maycontain large, clustered arrays of resistance genes,which protect against many different types of patho-gens.

Among potato’s most virulent pathogens isPhytophthora infestans, a fungus that causes late blightdisease in both potato and tomato. Phytophthorainfestans was responsible for the Irish Potato Famineof the mid-nineteenth century. To date, late blight

continues to be one of the most devastating of all plantdiseases. Yet, no major cultivar with adequate lateblight resistance is grown in the United States today.Recently, late blight has re-emerged on a global scaledue to migrations beginning in the 1970s of exoticstrains of Phytophthora infestans from Mexico to otherlocations worldwide.

The Baker lab, in collaboration with several other labs,is conducting a genomics project to study resistanceto late blight disease in wild species of potato. Thegoal is to identify, isolate, and characterize regions ofthe potato genome bearing resistance genes and todetermine the components of signaling pathways lead-ing to disease resistance. They want to know whichgenes are involved in disease resistance, where thosegenes are located, and the functions of the moleculesthey encode. To answer these questions, they are us-ing some of the latest and most innovative techniquesin biotechnology. For example, advances in genomicsequencing have allowed them to rapidly collect andanalyze vast amounts of DNA sequence data so thatthey can identify and map chromosomal resistanceregions more quickly. They are also using microarrays,also referred to as “gene chips,” to perform genome-wide expression profiling in order to identify genesthat are expressed in response to a pathogen andmay encode key components in disease resistance.

Once they have identified candidate resistance genesthrough sequencing, mapping, and microarray analy-

Page 28: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

27

ses, they then use a new and powerful technique calledvirus-induced gene silencing (VIGS) to test the func-tion of those genes in the plant. Traditionally, whenresearchers in plant biology wish to study the functionof a gene, they generate a transgenic plant in whichthe gene of interest is either absent or non-functional.They then grow the plant and look for aberrant physi-cal attributes. This method presents a problem forstudying the signaling components involved in diseaseresistance because we now know that many of thesecomponents are also involved in other fundamentalplant processes such as growth and development. If,for example, the candidate gene in disease resistanceis also necessary for initial cell growth and differentia-tion, then the transgenic plant lacking the functionalgene will not be able to grow to the point where itsability to resist disease can be studied. VIGS providesan alternative because it allows researchers to blockgene expression in specific parts of a plant at specifictimes. The Baker lab uses the VIGS technique withcandidate genes like the one described above to blockgene expression after a plant has already undergoneits initial development.

The techniques and technology used in the Baker labhave enabled them to take a look at plant behaviorfrom a number of different angles. High throughputDNA sequencing and microarray gene expression pro-filing allow for genomic analysis. VIGS may help tietheir research to real world application by allowingthem to study the function of genes uncovered through

their genomic analysis.

Disease resistance, like many other areas in biologicalresearch, has benefited tremendously from advancesin genomics. Genomics allows for a comprehensivestudy of overall expression of RNA, protein, and me-tabolites in a functionally relevant context and hasgreatly accelerated discovery in biology. Comparativegenomics is being used to make predictions aboutprotein function based on the known function of astructurally similar protein in another species. Thesepredictions have pushed discovery rates even more.Genomics has led to an unprecedented wealth of data.Bioinformatic analysis of this data is making a hugecontribution to the understanding of cellular functionand the mechanisms governing plant growth, devel-opment and survival. Interestingly, a useful bi-productof genomic research may be in the development anduse of plants as remote sensors in our physical envi-ronment. For example, we can envision a future inwhich a plant’s finely tuned sensing capabilities arelinked to the wireless planetary grid to warn us of im-pending changes in the biotic environment.

Genomics has accelerated research on disease resis-tance in plants and is demonstrating the potential forreal-world applications in agriculture. Future effortsto enhance agricultural productivity will benefit fromgreater international and interdisciplinary collabora-tion. The fact that so many plant processes and theirsignaling networks overlap paired with the inherently

Page 29: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

comprehensive nature of genomics, sets the stage forgreater integration of the international plant and agri-cultural research communities. By collaborating andmaking our data publicly available, we can speed thediscovery process. This is crucial if we are to turn ourmolecular understanding of basic plant processes intonew approaches for sustainable agriculture. The tieswe are establishing through collaborative research andthe bridging of technology define the future of plantbiology and shape the agriculture of tomorrow.

Figure 16: The Future of Plant Research

28

Page 30: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

Lothar WillmitzerProfessorMax Planck InstituteGolm/Potsdam, Germany

Plant Bioengineering

As proof that you cannot always plan science, it is worthrecalling discoveries made in the agrobacterium sys-tem. While studying a plant disease named crown gall,caused by Agrobacterium tumefaciens, scientists dis-covered that the disease spread in plants bytransference of genetic material from the bacteriuminto the plant cells. Basically, the bacterium transferspart of its DNA to the plant, and this DNA integratesinto the plant’s genome, causing the production of tu-mors and associated changes in plant metabolism.This led to the development of techniques to cut andsplice DNA and introduce genes into plants and hasenabled this bacterium to be used as a tool in plantplantplantplantplantbreedingbreedingbreedingbreedingbreeding. Thus, any desired genes, such as insecti-cidal toxin genes or herbicide-resistance genes, canbe engineered into the bacterial DNA and insertedinto the plant genome. The use of Agrobacterium notonly shortens the conventional plant breeding pro-cess, but also allows entirely new (non-plant) genes tobe engineered into crops.

In recent years, much progress has been made in thedevelopment of tools to create and characterize ge-netic diversity in plant systems, for example, transgenicknock-out populations, transposon insertions, chemi-cal gene machines, and highly efficient ways to genotypesingle nucleotide polymorphisms (SNPs) within largepopulations. In addition, complete plant genomes havebeen elucidated, allowing for phenotyping at the mo-lecular level and use of microarrays to determine theexpression levels of thousands of genes in parallel.

Metabolic profiling using gas chromatography massspectrometry technologies represents a largely un-tapped potential in the field of functional genomics.

In addition to making crops resistant to pests andweeds, plant bioengineers are beginning to cut andpaste genes to make crop plants more salt- ordrought-tolerant, and to produce better tasting andnutritious foods. And, because plants are the bestchemical engineers, they can be engineered to pro-duce specific compounds, such as industrial oils,plastics, enzymes, and even drugs and vaccines. 29

Page 31: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

30However, successful introduction of genetically engi-neered plants faces technical and social barriers. Amongthe problems that researchers and regulators face withregard to bioengineered seeds is that continual expo-sure to engineered proteins in bioengineered cropscould result in selection for new strains of insects thatcan withstand the toxic proteins of aggressive bacteria.Thus, seed companies must ensure that farmers planta small portion of their fields Another concern is thatgenes for herbicide resistance could be passed, viacross-pollination, to related weed species. Thus resis-tance to a particular herbicide may appear in some

strains of weeds. Thus, attention to environmentalhazards must be considered as well as consideringthe benefits.

Finally, some consumers are concerned that genes thatproduce an allergy-inducing protein in one food plantmight be introduced into another plant, which mightthen be eaten by an unsuspecting allergic individual.Conversely, genetic engineering offers the possibilityof removing offensive and allergy invoking antibodiesfrom food.

Page 32: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

31

William Stafford NobleAssistant ProfessorDepartment of Computer ScienceColumbia Genome CenterColumbia University

Bioinformatics in 2011

Computation will have a sweeping impact on both bi-ology and clinical medicine. Two trends in computerscience—transparent, distributed database technol-ogy and “organic learning”—will advance ourunderstanding of fundamental biology.

Although the amount of genome sequence data inpublic databases is vast, it is still only three gigabytes,a small percentage of a typical hard disk. What makesbiological data difficult to cope with is not just its size,but also its complexity. Even though the human ge-nome is composed of a four-base alphabet, additionalinformation must be taken into account—for example,the methylation status of each base, the sequence simi-larity of each subsequence with respect to various otherorganisms, and the structures of the genes and theproteins that they code for.

In order to understand these complex data sets on areasonably large scale, the in-tegration of various typesof data must be accomplished automatically. We aremoving toward a transparent database interface, aninvisible web that offers to any bioinfor-matics re-searcher, or more importantly, to any bioinformaticscomputer program, the ability to browse, search, col-lect and manipulate a wide variety of data that is storedin a distributed fashion. Currently, web-accessibledatabase systems integrate several different kinds ofdata—sequence data, literature, and in the not-too-distant future, microarray gene expression data.However, this centralized database architecture will

not last. With the development of appropriate mark-up languages like XML, the invisible web will spring uparound the centralized servers, offering a host of ad-ditional types of data and analyses. The central featureof this web will be a uniform protocol for representingand retrieving biological data, particularly for specificmolecules.

The capability to retrieve heterogeneous data will beuseless without intelligent means of combining it. Al-gorithms will be needed, for example, to combine dataabout gene function based on co-expression or basedon co-occurrence in the genome. The lesson is simplythat more data is not always better, and that learningfrom heterogeneous data is not as simple as throwing

Figure 17: The invisible web

Page 33: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

32

it together in one place. In the rush to create and ex-ploit high-throughput technologies, it is important toremember that we need intelligent analysis techniquesto separate the signal from the noise.

Every interesting bioinformatics problem can be posedas a learning problem, in which inferences, or predic-tions, are made from a given collection of observations.This learning framework describes problems as vari-ous as protein structure prediction, automatic genefinding, protein homology detection, gene network in-ference, and on and on. It is often possible to define anetwork, train it on a mass of data, and end up with asystem that makes reasonably good predictions. Un-fortunately, it is notoriously difficult to figure out whythe system works well, because the network itself isnot easily interpretable. In the lingo, it is “sub-sym-bolic.” For complex inputs in which particular constraintsare understood in advance, it is sometimes possible(but not easy) to design a network that will maintainthose constraints.

Bayesian statistics assumes that everything is a ran-dom variable, and Bayesian inference adopts this samestance. The laws of probability govern every modeland provide a uniform framework in which to repre-sent essentially any kind of knowledge. For some typesof data, the result is a powerful, scalable, and efficientlearning algorithm. Hidden Markov models, for ex-ample, are probabilistic models of time-series datawith certain independence assumptions. The most

general form of Bayesian model is called a Bayesiannetwork. The content of a Bayesian network is sym-bolic. Each node represents one random variable, andedges in the network typically represent causality.

The problem is that some learning algorithms outper-form Bayesian methods on many important problems.Differences in performance are a result of the differ-ence in an algorithms’ ability to cope with priorknowledge and to provide explanations. People havebeen predicting the development of such algorithmssince the late 1950s. Sometime within our lifetimesthere will be a watershed at which the first computercapable of “organic learning” emerges. “Organic learn-ing” refers to an algorithm that learns continually andis capable of incorporating a wide variety of knowl-

Figure 18: Genomic learner

Page 34: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

33

edge. Organic learning is the only way that we aregoing to be able to synthesize the biological knowl-edge stored in the invisible web.

Probably the first place that a web-enabled organiclearner will be employed is in genome analysis. Imag-ine a permanently active gene-finding engine. Itoperates on the complete human genome, sifting andre-sifting the data to iteratively improve its ability topredict known genes. For genes that it has troublerecognizing, it looks on the web for supplementarydata — methylation patterns to identify promoter re-gions, or sequence similarity scores from othergenomes, or even predictions from other gene-find-ing algorithms. When new data appears on the web,the engine decides whether that data is useful. All avail-able data is constantly prioritized: would it be betterto look at this new EST data set that just appeared, orwould it be more useful to spend more time lookingclosely at the sequence similarity scores from mouseand fish? When a user queries the engine with a par-ticular set of genomic coordinates, the engine reportsits most up-to-date prediction for that region. Theprediction includes hypotheses about alternativelyspliced exons, as well as certainty estimates for eachgene feature.

Eventually, we would like to have a model that cap-tures not only the genome or the proteome, but alsothe entire cell. A realistic, whole-cell model that pro-vides accurate molecular scale predictions is a distant

vision, but possible. It will incorporate essentially all ofour knowledge about the genome, about individualproteins, their three-dimensional structures, bindingaffinities, and participation in metabolic and transcrip-tional pathways. It will enable us, for example, to predictthe effect of a single-base mutation: this nucleotidechange will result in a different protein sequence, whichwill fold differently, which will consequently interactless strongly with its substrate. The mutation’s effectsthus will be traceable throughout the entire cell. Per-haps more significantly, we will be able to design andtest drugs “in silico”. The whole-cell model will not onlypredict individual molecular interactions, but will alsoextrapolate the drug’s effect throughout the cell. Tosome extent, environmental interactions might be in-corporated into the model, allowing the model of ahuman cell to be used to infer phenotypic effects.

The primary challenge in building such a model is dataintegration. There are many smaller problems as well,some of them extremely daunting. Protein structureprediction and gene network modeling are the twomost obvious ones. Only a web-enabled organiclearner will be capable of accomplishing this integra-tion.

In clinical medicine, programs will learn to predict theonset of disease, the disease progression, and its re-sponse to various forms of treatment, in essence,personalized medicine. Microarray expression chipsalready provide a snapshot of the mRNA expression

Page 35: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

34

levels of thousands of genes in a particular tissuesample. These chips will soon cover the entire ge-nome. They will be much more accurate and they willbe complemented by chips that measure protein con-centrations as well as mRNA concentrations. Imaginea SNP chip that measures in a drop of blood the oc-currences of hundreds of thousands of known singlenucleotide polymorphisms. Not all of these SNPs willbe individually phenotypically characterized, but thepattern of SNPs will be a kind of human genomic pro-file, like the expression profile of a gene. These varioustypes of genomic profiles—expression and polymor-phisms—will be coupled with a profile of the individual,including a complete medical history, as well as theresults of other diagnostic tests. As wearable com-

puters become more common, this history will includedetailed data collected by monitoring devices.

Your genomic-phenomic profile is compared to a da-tabase of millions of such profiles. This database isfurther augmented by knowledge of gene function—their molecular activities and metabolic andtranscriptional roles. Today, we have identified a smalllist of alleles that are implicated in some diseases.None of these allelic effects, however, is nearly as sta-tistically significant as lifestyle factors such as eatinghabits, exercise, or smoking. Only by integrating acomplete genomic profile with a complete medicalprofile of the individual will powerful predictive medi-cine be possible. Using this picture of the entireindividual, our analysis techniques will accurately pre-dict the onset of particular diseases. More productively,the analysis will suggest appropriate lifestyle changes—perhaps emphasizing a particular vitamin or avoidingparticular foods. Diseases that are in their infancy couldbe detected immediately, and a personalized drugtreatment regime could be designed.

This then is the picture: ten years from now, an invis-ible knowledge network will surround us. The net willbe accessible by long-lived, learning agents that spe-cialize in particular domains. The learners themselveswill likely be distributed and replicated, providing acomputational interface to the web. The result will bea symbiosis: they will constantly be learning from us,just as we learn from them.

Figure 19: Personal medical profile

Page 36: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

35

Bioinformatics andComputational Needs

Functional genomics opens up new possibilities andraises new requirements for computational tools. In-teractions between experimental biologists and thebioinformatics community will be essential to devel-oping the tools needed to move from genomesequence data to global functions. Data handling anddatabase schemes are currently the focus of manyefforts in the bioinformatics community. Bioinformaticsis not simply the melding of biology and computerscience but instead should be viewed as a service tomolecular biology. It is an evolving field that can servethe needs of evolutionary biologists, biotechnology, newdrug developments, and areas such as cellular orga-nization, genotyping, and creation of cellular factories.Four methodological areas deserve particular atten-tion. However, to achieve advances in these areas, itwill be critically important to ensure that databaseschemas for DNA array data are publicly available intheir entirety.

First, methods are needed to predict protein-proteininteractions based on the analysis of multiple sequencealignments. These methods are related to earlier de-velopments in sequence analysis and protein structureprediction in the area of bioinformatics. Recent ad-vances in molecular biology have provided a vastamount of genetic information for many different or-ganisms. Currently, one of the most challenging issuesis to establish the possible interactions between dif-ferent protein components at different levels, in whathas been called “neighborhood relationships.” Rather

than focusing on direct physical interactions, a num-ber of computational efforts have recently addressedthe problem of predicting proteins with general func-tional relationships. Functional interactions have beenpredicted based on comparisons of the species dis-tributions of gene pairs. These methods assume thatgenomes encoding one member of an interaction pairwill necessarily also encode its interacting partner. Eventhough these approaches all have promising features,they are still unable to cope with the complexity andextension of protein interaction networks in real sys-tems. Much remains to be done, therefore, in thedevelopment of new approaches and integration ofexisting ones.

Second, methods are needed to predict protein-pro-tein interactions based on the study of regulatory andother genomic signals with data provided by genomeanalysis and genome comparison applications. Ourknowledge of the evolutionary forces and processesthat play a role in the organization of genomes is farfrom perfect, and general approaches able to cap-ture the relationship between genomic and functionalorganization have to be developed.

Third, we need means to extract information on pro-tein-protein interactions by systematic analysis of textsources, based on data mining and text analysis tech-niques. New approaches have emerged for theextraction of information on protein-protein interac-tions. These initial systems are based on previous

Alfonso ValenciaProfessorProtein Design GroupCentro Nacional de BiotecnologiaCantoblanco, Madrid, Spain

Page 37: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

36

experience in the detection of significant, characteris-tic keywords in sets of Medline abstracts referring toprotein families, where the use of statistical methodswas sufficient to generate meaningful results withoutthe further need to implement syntactical analysis. Thechallenge ahead is to incorporate more refined statis-tical methods with other new computational techniquesto improve the coverage and accuracy of detected in-teraction networks. Current approaches would also beextended beyond protein interactions to related bio-logical issues, such as DNA-protein interactions,drug-protein binding, tissue distribution and disease-associated characteristics. Furthermore, problems inmolecular biology will connect with medical informatics,where access to clinical records and medical informa-tion is currently a demanding issue.

Finally, we need methods to simulate the behavior ofmetabolic and signaling pathways with techniques thatinclude numerical and logical descriptions of interac-tions. It is reasonable to think that in the near futurethe amount of genomics and functional informationavailable will be sufficient to define most cellular func-tions and interactions. Once all this information hasbeen integrated, the molecular biology andbioinformatics communities will be at the point of tak-ing a new step for the reconstruction of interactionnetworks and simulation of their behavior.

Figure 20: The flow of information in the post-genomic era

Page 38: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

37

Keynote SpeakerDr. Larry SmarrDr. Larry SmarrDr. Larry SmarrDr. Larry SmarrDr. Larry SmarrDirectorCalifornia Institute forTelecommunications andInformation TechnologyProfessorComputer Science and EngineeringUniversity of California, San Diego

SpeakersDr. Barbara BakerDr. Barbara BakerDr. Barbara BakerDr. Barbara BakerDr. Barbara BakerResearch LeaderPlant Gene Expression CenterUSDA/ARSAdjunct ProfessorDepartment of Plant and Microbial BiologyUniversity of California, Berkeley

Dr. Laura F. LandweberDr. Laura F. LandweberDr. Laura F. LandweberDr. Laura F. LandweberDr. Laura F. LandweberAssociate ProfessorDepartment of Ecology and Evolutionary BiologyPrinceton University

Dr. William Stafford NobleDr. William Stafford NobleDr. William Stafford NobleDr. William Stafford NobleDr. William Stafford NobleAssistant ProfessorDepartment of Computer ScienceColumbia Genome CenterColumbia University

U.S. Participants List

Discussants(Early Career Researchers)Dr. Mary Beth MudgettDr. Mary Beth MudgettDr. Mary Beth MudgettDr. Mary Beth MudgettDr. Mary Beth MudgettPost Doctoral ScholarDepartment of Plant and Microbial BiologyUniversity of California, Berkeley

Dr. Paul PavlidisDr. Paul PavlidisDr. Paul PavlidisDr. Paul PavlidisDr. Paul PavlidisAssociate Research ScientistColumbia Genome CenterColumbia University

Dr. Lydia L. SohnDr. Lydia L. SohnDr. Lydia L. SohnDr. Lydia L. SohnDr. Lydia L. SohnProfessorDepartment of PhysicsPrinceton University

US GovernmentRepresentativesDr. Mary E. Clutter,Dr. Mary E. Clutter,Dr. Mary E. Clutter,Dr. Mary E. Clutter,Dr. Mary E. Clutter, U.S. ChairAssistant DirectorDirectorate for Biological SciencesNational Science Foundation

Ms. Martha Steinbock,Ms. Martha Steinbock,Ms. Martha Steinbock,Ms. Martha Steinbock,Ms. Martha Steinbock,U.S. Executive SecretaryTechnology Transfer Coordinator, Pacific West AreaAgricultural Research ServiceU.S. Department of Agriculture

Page 39: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

38

Dr. James L. EdwardsDr. James L. EdwardsDr. James L. EdwardsDr. James L. EdwardsDr. James L. EdwardsDeputy Assistant DirectorDirectorate for Biological SciencesNational Science Foundation

Dr. Elaine Z. FrancisDr. Elaine Z. FrancisDr. Elaine Z. FrancisDr. Elaine Z. FrancisDr. Elaine Z. FrancisNational Program DirectorEndocrine Disruptors Research ProgramOffice of Research and DevelopmentUS Environmental Protection Agency

Dr. Maria Y. GiovanniDr. Maria Y. GiovanniDr. Maria Y. GiovanniDr. Maria Y. GiovanniDr. Maria Y. GiovanniDivision of Microbiology and Infectious DiseaseNational Institute of Allergy and Infectious DiseaseNational Institutes of Health

Dr. Maryanna HenkartDr. Maryanna HenkartDr. Maryanna HenkartDr. Maryanna HenkartDr. Maryanna HenkartDivision DirectorDivision of Molecular and Cellular BiosciencesDirectorate for Biological SciencesNational Science Foundation

Dr. Stephen H. KoslowDr. Stephen H. KoslowDr. Stephen H. KoslowDr. Stephen H. KoslowDr. Stephen H. KoslowDirectorOffice on NeuroinformaticsAssociate DirectorNational Institute of Mental HealthNational Institutes of Health

Ms. Rachel LevinsonMs. Rachel LevinsonMs. Rachel LevinsonMs. Rachel LevinsonMs. Rachel LevinsonAssistant Director for Life SciencesOffice of Science and Technology PolicyExecutive Office of the President

Dr. Caird Rexroad, Jr.Dr. Caird Rexroad, Jr.Dr. Caird Rexroad, Jr.Dr. Caird Rexroad, Jr.Dr. Caird Rexroad, Jr.Associate Deputy AdministratorAnimal Production, Product Value and SafetyAgricultural Research ServiceU.S. Department of Agriculture

Dr. Deborah SheelyDr. Deborah SheelyDr. Deborah SheelyDr. Deborah SheelyDr. Deborah SheelyProgram DirectorCompetitive Research Grants andAwards AdministrationCooperative State Research Education and ExtensionService, U.S. Department of Agriculture

Dr. Judith St. JohnDr. Judith St. JohnDr. Judith St. JohnDr. Judith St. JohnDr. Judith St. JohnAssociate Deputy AdministratorCrop Production, Product Value and SafetyAgricultural Research ServiceU.S. Department of Agriculture

Dr. Anne K. VidaverDr. Anne K. VidaverDr. Anne K. VidaverDr. Anne K. VidaverDr. Anne K. VidaverChief ScientistNational Research InitiativeCompetitive Grants Program,Cooperative State Research Education andExtension ServiceU.S. Department of Agriculture

RapporteurDr. Kathi E. HannaScience and Health Policy Consultant

Page 40: Forecasting the Future of Biotechnology - The Blue Sky Workshop

FORECAST ING THE FUTURE OF B IOTECHNOLOGY

39

SpeakersDr. George T. RobillardDr. George T. RobillardDr. George T. RobillardDr. George T. RobillardDr. George T. RobillardProfessorBiomade Technology FoundationGroningen, The Netherlands

Dr. Alfonso ValenciaDr. Alfonso ValenciaDr. Alfonso ValenciaDr. Alfonso ValenciaDr. Alfonso ValenciaProfessorProtein Design Group, CNB-CSICCentro Nacional de BiotecnologiaCantoblanco, Madrid, Spain

Dr. Lothar WillmitzerDr. Lothar WillmitzerDr. Lothar WillmitzerDr. Lothar WillmitzerDr. Lothar WillmitzerProfessorMax Planck InstituteGolm/Potsdam, Germany

Discussants(Early Career Researchers)Dr. Robert FriesenDr. Robert FriesenDr. Robert FriesenDr. Robert FriesenDr. Robert FriesenBiomade Technology FoundationGroningen, The Netherlands

Dr. Paulino Gomez-PuertasDr. Paulino Gomez-PuertasDr. Paulino Gomez-PuertasDr. Paulino Gomez-PuertasDr. Paulino Gomez-PuertasProtein Design GroupCNB-CSIC, Centro Nacional de BiotecnologiaCantoblanco, Madrid, Spain

Prof. Dr. Bernd Müller-RöberProf. Dr. Bernd Müller-RöberProf. Dr. Bernd Müller-RöberProf. Dr. Bernd Müller-RöberProf. Dr. Bernd Müller-RöberUniversität PotsdamPotsdam, Germany

E.C. Participants List

E.C. RepresentativesMr. Bruno HansenMr. Bruno HansenMr. Bruno HansenMr. Bruno HansenMr. Bruno Hansen, E.C. ChairDirectorBiotechnology Agriculture andFood Research DirectorateEuropean Commission

Dr. Maurice Lex,Dr. Maurice Lex,Dr. Maurice Lex,Dr. Maurice Lex,Dr. Maurice Lex,E.C. Executive SecretaryResearch Directorate GeneralEuropean Commission

Dr. Laurent BochereauDr. Laurent BochereauDr. Laurent BochereauDr. Laurent BochereauDr. Laurent BochereauResearch Directorate GeneralEuropean Commission

Dr. Ioannis EconomidisDr. Ioannis EconomidisDr. Ioannis EconomidisDr. Ioannis EconomidisDr. Ioannis EconomidisResearch Directorate GeneralEuropean Commission

Dr. Patrice LagetDr. Patrice LagetDr. Patrice LagetDr. Patrice LagetDr. Patrice LagetHeadScience, Technology & Education SectionDelegation of the European Commission

Dr. Etienne MagnienDr. Etienne MagnienDr. Etienne MagnienDr. Etienne MagnienDr. Etienne MagnienResearch Directorate GeneralEuropean Commission

Mr. Carlos Martinez RieraMr. Carlos Martinez RieraMr. Carlos Martinez RieraMr. Carlos Martinez RieraMr. Carlos Martinez RieraResearch Directorate GeneralEuropean Commission

Page 41: Forecasting the Future of Biotechnology - The Blue Sky Workshop