Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall...

20
Teaching and Using Informatics Tools for Molecular Biology and Genetics Prepared by: Andrés Aravena March 2015

Transcript of Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall...

Page 1: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

Teaching andUsing InformaticsTools for MolecularBiology andGenetics

Prepared by:Andrés AravenaMarch 2015

Page 2: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

The Challenges of Modern Molecular Biology and GeneticsInternet make Molecular Biology and Genetics accessible to morepeopleMore players come to the gameData is cheap, Knowledge is expensivePCR can soon become available for the general publicWhat will be our competitive advantage in the next ten years?How will we be different from the rest?Data Analysis for Molecular BiologyUndergrad Level Computation CoursesGraduate Level CoursesSystems Biology and Advanced Computational ToolsCreating a "Laboratory of Bioinformatics and Systems Biology"New tools for new scienceCreating innovative experimental toolsMaking "Software" for PCR machinesLearning by MakingCreating a "Scientific Instrumentation Laboratory"References

Contents Outline

Page 3: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

The Challenges ofModern MolecularBiology andGenetics

The Department of Molecular Biologyand Genetics at Istanbul University wascreated 13 years ago. In the followingyears it established itself as a wellrespected department. It attracts everyyear between 40 and 60 of the beststudents in the country. They can laterfollow Masters and Doctorate post-degrees. It has published 376 papers,mainly on plant genomics and fissionyeast as a model for humanmetabolism. Our department is youngbut is up-to-date with modernmolecular biology and genetics theoryand practice.

During the same years, we have seenan increasing interest of the generalpublic in molecular biology andgenetics. The sequencing of thehuman genome, made public by thepresident of USA, captured theattention of everybody. Physicists,mathematicians, computer scientistand engineers, turned their attentionto molecular biology questions. Theycome looking with new eyes andcreating new theoretical and practicaltools.

Molecular Biology and Geneticshave become mainstream

Today genomic tools are also usedoutside academia. Several companiesprovide "personalized DNA services".They sell genome analysis kits toreport hereditary health conditionsand traits. One of these companies,23andMe, is partially owned byGoogle.

Another case is the Genographicproject, created by the NationalGeographic Society and IBM. It aim totrace ancestry and migrations of thehuman population. Any person canknow which are his true origins.

In these and other cases molecularbiology and genetics tools are used bycompanies with little connection toacademia.

Molecular Biology and Genetics are on themarket, and people are joining in

Source: http://mediacenter.23andme.com/en-gb/investors/ http://www.ibm.com/podcasts/howitworks/091806/HIW_09182006.pdf

Page 4: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

Before Internet times, top science was accessible only to researchers withmoney to make complex experiments or to buy books and journals. Findingreferences used to take several weeks by regular mail. Professors had the onlycopy of the textbooks.

Today all journals are accessible on-line. We can now download references in minutesat low cost, or free when the article is provided in Open Access mode. Most of theimportant journals follow this policy. Moreover, the experimental results of each articleare also available for free. This data has a structure that makes easy for any trainedscientist to process it and find new knowledge.

It is easy today to download structured data from all articles related to a subject andcombine them to produce new knowledge. Many of the programs used for this meta-analysis are also accessible in Open Source mode. Anyone can download theseprograms without cost. Scientist can change the program internal code to solve thespecific scientific question.

If the analysis requires big computational power it is possible to have it at low cost byusing cloud computers. Companies like Amazon.com and Google sell their sparecomputer power at low prices. This enables researchers to carry computations thatwould be unfeasible otherwise.

This democratization of knowledge, known as the flat world (Friedman 2007), providesus with an exciting challenge. We have the opportunity to become a player in the bigleagues, in a fair game on a leveled surface. But the same opportunity presents toeveryone else.

Rich countries have no longer the monopoly of knowledge. We can read the same books and thesame articles, use the same machines and the same programs. Anyone could make the newscientific breakthrough, either in New York, New Delhi or Istanbul.

Internet makeMolecular Biologyand Geneticsaccessible to morepeople

Page 5: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

The rising of BRICS and MINT economies is also helping to increase the number ofresearchers worldwide. Today there are more PhD students than ever (Cyranoski et al.2011). And many of them will focus on Molecular Biology, Genetics and related areas.

Everyday we see new interdisciplinary collaborations. With other areas of biology, as well aswith mathematicians, chemists and computer scientists. It is usual that engineers learnabout molecular biology, specially biotechnology engineers.

India graduates more than a million engineers each year. Egypt has 35.000 PhDstudents and Israel 10.000 (Hays 2011). Many of them will find jobs in MolecularBiology companies or academia

More players cometo the game

Source: http://www.nature.com/news/2011/110420/pdf/472276a.pdf

Page 6: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

The data production costs are reducingeveryday. The value is on the ideas

Advances in molecular biology andgenetics have resulted in a newgeneration of efficient machines. Theyproduce reproducible experimental resultsin big volumes and at low cost. Devicessuch as new generation sequencers,microarrays, mass spectrometers and real-time PCR.

A well planned experiment can beoutsourced to an external lab. The value ison the plan, the selection and handling ofthe samples and in the analysis. Datawithout analysis will not have any impact.

Research changes from producing Data toproducing Information, Knowledge andWisdom

Data is cheap,Knowledge isexpensive

Page 7: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

Today PCR thermocyclers are expensive devices found in universities andresearch centers, very much like desktop computers were in the 70's and 80's.Nowadays computers are low‐cost and found everywhere.

Will the same happen with PCR?

PCR thermocyclers can be really low-cost. There is no technical reasons why theycannot be made as low-cost as an Arçelik Mini Telve Turkish coffee maker. They haveessentially the same pieces.

Today only a few companies produce PCR thermocyclers, just like smartphones such asthe iPhone and Samsung. Nevertheless you can see them everywhere. And this is a bigopportunity for creators of software applications.

PCR is the new PC

“A computer on every desk and in every home, all running Microsoft software”.Bill Gates, Microsoft’s founding mission.

Gates set this goal in the late 70's, when it was not obvious if people would even see acomputer in their lives (Stross 1997; Schlender and Goldblatt 1995). PCR technology isnow in the same state that Personal Computers were in 1975.

If PCR machines become inexpensive, and there is "a PCR on every desk and home", inhospitals, restaurants and high schools, then who will be making "software apps" forthem?

PCR can soonbecome available forthe general public

Source: Image from *Arçelik* website

Page 8: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

What will be ourcompetitiveadvantage in thenext ten years?

If we don't fully analyze our ownexperimental data, someone elsewill do. And they will publish it.

A Proposal to Get Ready for the NextDecade

Good science is the combined resultof asking the good questions,selecting and preparing the goodsamples, carrying on correctexperiments, measure the resultingvalues with the proper instruments,making the most complete analysis,and determine the relevant biologicalconclusions.

If we keep the current good job, wewill keep performing high qualityexperiments and producing results inincreasing volumes. But data withoutanalysis is becoming less valuable. Forexample, the first bacterial genomicsequence was published in Sciencejournal (Fleischmann et al. 1995);today it would be just a shotcommunication.

Young molecular biologists need tolearn how to extract relevantknowledge and biological meaningfrom the raw data.

And, given the large volume of dataand the global competition with otherresearchers, this can only be doneusing computers and informatics tools.We should increase the competencesof our undergraduate students ondata science, so they can use the toolsthat exist today.

Our graduate level students shall gobeyond generic data science tools.They shall integrate molecular biologyand analytical methods. Get familiarwith "Systems Biology", thecomputational and mathematicalmodeling of complex biologicalsystems. In particular they have tolearn how to understand the emergingproperties of regulatory, metabolicand cell signaling networks. Themathematical tools can be learnedtogether with the biological context,so they make sense and are easier tolearn.

Every normal student is capable of goodmathematical reasoning if attention isdirected to activities of his interest(Piaget 1976)

Page 9: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

How will we bedifferent from therest?

If we use the same instruments,theories and tools as everyone else,we will be just one more lab.

The second way we can use toincrease the impact of our science is todesign and use new instruments. Newtools that can measure faster, withmore precision or at lower cost. Manyof the most important advances inscience are consequence of thecreation of new instruments.

At first glance it seems thatinstruments for molecular biology arehigh technology tools that we cannotmake or modify. A "space age"technology out of our reach. But allinstruments are based on the samebasic physicochemical principles. Weknow and understand these principles,and current technology allows us tomake and modify our owninstruments. MIT has being teachingthis in the course "How to MakeAlmost Anything" since 1998.

We should teach to our students howto make laboratory instruments, andencourage them to create the toolsthat do not yet exist.

Let us use computers and Internetto quickly learn the current tools

We can and should use all online toolsin our advantage. Use online materialto teach and learn faster. Students arealready doing so.

We shall overcome the languagebarrier. Science language is English,for good or bad. Young people learnlanguages fast.

We shall provide tools to enableinterdisciplinary collaboration, inparticular with data scientists, learn towork online, on distributed orinternational workgroups, learn aboutinternational funding tools andunderstand intellectual property rules,patents and licenses.

And we should be making our ownonline teaching material.

Page 10: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

Data Analysis forMolecular Biology

move from creating data to creating information, knowledge andwisdomundergraduate level:

use more online resourcesEnglish writing classintroduction to data scienceprogramming for research

graduate level:advanced computational toolsbiostatistics and data miningsystems biologyintellectual property

create a bioinformatics and systems biology laboratory

Page 11: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

Undergrad LevelComputationCourses

Teaching Data Science

We can transform the "ComputationIII" course, which today is "Databases",into an "Introduction to DataScience".

The students will learn how to handleexperimental data and how tocommunicate with scientists of otherdata-oriented disciplines.

They will learn how to producepublication quality reports withreproducible results. How to get rawdata, extracting relevant information,filter it using several selection criteria.How to store and retrieve it in efficientand useful ways. How to transform it,organize it, categorize it, display, showand understand the results.

Tools include Unix command linetools, SQL and the R statisticalpackage. The student should be ableto understand how computernetworks work and what are theirlimitations.

Teaching modern programming

Transform "Computation IV" from"Making Web pages" to"Programming for Research".

Teach Python and BioPython toanalyze, model, evaluate and predictthe behavior of genomic andmolecular biology entities.

The students should be able tointeract with high end servers, usecommand line tools and becomfortable in computingenvironments others than MicrosoftWindows.

The objective of this course is no tomake our students experts oncomputer science, but to give themthe concepts and language thatwill allow them to collaborate ininterdisciplinary groups

Page 12: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

Graduate LevelCourses

Communication

If current global trends continue, thenew generations of molecularbiologist will work in frequentcollaboration with peers from othercountries and other disciplines. Theyneed to have not only a good level ofEnglish, but they also need to handlethe language and concepts of otherdisciplines. We can help them bycreating "Scientific communicationworkshops" where they can exerciseand improve their writing andspeaking skills. In this workshopstudents should practice writingpapers and present them, in English,as a rehearsal of a conference. It canbe shared with other departments toencourage multidisciplinary interactionand learning of technical language.

Collaboration also requires onlinetools. Papers will be written by manyauthors simultaneously usingDropbox, Google Docs or GitHub. Ormaybe it will be more like Wikipediaor arXiv. Our students need to knowthese tools and the methodology thateach one requires.

Intelectual property

Many of these collaborations andpublications will not be necessarilypublished in scientific journals. Somemay be patented, some may belicensed for industrial applications.Others may be published on Wikipediaor other web media. Or delivered asopen source. The alternatives aremany and they are relevant for thenew generation of molecularbiologists. There should be a lecturewhere the students learn what can bepatented, how to patent an inventionand about the different licenses thatprotect or disclose intelectualproperty.

Preparing our people for themolecular biology to come

Page 13: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

Analysis of Complex Biological Systems

As long as molecular biology and genetics experiments continue to produce bigvolumes of results, the scientific value shifts from the data to the ideas. If a researcherwants to show that apples are good for health, it is not longer enough to showstatistical data that correlates apple consumption and health indices. He need topropose a model of the way that apples improve health, design an experiment to provethese hypothesis versus alternative ones and analyze that experiment.

One approach that appeared in the last years for doing this on molecular biology is"Systems Biology", the study of the interactions between the components ofbiological systems, and how these interactions give rise to the function and behavior ofthat system. This approach combines biology theory with mathematical andcomputational models.

Our students need to know the basic concepts of statistical analysis and mathematicalmodeling used in Systems Biology, and computational concepts for carry out thesemodels by themselves or in collaboration.

Advanced Computing Concepts

Grad students that follow a specialization on Systems Biology or Computational Biologywill be interacting with scientists of other areas. They need the language and concepts ofparallel and distributed computing, clusters, cloud, massive parallel computation, artificialintelligence, neural networks, genetic algorithms and other topics of high performancecomputing and data mining. We should provide the environment where they can graspthese concepts. Either an "Advanced Computing" lecture, a workshop or a laboratory.

Systems Biology andAdvancedComputational Tools

Source: http://www.systemsbiology.org/about-systems-biology

Page 14: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

To consolidate the lessons given in the previously discussed courses, we need aphysical space where the students can put them in practice. This would be a laboratorylike the others already existing in our department, only with different kinds of tools.

In a first stage it would be ideal to have

Books and bookshelvesPhysical space, network connection and electrical power for near 6 people, theircomputers and other devicesWifi access pointColor laser printerDesktop computers (Linux and Apple)Linux serverWhiteboardData projector or flat TVTableChairs

We will not teach math "just-in-case" it is useful. Students will learn "just-in-time" whenthey need it. The idea is the teaching be driven by the demand of knowledge, not bythe supply of it.

Theory of constructivism suggests that learners construct knowledge out of their experiences.Learners with different skills and backgrounds should collaborate in tasks and discussions toarrive at a shared understanding of the truth in a specific field (Duffy and Jonassen 1992).

Creating a"Laboratory ofBioinformatics andSystems Biology"

Page 15: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

New tools for newscience

Page 16: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

Main advances in Molecular Biology and Genetics (and in other sciences) are often theresult of using new instruments, which usually are named according to their inventor

Galileo created modern science when he made his own telescopeNewton also invented a new kind of telescope, still used todayBunsen enabled spectrometry analysis with his burnerSvedberg ultracentrifugueSanger DNA sequencing methodSouthern blot method for specific DNA detectionPCR to amplify DNA samples

Notice that most of these inventors got Nobel prizes for their contributions.

Usually we think that building molecular biology tools is beyond our capabilities.Nevertheless in the last 5 years this has changed dramatically.

Today low-cost 3D printers, laser-cutters and computer controlled manufacturing toolsare used by amateurs scientist and technology enthusiast around the world to make allkind of advanced and complex devices (Anderson 2012).

Here in Istanbul we find several low-cost laser-cutter companies, 3D printers are sold inshopping malls (as seen in the picture) and there are at least two "makerspaces", socialclubs where anyone can teach, learn and use these technologies.

Using the same instruments on the same organisms as others is like being a cover band. We canplay well but we will not make a real impact

Creating innovativeexperimental tools

Picture of a 3D printer taken at Tepe Nautilusshopping mall, Kadiköy

Page 17: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

Making "Software"for PCR machines

We have seen, on computers andsmartphone industries, that once thehardware becomes widely available,the software industry flourishes. As wediscussed earlier, PCR are now readyto follow the same path.

If PCR machines are availableeverywhere, applications can be:

Determining ancestry (e.g. racehorses, farm animals, fishes)Detection of unwantedorganismsMarker-assisted breedingTargeted expression analysisFood quality control inrestaurants (for example in anuniversity canteen)Security and control ofGenetically Modified OrganismsPolymorphism detectionClinical diagnosisPersonalized medicinePolice forensic analysisChoosing strains that maximizeyield in biotechnology industry

Making the new “apps” makers

Software for PCR means the specificparameters of an application:

DNA extraction protocolsPrimers designAmplification protocolsDetection methods

We should prepare our students tomake these "apps". They should haveeasy access to low-cost thermocyclers,use them frequently and creatively.

Then, like in the computer industry,they may create completely newapplications that we cannot foreseenow.

We can make our own low‐costPCR machines based on theOpenPCR model. The design isfree, materials are inexpensive.

Page 18: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

Learning by Making Amateur scientists are already buildinglow cost PCR machines. For exampleOpenPCR is a kit with cost under USD$600. Its design files can bedownloaded from the web and built athome.

If PCR machines are open source now,which tool will be the next?

We can create an advanced course on"Scientific Instrumentation" usinginitially software tools. In somesubjects we can partner with theElectrical Engineering Department.

It is easy to enter in this area. There isno need for a huge initial investment.Making instruments is now "software",not craftsmanship.

When needed we can first use theservices of the existing "makerspaces".If our prototypes are successful, thenwe can have our own 3D printer orlaser cutter.

Online collaboration enables fastchange of designs.

Computer aided manufacturing toolstransform these designs into realobjects. We can understand this with abiological analogy.

Designs in digital files are likegenes. 3D printers are likeribosomes, producing physicalversions of the design. Onlinecollaboration is like the evolution:designs are changed to improvetheir fitness.

A "Scientific Instrumentation" courseshould also consider teaching the useof online collaboration tools.

The best tools result from iterativeimprovement of pre-existing designs.Just like software. And science.

Page 19: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

Initially the theoretical part of an "Scientific Instrumentation Laboratory" can bedeveloped in the same space of the "Systems Biology Laboratory". The basic principlescan be learned in periodic workshops, where each participant can teach and learn.Prototype designs can be made in the computer. The physical versions of theseprototype tools can be fabricated in collaboration with the existing "makerspaces" oroutsourcing to laser-cutters and 3D printer services.

In the second year we will evaluate the economical feasibility of having our own 3Dprinter and other maker tools.

Funding

Most of the ideas presented in this executive summary are zero-cost or low-cost. The onlymajor investment would be this laboratory. The details of the costs are described in anappendix. In summary, besides the physical space, the cost is between USD $10.000 and$20.000. Specific funding sources (national or european) are open to discussion.

Creating a "ScientificInstrumentationLaboratory"

Page 20: Teaching and Using Informatics Tools for Molecular · beyond generic data science tools. They shall integrate molecular biology and analytical methods. Get familiar with "Systems

Anderson, C. 2012. Makers: The New Industrial Revolution. McClelland & Stewart.

Cyranoski, David, Natasha Gilbert, Heidi Ledford, Anjali Nayar, and Mohammed Yahia.2011. “Education: The PhD Factory.” Nature 472: 276–79.

Duffy, T.M., and D.H. Jonassen. 1992. Constructivism and the Technology of Instruction:A Conversation. Lawrence Erlbaum Associates Publishers.

Fleischmann, RD, MD Adams, O White, RA Clayton, EF Kirkness, AR Kerlavage, CJ Bult, etal. 1995. “Whole-Genome Random Sequencing and Assembly of HaemophilusInfluenzae Rd.” Science 269 (5223): 496–512. doi:10.1126/science.7542800.

Friedman, T.L. 2007. The World Is Flat: A Brief History of the Twenty-First Century.Farrar, Straus and Giroux.

Hays, Thomas. 2011. “PhDs: Israel Also Trains Plenty.” Nature 473 (7347). NaturePublishing Group: 284–84.

Piaget, J. 1976. The Child and Reality: Problems of Genetic Psychology. Penguin Books.

Schlender, Brent, and Henry Goldblatt. 1995. “Bill Gates and Paul Allen Talk.” Fortune,October 2, 68–86.

Stross, R.E. 1997. The Microsoft Way: The Real Story of How the Company Outsmarts ItsCompetition. Addison-Wesley Publishing Company.

References