CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3:...
Transcript of CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3:...
CSE111Bio:ProgramDesignILecture3:PythonBasics&
MoreBioRobertSloan(CS)&RachelPoretsky(Bio)
UniversityofIllinois,ChicagoAugust31,2017
DNAsequencing
• Gettingfromhere: • Tohere:
WhysequenceDNA?
3
Figure10.11
• ThisisaphysicalmapofthehumanXchromosome.(credit:modificationofworkbyNCBI,NIH)
Figure10.12
• Muchbasicresearchisdonewithmodelorganisms,suchasthemouse,Mus musculus;thefruitfly,Drosophilamelanogaster;thenematodeCaenorhabditis elegans;theyeastSaccharomycescerevisiae;andthecommonweed,Arabidopsisthaliana.(credit“mouse”:modificationofworkbyFloreanFortescue;credit“nematodes”:modificationofworkby“snickclunk”/Flickr;credit“commonweed”:modificationofworkbyPeggyGreb,USDA;scale-bardatafromMattRussell)
ThebondsbetweeneachpairofDNAbasesareA. Strong(covalent)bondsthatkeeps
thetwostrandspermanentlyattached
B. Weak(hydrogen)bondsthatallowthetwostrandstobeseparated
C. Ionicbondsthatallowatransferofcurrentalongthemolecule
D. Idon’tknow
https://www.dnalc.org/view/15479-Sanger-method-of-DNA-sequencing-3D-animation-with-narration.html
• InFrederickSanger'sdideoxy chainterminationmethod,dye-labeleddideoxynucleotides areusedtogenerateDNAfragmentsthatterminateatdifferentpoints.TheDNAisseparatedbycapillaryelectrophoresisonthebasisofsize,andfromtheorderoffragmentsformed,theDNAsequencecanberead.TheDNAsequencereadoutisshownonanelectropherogram thatisgeneratedbyalaserscanner.
DNASequencing:SangerMethod
Metagenomics involvesisolatingandsequencingDNAfrommultiplespeciesatthesametime.
• Bioinformatics
– SciencethatappliescomputationaltoolstoDNAand
proteinsequences
– Forthepurposeofanalyzing,storing,andaccessingthesequencesforcomparativepurposes
Next-generationsequencing
• Generatesdata100xfasterthanSangermethod
• Massivelyparallelmethods• Largenumberofsamplessequencedsidebyside
• Usesincreasedcomputerpowerandminiaturization
TAGGCCTACACTTACGCGAATGT
ATCCGGAT
PPi
ATPdGTPdNTPs
dNDPsdNMPs+ Pi
Light flash isdetected by sensor.
5′3′
3′5′
DNA polymeraseTemplate strand
Growing strand
Growing chain of DNA
PPi
Incoming deoxyribonucleotidetriphosphate
Proton (H+)releasechanges pH,generatingan electricalsignal.
Ion torrent semiconductor sequencing
Growing point
• SemiconductorsequencingwasappliedtosequencingthegenomeofGordonMoore— authorofMoore'slaw— atameancoverageof10.6-fold.Thisrequired~1,000ionchipstoprovide1billionsensors
~MINUTESNOW…
DNASequencing:ShotgunAssembly
©2015PearsonEducation,Inc.
UnknownDNAsequenceT A G G T T A C C A C T C G A A
C T C G A A G G T T A C C A
G T T A C C A C T T A G G T T
C C A C A AGCTT A C C A C T T T AG C C
T A G G T T
G G T T A C C A
G
G
T T
T T
A C C
A C C A C TT A C C A C T
C C A C T C G A A
C T C G A A
TA GGTTA CCAC T C GA A
CleaveDNAintofragmentsandsequence.
Computeranalysisfindsoverlaps.
Sequenceisdeduced.
Sequencedfragments
Supposetheshotgunsequencesare:CTGCAGCATAGCATGGCTGC
Whatistheoriginalsequence?A. AGCATGCTGCB. CTGCAGCATAGCATGGCTGCC. GCTGCAGCATGD. AGCATGCE. Noclue
Announcement:Assignment
• Assignment2("Lab2")isnowout,availableforyouonBlackboard.
• DuethisFridayat9pm(out1daylaterthanusual,45extrahoursondeadline)
• FirstsmallsteponthewaytofindinggenesinDNAsequence
31
Announcement2:BringlaptopTh
• Thursdaywewillaskyoutodoshortin-classsurveyatstartofclass
• WearestudyingCS111!• Qualtrics surveythatshouldbeokayonphone/tablet,butprobablyeasieronlaptop
32
TosearchDNAforspecificgenomeneedatleast
• variables• functions• strings• Let’sstartwithvariablesandthenastimeallowsverylightlookatfunctionsandstringsthatmayhelpwithLab2
Variables:SimpleexampleIn [1]: biologist1 = 'Rosalind Franklin'In [2]: biologist1Out[2]: 'Rosalind Franklin'In [3]: biologist2 = "Watson"In [4]: biologist2 Out[4]: 'Watson'In [5]: print(biologist1)Out[5]: Rosalind Franklin
Variables:Simpleexample(cont.)In [1]: biologist1 = 'Rosalind Franklin'In [2]: biologist1Out[2]: 'Rosalind Franklin'In [3]: biologist2 = "Watson"In [4]: biologist1 Out[4]: 'Watson'In [5]: print(biologist1)Out[5]: Rosalind FranklinIn [6]: biologist1 + biologist2Out[6]: 'Rosalind FranklinWatson'In [7]: print("My favorite is ", biologist1)Out[7]: My favorite is Rosalind Franklin
print()
• Requiresthoseparentheses!• Printsoutwhatyougiveit,andyoucangiveitasequenceofthingsseparatedbycommas
• Ifsomeofthethingsyougiveitarevariables,willprintvalue ofthevariable
36
print(3*5)print(“3*5”)print(“3” * 5)
CodeAttheendofthiscode,whatwillappearontheterminal?
3*53*5‘33333’
1515‘33333’
153*5‘33333’
3*515‘33333’
A B C D
E.Idon’tknow
Variables
• Wewanttotellcomputertousespecificvalueweputintoitsmemory– (Toprintoutaword,toadd2numberstogether,etc.)
• Mucheasierforusashumanstogivethesethingsnamesthantorememberaddresses
Aboxthatholdsavalue• Thinkofvariable asboxthatholdsavalue (Pythonistaswillsayvalueorobject moreorlessinterchangeably),andvariable'snameasstickynoteonthebox
"RosalindFranklin"
"Watson"
biologist1
biologist2
>>>biologist1="Watson">>>biologist1'Watson’>>>biologist2='Crick'>>>biologist2'Crick‘>>>result=biologist1+biologist2>>>print(“Thetwobiologistsare“,result)Thetwobiologistsare‘WatsonCrick’
Copyright(c)HenryGrantArchive/MuseumofLondon
>>>biologist1=biologist1+biologist2>>>biologist1
Name Value
biologist1 Watson
Name Value
biologist1 WatsonCrick
biologist2 Crick
Name Value
biologist1 Watson
biologist2 Crick
biologist1 WatsonCrick
Name Valuebiologist1 Watson
biologist2 Crick
A
B
C
D
E.Idon’tknow
Types• Objectscomeinfewdifferenttypes.– E.g.,stringsvs.numbers
• InPythoncomputer(i.e.,interpreter/Spyder lowerrightwindow)generallyfiguresitoutforus,butwestillneedtoknowlittlebitaboutthissince,e.g.,
In[1] : biologist1+biologist2Out[1]: 'Rosalind FranklinWatson'In [2]: 3 + biologist1Traceback (most recent call last): File "<ipython-input-7-900fb9c901d0>", line 1, in <module>
3 + biologist1TypeError: unsupported operand type(s) for +: 'int' and 'str'
SomePythonTypesPythonType Example(s)
String String "CATTAG"
Integer (wholenumber) Integer 3, 0,17,42,-21,100001
Decimal number Float 3.14159
Boolean(true/false) Boolean True,False
Types:Pickrowthatis100%correct
iClickerChoice Integer Float String
A 1 2.25 ""
B "1" 4.4 'h'
C 1.0 2.0 "hello"D 1 2.0 goodbye
WeloveyouPython,ohyeswedo!
• Wehavenowcoveredwelloverhalfofeverythingyouwillneedtoknowabouttypesforthissemester
• Typesmuch biggerhassleinJava,C,C++
ABitmoreonStrings(Lab2!):indexing
45
0 1 2 3 4 5 6 7 8 9 10 11A A T G C C G T G C T T
In [1]: myDNA="AATGCCGTGCTT"In [2]: myDNA[0]Out[2]: 'A'In [3]: myDNA[3]Out[3]: 'G'In [4]: myDNA[20]Out[4]: IndexError: string index out of rangeIn [5]: myDNA[0:3]Out[5]: 'AAT'
Assignmenttovariables:Semantics
<variable>=<expression>
1. Evaluate<expression>2. Putthatvalueintocomputer'smemoryand
attachname<variable>as"stickynote"givingnameforthatmemorylocation
Expressions
• Canbeasimplevalue,like– "RosalindFranklin"or17
• Canbealmostanymathematicalstatement
Expressions
• Canbeasimplevalue,like– "RosalindFranklin"or17
• Alsocanbealmostanymathematicalstatement
x = 6 * 2y = x - 10
12x
Memory("objectspace")
Expressions
• Canbeasimplevalue,like– "RosalindFranklin"or17
• Alsocanbealmostanymathematicalstatement
x = 6 * 2y = x - 10
12x
Memory("objectspace")
Nowevaluatex– 10getting12-10à 2
Expressions
• Canbeasimplevalue,like– "RosalindFranklin"or17
• Alsocanbealmostanymathematicalstatement
x = 6 * 2y = x - 10 x
y
Memory("objectspace")
12
2
Atendofthiscodeywillbe
x = 5y = x * 3x = 2
A. 2 B. 5 C. 6 D. 15