CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3:...

51
CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of Illinois, Chicago August 31, 2017 Theresa McCracken @McHumor.com

Transcript of CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3:...

Page 1: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

CSE111Bio:ProgramDesignILecture3:PythonBasics&

MoreBioRobertSloan(CS)&RachelPoretsky(Bio)

UniversityofIllinois,ChicagoAugust31,2017

[email protected]

Page 2: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

DNAsequencing

• Gettingfromhere: • Tohere:

Page 3: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

WhysequenceDNA?

3

Page 4: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Figure10.11

• ThisisaphysicalmapofthehumanXchromosome.(credit:modificationofworkbyNCBI,NIH)

Page 5: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Figure10.12

• Muchbasicresearchisdonewithmodelorganisms,suchasthemouse,Mus musculus;thefruitfly,Drosophilamelanogaster;thenematodeCaenorhabditis elegans;theyeastSaccharomycescerevisiae;andthecommonweed,Arabidopsisthaliana.(credit“mouse”:modificationofworkbyFloreanFortescue;credit“nematodes”:modificationofworkby“snickclunk”/Flickr;credit“commonweed”:modificationofworkbyPeggyGreb,USDA;scale-bardatafromMattRussell)

Page 6: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

ThebondsbetweeneachpairofDNAbasesareA. Strong(covalent)bondsthatkeeps

thetwostrandspermanentlyattached

B. Weak(hydrogen)bondsthatallowthetwostrandstobeseparated

C. Ionicbondsthatallowatransferofcurrentalongthemolecule

D. Idon’tknow

Page 7: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

https://www.dnalc.org/view/15479-Sanger-method-of-DNA-sequencing-3D-animation-with-narration.html

Page 8: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

• InFrederickSanger'sdideoxy chainterminationmethod,dye-labeleddideoxynucleotides areusedtogenerateDNAfragmentsthatterminateatdifferentpoints.TheDNAisseparatedbycapillaryelectrophoresisonthebasisofsize,andfromtheorderoffragmentsformed,theDNAsequencecanberead.TheDNAsequencereadoutisshownonanelectropherogram thatisgeneratedbyalaserscanner.

DNASequencing:SangerMethod

Page 9: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of
Page 10: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of
Page 11: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of
Page 12: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Metagenomics involvesisolatingandsequencingDNAfrommultiplespeciesatthesametime.

Page 13: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of
Page 14: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

• Bioinformatics

– SciencethatappliescomputationaltoolstoDNAand

proteinsequences

– Forthepurposeofanalyzing,storing,andaccessingthesequencesforcomparativepurposes

Page 15: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Next-generationsequencing

• Generatesdata100xfasterthanSangermethod

• Massivelyparallelmethods• Largenumberofsamplessequencedsidebyside

• Usesincreasedcomputerpowerandminiaturization

Page 16: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of
Page 17: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of
Page 18: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

TAGGCCTACACTTACGCGAATGT

ATCCGGAT

PPi

ATPdGTPdNTPs

dNDPsdNMPs+ Pi

Light flash isdetected by sensor.

5′3′

3′5′

DNA polymeraseTemplate strand

Growing strand

Page 19: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Growing chain of DNA

PPi

Incoming deoxyribonucleotidetriphosphate

Proton (H+)releasechanges pH,generatingan electricalsignal.

Ion torrent semiconductor sequencing

Growing point

Page 20: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

• SemiconductorsequencingwasappliedtosequencingthegenomeofGordonMoore— authorofMoore'slaw— atameancoverageof10.6-fold.Thisrequired~1,000ionchipstoprovide1billionsensors

Page 21: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of
Page 22: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

~MINUTESNOW…

Page 23: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of
Page 24: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of
Page 25: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of
Page 26: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of
Page 27: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

DNASequencing:ShotgunAssembly

Page 28: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

©2015PearsonEducation,Inc.

UnknownDNAsequenceT A G G T T A C C A C T C G A A

C T C G A A G G T T A C C A

G T T A C C A C T T A G G T T

C C A C A AGCTT A C C A C T T T AG C C

T A G G T T

G G T T A C C A

G

G

T T

T T

A C C

A C C A C TT A C C A C T

C C A C T C G A A

C T C G A A

TA GGTTA CCAC T C GA A

CleaveDNAintofragmentsandsequence.

Computeranalysisfindsoverlaps.

Sequenceisdeduced.

Sequencedfragments

Page 29: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Supposetheshotgunsequencesare:CTGCAGCATAGCATGGCTGC

Whatistheoriginalsequence?A. AGCATGCTGCB. CTGCAGCATAGCATGGCTGCC. GCTGCAGCATGD. AGCATGCE. Noclue

Page 30: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

VARIABLESFUNCTIONSSTRINGS

[email protected]

RandallMunroe,XKCDxkcd.com/1513/

Page 31: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Announcement:Assignment

• Assignment2("Lab2")isnowout,availableforyouonBlackboard.

• DuethisFridayat9pm(out1daylaterthanusual,45extrahoursondeadline)

• FirstsmallsteponthewaytofindinggenesinDNAsequence

31

Page 32: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Announcement2:BringlaptopTh

• Thursdaywewillaskyoutodoshortin-classsurveyatstartofclass

• WearestudyingCS111!• Qualtrics surveythatshouldbeokayonphone/tablet,butprobablyeasieronlaptop

32

Page 33: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

TosearchDNAforspecificgenomeneedatleast

• variables• functions• strings• Let’sstartwithvariablesandthenastimeallowsverylightlookatfunctionsandstringsthatmayhelpwithLab2

Page 34: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Variables:SimpleexampleIn [1]: biologist1 = 'Rosalind Franklin'In [2]: biologist1Out[2]: 'Rosalind Franklin'In [3]: biologist2 = "Watson"In [4]: biologist2 Out[4]: 'Watson'In [5]: print(biologist1)Out[5]: Rosalind Franklin

Page 35: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Variables:Simpleexample(cont.)In [1]: biologist1 = 'Rosalind Franklin'In [2]: biologist1Out[2]: 'Rosalind Franklin'In [3]: biologist2 = "Watson"In [4]: biologist1 Out[4]: 'Watson'In [5]: print(biologist1)Out[5]: Rosalind FranklinIn [6]: biologist1 + biologist2Out[6]: 'Rosalind FranklinWatson'In [7]: print("My favorite is ", biologist1)Out[7]: My favorite is Rosalind Franklin

Page 36: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

print()

• Requiresthoseparentheses!• Printsoutwhatyougiveit,andyoucangiveitasequenceofthingsseparatedbycommas

• Ifsomeofthethingsyougiveitarevariables,willprintvalue ofthevariable

36

Page 37: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

print(3*5)print(“3*5”)print(“3” * 5)

CodeAttheendofthiscode,whatwillappearontheterminal?

3*53*5‘33333’

1515‘33333’

153*5‘33333’

3*515‘33333’

A B C D

E.Idon’tknow

Page 38: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Variables

• Wewanttotellcomputertousespecificvalueweputintoitsmemory– (Toprintoutaword,toadd2numberstogether,etc.)

• Mucheasierforusashumanstogivethesethingsnamesthantorememberaddresses

Page 39: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Aboxthatholdsavalue• Thinkofvariable asboxthatholdsavalue (Pythonistaswillsayvalueorobject moreorlessinterchangeably),andvariable'snameasstickynoteonthebox

"RosalindFranklin"

"Watson"

biologist1

biologist2

Page 40: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

>>>biologist1="Watson">>>biologist1'Watson’>>>biologist2='Crick'>>>biologist2'Crick‘>>>result=biologist1+biologist2>>>print(“Thetwobiologistsare“,result)Thetwobiologistsare‘WatsonCrick’

Copyright(c)HenryGrantArchive/MuseumofLondon

>>>biologist1=biologist1+biologist2>>>biologist1

Name Value

biologist1 Watson

Name Value

biologist1 WatsonCrick

biologist2 Crick

Name Value

biologist1 Watson

biologist2 Crick

biologist1 WatsonCrick

Name Valuebiologist1 Watson

biologist2 Crick

A

B

C

D

E.Idon’tknow

Page 41: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Types• Objectscomeinfewdifferenttypes.– E.g.,stringsvs.numbers

• InPythoncomputer(i.e.,interpreter/Spyder lowerrightwindow)generallyfiguresitoutforus,butwestillneedtoknowlittlebitaboutthissince,e.g.,

In[1] : biologist1+biologist2Out[1]: 'Rosalind FranklinWatson'In [2]: 3 + biologist1Traceback (most recent call last): File "<ipython-input-7-900fb9c901d0>", line 1, in <module>

3 + biologist1TypeError: unsupported operand type(s) for +: 'int' and 'str'

Page 42: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

SomePythonTypesPythonType Example(s)

String String "CATTAG"

Integer (wholenumber) Integer 3, 0,17,42,-21,100001

Decimal number Float 3.14159

Boolean(true/false) Boolean True,False

Page 43: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Types:Pickrowthatis100%correct

iClickerChoice Integer Float String

A 1 2.25 ""

B "1" 4.4 'h'

C 1.0 2.0 "hello"D 1 2.0 goodbye

Page 44: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

WeloveyouPython,ohyeswedo!

• Wehavenowcoveredwelloverhalfofeverythingyouwillneedtoknowabouttypesforthissemester

• Typesmuch biggerhassleinJava,C,C++

Page 45: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

ABitmoreonStrings(Lab2!):indexing

45

0 1 2 3 4 5 6 7 8 9 10 11A A T G C C G T G C T T

In [1]: myDNA="AATGCCGTGCTT"In [2]: myDNA[0]Out[2]: 'A'In [3]: myDNA[3]Out[3]: 'G'In [4]: myDNA[20]Out[4]: IndexError: string index out of rangeIn [5]: myDNA[0:3]Out[5]: 'AAT'

Page 46: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Assignmenttovariables:Semantics

<variable>=<expression>

1. Evaluate<expression>2. Putthatvalueintocomputer'smemoryand

attachname<variable>as"stickynote"givingnameforthatmemorylocation

Page 47: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Expressions

• Canbeasimplevalue,like– "RosalindFranklin"or17

• Canbealmostanymathematicalstatement

Page 48: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Expressions

• Canbeasimplevalue,like– "RosalindFranklin"or17

• Alsocanbealmostanymathematicalstatement

x = 6 * 2y = x - 10

12x

Memory("objectspace")

Page 49: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Expressions

• Canbeasimplevalue,like– "RosalindFranklin"or17

• Alsocanbealmostanymathematicalstatement

x = 6 * 2y = x - 10

12x

Memory("objectspace")

Nowevaluatex– 10getting12-10à 2

Page 50: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Expressions

• Canbeasimplevalue,like– "RosalindFranklin"or17

• Alsocanbealmostanymathematicalstatement

x = 6 * 2y = x - 10 x

y

Memory("objectspace")

12

2

Page 51: CSE 111 Bio: Program Design I Lecture 3: Python Basics ...CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of

Atendofthiscodeywillbe

x = 5y = x * 3x = 2

A. 2 B. 5 C. 6 D. 15