Recitation 12 Programming for Engineers in Python.

Click here to load reader

download Recitation 12 Programming for Engineers in Python.

of 28

Transcript of Recitation 12 Programming for Engineers in Python.

PowerPoint Presentation

Recitation 12

Programming for Engineers in Python1PlanDynamic ProgrammingCoin Change problemLongest Common SubsequenceApplication to Bioinformatics

22Teaching Survey3Please answer the teaching survey: https://www.ims.tau.ac.il/Tal/This will help us to improve the courseDeadline: 4.2.12

Coin Change Problem4What is the smallest number of coins I can use to make exact change?Greedy solution: pick the largest coin first, until you reach the change neededIn the US currency this works well:Give change for 30 cents if youve got 1, 5, 10, and 25 cent coins: 25 + 5 2 coins

http://jeremykun.files.wordpress.com/2012/01/coins.jpgThe Sin of Greediness5What if you dont have 5 cent coins?You got 1, 10, and 25Greedy solution: 25+1+1+1+1+1 6 coinsBut a better solution is: 10+10+10 3 coins!So the greedy approach isnt optimal

The Seven Deadly Sins and the Four Last Things by Hieronymus Boschhttp://en.wikipedia.org/wiki/File:Boschsevendeadlysins.jpgRecursive Solution6Reminder find the minimal # of coins needed to give exact change with coins of specified valuesAssume that we can use 1 cent coins so there is always some solutionDenote our coin list by c1, c2, , ck (c1=1)k is the # of coins values we can useDenote the change required by nIn the previous example: n=30, k=3, c1=1, c2=10, c3=25Recursive Solution7Recursion Base:If n=0 then we need 0 coinsIf k=1, c1=1, so we need n coins

Recursion Step:If n>> print 'result', coins_change_rec(30, (1,5,10,25))result 2>>> print 'max calls',max(calls.values())max calls 4

Dynamic Programing - Memoization10We want to store the values of calculation so we dont repeat themWe create a table called mem# of columns: # of cents needed + 1# of rows: # of coin values + 1The table is initialized with some illegal value for example -1:

mem = [ [-1 for y in range(cents_needed+1)] for x in range(len(coin_values)) ] Dynamic Programing - Memoization11For each call of the recursive function, we check if mem already has the answer:if mem[len(coin_values)][cents_needed] == -1:In case that it doesnt (the above is True) we calculate it as before, and we store the result, for example:if cents_needed ci)We can decide not to use ci , therefore to use only c0 ,.., ci-1, and therefore min_coins[i-1,j] .So which way do we choose?The one with the least coins!min_coins[i,j] = min(min_coins[i,j-ci] +1, min_coins[i-1,j])

Example matrix recursion step16coins_matrix.pyThe code for the matrix solution and the idea is from http://jeremykun.wordpress.com/2012/01/12/a-spoonful-of-python/Longest Common Subsequence17Given two sequences (strings/lists) we want to find the longest common subsequenceDefinition subsequence: B is a subsequence of A if B can be derived from A by removing elements from AExamples[2,4,6] is a subsequence of [1,2,3,4,5,6][6,4,2] is NOT a subsequence of [1,2,3,4,5,6]is is a subsequence of distancenice is NOT a subsequence of distanceLongest Common Subsequence18Given two subsequences (strings or lists) we want to find the longest common subsequence:Example for a LCS:Sequence 1: HUMANSequence 2: CHIMPANZEE

Applications include:BioInformatics (next up)Version Control

http://wordaligned.org/articles/longest-common-subsequence

The DNA19Our biological blue-print A sequence made of four bases A, G, C, TDouble strand:A connects to TG connects to CEvery triplet encodes for an amino-acid Example: GAGGlutamateA chain of amino-acids is a protein the biological machine!http://sips.inesc-id.pt/~nfvr/msc_theses/msc09b/Longest common subsequence20The DNA changes:Mutation: AG, CT, etc.Insertion: AGC ATGCDeletion: AGC AC

Given two non-identical sequences, we want to find the parts that are commonSo we can say how different they areWhich DNA is more similar to ours? The cats or the dogs?

http://palscience.com/wp-content/uploads/2010/09/DNA_with_mutation.jpgRecursion21An LCS of two sequences can be built from the LCSes of prefixes of these sequencesDenote the sequences seq1 and seq2Base check if either sequence is empty:If len(seq1) == 0 or len(seq2) == 0: return [ ]Step build solution from shorter sequences:If seq1[-1] == seq2[-1]: return lcs (seq1[:-1],seq2[:-1]) + [ seq1[-1] ]else: return max(lcs (seq1[:-1],seq2), lcs(seq1,seq2[:-1]), key = len)lcs_rec.py21Wasteful Recursion22For the inputs MAN and PIG, the calls are:(1, ('', 'PIG'))(1, ('M', 'PIG'))(1, ('MA', 'PIG'))(1, ('MAN', ''))(1, ('MAN', 'P'))(1, ('MAN', 'PI'))(1, ('MAN', 'PIG'))(2, ('MA', 'PI'))(3, ('', 'PI'))(3, ('M', 'PI'))(3, ('MA', ''))(3, ('MA', 'P'))(6, ('', 'P'))(6, ('M', ''))(6, ('M', 'P'))24 redundant calls!

http://wordaligned.org/articles/longest-common-subsequenceWasteful Recursion23When comparing longer sequences with a small number of letters the problem is worseFor example, DNA sequences are composed of A, G, T and C, and are longFor lcs('ACCGGTCGAGTGCGCGGAAGCCGGCCGAA', 'GTCGTTCGGAATGCCGTTGCTCTGTAAA') we get an absurd:(('', 'GT'), 13,182,769)(('A', 'GT'), 13,182,769)(('A', 'G'), 24,853,152)(('', 'G'), 24,853,152)(('A', ''), 24,853,152)

http://blog.oncofertility.northwestern.edu/wp-content/uploads/2010/07/DNA-sequence.jpgDP Saves the Day24We saw the overlapping sub problems emerge comparing the same sequences over and over againWe saw how we can find the solution from solution of sub problems a property we called optimal substructureTherefore we will apply a dynamic programming approachStart with top-down approach - memoization

Memoization25We save results of function calls to refrain from calculating them againdef lcs_mem( seq1, seq2, mem=None ): if not mem: mem = { } key = (len(seq1), len(seq2)) # tuples are immutable if key not in mem: # result not saved yet if len(seq1) == 0 or len(seq2) == 0: mem[key] = [ ] else: if seq1[-1] == seq2[-1]: mem[key] = lcs_mem(seq1[:-1], seq2[:-1], mem) + [ seq1[-1] ] else: mem[key] = max(lcs_mem(seq1[:-1], seq2 ,mem), lcs_mem (seq1, seq2[:-1], mem), key=len )return mem[key]maximum recursion depth exceeded26We want to use our memoized LCS algorithm on two long DNA sequences:>>> from random import choice>>> def base(): return choice('AGCT')>>> seq1 = str([base() for x in range(10000)])>>> seq2 = str([base() for x in range(10000)])>>>print lcs(seq1, seq2)RuntimeError: maximum recursion depth exceeded in cmpWe need a different algorithm27link

DNA Sequence Alignment28Needleman-Wunsch DP Algorithm:Python package: http://pypi.python.org/pypi/nwalignOn-line example: http://alggen.lsi.upc.es/docencia/ember/frame-ember.htmlCode: needleman_wunsch_algorithm.pyLecture videos from TAU:http://video.tau.ac.il/index.php?option=com_videos&view=video&id=4168&Itemid=53