Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence
Sequences - George Mason Universitymarks/112/slides/4.sequences.pdf · sequences •sequence: an...
Transcript of Sequences - George Mason Universitymarks/112/slides/4.sequences.pdf · sequences •sequence: an...
Sequenceslists, tuples, and strings
CS 112 @ GMU
• sequences and operations• lists and operations• loops and lists
2
Topics
Sequences
sequences
• sequence: an ordered group of values→ each spot in a sequence is numbered.→example: a string is a sequence of characters
Type mutable?(modifiable) representations
String immutable 'enclosed' "in quotes"'''of various''' """kinds"""
List mutable [commas, surrounded, by, brackets][ ] [5] [10,15]
Tuple immutableseparated, by, commas
(often, between, parentheses, too)( ) (4,) (8, 12)
list examples
[ ][3][[3]][5,6,7]["hello", 1, True][ [1,2,3], ["a","b","c"],6][ [[1,2,3],[4,5,6]], [[7,8,9],[10,11,12]] ]["one", "two", "three"][True, False, False]
notes: the empty list exists! [ ] [ 3 ] != 3[[3]] != [3]
tuple examples
( )("one", "two", "three")(4,)(True, False, False)( (1,2,3), ["a","b","c"], 6)( [3], )"hello", 1, True5,
notes: the empty tuple exists! ( )one-tuples need the comma. (3) != (3,)
(3) is an integer with parentheses around it.(3,) is a tuple of length one.
indexing
indexing means accessing (by spot number) the value at a particular spot.• indexing begins at zero, and goes up each spot.• negative indexing begins at the end with -1
msg = "index"
i n d e x
0 1 2 3 4
-5 -4 -3 -2 -1
xs = [8.5, 100, -16.3, 2.5]
8.5 100 -16.3 2.5
0 1 2 3
-4 -3 -2 -1
Practice Problem
sequence lengths: what is the length of each of these tuples? → (what does len(expr) yield for each expr?)
(1,2,3)(4,)((5,6,7,8),)((9,10), (11,(12,13)), (14))(True, "(3,2,1)", 6, [1,2,3]) ("(1,2,3,4)",5)
Poll – 4A
• indexing into sequences
Slicing
>>> msg = "hello there">>> msg[3:9]'lo the'>>> msg[1:11:3]'eohe'>>> msg[8:2:-1]'eht ol'>>> msg[0:10:-1]''
>>> msg[1:1000]'ello there'>>> msg[6:]'there'>>> msg[:5]'hello'>>> msg[:]'hello there'
• we can grab a sub-sequence instead of just one index's value.• give [start : stop] or [start : stop : step] indexes, similar to
range() (but with :'s!)
Poll – 4B
• slicing
Sequence Operations
operation meaning result typex in s checks if an item in s equals x. boolx not in s checks if no items in s equal x. bools + t concatenation same seq. types*n (or: n*s) n shallow copies of s,
concatenatedsame seq. type
len(s) length of s ints.count(x) find # items in s equal to x int (#matches)
s.index(x[,i[,j]]) give index of first x in s.(if not found, crashes)
int
(these are all expressions)
POLL – 4C
sequence operations
lists: mutable sequences
• When a sequence is mutable (as lists are), we can update part of the structure, leaving the rest alone:
• There are many operations available on mutable sequences (see next slides).
xs = [1,2,3]xs[1] = 99print(xs) # prints out [1, 99, 3]
list operations
15
operation meanings[i] = x replace ith item of s with xs[i:j] = t replace slice i:j with t.
lengths needn't match!)s[i:j:k] = t replace slice i:j:k with t.
(lengths must match!)del s[i] remove ith item from s.del s[i:j] remove slice i:j from s.del s[i:j:k] remove slice i:j:k from s.
try interactively.
list operations
16
operation meaning returned value
s.append(x) add x as a single value at end of s. None value
s.extend(t) individually append each item of t to the end of s.
None value
s.insert(i,x) make space (push other spots to the right), put x value at location i.
None value
s.pop(i) remove value at index i from sequence; return the value that was there
item that was at index i
s.remove(x) find first occurrence of x, remove it. Nones.reverse() reverse the ordering of items. Nones.sort() sort the items in increasing order. None
append: attach a value to the list.extend: attach a sequence to the list. try interactively.
Programming TRAP
• many mutable sequence operations return the None value
→ value is directly modified: rather than returning a modified copy, returns the None value
→ assigning the result back to the variable discards the value!
xs = [2,5,4,1,3]ys = [2,5,4,1,3]xs.sort()
print (xs, type(xs))print (ys, type(ys))
output when run:
[1, 2, 3, 4, 5] <class 'list'>None <class 'NoneType'>
ys did get sorted, but then we threw out the whole list by storing a None value into ys.
lists in memory
• So far we've drawn simple boxes next to names for our variables:x = 5 x
• Now, we will draw an arrow from a variable to the block of values it contains.xs = [6,7,8] xs
5
6 7 8
Memory Usage• These arrows help us understand complex data, such as
lists of lists.
• Every variable always stores one value in a box.• The only new concept is that sometimes the contents
of the box is an arrow (a reference) to some other value in memory.
4 5 6
7 8 9
xsys
both
xs = [4,5,6] ys = [7,8,9]both = [xs,ys]
Poll – 4D
• mutable sequences
Sequences and Loops
Sequences and Loops
Loops are most useful with sequences.Each iteration of the loop can inspect/use/modify one value in the sequence.
xs = [5, 2, 14, 63]sumval = 0for x in xs:
sumval += xprint(sumval)
"Value" For-Loop• For-loops assign each
value of the supplied sequence to the loop variable.
• We directly traverse the values in the list themselves
# print some words out.words = ["you", "are", "great"]for word in words:
print(word)
# sum up some numbers.vals = [1.5, 2.25, 10.75, -2.0]total = 0for curr_val in vals:
total += curr_valprint("sum of vals is",total)
# what is the largest value?vals = [17, 10, 99, 14, 50]max_val = vals[0]for val in vals:
if val > max_val:max_val = val
print("largest:",max_val)
"Index" For-Loop
We can generate all the valid indexes we'd like to visit, and supply those to a for-loop instead of the values-sequence itself.
We are thus aware of our position (i) as well as the value at the current position (vals[i])
# where is the largest value located?vals = [2,5,3,6,4,1]max_loc = 0for i in range(len(vals)):
if vals[i]>vals[max_loc]:max_loc = i
print("maxval="+str(vals[max_val]))print("max val @"+str(max_loc))
Naming Loop Variables
When we intend to directly supply values of our sequence to the for-loop, we choose a loop variable name that represents one thing of the sequence.
for word in words: for val in vals:…word… …val…
When we intend to supply indexes of a sequence to the loop (and use them to access values in the actual data sequence), we choose an 'index' name for the loop variable, such as i, j, k, nums_i, etc.
for i in range(len(xs)): for bird_i in range(len(birdSpeciesList)):…xs[i]… …birdSpeciesList[bird_i]…
more loop examples# is v in the list xs?found = Falsefor x in xs:
if x==v:found = True
print("found?",found))
# count occurrences of vcount = 0for x in xs:
if x==v:count += 1
print("#occurrences:",count)
# where does v show up in xs?loc = Nonefor i in range(len(xs)):
if xs[i]==v:loc = ibreak
if loc==None:print("not found!")
else:print("location:", loc)
loops recipe• We want to get some property/answer based on
a list. Example: "I want to print the max value".• create a variable to hold the answer; give it a safe
starting value (sum starts at zero; max starts at first value in list; num_occurrences starts at zero)
• create a loop that inspects each item in the list• inside the loop, incorporate the current value to
improve your answer (found a new max; added to the sum; incremented num_occurrences)
• after loop, answer is ready!
look at previous slide: do you see the recipe in use?
when do we need index loops?
• when the location matters• when we need to update the list's contents
(updating individual slots)• when we want to visit locations of the
sequence in other orders/patterns than first-to-last (in reverse, ever other, all-but-the-last-one, etc)
Indexing in other orders
By constructing a different call to range(), we can index through our sequence in more sophisticated ways than just "in-order, all elements":
watch out! using range(), you must get the indexes exactly right (never out of bounds). Slicing gracefully ignores out-of-bounds issues, indexing does not.
vals = [10,11,12,13,14,15,16,17]
for i in range(0, len(vals),2):print(vals[i])
for i in range(len(vals)-1, -1, -1) :print (vals[i])
Nested Value Loops
xss = [[5,6,7],[8,9,10]]total = 0for xs in xss:
for x in xs:print("\t+ "+str(x))total += x
print("total:",total)
output when run:
+ 5+ 6+ 7+ 8+ 9+ 10
total: 45
• when we have multiple dimensions to our lists, we can use that many nested loops to access each item individually.
• Note the access pattern, as well as the total calculation.
Nested Index Loops
• Create an index for each dimension of your sequence.• Nest loops for each dimension.• Access each element individually (and starting from the
entire structure like xss below), no matter how many dimensions.
xss = [[5,6,7],[8,9,10]]for i in range(len(xss)):
for j in range(len(xss[i])):print(xss[i][j])
output when run:
5678910
Nested Index Loops• Our data doesn't have to have multiple dimensions for our
algorithm to find use for nested loops.
# are there any duplicates in the list?xs = [2,3,5,4,5,1,7,8]has_dupes = Falsefor i in range(len(xs)):
for j in range(len(xs)):if (i!=j) and xs[i]==xs[j]:
has_dupes = Truebreak
print("any dupes?",has_dupes)
# are there any duplicates in the list?xs = [2,3,5,4,5,1,7,8]has_dupes = Falsefor i in range(len(xs)):
for j in range(i+1, len(xs)):if xs[i]==xs[j]:
has_dupes = Truebreak
print("any dupes?",has_dupes)
• note: what is different/better about the second version?
building lists with loops
n = 42divisors = []for i in range(1,n+1):
if n%i==0:divisors . append( i )
print("divisors of %d: %s" % (n, divisors))
We can start with an empty list and .append() to it repeatedly to build up a list with a loop.
output when run:
divisors of 42: [1, 2, 3, 6, 7, 14, 21, 42]
Poll – 4E
• loops and sequences
for-loop with sequences of tuples
- We can dissect each tuple with our for-loop variable(s).- This is called tuple unpacking. Provide a pattern of variables.
tups = [('a',1), ('b',2),('c',3)]for (c,n) in tups:
print(c*n)
output when run:
abbccc
Modifying Lists
# make all values in the list non-negativexs = [1,-2, 3, -4, -5, 100, 150, -30, 123]for i in range(len(xs)):
if xs[i]<0:xs[i] = -xs[i] # make non-negative
• To update spots with a loop, we must use index-loops. • (A value loop would modify the loop variable only, not
the list)
Modifying Lists – can't use value loops
xs = [1,-2, 3, -4, -5, 100, 150, -30, 123]ys = xs[:] # here's a copy of xs' original value.for x in xs:
if x<0:x = -x # we try, but fail, to modify part of xs
if xs==ys:print("failed to modify!") # this does print.
This code shows how a value loop won't succeed. You should trace through this code to see why (with the visualizer).
Aliases Example
xs = [1,2,3]ys = [4,5,6]both = [xs,ys]xs[1] = 7print("xs is",xs)print("both is", both)ys = [8,9]print("ys is",ys)print("both is", both)
xs is [1, 7, 3]both is [[1, 7, 3], [4,5,6]]ys = [8, 9]both is [[1, 7, 3], [4, 5, 6]]
program output:
What is happening?
• variables are not the same as values.• alias: when multiple names for the same location
exist (such as xs vs both[0]) – changing the value by any name is witnessed from all others
• reassigning a variable re-establishes what the variable stores
• updating part of a value doesn't change which variables currently refer to the value
• We draw multiple arrows to the same value in our memory diagrams.
id( ) built-in function
40
• id(thing) returns a unique intvalue.
• detect aliases when id(x)==id(y) actual int value doesn't matter, only whether they are the same or not
• memory diagrams: two aliases both point to the shared value
>>> xs = [1,2,3]>>> ys = [4,5,6]>>> lists = [xs,ys]>>> id(xs)4302079040>>> id(ys)4301525288>>> id(lists)4301525360>>> id(lists[0])4302079040>>> id(lists[1])4301525288>>> xs = [7,8,9]>>> id(xs)4301525864>>> id(lists[0])4302079040
When are aliases Preserved?
• Re-assigning a variable (xs = newExpr) can point it to some different memory location, and can disassociate aliases.→ id(xs) result will change
• Updating part of a value (xs[0] = newval) reuses the same memory location, so any aliases are preserved.→ id(xs) result will stay the same
Poll – 4F
• aliases and memory
Extra Materials
Practice Problems
Here are some sample tasks you should try, either as functions or simple scripts:• Ask the user how many numbers they'll enter, then store
them all in a list. (what methods will we use?)• calculate the sum of a list of numbers• count how many numbers in a list are even.• step through a list and make all negative numbers positive
(take their absolute value)• Find the maximum number in a list of positive numbers• Find the sum of a list-of-lists-of-numbers. (2D list of nums)• Find the sum of a 3D list of numbers.
finding 2D indexes
xss = [[2,5,3],[1,4],[5,7,6,8]]val = 7row_loc = 0col_loc = 0found = Falsefor i in range(len(xss)):
for j in range(len(xss[i])):if xss[i][j]==val:
row_loc = icol_loc = jfound = Truebreak
if not found:print("not found!")
else:print("found at ("+str(row_loc)+","+str(col_loc)+")")
Bizarre Corner Cases of For-Loops• When we modify the list over which we want to
iterate, strange things can happen.Avoid modifying the list's length during the loop.
• Python actually finds out once, at the very start of running a loop, what structure it'll iterate over. This "iterator" can't be changed to some other iterator during the loop's execution.
• Following are some examples where modifying the list we're iterating over causes problems –don't code in this style!
Example – modifying length, with value-loops
Output
1098
<doesn't crash, stops 'early'>
Codexs = [6,7,8,9,10]for x in xs:
print(xs.pop( ))
The loop's iterator is an alias with xs. As xs' length changes, and we grab the next popped value from the list's end each iteration, the end of the list gets closer twice as fast!
→ it turns out Python does a bit of indexing behind the scenes when we write a value loop after all… implementation details are being exposed. boooo.
Example – modifying length, with index-loops
Output109876
Codexs = [6,7,8,9,10]for i in range(len(xs)):
print(xs.pop( ))
Our loop's iterator is a reference to the list of indexes [0,1,2,3,4], which never gets modified (though the list that xs refers to certainly does!).
Example – Iterator Unchanged
Output
xs @ 4301695384x = 1xs @ 4301695456x = 2xs @ 4301695528x = 3xs @ 4301695456x = 4xs @ 4301695528x = 5>>> xs[9, 9, 9]
Setup>>> xs = [1,2,3,4,5]>>> id(xs)4301695384
Usagefor x in xs:
print ("xs @", id(xs))print ("\tx = " +str(x))xs = [9,9,9]
The iterator is determined once and for all as we enter the loop. The loop iterates not over the thing named xs, but over the thing that xs referred to the moment we began the loop.
Aliases
• When we have multiple ways to access the same spot in memory, we call these alternate names "aliases."
→ xs and lists[0] are aliases.→ ys and lists[1] are aliases.
xs = [1,2,3]ys = [4,5,6]lists = [xs,ys]
lists[0][0] = 88print ("xs:", xs)
ys[2] = 77print ("lists:", lists)
Aliases Example
xs = [1,2,3]ys = [4,5,6]both = [xs,ys]xs[1] = 7print("xs is",xs)print("both is", both)ys = [8,9]print("ys is",ys)print("both is", both)
xs is [1, 7, 3]both is [[1, 7, 3], [4,5,6]]ys = [8, 9]both is [[1, 7, 3], [4, 5, 6]]
program output:
Practice Problem
What are the final values of x, y, and listval?(drawing out memory/arrows to names helps.)
xs = [1,2,3]ys = [4,5,6]listval = [xs,ys]xs[2] = 7ys[:] = ["hi","mom"]listval[0][1] = 8listval[1] = [True,False]ys[1] = 9
xs contains [1,8,7]ys contains ['hi',9]listval contains [[1,8,7],[True,False]]
Practice Problem
The following code doesn't put the value 99 into biglist anywhere. Why?
because biglist looks up the value of xs and *3's it, copying the contents of xs without making aliases. There are no complex values (e.g. lists) inside biglist.
biglist = [xs]*3 would create the list of lists, so each sub-list is a complex value that can exhibit reference updating. (of course, the meaning is slightly different too—originally biglist was one-dimensional, now it would be two-dimensional.)
xs = [1,2,3]biglist = xs*3xs[1] = 99
Shallow Copies
xss = [[1,2,3],[4,5,6]]ys = xss + xssys[0][0] = 8print(xss)print(ys)ys[1] = [100,200,300]print(xss)print(ys)
[[8, 2, 3], [4, 5, 6]][[8, 2, 3], [4, 5, 6], [8, 2, 3], [4, 5, 6]][[8, 2, 3], [4, 5, 6]][[8, 2, 3], [100, 200, 300], [8, 2, 3], [4, 5, 6]]
outp
ut:
code
:
Another effect driven by the data references – the "copies" made are simply multiple references to the same objects.
We call these "shallow copies".