Download - making connections - - MSCS@UICjan/mcs275/connections.pdf · making connections 1 CTA Tables general transit feed specification stop names and stop times storing the connections

Transcript

making connections1 CTA Tables

general transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

MCS 275 Lecture 40Programming Tools and File Management

Jan Verschelde, 19 April 2017

Programming Tools (MCS 275) making connections L-40 19 April 2017 1 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 2 / 41

GTFS of our CTA

We can download the schedules of the CTA:http://www.transitchicago.com/developers/gtfs.aspx

GTFS = General Transit Feed Specificationis an open format for packaging scheduled service data.

A GTFS feed is a series of text files with data on lines separated bycommas (csv format).

Each file is a table in a relational database.

Programming Tools (MCS 275) making connections L-40 19 April 2017 3 / 41

some tables

stops.txt: stop locations for bus or trainroutes.txt: route list with unique identifierstrips.txt: information about each trip by a vehiclestop_times.txt: scheduled arrival and departure times foreach stop on each trip.

Programming Tools (MCS 275) making connections L-40 19 April 2017 4 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 5 / 41

finding a stop name

$ python3 ctastopname.pyopening CTA/stops.txt ...give a stop id : 3021skipping line 03021 has name "California & Augusta"

The script looks for the line

3021,3021,"California & Augusta",41.89939053, \-87.69688045,0,,1

Programming Tools (MCS 275) making connections L-40 19 April 2017 6 / 41

ctastopname.pyFILENAME = ’CTA/stops.txt’print ’opening’, FILENAME, ’...’DATAFILE = open(FILENAME, ’r’)STOPID = input(’give a stop id : ’)COUNT = 0STOPNAME = Nonewhile True:

LINE = DATAFILE.readline()if LINE == ’’:

breakL = LINE.split(’,’)try:

if int(L[0]) == STOPID:STOPNAME = L[2]break

except:print ’skipping line’, COUNT

COUNT = COUNT + 1print STOPID, ’has name’, STOPNAME

Programming Tools (MCS 275) making connections L-40 19 April 2017 7 / 41

finding head signs

Given an identification of a stop,we look for all CTA vehicles that make a stop there.

$ python3 ctastoptimes.pyopening CTA/stop_times.txt ...give a stop id : 3021skipping line 0adding "63rd Pl/Kedzie"adding "Kedzie/Van Buren"[’"63rd Pl/Kedzie"’, ’"Kedzie/Van Buren"’]

We scan the lines in stop_times.txt for where the given stopidentification occurs.

Programming Tools (MCS 275) making connections L-40 19 April 2017 8 / 41

ctastoptimes.pyFILENAME = ’CTA/stop_times.txt’print ’opening’, FILENAME, ’...’DATAFILE = open(FILENAME, ’r’)STOPID = input(’give a stop id : ’)COUNT = 0TIMES = []while True:

LINE = DATAFILE.readline()if LINE == ’’:

breakL = LINE.split(’,’)try:

if int(L[3]) == id:if not L[5] in TIMES:

print ’adding’, L[5]TIMES.append(L[5])

except:print ’skipping line’, COUNT

COUNT = COUNT + 1print TIMES

Programming Tools (MCS 275) making connections L-40 19 April 2017 9 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 10 / 41

finding connections

The file stop_times.txt has lines

22043803629,07:38:30,07:38:30,30085,22,"UIC",0,9845522043803629,07:40:30,07:40:30,30069,23,"UIC",0,100813

Stops 30085 ("Clinton-Blue")and 30069 ("UIC-Halsted") are connectedvia stop head sign "UIC".

In a dictionary D we store D[(30085,30069)] = "UIC".

Programming Tools (MCS 275) making connections L-40 19 April 2017 11 / 41

ctaconnections.py

The initialization and start of the loop:

FILENAME = ’CTA/stop_times.txt’print ’opening’, FILENAME, ’...’DATAFILE = open(FILENAME, ’r’)COUNT = 0PREV_STOP = -1PREV_HEAD = ’’D = {}while True:

LINE = DATAFILE.readline()if LINE == ’’:

breakL = LINE.split(’,’)

Programming Tools (MCS 275) making connections L-40 19 April 2017 12 / 41

ctaconnections.py

Updating the dictionary D with L:

try:(STOP, HEAD) = (int(L[3]), L[5])if PREV_STOP == -1:

(PREV_STOP, PREV_STOP) = (STOP, HEAD)else:

if PREV_HEAD == HEAD:D[(PREV_STOP, STOP)] = HEAD

else:(PREV_STOP, PREV_HEAD) = (STOP, HEAD)

except:print ’skipping line’, COUNT

COUNT = COUNT + 1print D, len(D)

Programming Tools (MCS 275) making connections L-40 19 April 2017 13 / 41

a sparse matrix

There are 11430 lines in stops.txt.Except for the first line, every line is stop.Viewing each stop as a node in a graph,there are 11429 nodes.The adjacency matrix has 11,429 rows and 11,429 colums or130,622,041 elements.The dictionary stores 583,279 elements, less than 0.5% of the totalpossible 11,429 × 11,429 elements.

Programming Tools (MCS 275) making connections L-40 19 April 2017 14 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 15 / 41

connecting the stops

$ python3 ctaconnectstops.pyopening CTA/stop_times.txt ...loading a big file, be patient ...skipping line 0573036 connectionsgive start stop id : 30085

give end stop id : 3006930085 and 30069 are connected by "UIC"

Programming Tools (MCS 275) making connections L-40 19 April 2017 16 / 41

the function stopdict

FILENAME = ’CTA/stop_times.txt’

def stopdict(name):"""Opens the file with given name.The file contains scheduled arrivaland departure times for each stopon each trip. On return is a dictionaryD with keys (i,j) and strings as values,where i and j are stop ids and thevalue is the empty string if i and jare not connected by a trip, otherwiseD[(i,j)] contains the trip name."""

Programming Tools (MCS 275) making connections L-40 19 April 2017 17 / 41

the function main()

def main():"""Creates a dictionary from the filestop_times.txt and prompts the userfor a start and end stop id.The result of the dictonary querytells whether the stops are connected."""conn = stopdict(FILENAME)print len(conn), ’connections’i = input(’give start stop id : ’)j = input(’ give end stop id : ’)outs = str(i) + ’ and ’ + str(j)if not conn.has_key((i, j)):

print outs + ’ are not connected’else:

print outs + ’ are connected by ’ + conn[(i, j)]

Programming Tools (MCS 275) making connections L-40 19 April 2017 18 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 19 / 41

sparse matrices

>>> from scipy import sparse

To store an adjacency matrix similar to D[(i,j)]we use the COOrdinate format:

>>> from scipy import array>>> from scipy.sparse import coo_matrix>>> row = array([0,3,1,0])>>> col = array([0,3,1,2])>>> data = array([4,5,7,9])>>> A = coo_matrix((data,(row,col)),shape=(4,4))>>> A.todense()matrix([[4, 0, 9, 0],

[0, 7, 0, 0],[0, 0, 0, 0],[0, 0, 0, 5]])

Programming Tools (MCS 275) making connections L-40 19 April 2017 20 / 41

SciPy session continued

>>> B = A*A>>> B.todense()matrix([[16, 0, 36, 0],

[ 0, 49, 0, 0],[ 0, 0, 0, 0],[ 0, 0, 0, 25]])

Property of adjacency matrices A: if (Ak )i ,j �= 0,then nodes i and j are connected by a path of length k .

Programming Tools (MCS 275) making connections L-40 19 April 2017 21 / 41

dictionary of keys sparse matrices

dok_matrix is a dictionary of keys based sparse matrix:

allows for efficient access of individual elements;can be efficient converted to a coo_matrix.

>>> from scipy import sparse>>> A = sparse.dok_matrix((4,4))>>> A[1,2] = 1>>> B = sparse.coo_matrix(A)>>> B.todense()matrix([[ 0., 0., 0., 0.],

[ 0., 0., 1., 0.],[ 0., 0., 0., 0.],[ 0., 0., 0., 0.]])

Programming Tools (MCS 275) making connections L-40 19 April 2017 22 / 41

session continued

>>> B.todense()matrix([[ 0., 0., 0., 0.],

[ 0., 0., 1., 0.],[ 0., 0., 0., 0.],[ 0., 0., 0., 0.]])

>>> B.rowarray([1], dtype=int32)>>> B.colarray([2], dtype=int32)>>> B.dataarray([ 1.])>>> B.nnz1

The attributes row, col, data, and nnz respectively return the row,column indices, the corresponding data, and the number of nonzeros.

Programming Tools (MCS 275) making connections L-40 19 April 2017 23 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 24 / 41

a matrix plot

Programming Tools (MCS 275) making connections L-40 19 April 2017 25 / 41

the script spy_matrixplot.py

import numpy as npfrom matplotlib.pyplot import spyimport matplotlib.pyplot as pltfrom scipy import sparse

r = 0.1 # ratio of nonzeroesn = 100 # dimension of the matrixA = np.random.rand(n,n)A = np.matrix(A < r,int)S = sparse.coo_matrix(A)x = S.row; y = S.colfig = plt.figure()ax = fig.add_subplot(111)ax.plot(x,y,’.’)plt.show()

Programming Tools (MCS 275) making connections L-40 19 April 2017 26 / 41

the matrix plot for the CTA

Programming Tools (MCS 275) making connections L-40 19 April 2017 27 / 41

the script ctamatrixplot.py

# L-40 MCS 275 Wed 20 Apr 2016 : ctamatrixplot.py

# This script creates a sparse matrix A,# which is the adjacency matrix of the stops:# A[i,j] = 1 if stops i and j are connected.

from scipy import sparseimport matplotlib.pyplot as plt

filename = ’CTA/stop_times.txt’print ’opening’, filename, ’...’file = open(filename,’r’)

n = 12165A = sparse.dok_matrix((n,n))

Programming Tools (MCS 275) making connections L-40 19 April 2017 28 / 41

the script continued

i = 0; prev_id = -1; prev_hd = ’’while True:

d = file.readline()if d == ’’: breakL = d.split(’,’)try:

id = int(L[3]); hd = L[5]if prev_id == -1:

(prev_id, prev_hd) = (id, hd)else:

if prev_hd == hd:A[prev_id, id] = 1

else:(prev_id, prev_hd) = (id, hd)

except:pass # print ’skipping line’, i

i = i + 1

Programming Tools (MCS 275) making connections L-40 19 April 2017 29 / 41

making the plot

B = sparse.coo_matrix(A)x = B.row; y = B.colfig = plt.figure()ax = fig.add_subplot(111)ax.set_xlim(-1,n)ax.set_ylim(-1,n)ax.plot(x,y,’b.’)plt.show()

Programming Tools (MCS 275) making connections L-40 19 April 2017 30 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 31 / 41

adjacency matrix

An adjacency matrix A is a matrix of zeroes and ones:

A[row][column] = 1: row and column are connected,A[row][column] = 0: row and column are not connected.

For example:

1 0 1 0 00 1 1 0 10 0 0 0 01 0 1 0 10 1 1 1 0

Programming Tools (MCS 275) making connections L-40 19 April 2017 32 / 41

a random adjacency matrix

from random import randint

def random_adjacencies(dim):"""Returns D, a dictionary of dictionaries torepresent a square matrix of dimension dim.D[row][column] is a random bit."""result = {}for row in range(dim):

result[row] = {}for column in range(dim):

result[row][column] = randint(0, 1)return result

Programming Tools (MCS 275) making connections L-40 19 April 2017 33 / 41

writing the matrix

def write(dim, mat):"""Writes the square matrix of dimension dimrepresented by the dictionary mat."""for row in range(dim):

for column in range(dim):print(’ %d’ % mat[row][column], end=’’)

print(’’)

Programming Tools (MCS 275) making connections L-40 19 April 2017 34 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 35 / 41

searching the adjacency matrix

Consider again the example:

1 0 1 0 00 1 1 0 10 0 0 0 01 0 1 0 10 1 1 1 0

Observe:There is no direct path from 1 to 3.We can go from 1 to 4 and from 4 to 3.

Programming Tools (MCS 275) making connections L-40 19 April 2017 36 / 41

matrix-matrix multiplication

>>> import numpy as np>>> A = np.matrix([[1, 0, 1, 0, 0],... [0, 1, 1, 0, 1],... [0, 0, 0, 0, 0],... [1, 0, 1, 0, 1],... [0, 1, 1, 1, 0]])>>> A*Amatrix([[1, 0, 1, 0, 0],

[0, 2, 2, 1, 1],[0, 0, 0, 0, 0],[1, 1, 2, 1, 0],[1, 1, 2, 0, 2]])

>>> _[1, 3]1

A2i ,j = 1: there is a path from i to j with one intermediate stop.

Programming Tools (MCS 275) making connections L-40 19 April 2017 37 / 41

the main program

def main():"""Prompts the user for the dimensionans shows a random adjacency matrix."""dim = int(input(’Give the dimension : ’))mtx = random_adjacencies(dim)write(dim, mtx)src = int(input(’Give the source : ’))dst = int(input(’Give the destination : ’))mxt = int(input(’Give the maximum number of steps : ’))pth = search(dim, mtx, dst, 0, mxt, [src])print(’the path :’, pth)

Programming Tools (MCS 275) making connections L-40 19 April 2017 38 / 41

the specfication and base case

def search(dim, mat, destination, level, maxsteps, \accu):"""Searchs the matrix mat of dimension dimfor a path between source and destination withno more than maxsteps intermediate stops.The path is accumulated in accu,initialized with source."""source = accu[-1]if mat[source][destination] == 1:

return accu + [destination]else:

...

Programming Tools (MCS 275) making connections L-40 19 April 2017 39 / 41

the rest of the definition

if level < maxsteps:for k in range(dim):

if k not in accu:if mat[source][k] == 1:

path = search(dim, mat, destination, \level+1, maxsteps, accu + [k])

if path[-1] == destination:return path

return accu

Programming Tools (MCS 275) making connections L-40 19 April 2017 40 / 41

Summary + Exercises

Dictionaries are good to process data on file.

1 Modify ctastopname.py so the user is prompted for a stringinstead of a number. The modified script prints all id’s andcorresponding names that have the given string as substring.

2 Instead of using numpy and scipy,use turtle to draw the spy plot of a matrix.

3 Instead of using numpy and scipy,use the canvas widget of tkinter to draw the spy plot of a matrix.

4 Apply the search to work on the adjacency matrix of the dataobtained for the CTA.

Programming Tools (MCS 275) making connections L-40 19 April 2017 41 / 41