ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

12
1 PURDUE Section: Instructions You have 60 minutes to complete this 8-question exam (excluding Q1 which is the honor pledge). That gives about 7 minutes per question, so be sure to pace yourself accordingly. The exam is worth 100 points total, and each question is worth 12 or 13 points. You are free to consult printed out materials from the course website and handwritten notes for the exam. However, you are not permitted to use a computer, calculator, or any other resources. It is critical that you follow the exact template we have provided in the exam packet. This includes 1) Writing all of your work on the exact sheet of paper provided for each question. Some questions are written on the front and back of the paper. Feel free to use the back of the page if it is blank. An extra sheet of paper is provided at the end for additional space as needed. For each question, you must show any work you used to arrive at your answer. 2) Writing your name and PUID at the top of every page. To facilitate this, the first 5 minutes (before the 60 minute counter starts) will be used solely to fill out your name/ID on each page. Good luck! NAME ____________________________________ PUID ________________________________________ ECE 20875: Python for Data Science Fall 2021 Exam #1 Christopher Brinton, Qiang Qiu, and Mahsa Ghasemi

Transcript of ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

Page 1: ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

1

PURDUE

Section:

Instructions You have 60 minutes to complete this 8-question exam (excluding Q1 which is the honor pledge). That gives about 7 minutes per question, so be sure to pace yourself accordingly. The exam is worth 100 points total, and each question is worth 12 or 13 points. You are free to consult printed out materials from the course website and handwritten notes for the exam. However, you are not permitted to use a computer, calculator, or any other resources. It is critical that you follow the exact template we have provided in the exam packet. This includes

1) Writing all of your work on the exact sheet of paper provided for each question. Some questions are written on the front and back of the paper. Feel free to use the back of the page if it is blank. An extra sheet of paper is provided at the end for additional space as needed. For each question, you must show any work you used to arrive at your answer.

2) Writing your name and PUID at the top of every page. To facilitate this, the first 5 minutes (before the 60 minute counter starts) will be used solely to fill out your name/ID on each page.

Good luck!

NAME ____________________________________

PUID ________________________________________

ECE 20875: Python for Data Science Fall 2021 Exam #1

Christopher Brinton, Qiang Qiu, and Mahsa Ghasemi

Page 2: ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

NAME: PUID:

2

Question 1: Honor Pledge and Acknowledgment Please sign the honor pledge below with a signature of your full legal name. I understand and acknowledge the above instructions and notes. I also affirm that the answers given on this test are mine and mine alone. I did not receive help from any person or material (other than those explicitly allowed). X___________________________________________

Page 3: ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

NAME: PUID:

3

Question 2 [12 points] Suppose we are given a nested list records which contains the names and grades of students in a class, with each inner list in the format [name, grade]. Fill in the blanks of the following code block to obtain a list max_name which stores the name(s) of student(s) having the highest grade. For example, if records = [["chi",85], ["beta",90], ["alpha",90]], the highest score is 90, and there are two students with that score, so max_name = [“beta”, “alpha”]. max_name = [] max_score = 0 for s in records: if s[1] > max_score: max_name = [] max_score = s[1] max_name.append(s[0]) elif s[1] == max_score: max_name.append(s[0]) print(max_name)

Page 4: ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

NAME: PUID:

4

Question 3 [12 points] A. [6 points] Write a program to create a new dictionary dict_n that contains all elements in dictionary dict1 which are not in dictionary dict2. For example, if dict1 = {"Dickinson" : "poet", "Socrates" : "philosopher", "da Vinci" : "painter"}

dict2 = {"Monet" : "painter", "Socrates" : "philosopher"}

Then

dict_n = {"Dickinson" : "poet", "da Vinci" : "painter"} Write your code in the box below using the following steps:

i. Convert dict1 and dict2 into set1 and set2 where each element of each set is a tuple of (key, value) from the corresponding dictionary.

ii. Compute the set difference of set1 and set2. iii. Create the new dictionary dict_n from the set difference.

# assume dict1 and dict2 are already defined # write step i below set1 = set() for i in dict1.items(): set1.add(i) set2 = set() for i in dict2.items(): set2.add(i) # write step ii below set_diff = set1.difference(set2) # write step iii below dict_n = {} for i in set_diff: dict_n[i[0]] = i[1] print(dict_n)

Page 5: ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

NAME: PUID:

5

B. [6 points] Write a function createAndOfOrs whose input argument funcList is a list of lists, where each inner list has two functions in it, and whose output out_func is a function that computes the ANDs of the ORs of each inner list. For example, if you receive the following list in the input argument:

funcList = [[f1,f2], [f3,f4], [f5,f6]]

then you should return the function:

out_func(x) = (f1(x) OR f2(x)) AND (f3(x) OR f4(x)) AND (f5(x) OR f6(x)) Write your code in the box below.

def createAndOfOrs(funcList): def out_func(x): p_list = [] for f_pair in funcList: p_list.append(f_pair[0](x) or f_pair[1](x)) res = True for p in p_list: res = res and p return res return out_func

Page 6: ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

NAME: PUID:

6

Question 4 [12 points] Nikola Jokic is an NBA basketball player playing for the Denver Nuggets. He scored the following in his last 10 games of the NBA 2021 season: [22 32 24 22 36 38 34 21 20 31] A. [4 points] Draw a count histogram of 3 evenly spaced bins for data points from 15 to 45, i.e.,

the three bins are defined as [15, 25), [25, 35), [35 45). Be sure to mark the height of each bar. B. [4 points] By normalizing the bin counts, draw the density histogram version of the above

histogram. Be sure to mark the height of each bar. Probability normalization first: 5/(5+3+2)=0.5; 3/(5+3+2)=0.3; 2/(5+3+2)=0.2 Density normalization by further dividing bin width 10: 0.5/10=0.05; 0.3/10=0.03; 0.2/10=0.02 C. [4 points] Based only on this dataset, use the Empirical CDF to estimate how likely it is for

Nikola to score 22 points or lower in his next game. (Optional step) Sort those scores in the ascending order [20 21 22 22 24 31 32 34 36 38] Use the empirical CDF formula P(X<=22) = Σx<=22 1/n = Σ{20, 21, 22, 22} 1/10 = 4/10=0.4

15 25 35 45

5 3 2

15 25 35 45

0.05 0.03 0.02

Page 7: ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

NAME: PUID:

7

Question 5 [13 points] A. An experiment consists of flipping a fair coin two times. Let 𝑋 be the random variable for the number of tails observed.

i. [4 points] What is the probability mass function (PMF) of 𝑋? Fill in the table below.

𝑋 = 𝑥! 𝑃(𝑋 = 𝑥!) 0 0.25 1 0.50 2 0.25

The possible outcomes are TT, TH, HT, and HH. (This is a Binomial random variable with 𝒏 = 𝟐 and a success probability 𝒑 = 𝟏/𝟐.)

ii. [3 points] Plot the Cumulative Distribution Function (CDF) of 𝑋 on the graph below. Be sure to label values on both axes clearly.

B. [6 points] The probability density function (PDF) of a continuous random variable 𝑌 is defined by:

𝑓"(𝑦) = 00𝑖𝑓𝑦 < 2

𝑘 ∗ (y − 2)𝑖𝑓2 ≤ 𝑦 ≤ 40𝑖𝑓𝑦 > 4

where 𝑘 is a constant. For this to be a valid PDF, what must be is the value of 𝑘? Be sure to show your work. The PDF of a continuous random variable must integrate to 1:

𝟏 = < 𝒇𝒀(𝒚)𝒅𝒚 =$

%$< 𝒌(𝒚 − 𝟐)𝒅𝒚 =

𝒌𝟐

𝟒

𝟐(𝒚 − 𝟐)𝟐|𝟒𝟐 = 𝟐𝒌 → 𝒌 =

𝟏𝟐

1.0

0.75

0.25

2 1 0

𝑃(𝑋 ≤ 𝑥)

𝑥

Page 8: ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

NAME: PUID:

8

Question 6 [13 points] Using any combination of Python’s built-in functions map, reduce, and filter, complete the block of code below to calculate the variance data_var of a dataset data_list after removing any “outlier” data points. (You may choose to write your own map, reduce, and filter operations if you prefer.) Specifically, discard all data points with absolute values greater than 6, and compute the variance of the remaining data points. Recall the variance formula for N data points with sample mean �̅�:11

𝑠( =1

𝑁 − 1H(𝑥! −�̅�)(

)

!*+

data_list = [9, 4, -9, 3, 7, 5, 3, 0, 8, -7] # remove outliers data_clean = list(filter(lambda x: (x >= -6 and x <= 6), data_list)) # compute sample mean data_mean = reduce(lambda x, y: x + y, data_clean) / len(data_clean) # compute sample variance diffs = list(map(lambda x: x-data_mean, data_clean)) sq_diffs = list(map(lambda x: x*x, diffs)) data_var = reduce(lambda x, y: x + y, sq_diffs) / (len(sq_diffs)-1) print("variance of the outlier-free data : {}".format(data_var))

Page 9: ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

NAME: PUID:

9

Question 7 [12 points] A. [6 points] Consider two lists p and q, e.g., p = [7, 6, 5] q = [4, 7, 9, 0, 6] Write a list comprehension to create a list of tuples that combines elements from the two lists p and q if they are not equal, e.g., [(7, 4), (7, 9), (7, 0), (7, 6), (6, 4), (6, 7), (6, 9), (6, 0), (5, 4), (5, 7), (5, 9), (5, 0), (5, 6)] (Fill in the blanks below) result = [(x, y) for __x in p______ for __y in q_____ if ___x!=y____] B. [6 points] Write a list comprehension to generate the following matrix: matrix= [[1, 2, 4, 8], [2, 4, 8, 16], [4, 8, 16, 32], [8, 16, 32, 64]] (Fill in the blanks below) matrix = [[2**(__x+y___) for __x__ in range(4)] for __y__ in range(4)]

Page 10: ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

NAME: PUID:

10

Question 8 [13 points] A consulting service is studying the amount of time after graduation that computer engineers spend before starting their first job. They have found that the random variable 𝑋 for the amount of time before starting follows an exponential distribution. The PDF for an exponentially distributed variable 𝑋 is given by

𝑓,(𝑥) = I𝜆𝑒%-.𝑥 ≥ 0

0𝑥 < 0,

where 𝜆 is the rate parameter, with mean 𝜇, = 1/𝜆 and variance 𝜎,( = 1/𝜆(. The consulting service has determined that 𝜆 = 2. A. [7 points] We are interested in modeling the sample mean 𝑋O/ for a large number of engineers 𝑛 (e.g., a few hundreds). What would be a good approximation to the probability density function of the sample mean? Your answer should be in terms of 𝑛. By the central limit theorem, we know that the sampling distribution of the sample mean becomes approximately normally distributed around the population mean as the number of

samples increases: 𝑿R𝒏~𝑵U𝝁𝑿,𝝈𝑿𝟐

𝒏X. In this case, we are given 𝝁𝑿 = 𝟏/𝟐 and 𝝈𝑿𝟐 = 𝟏/𝟒. Thus,

𝑿R𝒏~𝑵U𝟏𝟐, 𝟏𝟒𝒏X.

B. [6 points] Draw a sketch below of what happens to the probability density function of 𝑋O/ as 𝑛 increases. Be sure to indicate the behavior as 𝑛 approaches infinity. For increasing 𝒏𝟏, 𝒏𝟐, 𝒏𝟑, …

Page 11: ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

NAME: PUID:

11

Question 9 [13 points] Kate is a Junior in ECE and has been investing in LuckyCharmsCoin. She is inviting all her friends to invest. Andrew, one of her friends, hears the following assertion: "I assure you that the people who invest $10 in LuckyCharmsCoin today will earn $100 by Fall 2022 on average." Andrew is unsure of this statement, thinking the average earning is not $100. Fortunately, he knows 35 friends who have invested $10 in LuckyCharmsCoin, and the mean of their balances after a year is:

𝑋O = 103.4

A. [2 points] Form Andrew’s null and alternative hypotheses about Kate’s statement:

• 𝐻5:µ = 𝟏𝟎𝟎 or “The average balance (or earning) after a year is $100”

• 𝐻+ ∶ µ ≠ 𝟏𝟎𝟎 or “The average balance (or earning) after a year is not $100” B. [4 points] Andrew has determined that the variance of the people’s earnings from investing $10 in LuckyCharmsCoin after a year is 𝜎( = 560. Compute the z-score.

𝑺𝑬 = f𝝈𝟐

𝒏 = f𝟓𝟔𝟎𝟑𝟓 = √𝟏𝟔 = 𝟒

𝒛 = 𝒙R − µ𝑺𝑬 =

𝟏𝟎𝟑. 𝟒 − 𝟏𝟎𝟎𝟒 = 𝟎. 𝟖𝟓

C. [4 points] Shade the p-value on the plot below corresponding to the z-score from the previous part. Also, circle one of the following options as the correct interval for p-value:

(a) 0.01 > p-value > 0 (b) 0.05 > p-value > 0.01 (c) 0.1 > p-value > 0.05 (d) p-value > 0.1

Page 12: ECE 20875: Python for Data Science Fall 2021 Exam #1 ...

NAME: PUID:

12

D. [3 points] With significance level α = 0.05, would Andrew reject Kates's statement? Explain your rationale in 1-2 sentences.

𝒑 > 𝜶 The result is not significant at the given significance level. Hence, Andrew cannot reject Kate’s statement.