R assignment
Transcript of R assignment
-
7/24/2019 R assignment
1/32
POPULARITY OF MUSIC RECORDSThe music industry has a wellde!el"#ed mar$et with a %l"&al annualre!enue ar"und '() &illi"n* The rec"rdin% industry is hi%hly c"m#etiti!eand is d"minated &y three &i% #r"ducti"n c"m#anies which ma$e u#
nearly +,- ". the t"tal annual al&um sales*Artists are at the c"re ". the music industry and rec"rd la&els #r"!idethem with the necessary res"urces t" sell their music "n a lar%e scale* Arec"rd la&el incurs numer"us c"sts /studi" rec"rdin%0 mar$etin%0distri&uti"n0 and t"urin%1 in e2chan%e ."r a #ercenta%e ". the #r".its.r"m al&um sales0 sin%les and c"ncert tic$ets*Un."rtunately0 the success ". an artist3s release is hi%hly uncertain4 asin%le may &e e2tremely #"#ular0 resultin% in wides#read radi" #lay and
di%ital d"wnl"ads0 while an"ther sin%le may turn "ut 5uite un#"#ular0and there."re un#r".ita&le*6n"win% the c"m#etiti!e nature ". the rec"rdin% industry0 rec"rd la&els.ace the .undamental decisi"n #r"&lem ". which musical releases t"su##"rt t" ma2imi7e their .inancial success*8"w can we use analytics t" #redict the #"#ularity ". a s"n%9 In thisassi%nment0 we challen%e "ursel!es t" #redict whether a s"n% will reacha s#"t in the T"# (: ". the ;ill&"ard 8"t (:: Chart*
Ta$in% an analytics a##r"ach0 we aim t" use in."rmati"n a&"ut a s"n%3s#r"#erties t" #redict its #"#ularity* The dataset s"n%s*cs!c"nsists ". alls"n%s which made it t" the T"# (: ". the ;ill&"ard 8"t (:: Chart .r"m(
-
7/24/2019 R assignment
2/32
timesignatureand timesignature_confidence? a !aria&le
estimatin% the time si%nature ". the s"n%0 and the c"n.idence in theestimate
loudness? a c"ntinu"us !aria&le indicatin% the a!era%e am#litude
". the audi" in deci&els
tempoand tempo_confidence? a !aria&le indicatin% the
estimated &eats #er minute ". the s"n%0 and the c"n.idence in theestimate
keyand key_confidence? a !aria&le with twel!e le!els indicatin%
the estimated $ey ". the s"n% /C0 C@0 * * *0 ;10 and the c"n.idence in theestimate
energy? a !aria&le that re#resents the "!erall ac"ustic ener%y ".
the s"n%0 usin% a mi2 ". .eatures such as l"udness
pitch? a c"ntinu"us !aria&le that indicates the #itch ". the s"n%
timbre_0_min0 timbre_0_max0 timbre_1_min0 timbre_1_max0 *
* * 0 timbre_11_min0 and timbre_11_max? !aria&les that indicate theminimumma2imum !alues "!er all se%ments ."r each ". the twel!e!alues in the tim&re !ect"r /resultin% in ,B c"ntinu"us !aria&les1
Top10? a &inary !aria&le indicatin% whether "r n"t the s"n% made
it t" the T"# (: ". the ;ill&"ard 8"t (:: Chart /( i. it was in the t"# (:0and : i. it was n"t1Use the read*cs! .uncti"n t" l"ad the dataset s"n%s*cs! int" R*8"w many "&ser!ati"ns /s"n%s1 are .r"m the year ,:(:9
First0 na!i%ate t" the direct"ry "n y"ur c"m#uter c"ntainin% the .ile
s"n%s*cs!* Y"u can l"ad the dataset &y usin% the c"mmand4
s"n%s ? read*cs!/s"n%s*cs!1
Then0 y"u can c"unt the num&er ". s"n%s .r"m ,:(: &y usin% the
ta&le .uncti"n4
373
-
7/24/2019 R assignment
3/32
ta&le/s"n%s'year1
8"w many s"n%s d"es the dataset include ."r which the artist name isMichael ac$s"n9
I. y"u l""$ at the structure ". the dataset &y ty#in% str/s"n%s10 y"u can
see that there are (:, di..erent !alues ". the !aria&le artistname*
S" i. we create a ta&le ". artistname0 it will &e challen%in% t" .ind
Michael ac$s"n* Instead0 we can use su&set4
Michaelac$s"n ? su&set/s"n%s0 artistname ?? Michael ac$s"n1
Then0 &y ty#in% str/Michaelac$s"n1 "r nr"w/Michaelac$s"n10 we
can see that there are (+ "&ser!ati"ns*
=hich ". these s"n%s &y Michael ac$s"n made it t" the T"# (:9 Selectall that a##ly*
Y"u R"c$ My ="rld0 Y"u Are >"t Al"ne0 c"rrect;eat It Y"u R"c$ My ="rld ;illie ean Y"u Are >"t Al"ne
=e can answer this 5uesti"n &y usin% "ur su&set Michaelac$s"n .r"m
the #re!i"us 5uesti"n* I. y"u "ut#ut the !ect"r
Michaelac$s"n's"n%title0 y"u can see the r"w num&er ". each ". the
s"n%s* Then0 y"u can see whether "r n"t that s"n% made it t" the t"#
(: &y "ut#utin% the !alue ". T"#(: ."r that r"w* F"r e2am#le0 ;eat
It is the (th s"n% in "ur su&set* S" then i. we ty#e4
18
-
7/24/2019 R assignment
4/32
Michaelac$s"n'T"#(:(G
we %et :0 which means that this s"n% did n"t ma$e it t" the T"# (:*
The s"n% Y"u R"c$ My ="rld is .irst "n the list0 s" i. we ty#e4
Michaelac$s"n'T"#(:(G
we %et (0 which means that this s"n% did ma$e it t" the T"# (:*
As a sh"rtcut0 y"u c"uld Hust "ut#ut4
Michaelac$s"nc/s"n%titleJ0 T"#(:J1G
The !aria&le c"rres#"ndin% t" the estimated time si%nature/timesi%nature1 is discrete0 meanin% that it "nly ta$es inte%er !alues /:0(0 ,0 0 * * * 1* =hat are the !alues ". this !aria&le that "ccur in "urdataset9 Select all that a##ly*:0 (0 0 B0 )0 K0 c"rrect
: ( , B ) K +
=hich timesi%nature !alue is the m"st .re5uent am"n% s"n%s in "ur
dataset9: ( , B B c"rrect ) K +
Y"u can answer these 5uesti"ns &y usin% the ta&le c"mmand4
ta&le/s"n%s'timesi%nature1
The "nly !alues that a##ear in the ta&le ."r timesi%nature are :0 (0 0
B0 )0 and K* =e can als" read .r"m the ta&le that K+K s"n%s ha!e a
!alue ". B ."r the timesi%nature0 which is the hi%hest c"unt "ut ". all
". the #"ssi&le timesi%nature !alues*
-
7/24/2019 R assignment
5/32
Out ". all ". the s"n%s in "ur dataset0 the s"n% with the hi%hest tem#" is"ne ". the ."ll"win% s"n%s* =hich "ne is it9
Until The Day I Die =anna ;e Startin3 S"methin3 =anna ;e
Startin3 S"methin3 c"rrect My 8a##y Endin% Y"u Ma$e Me
=anna***
Y"u can answer this 5uesti"n &y usin% the which*ma2 .uncti"n* The
"ut#ut ". which*ma2/s"n%s'tem#"1 is ,:0 meanin% that the s"n%
with the hi%hest tem#" is the r"w ,:* =e can "ut#ut the s"n% title
&y ty#in%4
s"n%s's"n%title,:G
The s"n% title is4 =anna &e Startin3 S"methin3*
=e wish t" #redict whether "r n"t a s"n% will ma$e it t" the T"# (:* T"d" this0 .irst use the su&set .uncti"n t" s#lit the data int" a trainin% setS"n%sTrain c"nsistin% ". all the "&ser!ati"ns u# t" and includin% ,::"tice that the * is used in #lace ". enumeratin% allthe inde#endent !aria&les* /Als"0 $ee# in mind that y"u can ch""se t"
#ut 5u"tes ar"und &in"mial0 "r lea!e "ut the 5u"tes* R can understandthis ar%ument either way*18"we!er0 in "ur case0 we want t" e2clude s"me ". the !aria&les in "urdataset .r"m &ein% used as inde#endent !aria&les /year0 s"n%title0artistname0 s"n%ID0 and artistID1* T" d" this0 we can use the
."ll"win% tric$* First de.ine a !ect"r ". !aria&le names called n"n!ars these are the !aria&les that we w"n3t use in "ur m"del*n"n!ars ? c/year0 s"n%title0 artistname0 s"n%ID0 artistID1T" rem"!e these !aria&les .r"m y"ur trainin% and testin% sets0 ty#e the."ll"win% c"mmands in y"ur R c"ns"le4S"n%sTrain ? S"n%sTrain 0 /names/S"n%sTrain1 -in- n"n!ars1 G
-
7/24/2019 R assignment
7/32
S"n%sTest ? S"n%sTest 0 /names/S"n%sTest1 -in- n"n!ars1 G>"w0 use the %lm .uncti"n t" &uild a l"%istic re%ressi"n m"del t" #redictT"#(: usin% all ". the "ther !aria&les as the inde#endent !aria&les* Y"ush"uld use S"n%sTrain t" &uild the m"del*
L""$in% at the summary ". y"ur m"del0 what is the !alue ". the A$ai$eIn."rmati"n Criteri"n /AIC19
T" answer this 5uesti"n0 y"u .irst need t" run the three %i!en
c"mmands t" rem"!e the !aria&les that we w"n3t use in the m"del
.r"m the datasets4
n"n!ars ? c/year0 s"n%title0 artistname0 s"n%ID0 artistID1
S"n%sTrain ? S"n%sTrain 0 /names/S"n%sTrain1 -in- n"n!ars1 G
S"n%sTest ? S"n%sTest 0 /names/S"n%sTest1 -in- n"n!ars1 G
Then0 y"u can create the l"%istic re%ressi"n m"
del with the ."ll"win% c"mmand4
S"n%sL"%( ? %lm/T"#(: N *0 data?S"n%sTrain0 .amily?&in"mial1
L""$in% at the &"tt"m ". the summary/S"n%sL"%(1 "ut#ut0 we can
see that the AIC !alue is B+,K*,*
Let3s n"w thin$ a&"ut the !aria&les in "ur dataset related t" thec"n.idence ". the time si%nature0 $ey and tem#"/timesi%natureQc"n.idence0 $eyQc"n.idence0 and tem#"Qc"n.idence1*Our m"del seems t" indicate that these c"n.idence !aria&les aresi%ni.icant /rather than the !aria&les timesi%nature0 $ey and tem#"themsel!es1* =hat d"es the m"del su%%est9
4827.2
-
7/24/2019 R assignment
8/32
The l"wer "ur c"n.idence a&"ut time si%nature0 $ey and tem#"0 the
m"re li$ely the s"n% is t" &e in the T"# (: The hi%her "ur c"n.idence
a&"ut time si%nature0 $ey and tem#"0 the m"re li$ely the s"n% is t" &e inthe T"# (: The hi%her "ur c"n.idence a&"ut time si%nature0 $ey andtem#"0 the m"re li$ely the s"n% is t" &e in the T"# (: c"rrect
I. y"u l""$ at the "ut#ut summary/m"del10 where m"del is the name
". y"ur l"%istic re%ressi"n m"del0 y"u can see that the c"e..icient
estimates ."r the c"n.idence !aria&les /timesi%natureQc"n.idence0
$eyQc"n.idence0 and tem#"Qc"n.idence1 are #"siti!e* This means that
hi%her c"n.idence leads t" a hi%her #redicted #r"&a&ility ". a T"# (:
hit*
In %eneral0 i. the c"n.idence is l"w ."r the time si%nature0 tem#"0 and$ey0 then the s"n% is m"re li$ely t" &e c"m#le2* =hat d"es M"del (su%%est in terms ". c"m#le2ity9
Mainstream listeners tend t" #re.er m"re c"m#le2 s"n%s
Mainstream listeners tend t" #re.er less c"m#le2 s"n%s Mainstreamlisteners tend t" #re.er less c"m#le2 s"n%s c"rrect
Since the c"e..icient !alues ."r timesi%natureQc"n.idence0
tem#"Qc"n.idence0 and $eyQc"n.idence are all #"siti!e0 l"wer
c"n.idence leads t" a l"wer #redicted #r"&a&ility ". a s"n% &ein% a hit*
S" mainstream listeners tend t" #re.er less c"m#le2 s"n%s*
S"n%s with hea!ier instrumentati"n tend t" &e l"uder /ha!e hi%her!alues in the !aria&le l"udness1 and m"re ener%etic /ha!e hi%her!alues in the !aria&le ener%y1*
-
7/24/2019 R assignment
9/32
;y ins#ectin% the c"e..icient ". the !aria&le l"udness0 what d"esM"del ( su%%est9
Mainstream listeners #re.er s"n%s with hea!y
instrumentati"n Mainstream listeners #re.er s"n%s with hea!yinstrumentati"n c"rrect Mainstream listeners #re.er s"n%s with li%ht
instrumentati"n;y ins#ectin% the c"e..icient ". the !aria&le ener%y0 d" we draw thesame c"nclusi"ns as a&"!e9 >" >" c"rrect
The c"e..icient estimate ."r l"udness is #"siti!e0 meanin% that
mainstream listeners #re.er l"uder s"n%s0 which are th"se with hea!ier
instrumentati"n* 8"we!er0 the c"e..icient estimate ."r ener%y is
ne%ati!e0 meanin% that mainstream listeners #re.er s"n%s that are less
ener%etic0 which are th"se with li%ht instrumentati"n* These
c"e..icients lead us t" di..erent c"nclusi"ns
=hat is the c"rrelati"n &etween the !aria&les l"udness and ener%y inthe trainin% set9
:
The c"rrelati"n can &e c"m#uted with the ."ll"win% c"mmand4
c"r/S"n%sTrain'l"udness0 S"n%sTrain'ener%y1
i!en that these tw" !aria&les are hi%hly c"rrelated0 M"del ( su..ers.r"m multic"llinearity* T" a!"id this issue0 we will "mit "ne ". these tw"!aria&les and rerun the l"%istic re%ressi"n* In the rest ". this #r"&lem0we3ll &uild tw" !ariati"ns ". "ur "ri%inal m"del4 M"del ,0 in which we
No
0.7399067
-
7/24/2019 R assignment
10/32
$ee# ener%y and "mit l"udness0 and M"del 0 in which we $ee#l"udness and "mit ener%y*Create M"del ,0 which is M"del ( with"ut the inde#endent !aria&lel"udness* This can &e d"ne with the ."ll"win% c"mmand4
S"n%sL"%, ? %lm/T"#(: N * l"udness0 data?S"n%sTrain0.amily?&in"mial1=e Hust su&tracted the !aria&le l"udness* =e c"uldn3t d" this with the!aria&les s"n%title and artistname0 &ecause they are n"t numeric!aria&les0 and we mi%ht %et di..erent !alues in the test set that thetrainin% set has ne!er seen* ;ut this a##r"ach /su&tractin% the !aria&le.r"m the m"del ."rmula1 will always w"r$ when y"u want t" rem"!enumeric !aria&les*
L""$ at the summary ". S"n%sL"%,0 and ins#ect the c"e..icient ". the!aria&le ener%y* =hat d" y"u "&ser!e9
M"del , su%%ests that s"n%s with hi%h ener%y le!els tend t" &e m"re
#"#ular* This c"ntradicts "ur "&ser!ati"n in M"del (* M"del , su%%eststhat s"n%s with hi%h ener%y le!els tend t" &e m"re #"#ular* Thisc"ntradicts "ur "&ser!ati"n in M"del (* c"rrect M"del , su%%ests
that0 similarly t" M"del (0 s"n%s with l"w ener%y le!els tend t" &e m"re
#"#ular*
The c"e..icient estimate ."r ener%y is #"siti!e in M"del ,0 su%%estin%
that s"n%s with hi%her ener%y le!els tend t" &e m"re #"#ular*
8"we!er0 n"te that the !aria&le ener%y is n"t si%ni.icant in this m"del*
>"w0 create M"del 0 which sh"uld &e e2actly li$e M"del (0 &ut with"utthe !aria&le ener%y*L""$ at the summary ". M"del and ins#ect the c"e..icient ". the!aria&le l"udness* Remem&erin% that hi%her l"udness and ener%y &"th"ccur in s"n%s with hea!ier instrumentati"n0 d" we ma$e the same"&ser!ati"n a&"ut the #"#ularity ". hea!y instrumentati"n as we didwith M"del ,9
-
7/24/2019 R assignment
11/32
Yes Yes c"rrect
M"del can &e created with the ."ll"win% c"mmand4
S"n%sL"% ? %lm/T"#(: N * ener%y0 data?S"n%sTrain0
.amily?&in"mial1
L""$in% at the "ut#ut ". summary/S"n%sL"%10 we can see that
l"udness has a #"siti!e c"e..icient estimate0 meanin% that "ur m"del
#redicts that s"n%s with hea!ier instrumentati"n tend t" &e m"re
#"#ular* This is the same c"nclusi"n we %"t .r"m M"del ,*
In the remainder ". this #r"&lem0 we3ll Hust use M"del *Ma$e #redicti"ns "n the test set usin% M"del * =hat is the accuracy ".M"del "n the test set0 usin% a thresh"ld ". :*B)9 /C"m#ute theaccuracy as a num&er &etween : and (*1
Y"u can ma$e #redicti"ns "n the test set &y usin% the c"mmand4
testPredict ? #redict/S"n%sL"%0 newdata?S"n%sTest0
ty#e?res#"nse1
Then0 y"u can create a c"n.usi"n matri2 with a thresh"ld ". :*B) &y
usin% the c"mmand4
ta&le/S"n%sTest'T"#(:0 testPredict ? :*B)1
The accuracy ". the m"del is /:
-
7/24/2019 R assignment
12/32
Let3s chec$ i. there3s any incremental &ene.it in usin% M"del instead ".a &aseline m"del* i!en the di..iculty ". %uessin% which s"n% is %"in% t"
&e a hit0 an easier m"del w"uld &e t" #ic$ the m"st .re5uent "utc"me /as"n% is n"t a T"# (: hit1 ."r all s"n%s* =hat w"uld the accuracy ". the
&aseline m"del &e "n the test set9 /i!e y"ur answer as a num&er&etween : and (*1
Y"u can c"m#ute the &aseline accuracy &y ta&lin% the "utc"me
!aria&le in the test set4
ta&le/S"n%sTest'T"#(:1
The &aseline m"del w"uld %et (B "&ser!ati"ns c"rrect0 and )"rtheast0
S"uth0 "r =est1*
!onser"ati"eness4 Sel.descri&ed le!el ". c"nser!ati!eness ".inter!iewee0 .r"m ( /!ery li&eral1 t" ) /!ery c"nser!ati!e1*
Info.#n.Internet4 >um&er ". the ."ll"win% items this inter!iewee
&elie!es t" &e a!aila&le "n the Internet ."r "thers t" see4 /(1 Their emailaddress /,1 Their h"me address /1 Their h"me #h"ne num&er /B1Their cell #h"ne num&er /)1 The em#l"yerc"m#any they w"r$ ."r /1Their #"litical #arty "r #"litical a..iliati"n /K1 Thin%s they3!e writtenthat ha!e their name "n it /+1 A #h"t" ". them /
-
7/24/2019 R assignment
16/32
Anonymity.%ossible4 A &inary !aria&le indicatin% i. the
inter!iewee thin$s it3s #"ssi&le t" use the Internet an"nym"usly0 meanin%in such a way that "nline acti!ities can3t &e traced &ac$ t" them /e5uals (i. heshe &elie!es y"u can0 and e5uals : i. heshe &elie!es y"u can3t1*
Tried.&asking.Identity4 A &inary !aria&le indicatin% i. the
inter!iewee has e!er tried t" mas$ hisher identity when usin% theInternet /e5uals ( i. heshe has tried t" mas$ hisher identity0 and e5uals: i. heshe has n"t tried t" mas$ hisher identity1*
%ri"acy.'a(s.)ffecti"e4 A &inary !aria&le indicatin% i. the
inter!iewee &elie!es United States law #r"!ides reas"na&le #ri!acy#r"tecti"n ."r Internet users /e5uals ( i. heshe &elie!es it d"es0 ande5uals : i. heshe &elie!es it d"esn3t1*
Usin% read*cs!/10 l"ad the dataset .r"m An"nymityP"ll*cs!int" a data
.rame called #"ll and summari7e it with the summary/1 and str/1
.uncti"ns*
8"w many #e"#le #artici#ated in the #"ll9
(
The num&er ". #e"#le wh" t""$ the #"ll is e5ual t" the num&er ".
r"ws ". the data .rame0 and can &e "&tained with nr"w/#"ll1 "r .r"m
the "ut#ut ". str/#"ll1*
Let3s l""$ at the &rea$d"wn ". the num&er ". #e"#le with smart#h"nes
usin% the ta&le/1 and summary/1 c"mmands "n the Smart#h"ne !aria&le*
/8I>T4 These three num&ers sh"uld sum t" (::,*1
1002
https://courses.edx.org/c4x/MITx/15.071x_2/asset/AnonymityPoll.csvhttps://courses.edx.org/c4x/MITx/15.071x_2/asset/AnonymityPoll.csv -
7/24/2019 R assignment
17/32
8"w many inter!iewees res#"nded that they use a smart#h"ne9
8"w many inter!iewees res#"nded that they d"n3t use a smart#h"ne9
B
8"w many inter!iewees did n"t res#"nd t" the 5uesti"n0 resultin% in a
missin% !alue0 "r >A0 in the summary/1 "ut#ut9
Fr"m the "ut#ut ". ta&le/#"ll'Smart#h"ne10 we can read that B+K
inter!iewees use a smart#h"ne and BK, d" n"t* Fr"m the
summary/#"ll'Smart#h"ne1 "ut#ut0 we see that an"ther B had
missin% !alues* As a sanity chec$0 B+KBK,B?(::,0 the t"tal
num&er ". inter!iewees*
;y usin% the ta&le/1 .uncti"n "n tw" !aria&les0 we can tell h"w they are
related* T" use the ta&le/1 .uncti"n "n tw" !aria&les0 Hust #ut the tw"
!aria&le names inside the #arentheses0 se#arated &y a c"mma /d"n3t
."r%et t" add #"ll' &e."re each !aria&le name1* In the "ut#ut0 the
#"ssi&le !alues ". the .irst !aria&le will &e listed in the le.t0 and the
#"ssi&le !alues ". the sec"nd !aria&le will &e listed "n the t"#* Each
487
472
43
-
7/24/2019 R assignment
18/32
entry ". the ta&le c"unts the num&er ". "&ser!ati"ns in the data set that
ha!e the !alue ". the .irst !alue in that r"w0 and the !alue ". the sec"nd
!aria&le in that c"lumn* F"r e2am#le0 su##"se we want t" create a ta&le
". the !aria&les Se2 and Re%i"n* =e w"uld ty#e
ta&le/#"ll'Se20 #"ll'Re%i"n1
in "ur R C"ns"le0 and we w"uld %et as "ut#ut
Midwest >"rtheast S"uth =est
Female (,
-
7/24/2019 R assignment
19/32
Te2as Te2as
c"rrect
Fr"m ta&le/#"ll'State0 #"ll'Re%i"n10 we can identi.y the census
re%i"n ". a #articular state &y l""$in% at the re%i"n ass"ciated with all
its inter!iewees* =e can read that C"l"rad" is in the =est re%i"n0
6entuc$y is in the S"uth re%i"n0 Pennsyl!ania is in the >"rtheast
re%i"n0 &ut the "ther three states are all in the Midwest re%i"n* Fr"m
the same chart we can read that Te2as is the state in the S"uth re%i"n
with the lar%est num&er ". inter!iewees0 K,*
An"ther way t" a##r"ach these #r"&lems w"uld ha!e &een t" su&set
the data .rame and then use ta&le "n the limited data .rame* F"r
instance0 t" .ind which states are in the Midwest re%i"n we c"uld ha!e
used4
MidwestInter!iewees ? su&set/#"ll0 Re%i"n??Midwest1
ta&le/MidwestInter!iewees'State1
Texas
-
7/24/2019 R assignment
20/32
and t" .ind the num&er ". inter!iewees .r"m each S"uth re%i"n state
we c"uld ha!e used4
S"uthInter!iewees ? su&set/#"ll0 Re%i"n??S"uth1
ta&le/S"uthInter!iewees'State1
As menti"ned in the intr"ducti"n t" this #r"&lem0 many ". the res#"nse
!aria&les /In."*On*Internet0 ="rry*A&"ut*In."0 Pri!acy*Im#"rtance0
An"nymity*P"ssi&le0 and Tried*Mas$in%*Identity1 were n"t c"llected i.
an inter!iewee d"es n"t use the Internet "r a smart#h"ne0 meanin% the
!aria&les will ha!e missin% !alues ."r these inter!iewees*
8"w many inter!iewees re#"rted n"t ha!in% used the Internet and n"t
ha!in% used a smart#h"ne9
8"w many inter!iewees re#"rted ha!in% used the Internet and ha!in%
used a smart#h"ne9
8"w many inter!iewees re#"rted ha!in% used the Internet &ut n"t ha!in%
used a smart#h"ne9
186
470
285
-
7/24/2019 R assignment
21/32
,
8"w many inter!iewees re#"rted ha!in% used a smart#h"ne &ut n"t
ha!in% used the Internet9
These ."ur !alues can &e read .r"m ta&le/#"ll'Internet*Use0
#"ll'Smart#h"ne1
8"w many inter!iewees ha!e a missin% !alue ."r their Internet use9
(
8"w many inter!iewees ha!e a missin% !alue ."r their smart#h"ne use9
B
The num&er ". missin% !alues can &e read .r"m summary/#"ll1
*ide Ans(erYou have used 3 of 3 submissions
PROBLEM 2.3 - INTERNET AND SMARTPHONE USERS
17
1
43
-
7/24/2019 R assignment
22/32
Use the su&set .uncti"n t" "&tain a data .rame called limited0 which is
limited t" inter!iewees wh" re#"rted Internet use "r wh" re#"rted
smart#h"ne use* In lecture0 we used the sym&"l t" use tw" criteria t"ma$e a su&set ". the data* T" "nly ta$e "&ser!ati"ns that ha!e a certain
!alue in "ne !aria&le "r the "ther0 the V character can &e used in #lace ".
the sym&"l* This is als" called a l"%ical "r "#erati"n*
8"w many inter!iewees are in the new data .rame9
K
The new data .rame can &e c"nstructed with4
limited ? su&set/#"ll0 Internet*Use ?? ( V Smart#h"ne ?? (1
The num&er ". r"ws can &e c"m#uted with nr"w/limited1*
*ide Ans(erYou have used 3 of 3 submissions
Important4 F"r all remainin% 5uesti"ns in this assi%nment #lease use the
limited data .rame y"u created in Pr"&lem ,**
PROBLEM 3.1 - SUMMARIZING OPINIONS ABOUT
INTERNET PRIVACY
792
-
7/24/2019 R assignment
23/32
=hich !aria&les ha!e missin% !alues in the limited data .rame9 /Select
all that a##ly*1
Smart#h"ne0 A%e0 C"nser!ati!eness0 ="rry*A&"ut*In."0
Pri!acy*Im#"rtance0 An"nymity*P"ssi&le0 Tried*Mas$in%*Identity0Pri!acy*Laws*E..ecti!e0 c"rrect
Internet*Use Smart#h"ne Se2 A%e State Re%i"n C"n
ser!ati!eness In."*On*Internet ="rry*A&"ut*In." Pri!acy*Im#"r
tance An"nymity*P"ssi&le Tried*Mas$in%*Identity Pri!acy*Law
s*E..ecti!e
),%'A-ATI#-
Y"u can read the num&er ". missin% !alues ."r each !aria&le .r"m
summary/limited1
=hat is the a!era%e num&er ". #ieces ". #ers"nal in."rmati"n "n the
Internet0 acc"rdin% t" the In."*On*Internet !aria&le9
This can &e "&tained with mean/limited'In."*On*Internet1 "r
summary/limited'In."*On*Internet1
8"w many inter!iewees re#"rted a !alue ". : ."r In."*On*Internet9
3.795455
-
7/24/2019 R assignment
24/32
(
8"w many inter!iewees re#"rted the ma2imum !alue ". (( ."r
In."*On*Internet9
),%'A-ATI#-
These can &e read .r"m ta&le/limited'In."*On*Internet1
=hat #r"#"rti"n ". inter!iewees wh" answered the ="rry*A&"ut*In."
5uesti"n w"rry a&"ut h"w much in."rmati"n is a!aila&le a&"ut them "n
the Internet9 >"te that t" c"m#ute this #r"#"rti"n y"u will &e di!idin%
&y the num&er ". #e"#le wh" answered the ="rry*A&"ut*In." 5uesti"n0
n"t the t"tal num&er ". #e"#le in the data .rame*
Fr"m ta&le/limited'="rry*A&"ut*In."10 we see that + ".
inter!iewees w"rry a&"ut their in."0 and B:B d" n"t* There."re0 there
were +B:B?K
-
7/24/2019 R assignment
25/32
>"te that we did n"t di!ide &y K
-
7/24/2019 R assignment
26/32
This can &e c"m#uted with the c"mmand
ta&le/limited'Tried*Mas$in%*Identity1* The "ut#ut tells us that ". all
the res#"ndents wh" answered the Tried*Mas$in%*Identity 5uesti"n0
(,+ "ut ". /(,+)1 ha!e tried mas$in% their identity "n the internet*
=hat #r"#"rti"n ". inter!iewees wh" answered the
Pri!acy*Laws*E..ecti!e 5uesti"n .ind United States #ri!acy laws
e..ecti!e9
=e can .ind this num&er with the c"mmand
ta&le/limited'Pri!acy*Laws*E..ecti!e1* The "ut#ut tells us that (+ "ut
". /(+)B(1 #e"#le wh" answered the Pri!acy*Laws*E..ecti!e
5uesti"n .ind US #ri!acy laws e..ecti!e*
O.ten0 we are interested in whether certain characteristics ".
inter!iewees /e*%* their a%e "r #"litical "#ini"ns1 a..ect their "#ini"ns "n
the t"#ic ". the #"ll /in this case0 "#ini"ns "n #ri!acy1* In this secti"n0
we will in!esti%ate the relati"nshi# &etween the characteristics A%e andSmart#h"ne and "utc"me !aria&les In."*On*Internet and
Tried*Mas$in%*Identity0 a%ain usin% the limited data .rame we &uilt in an
earlier secti"n ". this #r"&lem*
0.2558459
-
7/24/2019 R assignment
27/32
;uild a hist"%ram ". the a%e ". inter!iewees* =hat is the &est
re#resented a%e %r"u# in the #"#ulati"n9
Pe"#le a%ed a&"ut ,: years "ld Pe"#le a%ed a&"ut B: years "ld
Pe"#le a%ed a&"ut : years "ld Pe"#le a%ed a&"ut : years "ld c"rrectPe"#le a%ed a&"ut +: years "ld
Fr"m hist/limited'A%e10 we see the hist"%ram #ea$s at ar"und :
years "ld*
;"th A%e and In."*On*Internet are !aria&les that ta$e "n many !alues0 s"
a %""d way t" "&ser!e their relati"nshi# is thr"u%h a %ra#h* =e learned
in lecture that we can #l"t A%e a%ainst In."*On*Internet with the
c"mmand #l"t/limited'A%e0 limited'In."*On*Internet1* 8"we!er0
&ecause In."*On*Internet ta$es "n a small num&er ". !alues0 multi#le
#"ints can &e #l"tted in e2actly the same l"cati"n "n this %ra#h*
=hat is the lar%est num&er ". inter!iewees that ha!e e2actly the same
!alue in their A%e !aria&le A>D the same !alue in their In."*On*Internet
!aria&le9 In "ther w"rds0 what is the lar%est num&er ". "!erla##in%
#"ints in the #l"t #l"t/limited'A%e0 limited'In."*On*Internet19 /8I>T4
Use the ta&le .uncti"n t" c"m#are the num&er ". "&ser!ati"ns with
di..erent !alues ". A%e and In."*On*Internet*1
6
-
7/24/2019 R assignment
28/32
;y re!iewin% the "ut#ut ". ta&le/limited'A%e0
limited'In."*On*Internet10 we can see that there are inter!iewees
with a%e ) and In."*On*Internet !alue :0 with a%e : and
In."*On*Internet !alue :0 and with a%e : and In."*On*Internet !alue
(*
A m"re e..icient way t" ha!e "&tained the ma2imum num&er w"uld
ha!e &een t" run ma2/ta&le/limited'A%e0 limited'In."*On*Internet11
T" a!"id #"ints c"!erin% each "ther u#0 we can use the Hitter/1 .uncti"n
"n the !alues we #ass t" the #l"t .uncti"n* E2#erimentin% with the
c"mmand Hitter/c/(0 ,0 110 what a##ears t" &e the .uncti"nality ". the
Hitter c"mmand9
Hitter rand"mly re"rders the !alues #assed t" it0 and tw" runs will
yield the same result Hitter rand"mly re"rders the !alues #assed t" it0
and tw" runs will yield di..erent results Hitter adds "r su&tracts a small
am"unt ". rand"m n"ise t" the !alues #assed t" it0 and tw" runs willyield the same result Hitter adds "r su&tracts a small am"unt ". rand"m
n"ise t" the !alues #assed t" it0 and tw" runs will yield di..erentresults Hitter adds "r su&tracts a small am"unt ". rand"m n"ise t" the!alues #assed t" it0 and tw" runs will yield di..erent results c"rrect
-
7/24/2019 R assignment
29/32
;y runnin% the c"mmand Hitter/c/(0 ,0 11 multi#le times0 we can see
that the Hitter .uncti"n rand"mly adds "r su&tracts a small !alue .r"m
each num&er0 and tw" runs will yield di..erent results*
>"w0 #l"t A%e a%ainst In."*On*Internet with #l"t/Hitter/limited'A%e10
Hitter/limited'In."*On*Internet11* =hat relati"nshi# t" y"u "&ser!e
&etween A%e and In."*On*Internet9
Older a%e seems str"n%ly ass"ciated with a lar%er !alue ."r
In."*On*Internet Older a%e seems m"derately ass"ciated with a lar%er
!alue ."r In."*On*Internet Older a%e d"es n"t seem ass"ciated with a
chan%e in the !alue ". In."*On*Internet Older a%e seems m"derately
ass"ciated with a smaller !alue ."r In."*On*Internet Older a%e seemsm"derately ass"ciated with a smaller !alue ."r In."*On*Internet c"rrect
Older a%e seems str"n%ly ass"ciated with a smaller !alue ."r
In."*On*Internet
F"r y"un%er #e"#le a%ed (+:0 the a!era%e !alue ". In."*On*Internet
a##ears t" &e r"u%hly )0 while m"st #e"#led a%ed : and "lder ha!e a
!alue less than )* There."re0 "lder a%e a##ears t" &e ass"ciated with a
smaller !alue ". In."*On*Internet0 &ut .r"m the s#read ". d"ts "n the
ima%e0 it3s clear the ass"ciati"n is n"t #articularly str"n%*
-
7/24/2019 R assignment
30/32
Use the ta##ly/1 .uncti"n t" "&tain the summary ". the In."*On*Internet
!alue0 &r"$en d"wn &y whether an inter!iewee is a smart#h"ne user*
=hat is the a!era%e In."*On*Internet !alue ."r smart#h"ne users9
B
=hat is the a!era%e In."*On*Internet !alue ."r n"nsmart#h"ne users9
The #r"#er a##licati"n ". ta##ly here is4
ta##ly/limited'In."*On*Internet0 limited'Smart#h"ne0 summary1
=e can read the a!era%e ."r n"nsmart#h"ne users .r"m the summary
"ut#ut la&eled with : and the a!era%e ."r smart#h"ne users .r"m
the summary "ut#ut la&eled with ( *
Similarly use ta##ly t" &rea$ d"wn the Tried*Mas$in%*Identity !aria&le
."r smart#h"ne and n"nsmart#h"ne users*
=hat #r"#"rti"n ". smart#h"ne users wh" answered the
Tried*Mas$in%*Identity 5uesti"n ha!e tried mas$in% their identity when
usin% the Internet9
4.367556
2.922807
-
7/24/2019 R assignment
31/32
=hat #r"#"rti"n ". n"nsmart#h"ne users wh" answered the
Tried*Mas$in%*Identity 5uesti"n ha!e tried mas$in% their identity whenusin% the Internet9
=e can %et the &rea$d"wn ."r smart#h"ne and n"nsmart#h"ne users
with4
ta##ly/limited'Tried*Mas$in%*Identity0 limited'Smart#h"ne0 ta&le1
Am"n% smart#h"ne users0
-
7/24/2019 R assignment
32/32
>e2t wee$0 we will &e%in t" m"re ."rmally characteri7e h"w an "utc"me
!aria&le li$e In."*On*Internet can &e #redicted with a !aria&le li$e A%e
"r Smart#h"ne*