R assignment

download R assignment

of 32

Transcript of R assignment

  • 7/24/2019 R assignment

    1/32

    POPULARITY OF MUSIC RECORDSThe music industry has a wellde!el"#ed mar$et with a %l"&al annualre!enue ar"und '() &illi"n* The rec"rdin% industry is hi%hly c"m#etiti!eand is d"minated &y three &i% #r"ducti"n c"m#anies which ma$e u#

    nearly +,- ". the t"tal annual al&um sales*Artists are at the c"re ". the music industry and rec"rd la&els #r"!idethem with the necessary res"urces t" sell their music "n a lar%e scale* Arec"rd la&el incurs numer"us c"sts /studi" rec"rdin%0 mar$etin%0distri&uti"n0 and t"urin%1 in e2chan%e ."r a #ercenta%e ". the #r".its.r"m al&um sales0 sin%les and c"ncert tic$ets*Un."rtunately0 the success ". an artist3s release is hi%hly uncertain4 asin%le may &e e2tremely #"#ular0 resultin% in wides#read radi" #lay and

    di%ital d"wnl"ads0 while an"ther sin%le may turn "ut 5uite un#"#ular0and there."re un#r".ita&le*6n"win% the c"m#etiti!e nature ". the rec"rdin% industry0 rec"rd la&els.ace the .undamental decisi"n #r"&lem ". which musical releases t"su##"rt t" ma2imi7e their .inancial success*8"w can we use analytics t" #redict the #"#ularity ". a s"n%9 In thisassi%nment0 we challen%e "ursel!es t" #redict whether a s"n% will reacha s#"t in the T"# (: ". the ;ill&"ard 8"t (:: Chart*

    Ta$in% an analytics a##r"ach0 we aim t" use in."rmati"n a&"ut a s"n%3s#r"#erties t" #redict its #"#ularity* The dataset s"n%s*cs!c"nsists ". alls"n%s which made it t" the T"# (: ". the ;ill&"ard 8"t (:: Chart .r"m(

  • 7/24/2019 R assignment

    2/32

    timesignatureand timesignature_confidence? a !aria&le

    estimatin% the time si%nature ". the s"n%0 and the c"n.idence in theestimate

    loudness? a c"ntinu"us !aria&le indicatin% the a!era%e am#litude

    ". the audi" in deci&els

    tempoand tempo_confidence? a !aria&le indicatin% the

    estimated &eats #er minute ". the s"n%0 and the c"n.idence in theestimate

    keyand key_confidence? a !aria&le with twel!e le!els indicatin%

    the estimated $ey ". the s"n% /C0 C@0 * * *0 ;10 and the c"n.idence in theestimate

    energy? a !aria&le that re#resents the "!erall ac"ustic ener%y ".

    the s"n%0 usin% a mi2 ". .eatures such as l"udness

    pitch? a c"ntinu"us !aria&le that indicates the #itch ". the s"n%

    timbre_0_min0 timbre_0_max0 timbre_1_min0 timbre_1_max0 *

    * * 0 timbre_11_min0 and timbre_11_max? !aria&les that indicate theminimumma2imum !alues "!er all se%ments ."r each ". the twel!e!alues in the tim&re !ect"r /resultin% in ,B c"ntinu"us !aria&les1

    Top10? a &inary !aria&le indicatin% whether "r n"t the s"n% made

    it t" the T"# (: ". the ;ill&"ard 8"t (:: Chart /( i. it was in the t"# (:0and : i. it was n"t1Use the read*cs! .uncti"n t" l"ad the dataset s"n%s*cs! int" R*8"w many "&ser!ati"ns /s"n%s1 are .r"m the year ,:(:9

    First0 na!i%ate t" the direct"ry "n y"ur c"m#uter c"ntainin% the .ile

    s"n%s*cs!* Y"u can l"ad the dataset &y usin% the c"mmand4

    s"n%s ? read*cs!/s"n%s*cs!1

    Then0 y"u can c"unt the num&er ". s"n%s .r"m ,:(: &y usin% the

    ta&le .uncti"n4

    373

  • 7/24/2019 R assignment

    3/32

    ta&le/s"n%s'year1

    8"w many s"n%s d"es the dataset include ."r which the artist name isMichael ac$s"n9

    I. y"u l""$ at the structure ". the dataset &y ty#in% str/s"n%s10 y"u can

    see that there are (:, di..erent !alues ". the !aria&le artistname*

    S" i. we create a ta&le ". artistname0 it will &e challen%in% t" .ind

    Michael ac$s"n* Instead0 we can use su&set4

    Michaelac$s"n ? su&set/s"n%s0 artistname ?? Michael ac$s"n1

    Then0 &y ty#in% str/Michaelac$s"n1 "r nr"w/Michaelac$s"n10 we

    can see that there are (+ "&ser!ati"ns*

    =hich ". these s"n%s &y Michael ac$s"n made it t" the T"# (:9 Selectall that a##ly*

    Y"u R"c$ My ="rld0 Y"u Are >"t Al"ne0 c"rrect;eat It Y"u R"c$ My ="rld ;illie ean Y"u Are >"t Al"ne

    =e can answer this 5uesti"n &y usin% "ur su&set Michaelac$s"n .r"m

    the #re!i"us 5uesti"n* I. y"u "ut#ut the !ect"r

    Michaelac$s"n's"n%title0 y"u can see the r"w num&er ". each ". the

    s"n%s* Then0 y"u can see whether "r n"t that s"n% made it t" the t"#

    (: &y "ut#utin% the !alue ". T"#(: ."r that r"w* F"r e2am#le0 ;eat

    It is the (th s"n% in "ur su&set* S" then i. we ty#e4

    18

  • 7/24/2019 R assignment

    4/32

    Michaelac$s"n'T"#(:(G

    we %et :0 which means that this s"n% did n"t ma$e it t" the T"# (:*

    The s"n% Y"u R"c$ My ="rld is .irst "n the list0 s" i. we ty#e4

    Michaelac$s"n'T"#(:(G

    we %et (0 which means that this s"n% did ma$e it t" the T"# (:*

    As a sh"rtcut0 y"u c"uld Hust "ut#ut4

    Michaelac$s"nc/s"n%titleJ0 T"#(:J1G

    The !aria&le c"rres#"ndin% t" the estimated time si%nature/timesi%nature1 is discrete0 meanin% that it "nly ta$es inte%er !alues /:0(0 ,0 0 * * * 1* =hat are the !alues ". this !aria&le that "ccur in "urdataset9 Select all that a##ly*:0 (0 0 B0 )0 K0 c"rrect

    : ( , B ) K +

    =hich timesi%nature !alue is the m"st .re5uent am"n% s"n%s in "ur

    dataset9: ( , B B c"rrect ) K +

    Y"u can answer these 5uesti"ns &y usin% the ta&le c"mmand4

    ta&le/s"n%s'timesi%nature1

    The "nly !alues that a##ear in the ta&le ."r timesi%nature are :0 (0 0

    B0 )0 and K* =e can als" read .r"m the ta&le that K+K s"n%s ha!e a

    !alue ". B ."r the timesi%nature0 which is the hi%hest c"unt "ut ". all

    ". the #"ssi&le timesi%nature !alues*

  • 7/24/2019 R assignment

    5/32

    Out ". all ". the s"n%s in "ur dataset0 the s"n% with the hi%hest tem#" is"ne ". the ."ll"win% s"n%s* =hich "ne is it9

    Until The Day I Die =anna ;e Startin3 S"methin3 =anna ;e

    Startin3 S"methin3 c"rrect My 8a##y Endin% Y"u Ma$e Me

    =anna***

    Y"u can answer this 5uesti"n &y usin% the which*ma2 .uncti"n* The

    "ut#ut ". which*ma2/s"n%s'tem#"1 is ,:0 meanin% that the s"n%

    with the hi%hest tem#" is the r"w ,:* =e can "ut#ut the s"n% title

    &y ty#in%4

    s"n%s's"n%title,:G

    The s"n% title is4 =anna &e Startin3 S"methin3*

    =e wish t" #redict whether "r n"t a s"n% will ma$e it t" the T"# (:* T"d" this0 .irst use the su&set .uncti"n t" s#lit the data int" a trainin% setS"n%sTrain c"nsistin% ". all the "&ser!ati"ns u# t" and includin% ,::"tice that the * is used in #lace ". enumeratin% allthe inde#endent !aria&les* /Als"0 $ee# in mind that y"u can ch""se t"

    #ut 5u"tes ar"und &in"mial0 "r lea!e "ut the 5u"tes* R can understandthis ar%ument either way*18"we!er0 in "ur case0 we want t" e2clude s"me ". the !aria&les in "urdataset .r"m &ein% used as inde#endent !aria&les /year0 s"n%title0artistname0 s"n%ID0 and artistID1* T" d" this0 we can use the

    ."ll"win% tric$* First de.ine a !ect"r ". !aria&le names called n"n!ars these are the !aria&les that we w"n3t use in "ur m"del*n"n!ars ? c/year0 s"n%title0 artistname0 s"n%ID0 artistID1T" rem"!e these !aria&les .r"m y"ur trainin% and testin% sets0 ty#e the."ll"win% c"mmands in y"ur R c"ns"le4S"n%sTrain ? S"n%sTrain 0 /names/S"n%sTrain1 -in- n"n!ars1 G

  • 7/24/2019 R assignment

    7/32

    S"n%sTest ? S"n%sTest 0 /names/S"n%sTest1 -in- n"n!ars1 G>"w0 use the %lm .uncti"n t" &uild a l"%istic re%ressi"n m"del t" #redictT"#(: usin% all ". the "ther !aria&les as the inde#endent !aria&les* Y"ush"uld use S"n%sTrain t" &uild the m"del*

    L""$in% at the summary ". y"ur m"del0 what is the !alue ". the A$ai$eIn."rmati"n Criteri"n /AIC19

    T" answer this 5uesti"n0 y"u .irst need t" run the three %i!en

    c"mmands t" rem"!e the !aria&les that we w"n3t use in the m"del

    .r"m the datasets4

    n"n!ars ? c/year0 s"n%title0 artistname0 s"n%ID0 artistID1

    S"n%sTrain ? S"n%sTrain 0 /names/S"n%sTrain1 -in- n"n!ars1 G

    S"n%sTest ? S"n%sTest 0 /names/S"n%sTest1 -in- n"n!ars1 G

    Then0 y"u can create the l"%istic re%ressi"n m"

    del with the ."ll"win% c"mmand4

    S"n%sL"%( ? %lm/T"#(: N *0 data?S"n%sTrain0 .amily?&in"mial1

    L""$in% at the &"tt"m ". the summary/S"n%sL"%(1 "ut#ut0 we can

    see that the AIC !alue is B+,K*,*

    Let3s n"w thin$ a&"ut the !aria&les in "ur dataset related t" thec"n.idence ". the time si%nature0 $ey and tem#"/timesi%natureQc"n.idence0 $eyQc"n.idence0 and tem#"Qc"n.idence1*Our m"del seems t" indicate that these c"n.idence !aria&les aresi%ni.icant /rather than the !aria&les timesi%nature0 $ey and tem#"themsel!es1* =hat d"es the m"del su%%est9

    4827.2

  • 7/24/2019 R assignment

    8/32

    The l"wer "ur c"n.idence a&"ut time si%nature0 $ey and tem#"0 the

    m"re li$ely the s"n% is t" &e in the T"# (: The hi%her "ur c"n.idence

    a&"ut time si%nature0 $ey and tem#"0 the m"re li$ely the s"n% is t" &e inthe T"# (: The hi%her "ur c"n.idence a&"ut time si%nature0 $ey andtem#"0 the m"re li$ely the s"n% is t" &e in the T"# (: c"rrect

    I. y"u l""$ at the "ut#ut summary/m"del10 where m"del is the name

    ". y"ur l"%istic re%ressi"n m"del0 y"u can see that the c"e..icient

    estimates ."r the c"n.idence !aria&les /timesi%natureQc"n.idence0

    $eyQc"n.idence0 and tem#"Qc"n.idence1 are #"siti!e* This means that

    hi%her c"n.idence leads t" a hi%her #redicted #r"&a&ility ". a T"# (:

    hit*

    In %eneral0 i. the c"n.idence is l"w ."r the time si%nature0 tem#"0 and$ey0 then the s"n% is m"re li$ely t" &e c"m#le2* =hat d"es M"del (su%%est in terms ". c"m#le2ity9

    Mainstream listeners tend t" #re.er m"re c"m#le2 s"n%s

    Mainstream listeners tend t" #re.er less c"m#le2 s"n%s Mainstreamlisteners tend t" #re.er less c"m#le2 s"n%s c"rrect

    Since the c"e..icient !alues ."r timesi%natureQc"n.idence0

    tem#"Qc"n.idence0 and $eyQc"n.idence are all #"siti!e0 l"wer

    c"n.idence leads t" a l"wer #redicted #r"&a&ility ". a s"n% &ein% a hit*

    S" mainstream listeners tend t" #re.er less c"m#le2 s"n%s*

    S"n%s with hea!ier instrumentati"n tend t" &e l"uder /ha!e hi%her!alues in the !aria&le l"udness1 and m"re ener%etic /ha!e hi%her!alues in the !aria&le ener%y1*

  • 7/24/2019 R assignment

    9/32

    ;y ins#ectin% the c"e..icient ". the !aria&le l"udness0 what d"esM"del ( su%%est9

    Mainstream listeners #re.er s"n%s with hea!y

    instrumentati"n Mainstream listeners #re.er s"n%s with hea!yinstrumentati"n c"rrect Mainstream listeners #re.er s"n%s with li%ht

    instrumentati"n;y ins#ectin% the c"e..icient ". the !aria&le ener%y0 d" we draw thesame c"nclusi"ns as a&"!e9 >" >" c"rrect

    The c"e..icient estimate ."r l"udness is #"siti!e0 meanin% that

    mainstream listeners #re.er l"uder s"n%s0 which are th"se with hea!ier

    instrumentati"n* 8"we!er0 the c"e..icient estimate ."r ener%y is

    ne%ati!e0 meanin% that mainstream listeners #re.er s"n%s that are less

    ener%etic0 which are th"se with li%ht instrumentati"n* These

    c"e..icients lead us t" di..erent c"nclusi"ns

    =hat is the c"rrelati"n &etween the !aria&les l"udness and ener%y inthe trainin% set9

    :

    The c"rrelati"n can &e c"m#uted with the ."ll"win% c"mmand4

    c"r/S"n%sTrain'l"udness0 S"n%sTrain'ener%y1

    i!en that these tw" !aria&les are hi%hly c"rrelated0 M"del ( su..ers.r"m multic"llinearity* T" a!"id this issue0 we will "mit "ne ". these tw"!aria&les and rerun the l"%istic re%ressi"n* In the rest ". this #r"&lem0we3ll &uild tw" !ariati"ns ". "ur "ri%inal m"del4 M"del ,0 in which we

    No

    0.7399067

  • 7/24/2019 R assignment

    10/32

    $ee# ener%y and "mit l"udness0 and M"del 0 in which we $ee#l"udness and "mit ener%y*Create M"del ,0 which is M"del ( with"ut the inde#endent !aria&lel"udness* This can &e d"ne with the ."ll"win% c"mmand4

    S"n%sL"%, ? %lm/T"#(: N * l"udness0 data?S"n%sTrain0.amily?&in"mial1=e Hust su&tracted the !aria&le l"udness* =e c"uldn3t d" this with the!aria&les s"n%title and artistname0 &ecause they are n"t numeric!aria&les0 and we mi%ht %et di..erent !alues in the test set that thetrainin% set has ne!er seen* ;ut this a##r"ach /su&tractin% the !aria&le.r"m the m"del ."rmula1 will always w"r$ when y"u want t" rem"!enumeric !aria&les*

    L""$ at the summary ". S"n%sL"%,0 and ins#ect the c"e..icient ". the!aria&le ener%y* =hat d" y"u "&ser!e9

    M"del , su%%ests that s"n%s with hi%h ener%y le!els tend t" &e m"re

    #"#ular* This c"ntradicts "ur "&ser!ati"n in M"del (* M"del , su%%eststhat s"n%s with hi%h ener%y le!els tend t" &e m"re #"#ular* Thisc"ntradicts "ur "&ser!ati"n in M"del (* c"rrect M"del , su%%ests

    that0 similarly t" M"del (0 s"n%s with l"w ener%y le!els tend t" &e m"re

    #"#ular*

    The c"e..icient estimate ."r ener%y is #"siti!e in M"del ,0 su%%estin%

    that s"n%s with hi%her ener%y le!els tend t" &e m"re #"#ular*

    8"we!er0 n"te that the !aria&le ener%y is n"t si%ni.icant in this m"del*

    >"w0 create M"del 0 which sh"uld &e e2actly li$e M"del (0 &ut with"utthe !aria&le ener%y*L""$ at the summary ". M"del and ins#ect the c"e..icient ". the!aria&le l"udness* Remem&erin% that hi%her l"udness and ener%y &"th"ccur in s"n%s with hea!ier instrumentati"n0 d" we ma$e the same"&ser!ati"n a&"ut the #"#ularity ". hea!y instrumentati"n as we didwith M"del ,9

  • 7/24/2019 R assignment

    11/32

    Yes Yes c"rrect

    M"del can &e created with the ."ll"win% c"mmand4

    S"n%sL"% ? %lm/T"#(: N * ener%y0 data?S"n%sTrain0

    .amily?&in"mial1

    L""$in% at the "ut#ut ". summary/S"n%sL"%10 we can see that

    l"udness has a #"siti!e c"e..icient estimate0 meanin% that "ur m"del

    #redicts that s"n%s with hea!ier instrumentati"n tend t" &e m"re

    #"#ular* This is the same c"nclusi"n we %"t .r"m M"del ,*

    In the remainder ". this #r"&lem0 we3ll Hust use M"del *Ma$e #redicti"ns "n the test set usin% M"del * =hat is the accuracy ".M"del "n the test set0 usin% a thresh"ld ". :*B)9 /C"m#ute theaccuracy as a num&er &etween : and (*1

    Y"u can ma$e #redicti"ns "n the test set &y usin% the c"mmand4

    testPredict ? #redict/S"n%sL"%0 newdata?S"n%sTest0

    ty#e?res#"nse1

    Then0 y"u can create a c"n.usi"n matri2 with a thresh"ld ". :*B) &y

    usin% the c"mmand4

    ta&le/S"n%sTest'T"#(:0 testPredict ? :*B)1

    The accuracy ". the m"del is /:

  • 7/24/2019 R assignment

    12/32

    Let3s chec$ i. there3s any incremental &ene.it in usin% M"del instead ".a &aseline m"del* i!en the di..iculty ". %uessin% which s"n% is %"in% t"

    &e a hit0 an easier m"del w"uld &e t" #ic$ the m"st .re5uent "utc"me /as"n% is n"t a T"# (: hit1 ."r all s"n%s* =hat w"uld the accuracy ". the

    &aseline m"del &e "n the test set9 /i!e y"ur answer as a num&er&etween : and (*1

    Y"u can c"m#ute the &aseline accuracy &y ta&lin% the "utc"me

    !aria&le in the test set4

    ta&le/S"n%sTest'T"#(:1

    The &aseline m"del w"uld %et (B "&ser!ati"ns c"rrect0 and )"rtheast0

    S"uth0 "r =est1*

    !onser"ati"eness4 Sel.descri&ed le!el ". c"nser!ati!eness ".inter!iewee0 .r"m ( /!ery li&eral1 t" ) /!ery c"nser!ati!e1*

    Info.#n.Internet4 >um&er ". the ."ll"win% items this inter!iewee

    &elie!es t" &e a!aila&le "n the Internet ."r "thers t" see4 /(1 Their emailaddress /,1 Their h"me address /1 Their h"me #h"ne num&er /B1Their cell #h"ne num&er /)1 The em#l"yerc"m#any they w"r$ ."r /1Their #"litical #arty "r #"litical a..iliati"n /K1 Thin%s they3!e writtenthat ha!e their name "n it /+1 A #h"t" ". them /

  • 7/24/2019 R assignment

    16/32

    Anonymity.%ossible4 A &inary !aria&le indicatin% i. the

    inter!iewee thin$s it3s #"ssi&le t" use the Internet an"nym"usly0 meanin%in such a way that "nline acti!ities can3t &e traced &ac$ t" them /e5uals (i. heshe &elie!es y"u can0 and e5uals : i. heshe &elie!es y"u can3t1*

    Tried.&asking.Identity4 A &inary !aria&le indicatin% i. the

    inter!iewee has e!er tried t" mas$ hisher identity when usin% theInternet /e5uals ( i. heshe has tried t" mas$ hisher identity0 and e5uals: i. heshe has n"t tried t" mas$ hisher identity1*

    %ri"acy.'a(s.)ffecti"e4 A &inary !aria&le indicatin% i. the

    inter!iewee &elie!es United States law #r"!ides reas"na&le #ri!acy#r"tecti"n ."r Internet users /e5uals ( i. heshe &elie!es it d"es0 ande5uals : i. heshe &elie!es it d"esn3t1*

    Usin% read*cs!/10 l"ad the dataset .r"m An"nymityP"ll*cs!int" a data

    .rame called #"ll and summari7e it with the summary/1 and str/1

    .uncti"ns*

    8"w many #e"#le #artici#ated in the #"ll9

    (

    The num&er ". #e"#le wh" t""$ the #"ll is e5ual t" the num&er ".

    r"ws ". the data .rame0 and can &e "&tained with nr"w/#"ll1 "r .r"m

    the "ut#ut ". str/#"ll1*

    Let3s l""$ at the &rea$d"wn ". the num&er ". #e"#le with smart#h"nes

    usin% the ta&le/1 and summary/1 c"mmands "n the Smart#h"ne !aria&le*

    /8I>T4 These three num&ers sh"uld sum t" (::,*1

    1002

    https://courses.edx.org/c4x/MITx/15.071x_2/asset/AnonymityPoll.csvhttps://courses.edx.org/c4x/MITx/15.071x_2/asset/AnonymityPoll.csv
  • 7/24/2019 R assignment

    17/32

    8"w many inter!iewees res#"nded that they use a smart#h"ne9

    8"w many inter!iewees res#"nded that they d"n3t use a smart#h"ne9

    B

    8"w many inter!iewees did n"t res#"nd t" the 5uesti"n0 resultin% in a

    missin% !alue0 "r >A0 in the summary/1 "ut#ut9

    Fr"m the "ut#ut ". ta&le/#"ll'Smart#h"ne10 we can read that B+K

    inter!iewees use a smart#h"ne and BK, d" n"t* Fr"m the

    summary/#"ll'Smart#h"ne1 "ut#ut0 we see that an"ther B had

    missin% !alues* As a sanity chec$0 B+KBK,B?(::,0 the t"tal

    num&er ". inter!iewees*

    ;y usin% the ta&le/1 .uncti"n "n tw" !aria&les0 we can tell h"w they are

    related* T" use the ta&le/1 .uncti"n "n tw" !aria&les0 Hust #ut the tw"

    !aria&le names inside the #arentheses0 se#arated &y a c"mma /d"n3t

    ."r%et t" add #"ll' &e."re each !aria&le name1* In the "ut#ut0 the

    #"ssi&le !alues ". the .irst !aria&le will &e listed in the le.t0 and the

    #"ssi&le !alues ". the sec"nd !aria&le will &e listed "n the t"#* Each

    487

    472

    43

  • 7/24/2019 R assignment

    18/32

    entry ". the ta&le c"unts the num&er ". "&ser!ati"ns in the data set that

    ha!e the !alue ". the .irst !alue in that r"w0 and the !alue ". the sec"nd

    !aria&le in that c"lumn* F"r e2am#le0 su##"se we want t" create a ta&le

    ". the !aria&les Se2 and Re%i"n* =e w"uld ty#e

    ta&le/#"ll'Se20 #"ll'Re%i"n1

    in "ur R C"ns"le0 and we w"uld %et as "ut#ut

    Midwest >"rtheast S"uth =est

    Female (,

  • 7/24/2019 R assignment

    19/32

    Te2as Te2as

    c"rrect

    Fr"m ta&le/#"ll'State0 #"ll'Re%i"n10 we can identi.y the census

    re%i"n ". a #articular state &y l""$in% at the re%i"n ass"ciated with all

    its inter!iewees* =e can read that C"l"rad" is in the =est re%i"n0

    6entuc$y is in the S"uth re%i"n0 Pennsyl!ania is in the >"rtheast

    re%i"n0 &ut the "ther three states are all in the Midwest re%i"n* Fr"m

    the same chart we can read that Te2as is the state in the S"uth re%i"n

    with the lar%est num&er ". inter!iewees0 K,*

    An"ther way t" a##r"ach these #r"&lems w"uld ha!e &een t" su&set

    the data .rame and then use ta&le "n the limited data .rame* F"r

    instance0 t" .ind which states are in the Midwest re%i"n we c"uld ha!e

    used4

    MidwestInter!iewees ? su&set/#"ll0 Re%i"n??Midwest1

    ta&le/MidwestInter!iewees'State1

    Texas

  • 7/24/2019 R assignment

    20/32

    and t" .ind the num&er ". inter!iewees .r"m each S"uth re%i"n state

    we c"uld ha!e used4

    S"uthInter!iewees ? su&set/#"ll0 Re%i"n??S"uth1

    ta&le/S"uthInter!iewees'State1

    As menti"ned in the intr"ducti"n t" this #r"&lem0 many ". the res#"nse

    !aria&les /In."*On*Internet0 ="rry*A&"ut*In."0 Pri!acy*Im#"rtance0

    An"nymity*P"ssi&le0 and Tried*Mas$in%*Identity1 were n"t c"llected i.

    an inter!iewee d"es n"t use the Internet "r a smart#h"ne0 meanin% the

    !aria&les will ha!e missin% !alues ."r these inter!iewees*

    8"w many inter!iewees re#"rted n"t ha!in% used the Internet and n"t

    ha!in% used a smart#h"ne9

    8"w many inter!iewees re#"rted ha!in% used the Internet and ha!in%

    used a smart#h"ne9

    8"w many inter!iewees re#"rted ha!in% used the Internet &ut n"t ha!in%

    used a smart#h"ne9

    186

    470

    285

  • 7/24/2019 R assignment

    21/32

    ,

    8"w many inter!iewees re#"rted ha!in% used a smart#h"ne &ut n"t

    ha!in% used the Internet9

    These ."ur !alues can &e read .r"m ta&le/#"ll'Internet*Use0

    #"ll'Smart#h"ne1

    8"w many inter!iewees ha!e a missin% !alue ."r their Internet use9

    (

    8"w many inter!iewees ha!e a missin% !alue ."r their smart#h"ne use9

    B

    The num&er ". missin% !alues can &e read .r"m summary/#"ll1

    *ide Ans(erYou have used 3 of 3 submissions

    PROBLEM 2.3 - INTERNET AND SMARTPHONE USERS

    17

    1

    43

  • 7/24/2019 R assignment

    22/32

    Use the su&set .uncti"n t" "&tain a data .rame called limited0 which is

    limited t" inter!iewees wh" re#"rted Internet use "r wh" re#"rted

    smart#h"ne use* In lecture0 we used the sym&"l t" use tw" criteria t"ma$e a su&set ". the data* T" "nly ta$e "&ser!ati"ns that ha!e a certain

    !alue in "ne !aria&le "r the "ther0 the V character can &e used in #lace ".

    the sym&"l* This is als" called a l"%ical "r "#erati"n*

    8"w many inter!iewees are in the new data .rame9

    K

    The new data .rame can &e c"nstructed with4

    limited ? su&set/#"ll0 Internet*Use ?? ( V Smart#h"ne ?? (1

    The num&er ". r"ws can &e c"m#uted with nr"w/limited1*

    *ide Ans(erYou have used 3 of 3 submissions

    Important4 F"r all remainin% 5uesti"ns in this assi%nment #lease use the

    limited data .rame y"u created in Pr"&lem ,**

    PROBLEM 3.1 - SUMMARIZING OPINIONS ABOUT

    INTERNET PRIVACY

    792

  • 7/24/2019 R assignment

    23/32

    =hich !aria&les ha!e missin% !alues in the limited data .rame9 /Select

    all that a##ly*1

    Smart#h"ne0 A%e0 C"nser!ati!eness0 ="rry*A&"ut*In."0

    Pri!acy*Im#"rtance0 An"nymity*P"ssi&le0 Tried*Mas$in%*Identity0Pri!acy*Laws*E..ecti!e0 c"rrect

    Internet*Use Smart#h"ne Se2 A%e State Re%i"n C"n

    ser!ati!eness In."*On*Internet ="rry*A&"ut*In." Pri!acy*Im#"r

    tance An"nymity*P"ssi&le Tried*Mas$in%*Identity Pri!acy*Law

    s*E..ecti!e

    ),%'A-ATI#-

    Y"u can read the num&er ". missin% !alues ."r each !aria&le .r"m

    summary/limited1

    =hat is the a!era%e num&er ". #ieces ". #ers"nal in."rmati"n "n the

    Internet0 acc"rdin% t" the In."*On*Internet !aria&le9

    This can &e "&tained with mean/limited'In."*On*Internet1 "r

    summary/limited'In."*On*Internet1

    8"w many inter!iewees re#"rted a !alue ". : ."r In."*On*Internet9

    3.795455

  • 7/24/2019 R assignment

    24/32

    (

    8"w many inter!iewees re#"rted the ma2imum !alue ". (( ."r

    In."*On*Internet9

    ),%'A-ATI#-

    These can &e read .r"m ta&le/limited'In."*On*Internet1

    =hat #r"#"rti"n ". inter!iewees wh" answered the ="rry*A&"ut*In."

    5uesti"n w"rry a&"ut h"w much in."rmati"n is a!aila&le a&"ut them "n

    the Internet9 >"te that t" c"m#ute this #r"#"rti"n y"u will &e di!idin%

    &y the num&er ". #e"#le wh" answered the ="rry*A&"ut*In." 5uesti"n0

    n"t the t"tal num&er ". #e"#le in the data .rame*

    Fr"m ta&le/limited'="rry*A&"ut*In."10 we see that + ".

    inter!iewees w"rry a&"ut their in."0 and B:B d" n"t* There."re0 there

    were +B:B?K

  • 7/24/2019 R assignment

    25/32

    >"te that we did n"t di!ide &y K

  • 7/24/2019 R assignment

    26/32

    This can &e c"m#uted with the c"mmand

    ta&le/limited'Tried*Mas$in%*Identity1* The "ut#ut tells us that ". all

    the res#"ndents wh" answered the Tried*Mas$in%*Identity 5uesti"n0

    (,+ "ut ". /(,+)1 ha!e tried mas$in% their identity "n the internet*

    =hat #r"#"rti"n ". inter!iewees wh" answered the

    Pri!acy*Laws*E..ecti!e 5uesti"n .ind United States #ri!acy laws

    e..ecti!e9

    =e can .ind this num&er with the c"mmand

    ta&le/limited'Pri!acy*Laws*E..ecti!e1* The "ut#ut tells us that (+ "ut

    ". /(+)B(1 #e"#le wh" answered the Pri!acy*Laws*E..ecti!e

    5uesti"n .ind US #ri!acy laws e..ecti!e*

    O.ten0 we are interested in whether certain characteristics ".

    inter!iewees /e*%* their a%e "r #"litical "#ini"ns1 a..ect their "#ini"ns "n

    the t"#ic ". the #"ll /in this case0 "#ini"ns "n #ri!acy1* In this secti"n0

    we will in!esti%ate the relati"nshi# &etween the characteristics A%e andSmart#h"ne and "utc"me !aria&les In."*On*Internet and

    Tried*Mas$in%*Identity0 a%ain usin% the limited data .rame we &uilt in an

    earlier secti"n ". this #r"&lem*

    0.2558459

  • 7/24/2019 R assignment

    27/32

    ;uild a hist"%ram ". the a%e ". inter!iewees* =hat is the &est

    re#resented a%e %r"u# in the #"#ulati"n9

    Pe"#le a%ed a&"ut ,: years "ld Pe"#le a%ed a&"ut B: years "ld

    Pe"#le a%ed a&"ut : years "ld Pe"#le a%ed a&"ut : years "ld c"rrectPe"#le a%ed a&"ut +: years "ld

    Fr"m hist/limited'A%e10 we see the hist"%ram #ea$s at ar"und :

    years "ld*

    ;"th A%e and In."*On*Internet are !aria&les that ta$e "n many !alues0 s"

    a %""d way t" "&ser!e their relati"nshi# is thr"u%h a %ra#h* =e learned

    in lecture that we can #l"t A%e a%ainst In."*On*Internet with the

    c"mmand #l"t/limited'A%e0 limited'In."*On*Internet1* 8"we!er0

    &ecause In."*On*Internet ta$es "n a small num&er ". !alues0 multi#le

    #"ints can &e #l"tted in e2actly the same l"cati"n "n this %ra#h*

    =hat is the lar%est num&er ". inter!iewees that ha!e e2actly the same

    !alue in their A%e !aria&le A>D the same !alue in their In."*On*Internet

    !aria&le9 In "ther w"rds0 what is the lar%est num&er ". "!erla##in%

    #"ints in the #l"t #l"t/limited'A%e0 limited'In."*On*Internet19 /8I>T4

    Use the ta&le .uncti"n t" c"m#are the num&er ". "&ser!ati"ns with

    di..erent !alues ". A%e and In."*On*Internet*1

    6

  • 7/24/2019 R assignment

    28/32

    ;y re!iewin% the "ut#ut ". ta&le/limited'A%e0

    limited'In."*On*Internet10 we can see that there are inter!iewees

    with a%e ) and In."*On*Internet !alue :0 with a%e : and

    In."*On*Internet !alue :0 and with a%e : and In."*On*Internet !alue

    (*

    A m"re e..icient way t" ha!e "&tained the ma2imum num&er w"uld

    ha!e &een t" run ma2/ta&le/limited'A%e0 limited'In."*On*Internet11

    T" a!"id #"ints c"!erin% each "ther u#0 we can use the Hitter/1 .uncti"n

    "n the !alues we #ass t" the #l"t .uncti"n* E2#erimentin% with the

    c"mmand Hitter/c/(0 ,0 110 what a##ears t" &e the .uncti"nality ". the

    Hitter c"mmand9

    Hitter rand"mly re"rders the !alues #assed t" it0 and tw" runs will

    yield the same result Hitter rand"mly re"rders the !alues #assed t" it0

    and tw" runs will yield di..erent results Hitter adds "r su&tracts a small

    am"unt ". rand"m n"ise t" the !alues #assed t" it0 and tw" runs willyield the same result Hitter adds "r su&tracts a small am"unt ". rand"m

    n"ise t" the !alues #assed t" it0 and tw" runs will yield di..erentresults Hitter adds "r su&tracts a small am"unt ". rand"m n"ise t" the!alues #assed t" it0 and tw" runs will yield di..erent results c"rrect

  • 7/24/2019 R assignment

    29/32

    ;y runnin% the c"mmand Hitter/c/(0 ,0 11 multi#le times0 we can see

    that the Hitter .uncti"n rand"mly adds "r su&tracts a small !alue .r"m

    each num&er0 and tw" runs will yield di..erent results*

    >"w0 #l"t A%e a%ainst In."*On*Internet with #l"t/Hitter/limited'A%e10

    Hitter/limited'In."*On*Internet11* =hat relati"nshi# t" y"u "&ser!e

    &etween A%e and In."*On*Internet9

    Older a%e seems str"n%ly ass"ciated with a lar%er !alue ."r

    In."*On*Internet Older a%e seems m"derately ass"ciated with a lar%er

    !alue ."r In."*On*Internet Older a%e d"es n"t seem ass"ciated with a

    chan%e in the !alue ". In."*On*Internet Older a%e seems m"derately

    ass"ciated with a smaller !alue ."r In."*On*Internet Older a%e seemsm"derately ass"ciated with a smaller !alue ."r In."*On*Internet c"rrect

    Older a%e seems str"n%ly ass"ciated with a smaller !alue ."r

    In."*On*Internet

    F"r y"un%er #e"#le a%ed (+:0 the a!era%e !alue ". In."*On*Internet

    a##ears t" &e r"u%hly )0 while m"st #e"#led a%ed : and "lder ha!e a

    !alue less than )* There."re0 "lder a%e a##ears t" &e ass"ciated with a

    smaller !alue ". In."*On*Internet0 &ut .r"m the s#read ". d"ts "n the

    ima%e0 it3s clear the ass"ciati"n is n"t #articularly str"n%*

  • 7/24/2019 R assignment

    30/32

    Use the ta##ly/1 .uncti"n t" "&tain the summary ". the In."*On*Internet

    !alue0 &r"$en d"wn &y whether an inter!iewee is a smart#h"ne user*

    =hat is the a!era%e In."*On*Internet !alue ."r smart#h"ne users9

    B

    =hat is the a!era%e In."*On*Internet !alue ."r n"nsmart#h"ne users9

    The #r"#er a##licati"n ". ta##ly here is4

    ta##ly/limited'In."*On*Internet0 limited'Smart#h"ne0 summary1

    =e can read the a!era%e ."r n"nsmart#h"ne users .r"m the summary

    "ut#ut la&eled with : and the a!era%e ."r smart#h"ne users .r"m

    the summary "ut#ut la&eled with ( *

    Similarly use ta##ly t" &rea$ d"wn the Tried*Mas$in%*Identity !aria&le

    ."r smart#h"ne and n"nsmart#h"ne users*

    =hat #r"#"rti"n ". smart#h"ne users wh" answered the

    Tried*Mas$in%*Identity 5uesti"n ha!e tried mas$in% their identity when

    usin% the Internet9

    4.367556

    2.922807

  • 7/24/2019 R assignment

    31/32

    =hat #r"#"rti"n ". n"nsmart#h"ne users wh" answered the

    Tried*Mas$in%*Identity 5uesti"n ha!e tried mas$in% their identity whenusin% the Internet9

    =e can %et the &rea$d"wn ."r smart#h"ne and n"nsmart#h"ne users

    with4

    ta##ly/limited'Tried*Mas$in%*Identity0 limited'Smart#h"ne0 ta&le1

    Am"n% smart#h"ne users0

  • 7/24/2019 R assignment

    32/32

    >e2t wee$0 we will &e%in t" m"re ."rmally characteri7e h"w an "utc"me

    !aria&le li$e In."*On*Internet can &e #redicted with a !aria&le li$e A%e

    "r Smart#h"ne*