Auditing Algorithms : Towards Transparency in the Age of ...

42
Auditing Algorithms : Towards Transparency in the Age of Big Data Christo Wilson Assistant Professor @ Northeastern University [email protected]

Transcript of Auditing Algorithms : Towards Transparency in the Age of ...

Page 1: Auditing Algorithms : Towards Transparency in the Age of ...

AuditingAlgorithms:TowardsTransparencyintheAgeofBigData

ChristoWilsonAssistantProfessor@[email protected]

Page 2: Auditing Algorithms : Towards Transparency in the Age of ...

PersonalizationontheWebSantaBarbara,California Amherst,Massachusetts

Page 3: Auditing Algorithms : Towards Transparency in the Age of ...

PersonalizationisUbiquitousSearchResults

GoodsandServices

Music,Movies,Media

SocialMedia

Page 4: Auditing Algorithms : Towards Transparency in the Age of ...

DangersofPersonalization?

Page 5: Auditing Algorithms : Towards Transparency in the Age of ...

RacialDiscriminationChrisWilson

LookingforChrisWilson?Ad

FindPeopleNearYou!www.yellowpages.com

TrevonJones

TrevonJones,Arrested?Ad

SearchCriminalRecords,SexOffenderRegistry,andMore.

www.instantcheckmate.com

RacialbiasinGoogle’sAdSensesystemuncoveredbyLatanya Sweeneyin2013

Exampleofunintendedconsequences ofbigdataPeopleexhibitracialbiasintheirsearchandclickspatternsThead-placementalgorithmobservedandlearnedthesebehaviors

Page 6: Auditing Algorithms : Towards Transparency in the Age of ...

PriceDiscriminationShowingusersdifferentpricesInecon:differentialpricing

Example:Amazonin2001DVDsweresoldfor$3-4moretosomeusers

Surprisingly,notillegalintheUSAnti-DiscriminationActdoesnotprotectconsumers

Article20(2)oftheServicesDirectiveprotectsEUresidentsButcompaniesseemtobeflauntingtheregulation:(

WebsitesVaryPrices,DealsBasedonUsers’Information

Page 7: Auditing Algorithms : Towards Transparency in the Age of ...

PriceSteeringAlteringtheorderorcompositionofproductsE.g.highpriceditemsrankhigherforsomepeople

Example:Orbitz in2012UsersreceivedhotelsinadifferentorderwhensearchingNormalusers:cheaphotelsfirst;Macusers:expensivehotelsfirst

OnOrbitz,MacUsersSteeredtoPricierHotels

Page 8: Auditing Algorithms : Towards Transparency in the Age of ...

AuditingAlgorithmsGovernmentsandregulatorsareconcernedaboutbigdataandalgorithmsWhiteHousereports:BigData:SeizingOpportunities,PreservingValuesBigDataandDifferentialPricing

FTC’snewOfficeofTechnologyResearchandInvestigationTaskedwithmonitoringtheapplicationsofbigdataandalgorithms

Howdowemeasureandunderstandalgorithms?Algorithmsmaybetradesecrets,constantlychangingAccesstosourcecodeisnotenough,dataisequallyimportant

Emergingscientificarea:AuditingAlgorithms

Page 9: Auditing Algorithms : Towards Transparency in the Age of ...

GoalsofOurWork

1. UnderstandinghowcompaniescollectandsharedataaboutusersOnlineandofflineretailersAdvertisersandmarketersDatabrokerslikeAcxiom,Datalogix,Equifax,Experian,etc…

2. Reverse-engineeringonlinealgorithmstoassesstheirimpactSearchenginesOnlineadvertisementsE-commerceSocialnetworksetc…

Page 10: Auditing Algorithms : Towards Transparency in the Age of ...

MeasuringPersonalizationCaseStudy:GoogleSearchCaseStudy:E-commerce

Page 11: Auditing Algorithms : Towards Transparency in the Age of ...

MeasuringPersonalizationCaseStudy:E-commerce

Page 12: Auditing Algorithms : Towards Transparency in the Age of ...

AreAllDifferencesPersonalization?

Product1Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product2Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product4Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis

Product3Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product2Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product1Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product3Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis

Product4Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Compare

Notnecessarily! Itcouldbe:• Updatestoinventory/prices• Tax/Shippingdifferences• Distributedinfrastructure• Load-balancing

Howcanwereliablyidentifyandquantifypersonalization?

Personalization?

Page 13: Auditing Algorithms : Towards Transparency in the Age of ...

ControllingforNoise

129.10.115.14

129.10.115.15 74.125.225.67

Product 1Lorem ipsum dolor sit amet, consectetur adipiscing elit. In mollis

Product 2Lorem ipsum dolor sit amet, consectetur adipiscing elit. In mollis

Queriesrunatthesametime

SameAmazonIPaddress

129.10.115.16

Product 2Lorem ipsum dolor sit amet, consectetur adipiscing elit. In mollis

Noise

Difference – Noise = Personalization

IPaddressesinthesame/24

Page 14: Auditing Algorithms : Towards Transparency in the Age of ...

DualMethodology

REALUSERACCOUNTS

Leveragerealuseraccountswithlotsofhistory

Measurepersonalizationinreallife

SYNTHETICUSERACCOUNTS

Createaccountsthateachvarybyonefeature

Measuretheimpactofspecificfeatures

Questionswewanttoanswer:1. Towhatextentiscontentpersonalized?2. Whatuserfeaturesdrivepersonalization?

Page 15: Auditing Algorithms : Towards Transparency in the Age of ...

RealUserExperiment

TaskonAmazonMechanicalTurk(AMT)Over1000sofparticipantsEachexecutedhundredsofsearchqueriesEveryquerypairedwithtwocontrolqueriesRunfromemptyaccounts,i.e.nohistoryBaselineresultsforcomparison

HTTPProxy

UserQuery

UserQueryControlQuery

ControlQuery

Page 16: Auditing Algorithms : Towards Transparency in the Age of ...

MeasuringPersonalizationCaseStudy:GoogleSearchCaseStudy:E-commerce

Page 17: Auditing Algorithms : Towards Transparency in the Age of ...

ResultsfromRealUsers

0

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8 9 10

ResultsChanged(%

)

SearchResultRank

Control/Control

RealUser/Control Differencebetweenresultsispersonalization

Topranksarelesspersonalized

Lowerranksaremorepersonalized

• Onaverage,realusershavea12%higherchanceofdifferingthanthecontrols• Mostchangesareduetolocation

Page 18: Auditing Algorithms : Towards Transparency in the Age of ...

WhatCausesofPersonalization?

HistoricalFeatures• LoggedIn/Out• HistoryofSearches• HistoryofSearchResultClicks• BrowsingHistory

AMTresultsrevealextensivepersonalizationNextquestion:whatuserfeaturesdrivethis?

StaticFeatures• Gender• Age• Browser• OperatingSystem• Location(IPAddress)• LoggedIn/Out

Methodology:usesynthetic(fake)accounts

Page 19: Auditing Algorithms : Towards Transparency in the Age of ...

LoggedIn/OuttoGoogle

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7

Ave

rage

Jac

card

Inde

x

Day

No Cookies / No Cookies

Logged In / No Cookies

Logged Out / No Cookies

0

1

2

3

4

5

1 2 3 4 5 6 7A

vera

ge E

dit D

ista

nce

Day

Sameresults…Butina

differentorder

Page 20: Auditing Algorithms : Towards Transparency in the Age of ...

IPAddressGeolocation

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7

Jacc

ard

Inde

x

Days

MA / MACA / MAUT / MAIL / MANC / MA

0

1

2

3

4

5

1 2 3 4 5 6 7

Ave

rage

Edi

t Dis

tanc

eDay

Onaverage,1differentresult

…Plus1pairofreorderedresults

Page 21: Auditing Algorithms : Towards Transparency in the Age of ...

WhatAboutSearchHistory?Searchfor‘healthcare’ Searchfor‘obama,’ then‘healthcare’

Subsequentqueriesmay“carry-over”

Page 22: Auditing Algorithms : Towards Transparency in the Age of ...

ImpactofSearchHistory

00.10.20.30.40.50.60.70.80.91

0 2.5 5 7.5 10 12.5 15 17.5 20

AverageJaccardIndex

TimeBetweenQueries(Minutes)

OverlapinResults,Searchingfor‘healthcare’and‘obama’+‘healthcare’

10minutecutoff

Page 23: Auditing Algorithms : Towards Transparency in the Age of ...

MeasuringPersonalizationCaseStudy:GoogleSearchCaseStudy:E-commerce

Page 24: Auditing Algorithms : Towards Transparency in the Age of ...

MeasuringPersonalizationCaseStudy:E-commerce

Page 25: Auditing Algorithms : Towards Transparency in the Age of ...

TargetedRetailers10Generalretailers

BestBuyCDWHomeDepot JCPenney Macy’sNewEgg OfficeDepot SearsStaplesWalmart

Focusonproductsreturnedbysearches,20searchterms/site

6travelsites(hotels&carrental)CheapTickets Expedia Hotels.comPricelineOrbitz Travelocity

Page 26: Auditing Algorithms : Towards Transparency in the Age of ...

DoUsersSeetheSamePricesfortheSameProducts?

Manysitesshowinconsistencies forrealusersUpto3.6%ofallproducts

Retailers Hotels RentalCars

%ofP

roducts

InconsistentPrices

Page 27: Auditing Algorithms : Towards Transparency in the Age of ...

0

200

400

600

800

1000Differencein$

95th

75th

mean

25th

5th

HowMuchMoneyAreWeTalkingAbout?

Inconsistenciescanbe$100s!(perday/nightforhotels/cars)

Retailers Hotels RentalCars

Page 28: Auditing Algorithms : Towards Transparency in the Age of ...

WhatFeaturesTriggerPersonalization?Methodology:usesynthetic(fake)accountsGivethemdifferentfeatures,lookforpersonalizationEachdayfor1month,runstandardsetofsearches

Category Feature TestedFeatures

Account Cookie NoAccount,LoggedIn,NoCookies

User-AgentOS WinXP,Win7,OSX,Linux

BrowserChrome33,AndroidChrome34,IE8,Firefox25,Safari7,iOSSafari6

HistoryClick BigSpender,LowSpender

Purchase BigSpender,LowSpender

Page 29: Auditing Algorithms : Towards Transparency in the Age of ...

HomeDepotSmartphoneusersseetotallydifferent

productsthandesktopusers

7%ofproductshavedifferentpricesonAndroid

…butthepricesonlygoupby$0.50onaverage

Page 30: Auditing Algorithms : Towards Transparency in the Age of ...

TravelSitesCheaptickets andOrbitz offerlowerpricesonhotelsforuserswholog-intothesites1hotelperpage,$12offpernightonaverage

Travelocityoffersdiscountsonhotelsforusersonmobiledevices1hotelperpage,$15offpernightonaverage

Pricelinechangestheorderofsearchresultsbasedonclickandpurchasehistory

Exampleofpricesteering• 2accountsclick/reservehighpricehotels• 2accountsclick/reservelowpricehotels• 2accountsdonothing

Page 31: Auditing Algorithms : Towards Transparency in the Age of ...

Cheaptickets/Orbitz

Page 32: Auditing Algorithms : Towards Transparency in the Age of ...

Cheaptickets/OrbitzCheaptickets andOrbitz offerlowerpricesonhotelsforuserswholog-intothesites

About1hotelperpagehasalowerprice

Pricesdropbyaround$12pernight

Avg.PriceDifference($)

Page 33: Auditing Algorithms : Towards Transparency in the Age of ...

Travelocity

iOSusersseedifferenthotels

About1hotelperpagehasalowerprice

Pricedropsbyaround$15/night

Travelocityoffersdiscountsonhotelsforusersonmobiledevices

Page 34: Auditing Algorithms : Towards Transparency in the Age of ...

PricelinePricelinechangestheorderofsearchresultsbasedonclickandpurchasehistory

• 2accountsclick/reservehighpricehotels• 2accountsclick/reservelowpricehotels• 2accountsdonothing

Page 35: Auditing Algorithms : Towards Transparency in the Age of ...

Hotels.com/ExpediaHotelsandExpediaareconductinglarge-scaleA/BtestsontheirusersWhenyouvisitthesite,youarerandomly placedina“bucket”2outof3bucketsseehigh-pricehotelsatthetopofsearchresultsTheremainingbucketseeslow-pricehotelsatthetopofthepage

ExemplifiespricesteeringTheonlywaytoseethehiddenhotelresultsistoclearyourcookiesandreloadthesite

Page 36: Auditing Algorithms : Towards Transparency in the Age of ...

ConclusionsandFutureWork

Page 37: Auditing Algorithms : Towards Transparency in the Age of ...

TheEraofBigDataAlgorithmsdrivenbybigdatashapeyourworldSearchresultsyouaregivenPricesandproductsyouareshownMovie,music,andbookrecommendationsThedirectionsyouusetodrive

Inmanycases,thesesystemsarewonderful

Inothercases,theymaybedetrimentalUnintendedconsequencesIntentionalmanipulation

EligibilityforsocialservicesAccesstocreditandbankingAllocationofpoliceforces

Page 38: Auditing Algorithms : Towards Transparency in the Age of ...

OurGoal:TransparencyPersonalizationisproblematicwhenitisnottransparentHowisdatabeingcollectedandshared?Howisdatabeingusedtoaltercontent?

Usealgorithmauditstoinvestigatedeployedsystems,assesstheirimpact

OurgoalistoincreasetransparencyBuilding toolstohelpusersandregulatorsReverse-engineeringsystemstounderstandhowtheyworkRaisingpublicawarenessoftheseissues

Page 39: Auditing Algorithms : Towards Transparency in the Age of ...

PeekingBeneaththeHoodofUber

Page 40: Auditing Algorithms : Towards Transparency in the Age of ...

BordersonGoogleMaps

Page 41: Auditing Algorithms : Towards Transparency in the Age of ...

DiscriminationintheGig-economy

Page 42: Auditing Algorithms : Towards Transparency in the Age of ...

Allofourcode,data,andpapersareavailableat:

http://personalization.ccs.neu.edu