Internet Application Censorship: Studies of Weibo in China and … · 2020-01-17 · User timeline:...

Post on 13-Jul-2020

5 views 0 download

Transcript of Internet Application Censorship: Studies of Weibo in China and … · 2020-01-17 · User timeline:...

Internet Application Censorship: Studies of Weibo in China and Twitter in Turkey Dan S. Wallach, Rice University

Today, two short studies Weibo:atonepointChina’slargestsocialnetwork.Twi(er:broadlyusedworldwide,par:cularlypopularintheMiddleEast.We’relookingatwebapplica4ons,notatnetworksorthe“GreatFirewallofChina”.

The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions

Tao Zhu, Independent Researcher David Phipps, Bowdoin College Adam Pridgen, Rice University Jedidiah R. Crandall, University of New Mexico Dan S. Wallach, Rice University

March 2006

July 2009

http://en.wikipedia.org/wiki/Microblogging_in_China

August 2009

Microblogging sites in China

Sina Weibo

● 503 million registered users as of Dec 2012. o More than half are from mobile devices.

● About 100 million messages are posted

each day on Sina Weibo. ● Promote visibility of social issues.

http://en.wikipedia.org/wiki/Sina_Weibo

Weibo’s influence: Wukan incident - 2011

(The village name) vs (Neologism)

Sina Weibo

● Strict controls over the posts.

Introduction of our research

● Detecting a censorship event within 1-2 minutes of its occurrence.

● Three strategies Weibo system uses to

target sensitive content quickly. ● Performing a topical analysis of the deleted

posts.

Methodology

1. Identifying the sensitive user group 2. Crawling posts of sensitive user groups 3. Detecting deletions

Identifying the sensitive user group ● Use outdated sensitive keywords from China

Digital Times

● Identifying the sensitive user group o Use outdated sensitive keywords from China Digital

Times. o Start with 25 sensitive users.

Repost

Identifying the sensitive user group

● Identifying the sensitive user group o Use outdated sensitive keywords from China Digital

Times. o Start with 25 sensitive users. o Sensitive group reaches 3,567 users after 15 days. o More than 4,500 deletion daily

Identifying the sensitive user group

● User timeline: o Weibo user timeline API returns the most recent 50 posts

of the specified user.

o Query 3,567 sensitive users once per minute §  100 accounts for API call §  300 concurrent Tor circuits.

o Four-node cluster running Hadoop and Hbase

§  2.38 million posts from July 20 to September 8, 2012.

Crawling

Diff

Our database Latest 50 posts Deleted Post

Detecting deletions

t0 t1 t2 tn

The lifetime of deleted Post = tn - t0

Detecting deletions

…...

● Permission-denied or system deletion o “Permission denied” error. o Caused by censorship events. o The post still exists but cannot be accessed by users.

● General deletion

o “Post does not exist” error. o May caused by user self deletion or censorship events. o The post does not exist.

Detecting deletions

Detecting deletions

12

Permission-denied deletion 4.5%

General deletion 8.3%

2.38 Million user timeline posts

● Permission-denied deletion or system deletion §  Around 1,500 permission denied deletions. §  Comparing with WeiboScope, which is tracking

around 300,000 users and have no more than 100 permission denied deletions daily.

Detecting deletions

Distribution of deleted posts

Whole lifetime First two hours

Strategies to target sensitive contents

1. Weibo has filtering mechanisms as a proactive, automated defense.

2. Weibo targets specific users, such as those

who frequently post sensitive content. 3. When a sensitive post is found, a moderator

will use automated searching tools to find all of its related reposts, and delete them all at once.

1. Keywords list filtering

● Weibo has filtering mechanisms as a proactive, automated defense

o Explicit filtering Sorry, The content

violates the relevant laws and

regulations. If need help, please contact customer service.

● Weibo has filtering mechanisms as a proactive, automated defense

o Explicit filtering o Implicit filtering

1. Keywords list filtering Your post has been submitted

successfully. Currently, there is a delay caused by server data

synchronization. Please wait for 1 to 2 minutes. Thank you very much.

● Weibo has filtering mechanisms as a proactive, automated defense o Explicit filtering o Implicit filtering o Camouflaged posts o Surveillance keywords list?

§  If no such list the cost will be too expansive

1. Keywords list filtering

2. Targeting specific users

● Weibo targets specific users, such as those who frequently post sensitive content.

3. Finding all related reposts ●  When a sensitive post is found, a moderator can find all

of its related reposts, and delete them all at once

Censors work in the night

Censors catch up in the morning

Conclusion

Whole lifetime First two hours

KnownUnknowns:AnAnalysisofTwi1erCensorshipin#Turkey

RimaTanash,RiceUniversityZhouhanChen,RiceUniversityTanmayThakur,UniversityofHoustonChrisBronk,UniversityofHoustonDevikaSubramanian,RiceUniversityDanS.Wallach,RiceUniversity

Disclaimer

•  Ourresearchisfocusedonquan:fyingtheextentofTwiJercensorshipinagivencountry.

•  Wedonotdiscloseprivateuserinforma:on,orshareTwiJerdatasets.

Mo=va=on•  Socialmediaplayedasignificantroleduringtherecentwave

ofuprisingsintheMiddleEast(a.k.a.ArabSpring)-  Demonstra:onswereorchestratedviaTwiJer&Facebook.

Mo=va=on•  Socialmediaplayedasignificantroleduringtherecentwave

ofuprisingsintheMiddleEast(a.k.a.ArabSpring)-  Demonstra:onswereorchestratedviaTwiJer&Facebook.-  ResultedintheoverthrowofsomeArabDictatorships.

Mo=va=on•  Socialmediaplayedasignificantroleduringtherecentwave

ofuprisingsintheMiddleEast(a.k.a.ArabSpring)-  Demonstra:onswereorchestratedviaTwiJer&Facebook.-  ResultedintheoverthrowofsomeArabDictatorships.

Governmentcensorship• Clearly,governmentshavegreatinterestincontrollingsocialmedia.

Governmentcensorshipinthenews

Typesofcensorship• Networklevel:

•  Blocken:reservice.•  Chinablockswesternsocialmedia.•  TurkeyblocksTwiJerMarch,2014.

• Applica=onlevel:•  Serviceisnotblocked,censorshipisinternaltotheservice•  Example:SinaWeibo(Zhuet.al)–keywordfiltering,etc.

Typesofcensorship• Networklevel:

•  Blocken:reservice.•  Chinablockswesternsocialmedia.•  TurkeyblocksTwiJerMarch,2014.

• Applica=onlevel:•  Serviceisnotblocked,censorshipisinternaltotheservice•  Example:SinaWeibo(Zhuet.al)–keywordfiltering,etc.

•  Butwhataboutapplica=oncensorshipinwesternsocialmedia?

Twi1ercensorship•  January2012,TwiJer“Country-WithheldContent”

•  TwiJerpublishescensorshiprequestsonChillingEffects.org

•  TwiJerpublishesbi-annualtransparencyreports

Removalrequests

https://transparency.twitter.com/removal-requests/2015/jan-jun

RemovalrequestsUSA

RemovalrequestsTurkey

Thismapraisesmanyques=on

•  What’supwithTurkey?•  Whyarethere0requestsfromArabic-speaking

countriessuchasSaudiandUAE?•  Theyhavereputa:onforcensorship.•  TwiJerisverypopularinthesecountries.

AreTwi1ertransparencyreportscomplete?

•  “NOTE:Thedatainthesereportsisasaccurateaspossible,butmaynotbe100%comprehensive.”

•  TwiJerdoesnotposttheno:cestoChillingEffectswhenthey“…arelegallyprohibitedfromdoingso”

Prohibitedbywhom?Andwhichlaws?

RequestsamplefromChillingEffects

(Redac:onsbyTwiJer.)

WhyTurkey?•  Itrepresentsthedarkestblobonthemap.

•  TurkeybannedTwiJer,initsen:rely,in2014.

ResearchQues=ons

•  CanweconfirmthenumberofwithheldtweetsreportedforTurkeyintheTransparencyReports?

•  CanwefindunreportedtweetsinTurkey?•  Canweextractandanalyzetopicsbeing

withheld?

Agenda•  Methodology

•  Valida:ngcensoredtweets•  Findinginteres:ngusers•  Crawling

•  Findings•  Topicanalysis•  Bypassingcensorship•  Futurework

HowtovalidateiftweetsarecensoredinTurkey?

HowtovalidateiftweetsarecensoredinTurkey?

•  ViewtweetsfrominsideTurkeyandcheckiftheyareinvisible.

HowtovalidateiftweetsarecensoredinTurkey?

•  ViewtweetsfrominsideTurkeyandcheckiftheyareinvisible.•  UseafreeTurkishproxy

HowtovalidateiftweetsarecensoredinTurkey?

•  ViewtweetsfrominsideTurkeyandcheckiftheyareinvisible.•  UseafreeTurkishproxy✗

HowtovalidateiftweetsarecensoredinTurkey?

•  ViewtweetsfrominsideTurkeyandcheckiftheyareinvisible.•  UseafreeTurkishproxy✗•  UsethePlanetLabnetwork

HowtovalidateiftweetsarecensoredinTurkey?

•  ViewtweetsfrominsideTurkeyandcheckiftheyareinvisible.•  UseafreeTurkishproxy✗•  UsethePlanetLabnetwork✗

HowtovalidateiftweetsarecensoredinTurkey?

•  ViewtweetsfrominsideTurkeyandcheckiftheyareinvisible.•  UseafreeTurkishproxy✗•  UsethePlanetLabnetwork✗

•  Whileanalyzingasampleofknowncensoredtweets,weobservedaspecialfieldinthetweetstructure,called:

"id_str" : "51707756700291... "in_reply_to_user_id":null, "favorited":false, "withheld_in_countries":["TR"] “...

HowtovalidateiftweetsarecensoredinTurkey?

•  ViewtweetsfrominsideTurkeyandcheckiftheyareinvisible.•  UseafreeTurkishproxy✗•  UsethePlanetLabnetwork✗

•  Whileanalyzingasampleofknowncensoredtweets,weobservedaspecialfieldinthetweetstructure,called:

•  Ifthisfieldisreliable,thenwecancrawlfromhome!

"id_str" : "51707756700291... "in_reply_to_user_id":null, "favorited":false, "withheld_in_countries":["TR"] “...

Valida=ngourobserva=on•  TorbrowserbundlewithExitNode{TR}.•  Result:

•  Weconfirmedthatalltweetsthatcontainedthe"withheld_in_countries”field,wereindeedinvisibleinTurkey.

•  Therefore,cancrawlfromUSA.✓

Crawling•  Goal:collectcensoredtweetsandexaminetheircontent.•  FreeTwiJerpublicAPI.•  Startwith689seedsensi:veusers:

•  Source:ChillingEffectswebsite.•  Spider-outinthesocialgroupfornewusers.

•  APIwithgeoboundingboxesofthreemajorTurkishci:es.•  Crawlhistoricalusers’:meline.

Revisi=ngcollectedtweets•  Methodology:Collectnow,verifylater

•  Censorshipdoesn’thappenimmediately•  Inspectif“withheld_in_countries”fieldispresent

•  Tradeoff:#usersvs.:megranularity

Agenda•  Methodology

•  Valida:ngcensoredtweets•  Findinginteres:ngusers•  Crawling

•  Results•  Topicanalysis•  Bypassingcensorship•  Futurework

Results

Overallwecollected 20+miltweets

Experimentdura:on(allmethods) 10/2014-5/2015

Numberofinteres:ngusers 7,642users

Results

Overallwecollected 20+miltweets

Experimentdura:on(allmethods) 10/2014-5/2015

Numberofinteres:ngusers 7,642users

Numberofreportedwithheldtweetsby6/15 3,981tweets

Numberofreportedwithheldusersby6/15 204users

Results

Overallwecollected 20+miltweets

Experimentdura:on(allmethods) 10/2014-5/2015

Numberofinteres:ngusers 7,642users

Numberofreportedwithheldtweetsby6/15 3,981tweets

Numberofreportedwithheldusersby6/15 204users

Numberofwithheldtweetswecollected 266,407tweets

Numberofwithheldusersweiden=fied 46users

Numberofwithheldtweetsnotincludingthosefromwithheldaccounts

205,451tweets

Results

Overallwecollected ~20milTweets

Timerange** 10/2014-5/2015

Interes:ngusers 7,642

Numberofreportedwithheldtweetsby6/15 3,981tweets

Numberofreportedwithheldaccountsby6/15 204users

Numberofwithheldtweetswecollected 266,407tweets

Numberofwithheldaccountsweiden=fied 46users

Numberofwithheldtweetsnotincludingthosefromwithheldaccounts

205,451tweets

There is at least two orders of magnitude more withheld tweets in Turkey than what Twitter reported

Twoordersofmagnitude•  “NOTE:Thedatainthesereportsisasaccurateaspossible,

butmaynotbe100%comprehensive.”

•  Itisindeednot100%comprehensivenoraccurate.

Deduplica=on •  MaybeTwiJerreportscensoredcopiesasoneevent?•  Note:thetweetswecollectedareuniquebyID.•  Weremovedduplica:ons:copy/paste&retweets(details

inthepaper).

Deduplica=on •  MaybeTwiJerreportscensoredcopiesasoneevent?•  Note:thetweetswecollectedareuniquebyID.•  Weremovedduplica:ons:copy/paste&retweets(details

inthepaper).•  Reducedthenumberoftweetsto:88,276.

Deduplica=on •  MaybeTwiJerreportscensoredcopiesasoneevent?•  Note:thetweetswecollectedareuniquebyID.•  Weremovedduplica:ons:copy/paste&retweets(details

inthepaper).•  Reducedthenumberoftweetsto:88,276.•  àOneorderofmagnitudehigherthanwhatTwi1er

reported.

Agenda•  Methodology

•  Valida:ngcensoredtweets•  Findinginteres:ngusers•  Crawling

•  Findings•  Topicanalysis•  Bypassingcensorship•  Futurework

Topicanalysis•  Topicanalysisisvaluableforunderstandingthepoli:cal

aimsoftheTurkishcensors.•  Termfrequency–inversedocumentfrequency(t-idf)–

standardmachinelearningalgorithm.•  Weextractedthetop5topicswith10wordsforeachtopic.

Topicanalysis•  Word:media,dishonest,freedom,an=-Semi=creferences,minister,etc.•  Wecaninferthatstronglywordedandvulgarpoli:caldiscussionsarebeing

targetedbyTurkey’scensorshipauthori:es.Turkish“topic” Englishtransla=on

do˘ganaydınmedya¸serefsizgrubumedyasıvatanhurriyetde˘gilköpe˘gi

AydinDogan(apersonwhoownsthebiggestmediainTurkey)mediadishonestgrouphomefreedomnotdog

cort¸sekerbank¸sap¸sikinhikayesidikkatchpibrahimkaraca Notmeaningful:Sekerbank(aTurkishfinancialins:tu:on)aJen:oninstoryIbrahimKaraca(aTurkishname)

koçunvehbio˘gluınaydınrahmido˘ganyahudinahum VehbiKoç(aTurkishentrepreneur/philanthropist)sonenlightenedwombrisingJewish

elvanlütibakanbakanıula¸s|rmaJtaptalbiwwwadam LüfiElvan(aTurkishgovernmentminister)stupidmanministertransporta=on

davuto˘gluahmetba¸sbakanlanpicsikeyimyahudigavatgötvatan

AhmetDavutoglu(currentTurkishprimeminister)primeministermanbastardfuckJewishpimpass

Agenda•  Methodology

•  Valida:ngcensoredtweets•  Findinginteres:ngusers•  Crawling

•  Findings•  Topicanalysis•  Bypassingcensorship•  Futurework

Bypassingcensorship•  InApril2015,wefollowedagroupofwithheldaccountsin

Turkey,andno:cedthatsomeusersweres:lltwee:ngfrominsidethecountrydespitetheiraccountsbeingwithheld.

Bypassingcensorship•  InApril2015,wefollowedagroupofwithheldaccountsin

Turkey,andno:cedthatsomeusersweres:lltwee:ngfrominsidethecountrydespitetheiraccountsbeingwithheld.

•  How?

Bypassingcensorship•  InApril2015,wefollowedagroupofwithheldaccountsin

Turkey,andno:cedthatsomeusersweres:lltwee:ngfrominsidethecountrydespitetheiraccountsbeingwithheld.

•  How?•  VPN&Tor.

Users connecting to Tor in Turkey 1.  Tormetric:Numberofusersconnec:ngto

TorinTurkeydaily.2.  Distribu:onofwithheldtweetbymonth

(logscale).

Users connecting to Tor in Turkey 1.  Tormetric:Numberofusersconnec:ngto

TorinTurkeydaily.2.  Distribu:onofwithheldtweetbymonth

(logscale).

Peaks:1.  5/2013Taksimsquareprotest.2.  3/2014TwiJerbaninTurkey.

Users connecting to Tor in Turkey 1.  Tormetric:Numberofusersconnec:ngto

TorinTurkeydaily.2.  Distribu:onofwithheldtweetbymonth

(logscale).

Peaks:1.  5/2013Taksimsquareprotest.2.  3/2014TwiJerban.3.  11/2014&12/2014:peakin“withheld

content”censorship

Bypassingcensorship •  Turnsout,itiseveneasier!•  Changethe“Loca=on”se}ng

intheTwiJerapplica:on “Turto“USA”forexample.

•  Weexpectmanyuserstostart

usingthismethodinthefuture.

WhatifTwi1erbecomesvigilant?

•  UserswillreverttoVPNsorTor.

Countrylevelcensorshipishard•  If“countrywithholding”mechanismsdon’twork,countries

willdemandglobalTwiJercensorship

Future work •  Measuringcensorshipinothercountries.

•  TwiJerimposesrestric:veratelimits.

•  Poli:calsciencecollabora:onongoing.

Conclusion 1.  Weprovidedmethodstofindunpublishedcensoredtweets2.  WeshowedthatthesizeofcensorshipinTurkeyisatleast

twoordersofmagnitudelargerthanwhatTwiJerreported.

3.  WeintroducedanewsimplemethodtobypassTwiJercensorshipbychangingtheloca:onse}ng.

4.  Weextractedcensoredtopicsusingmachinelearningclusteringalgorithmandfoundthatmostofcensoredtopicsarepoli:cal.

FAQ’s:

http://www.cs.rice.edu/~rst5/twitterTurkey/