Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of...

Post on 13-Oct-2020

0 views 0 download

Transcript of Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of...

Understanding automated copyright takedowns: a YouTube case study

Nicolas Suzor @nicsuzor n.suzor@qut.edu.au

Joanne Gray @jograycy7 j30.gray@qut.edu.au

QUT Faculty of Law Digital Media Research Centre

WARNING: WORK IN PROGRESS

1. Internet users are governed by — and through — intermediaries. 2. We have no real tools to hold intermediaries to account 3. Real accountability will require new methods and new partnerships to understand complex, partially-automated systems at scale.

Heather Heyer killed in Charlottesville

• SPLC called the Daily Stormer ‘the most popular English-language radical right website in the world’

• Site’s editor posted a story describing Heyer as a ‘Fat Childless 32-Year-Old slut’

• ‘most people are glad she is dead’

• Site continues to spread false claim that Heyer died of a heart attack and wasn’t hit by a car

Heyer’s death celebrated on ‘Daily Stormer’ neo-Nazi site

• Domain name cancelled by GoDaddy, Google, and others

• Hosting dropped by DigitalOcean

• Cloudflare stop caching and mirroring

Stormer dropped by infrastructure companies

“This was my decision. Our terms of service reserve the right for us to terminate users of our network at our sole discretion. My rationale for making this decision was simple: the people behind the Daily Stormer are assholes and I'd had enough.

Stormer dropped by infrastructure companies

Let me be clear: this was an arbitrary decision ...

Literally, I woke up in a bad mood and decided someone shouldn't be allowed on the Internet. No one should have that power.

Stormer dropped by infrastructure companies

1. Intermediaries are the focal points of control in a

networked society

Key focus so far has been on improving accuracy (consistency)

Big improvements have been made

But major platforms deal with millions of decisions a week

... which means hundreds of thousands of mistakes a week at >98% accuracy

Google now processes 75 Million takedown requests / month

Almost everything we know is based on anecdotes, leaked evidence, and lobbying docs

Transparency is improving

Telcos and Platforms are starting to provide more transparency about how they govern their networks

But we still don’t have good granular information about content moderation and takedowns

How do we improve accountability?

We need new methods and infrastructure to understand online governance:

• at scale • over time • across platforms and jurisdictions

Our Digital Observatory infrastructure provides longitudinal random samples of social media posts, tested for availability

We have data for YouTube, Twitter, & IG so far

(aside: moderation is fascinating.)

Using machine learning to help understand moderation at scale

Approx 0.3% videos removed

• Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

• We used LDA and NMF to cluster similar videos together (based on title and description)

• After several tweaks, we found relatively coherent topics (k=15)

• From this we identified discrete categories for further analysis:

• Game play, hacks and cheats, full movies, live sports, and sports highlights.

Unsupervised clustering

We then trained a classifier (BERT)

Active training

Reached 94% accuracy after 4 iterations.

Generally the most influential users of content ID blocking are as expected.

Rights management firms still dominate DMCA takedowns

The DMCA is practically useless for categories like live gaming — embeded music is impossible to detect from metadata.

Only Content ID blocks for game streams.

Surprisingly small (~0.5%) proportion overall

Practically no DMCA takedowns

Live sports still one of the biggest complaints from broadcasters

Concerning takedowns for guides (cheats, circumvention)

Relatively high removal rates for sports highlights

With manual coding, we can check rates of false positives and false negatives (Alice Witt’s work)

Detailed metadata on a random sample of videos removed for hate speech

Particularly useful for rich qualitative analysis around controversies

Using machine learning classifiers to track the effect of regulatory and policy changes over time

When governance is decentralised, we need new institutions to understand and monitor private decisionmaking.

• Better transparency from platforms • Greater participation in rule making (Multistakeholderism?) • A set of principles that set out what we expect from providers

(particularly due process + respect for substantive human rights) • New methods to understand decisions at scale and over time • Better understanding of the mechanics of regulating transnational

intermediaries: in what circumstances do different approaches (incl. self-regulation and co-regulation) work well?

Questions? <n.suzor@qut.edu.au>

Lawless: The Secret Rules That Govern Our Digital Lives

Cambridge University Press (2019)

Read it now (free PDF): https://osf.io/preprints/socarxiv/ack26/

• Suzor et al (forthcoming 2019) What do we mean when we talk about transparency? • Witt, Suzor, Huggins, 'The Rule of Law on Instagram: An Evaluation of the Moderation of Images

Depicting Women’s Bodies' (2019) University of New South Wales Law Journal • Suzor, Nicolas (2018) Digital constitutionalism: Using the rule of law to evaluate the legitimacy of

governance by platforms. Social Media + Society, 4(3), pp. 1-11. • Duguay, Stefanie, Burgess, Jean, & Suzor, Nicolas (2018) Queer women’s experiences of patchwork

platform governance on Tinder, Instagram, and Vine. Convergence: The International Journal of Research into New Media Technologies. (In Press)

• Dragiewicz, Molly, Burgess, Jean, Matamoros-Fernandez, Ariadna, Salter, Michael, Suzor, Nicolas P., Woodlock, Delanie, et al. (2018) Technology facilitated coercive control: Domestic violence and the competing roles of digital media platforms. Feminist Media Studies, 18(4), pp. 609-625.

• Suzor, Nicolas, Van Geelen, Tess, & Myers West, Sarah (2018) Evaluating the legitimacy of platform governance: A review of research and a shared research agenda. International Communication Gazette, 80(4), pp. 385-400.

• Suzor, Nicolas P., Dragiewicz, Molly, Harris, Bridget, Gillett, Rosalie, Burgess, Jean, & Van Geelen, Tess (2018) Human rights by design: The responsibilities of social media platforms to address gender-based violence online. Policy & Internet.

• Pappalardo, Kylie M. & Suzor, Nicolas P. (2018) The liability of Australian online intermediaries. Sydney Law Review, 40(4), pp. 469-498.

• Jay Alammar, The Illustrated BERT: http://jalammar.github.io/illustrated-bert/

References and more

When governance is decentralised, we need new institutions to understand and monitor private decisionmaking.

• Better transparency from platforms • Greater participation in rule making (Multistakeholderism?) • A set of principles that set out what we expect from providers

(particularly due process + respect for substantive human rights) • New methods to understand decisions at scale and over time • Better understanding of the mechanics of regulating transnational

intermediaries: in what circumstances do different approaches (incl. self-regulation and co-regulation) work well?

Questions? <n.suzor@qut.edu.au>