Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of...

36
Understanding automated copyright takedowns: a YouTube case study Nicolas Suzor @nicsuzor [email protected] Joanne Gray @jograycy7 [email protected] QUT Faculty of Law Digital Media Research Centre

Transcript of Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of...

Page 1: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Understanding automated copyright takedowns: a YouTube case study

Nicolas Suzor @nicsuzor [email protected]

Joanne Gray @jograycy7 [email protected]

QUT Faculty of Law Digital Media Research Centre

Page 2: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

WARNING: WORK IN PROGRESS

Page 3: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

1. Internet users are governed by — and through — intermediaries. 2. We have no real tools to hold intermediaries to account 3. Real accountability will require new methods and new partnerships to understand complex, partially-automated systems at scale.

Page 4: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Heather Heyer killed in Charlottesville

Page 5: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

• SPLC called the Daily Stormer ‘the most popular English-language radical right website in the world’

• Site’s editor posted a story describing Heyer as a ‘Fat Childless 32-Year-Old slut’

• ‘most people are glad she is dead’

• Site continues to spread false claim that Heyer died of a heart attack and wasn’t hit by a car

Heyer’s death celebrated on ‘Daily Stormer’ neo-Nazi site

Page 6: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

• Domain name cancelled by GoDaddy, Google, and others

• Hosting dropped by DigitalOcean

• Cloudflare stop caching and mirroring

Stormer dropped by infrastructure companies

Page 7: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

“This was my decision. Our terms of service reserve the right for us to terminate users of our network at our sole discretion. My rationale for making this decision was simple: the people behind the Daily Stormer are assholes and I'd had enough.

Stormer dropped by infrastructure companies

Page 8: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Let me be clear: this was an arbitrary decision ...

Literally, I woke up in a bad mood and decided someone shouldn't be allowed on the Internet. No one should have that power.

Stormer dropped by infrastructure companies

Page 9: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

1. Intermediaries are the focal points of control in a

networked society

Page 10: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Key focus so far has been on improving accuracy (consistency)

Big improvements have been made

But major platforms deal with millions of decisions a week

... which means hundreds of thousands of mistakes a week at >98% accuracy

Google now processes 75 Million takedown requests / month

Page 11: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Almost everything we know is based on anecdotes, leaked evidence, and lobbying docs

Page 12: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Transparency is improving

Page 13: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Telcos and Platforms are starting to provide more transparency about how they govern their networks

Page 14: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

But we still don’t have good granular information about content moderation and takedowns

Page 15: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

How do we improve accountability?

We need new methods and infrastructure to understand online governance:

• at scale • over time • across platforms and jurisdictions

Page 16: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Our Digital Observatory infrastructure provides longitudinal random samples of social media posts, tested for availability

Page 17: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

We have data for YouTube, Twitter, & IG so far

Page 18: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

(aside: moderation is fascinating.)

Page 19: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Using machine learning to help understand moderation at scale

Page 20: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Approx 0.3% videos removed

Page 21: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

• Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

• We used LDA and NMF to cluster similar videos together (based on title and description)

• After several tweaks, we found relatively coherent topics (k=15)

• From this we identified discrete categories for further analysis:

• Game play, hacks and cheats, full movies, live sports, and sports highlights.

Unsupervised clustering

Page 22: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

We then trained a classifier (BERT)

Active training

Reached 94% accuracy after 4 iterations.

Page 23: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.
Page 24: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Generally the most influential users of content ID blocking are as expected.

Page 25: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Rights management firms still dominate DMCA takedowns

Page 26: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

The DMCA is practically useless for categories like live gaming — embeded music is impossible to detect from metadata.

Only Content ID blocks for game streams.

Surprisingly small (~0.5%) proportion overall

Practically no DMCA takedowns

Page 27: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Live sports still one of the biggest complaints from broadcasters

Page 28: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Concerning takedowns for guides (cheats, circumvention)

Page 29: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Relatively high removal rates for sports highlights

Page 30: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

With manual coding, we can check rates of false positives and false negatives (Alice Witt’s work)

Page 31: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Detailed metadata on a random sample of videos removed for hate speech

Particularly useful for rich qualitative analysis around controversies

Page 32: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Using machine learning classifiers to track the effect of regulatory and policy changes over time

Page 33: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

When governance is decentralised, we need new institutions to understand and monitor private decisionmaking.

• Better transparency from platforms • Greater participation in rule making (Multistakeholderism?) • A set of principles that set out what we expect from providers

(particularly due process + respect for substantive human rights) • New methods to understand decisions at scale and over time • Better understanding of the mechanics of regulating transnational

intermediaries: in what circumstances do different approaches (incl. self-regulation and co-regulation) work well?

Questions? <[email protected]>

Page 34: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

Lawless: The Secret Rules That Govern Our Digital Lives

Cambridge University Press (2019)

Read it now (free PDF): https://osf.io/preprints/socarxiv/ack26/

Page 35: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

• Suzor et al (forthcoming 2019) What do we mean when we talk about transparency? • Witt, Suzor, Huggins, 'The Rule of Law on Instagram: An Evaluation of the Moderation of Images

Depicting Women’s Bodies' (2019) University of New South Wales Law Journal • Suzor, Nicolas (2018) Digital constitutionalism: Using the rule of law to evaluate the legitimacy of

governance by platforms. Social Media + Society, 4(3), pp. 1-11. • Duguay, Stefanie, Burgess, Jean, & Suzor, Nicolas (2018) Queer women’s experiences of patchwork

platform governance on Tinder, Instagram, and Vine. Convergence: The International Journal of Research into New Media Technologies. (In Press)

• Dragiewicz, Molly, Burgess, Jean, Matamoros-Fernandez, Ariadna, Salter, Michael, Suzor, Nicolas P., Woodlock, Delanie, et al. (2018) Technology facilitated coercive control: Domestic violence and the competing roles of digital media platforms. Feminist Media Studies, 18(4), pp. 609-625.

• Suzor, Nicolas, Van Geelen, Tess, & Myers West, Sarah (2018) Evaluating the legitimacy of platform governance: A review of research and a shared research agenda. International Communication Gazette, 80(4), pp. 385-400.

• Suzor, Nicolas P., Dragiewicz, Molly, Harris, Bridget, Gillett, Rosalie, Burgess, Jean, & Van Geelen, Tess (2018) Human rights by design: The responsibilities of social media platforms to address gender-based violence online. Policy & Internet.

• Pappalardo, Kylie M. & Suzor, Nicolas P. (2018) The liability of Australian online intermediaries. Sydney Law Review, 40(4), pp. 469-498.

• Jay Alammar, The Illustrated BERT: http://jalammar.github.io/illustrated-bert/

References and more

Page 36: Understanding automated copyright takedowns: a YouTube ... Suzor... · •Extracted a sample of 20,000 random videos that had been removed when tested two weeks after initial collection.

When governance is decentralised, we need new institutions to understand and monitor private decisionmaking.

• Better transparency from platforms • Greater participation in rule making (Multistakeholderism?) • A set of principles that set out what we expect from providers

(particularly due process + respect for substantive human rights) • New methods to understand decisions at scale and over time • Better understanding of the mechanics of regulating transnational

intermediaries: in what circumstances do different approaches (incl. self-regulation and co-regulation) work well?

Questions? <[email protected]>