Content Access Control with Varnish Cache

39
Content Access Control with Varnish Cache A quick look at some challenges & considerations Carlos Abalde, Roberto Moreda {cabalde,moreda}@allenta.com Stockholm, Varnish Summit 2014

Transcript of Content Access Control with Varnish Cache

Page 1: Content Access Control with Varnish Cache

Content Access Control with Varnish CacheA quick look at some challenges & considerations

Carlos Abalde, Roberto Moreda

{cabalde,moreda}@allenta.com

Stockholm, Varnish Summit 2014

Page 2: Content Access Control with Varnish Cache

Agenda

๏ Our particular journey designing & deploying access control solutions based on Varnish Plus

๏ Contents

‣ Varnish Paywall

‣ Challenges & considerations

‣ Conclusions

Page 3: Content Access Control with Varnish Cache

Who are we?

๏ Allenta Consulting

‣ http://www.allenta.com

๏ Varnish Software integration partner

๏ Specialized in Varnish Paywall

‣ Paywall projects running in Italy, Ireland & Argentina at the moment

Page 4: Content Access Control with Varnish Cache
Page 5: Content Access Control with Varnish Cache

Johann meets Varnish Paywall

Page 6: Content Access Control with Varnish Cache

Hello, I’m Johann!

Page 7: Content Access Control with Varnish Cache

Who’s Johann?

๏ According to Wikipedia, Johann Carolus is the name of the publisher of the first newspaper

๏ He’s also the hero of this presentation

๏ Johann is yet another publisher worried for the decline of advertising revenue in on-line media

‣ Evolution of traditional ad-based models?

‣ Alternative tool for monetizing on-line contents?

Page 8: Content Access Control with Varnish Cache

Johann has a wish list

๏ Transition to a subscription-based model

‣ Flexible / extensible subscription model

- Metered subscriptions

- Partial subscriptions

๏ Freemium model

๏ Owned contents

Page 9: Content Access Control with Varnish Cache

… a huge wish list!

๏ Separate Plug & Play component

‣ Minimal changes to existing backend

๏ Scalable & high performance solution

‣ Do not degrade current UX

๏ On-premises solution

‣ Full control of the product

Page 10: Content Access Control with Varnish Cache
Page 11: Content Access Control with Varnish Cache
Page 12: Content Access Control with Varnish Cache

What’s VPW?

๏ Part of Varnish Plus

‣ Access control logic moved to the caching edge

‣ Fast & flexible paid content delivery

๏ Win-win toolkit solution

‣ Powerful access control layer

‣ Advanced caching technology

Page 13: Content Access Control with Varnish Cache

What’s really VPW?

๏ Some VCL subroutines, a few general purpose OSS VMODs, and one access control specific VMOD

๏ Optionally,

‣ Some high performance storage

‣ Some Varnish Custom Statistics counters

‣ Some JavaScript assets

Page 14: Content Access Control with Varnish Cache
Page 15: Content Access Control with Varnish Cache

Beyond newspapers

๏ VPW is not a traditional media specific product

๏ VPW is about moving access control logic to the caching edge

‣ Execute access control logic at Varnish speed

‣ Improve hit ratio

‣ Simplify backend logic

Page 16: Content Access Control with Varnish Cache

VPW is also for…

๏ Alice, who’s running a trading site willing to distribute certain reports only to premium users

๏ Bob, who has been asked to speed up a paid music streaming service

๏ Emma, who’s running a slow site of stock images limited to 5 downloads per day per authenticated user

๏ …

Page 17: Content Access Control with Varnish Cache
Page 18: Content Access Control with Varnish Cache

Johann meets Cosme

Page 19: Content Access Control with Varnish Cache

Who’s Cosme?

๏ Cosme is an engineer working at Allenta

๏ He has been working on access control solutions based on Varnish Plus for a few years

๏ Cosme discusses with Johann some usual challenges & considerations when adding a paywall layer to an existing website

‣ Anonymous metering, storage options, SEO…

Page 20: Content Access Control with Varnish Cache

Anonymous metering

๏ “I don’t want the paywall to bother casual readers. Let’s do this NYT style. Only require authentication after 10 articles have been accessed during the current month”

๏ “I’ve read the NYT paywall is breakable using a simple bookmarklet. Seriously?”

๏ “What about using browser fingerprinting to identify anonymous users?”

“Let’s do this NYT style”

Page 21: Content Access Control with Varnish Cache

Anonymous metering

๏ Metering based on cookies is breakable

‣ Is this a real issue from a business perspective?

‣ Restrict contents eligible for anonymous access

- Focus on user engaging

๏ Cookie backups in local storage, DOM…

- https://github.com/samyk/evercookie

Metering cookies

Page 22: Content Access Control with Varnish Cache

Anonymous metering

๏ Server side metering

‣ https://github.com/Valve/fingerprintjs

๏ Not a real solution

‣ Also easily breakable

‣ Collisions

- Mobile devices, cloned desktops…

Browser fingerprinting

Page 23: Content Access Control with Varnish Cache

Paywall state

๏ “Where is metering data stored?”

๏ “Systems guys are asking about scalability of the storage layer keeping track of the state of the paywall. What about this?”

๏ “And what about HA? What are the options here?”

“Where is metering data stored?”

Page 24: Content Access Control with Varnish Cache

Paywall state

๏ Memcached

‣ https://github.com/varnish/libvmod-memcached

๏ Redis

‣ https://github.com/carlosabalde/libvmod-redis

‣ Persistence

‣ Richer API & Power of LUA scripting

Memcached vs. Redis

Page 25: Content Access Control with Varnish Cache

Paywall state

๏ Twemproxy

‣ https://github.com/twitter/twemproxy

‣ Light-weight sharding proxy for MC & Redis

๏ Redis Sentinel

‣ http://redis.io/topics/sentinel

‣ Monitoring, notification & automatic failover

Current scalability & HA options

Page 26: Content Access Control with Varnish Cache

Paywall state

๏ Redis Cluster

‣ http://redis.io/topics/cluster-tutorial

‣ Automatic sharding & replication for Redis

๏ Dynomite

‣ https://github.com/Netflix/dynomite

‣ Dynamo implementation for MC & Redis

Future scalability & HA options

Page 27: Content Access Control with Varnish Cache

SEO

๏ “Google bot should be able to index all contents in my site, both paywalled and not paywalled ones”

๏ “Simply detect the bot checking the User Agent HTTP header, check the source IP address using the DNS VMOD, and let it access to all paywalled contents”

“Let Google bot access to all paywalled contents”

Page 28: Content Access Control with Varnish Cache

SEO

๏ Google penalices content cloaking

๏ FCF requires that all users who click a Google search result should be allowed to see the full text of the content they are trying to access

‣ That text must be identical to the content that was shown to Google bot on indexing time

‣ Publishers are allowed to limit the number of accesses under the FCF policy to 5 accesses per user each day

Google’s First Click Free Policy for Web Search

Page 29: Content Access Control with Varnish Cache

SEO

๏ Users may get access even when their quotas are exhausted or they are even not authenticated

๏ Breakable exclusion based on Referrer header

‣ Well known issue of FT and other newspapers

‣ What about teasers?

- Same URL internally rewritten by Varnish

- Not useful for freemium contents

FCF implications

Page 30: Content Access Control with Varnish Cache

And much more…

๏ Access control exclusions

๏ Fraud detection

๏ Testing strategy

๏ Paywall API & Agent

๏ Usage statistics

๏ …

Page 31: Content Access Control with Varnish Cache

Conclusions

Page 32: Content Access Control with Varnish Cache

Conclusions

๏ VPW is a powerful paywalling toolkit

‣ Flexibility

‣ Access control logic running at Varnish speed

๏ Win-win solution

‣ Advanced caching technology

‣ Powerful access control layer

Page 33: Content Access Control with Varnish Cache

Thanks!

Page 34: Content Access Control with Varnish Cache

Bonus slides

Page 35: Content Access Control with Varnish Cache

How does VPW work?

๏ Custom HTTP headers

‣ X-Pw-Access-Control…

๏ API services

‣ Authorization service…

๏ Securely signed cookies

๏ High performance storage

Page 36: Content Access Control with Varnish Cache

Exclusions

๏ “The IP ranges of these companies should completely bypass the paywall. We have some B2B agreements with them”

๏ “The web views used by our official mobile apps should also bypass the paywall”

๏ “Any click on paywalled contents linked in Facebook or Twitter should also bypass the paywall”

“And now some exceptions”

Page 37: Content Access Control with Varnish Cache

Exclusions

๏ It’s completely reasonable to bypass the paywall logic based on:

‣ A Varnish ACL

‣ Some ad-hoc HTTP headers including a HMAC signature generated using a secret shared between Varnish and the mobile apps

๏ Bypassing the paywall logic based on the HTTP referrer header is weak and should be carefully analyzed

Beware of fake HTTP headers

Page 38: Content Access Control with Varnish Cache

Fraud detection

๏ “What if some user purchases an unmetered subscription and then shares his/her credentials with all his/her Facebook friends?”

๏ “What if an office using a NAT proxy buy a single unmetered subscription to all the employees in the building?”

“Sharing unmetered subscriptions”

Page 39: Content Access Control with Varnish Cache

Fraud detection

๏ You may be able to detect fraud in your user management component

‣ Limit number / rate of sessions per user

‣ Force extra validations / block users when a suspicious behavior is detected

๏ Paywall may help if you are not able to do that

‣ Redis sorted set restricting number of SIDs & IPs per user during some short time window

Rate limiting