Content Access Control with Varnish Cache

Post on 17-Jul-2015

244 views 3 download

Tags:

Transcript of Content Access Control with Varnish Cache

Content Access Control with Varnish CacheA quick look at some challenges & considerations

Carlos Abalde, Roberto Moreda

{cabalde,moreda}@allenta.com

Stockholm, Varnish Summit 2014

Agenda

๏ Our particular journey designing & deploying access control solutions based on Varnish Plus

๏ Contents

‣ Varnish Paywall

‣ Challenges & considerations

‣ Conclusions

Who are we?

๏ Allenta Consulting

‣ http://www.allenta.com

๏ Varnish Software integration partner

๏ Specialized in Varnish Paywall

‣ Paywall projects running in Italy, Ireland & Argentina at the moment

Johann meets Varnish Paywall

Hello, I’m Johann!

Who’s Johann?

๏ According to Wikipedia, Johann Carolus is the name of the publisher of the first newspaper

๏ He’s also the hero of this presentation

๏ Johann is yet another publisher worried for the decline of advertising revenue in on-line media

‣ Evolution of traditional ad-based models?

‣ Alternative tool for monetizing on-line contents?

Johann has a wish list

๏ Transition to a subscription-based model

‣ Flexible / extensible subscription model

- Metered subscriptions

- Partial subscriptions

๏ Freemium model

๏ Owned contents

… a huge wish list!

๏ Separate Plug & Play component

‣ Minimal changes to existing backend

๏ Scalable & high performance solution

‣ Do not degrade current UX

๏ On-premises solution

‣ Full control of the product

What’s VPW?

๏ Part of Varnish Plus

‣ Access control logic moved to the caching edge

‣ Fast & flexible paid content delivery

๏ Win-win toolkit solution

‣ Powerful access control layer

‣ Advanced caching technology

What’s really VPW?

๏ Some VCL subroutines, a few general purpose OSS VMODs, and one access control specific VMOD

๏ Optionally,

‣ Some high performance storage

‣ Some Varnish Custom Statistics counters

‣ Some JavaScript assets

Beyond newspapers

๏ VPW is not a traditional media specific product

๏ VPW is about moving access control logic to the caching edge

‣ Execute access control logic at Varnish speed

‣ Improve hit ratio

‣ Simplify backend logic

VPW is also for…

๏ Alice, who’s running a trading site willing to distribute certain reports only to premium users

๏ Bob, who has been asked to speed up a paid music streaming service

๏ Emma, who’s running a slow site of stock images limited to 5 downloads per day per authenticated user

๏ …

Johann meets Cosme

Who’s Cosme?

๏ Cosme is an engineer working at Allenta

๏ He has been working on access control solutions based on Varnish Plus for a few years

๏ Cosme discusses with Johann some usual challenges & considerations when adding a paywall layer to an existing website

‣ Anonymous metering, storage options, SEO…

Anonymous metering

๏ “I don’t want the paywall to bother casual readers. Let’s do this NYT style. Only require authentication after 10 articles have been accessed during the current month”

๏ “I’ve read the NYT paywall is breakable using a simple bookmarklet. Seriously?”

๏ “What about using browser fingerprinting to identify anonymous users?”

“Let’s do this NYT style”

Anonymous metering

๏ Metering based on cookies is breakable

‣ Is this a real issue from a business perspective?

‣ Restrict contents eligible for anonymous access

- Focus on user engaging

๏ Cookie backups in local storage, DOM…

- https://github.com/samyk/evercookie

Metering cookies

Anonymous metering

๏ Server side metering

‣ https://github.com/Valve/fingerprintjs

๏ Not a real solution

‣ Also easily breakable

‣ Collisions

- Mobile devices, cloned desktops…

Browser fingerprinting

Paywall state

๏ “Where is metering data stored?”

๏ “Systems guys are asking about scalability of the storage layer keeping track of the state of the paywall. What about this?”

๏ “And what about HA? What are the options here?”

“Where is metering data stored?”

Paywall state

๏ Memcached

‣ https://github.com/varnish/libvmod-memcached

๏ Redis

‣ https://github.com/carlosabalde/libvmod-redis

‣ Persistence

‣ Richer API & Power of LUA scripting

Memcached vs. Redis

Paywall state

๏ Twemproxy

‣ https://github.com/twitter/twemproxy

‣ Light-weight sharding proxy for MC & Redis

๏ Redis Sentinel

‣ http://redis.io/topics/sentinel

‣ Monitoring, notification & automatic failover

Current scalability & HA options

Paywall state

๏ Redis Cluster

‣ http://redis.io/topics/cluster-tutorial

‣ Automatic sharding & replication for Redis

๏ Dynomite

‣ https://github.com/Netflix/dynomite

‣ Dynamo implementation for MC & Redis

Future scalability & HA options

SEO

๏ “Google bot should be able to index all contents in my site, both paywalled and not paywalled ones”

๏ “Simply detect the bot checking the User Agent HTTP header, check the source IP address using the DNS VMOD, and let it access to all paywalled contents”

“Let Google bot access to all paywalled contents”

SEO

๏ Google penalices content cloaking

๏ FCF requires that all users who click a Google search result should be allowed to see the full text of the content they are trying to access

‣ That text must be identical to the content that was shown to Google bot on indexing time

‣ Publishers are allowed to limit the number of accesses under the FCF policy to 5 accesses per user each day

Google’s First Click Free Policy for Web Search

SEO

๏ Users may get access even when their quotas are exhausted or they are even not authenticated

๏ Breakable exclusion based on Referrer header

‣ Well known issue of FT and other newspapers

‣ What about teasers?

- Same URL internally rewritten by Varnish

- Not useful for freemium contents

FCF implications

And much more…

๏ Access control exclusions

๏ Fraud detection

๏ Testing strategy

๏ Paywall API & Agent

๏ Usage statistics

๏ …

Conclusions

Conclusions

๏ VPW is a powerful paywalling toolkit

‣ Flexibility

‣ Access control logic running at Varnish speed

๏ Win-win solution

‣ Advanced caching technology

‣ Powerful access control layer

Thanks!

Bonus slides

How does VPW work?

๏ Custom HTTP headers

‣ X-Pw-Access-Control…

๏ API services

‣ Authorization service…

๏ Securely signed cookies

๏ High performance storage

Exclusions

๏ “The IP ranges of these companies should completely bypass the paywall. We have some B2B agreements with them”

๏ “The web views used by our official mobile apps should also bypass the paywall”

๏ “Any click on paywalled contents linked in Facebook or Twitter should also bypass the paywall”

“And now some exceptions”

Exclusions

๏ It’s completely reasonable to bypass the paywall logic based on:

‣ A Varnish ACL

‣ Some ad-hoc HTTP headers including a HMAC signature generated using a secret shared between Varnish and the mobile apps

๏ Bypassing the paywall logic based on the HTTP referrer header is weak and should be carefully analyzed

Beware of fake HTTP headers

Fraud detection

๏ “What if some user purchases an unmetered subscription and then shares his/her credentials with all his/her Facebook friends?”

๏ “What if an office using a NAT proxy buy a single unmetered subscription to all the employees in the building?”

“Sharing unmetered subscriptions”

Fraud detection

๏ You may be able to detect fraud in your user management component

‣ Limit number / rate of sessions per user

‣ Force extra validations / block users when a suspicious behavior is detected

๏ Paywall may help if you are not able to do that

‣ Redis sorted set restricting number of SIDs & IPs per user during some short time window

Rate limiting