Parallel and Concurrent Haskell Part I Simon Marlow (Microsoft Research, Cambridge, UK)
Fighting Spam with Haskell - Multicore Programming...
Transcript of Fighting Spam with Haskell - Multicore Programming...
![Page 1: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/1.jpg)
Fighting Spam with Haskell
Simon Marlow
5 Sept 2015
![Page 2: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/2.jpg)
Headlines
▪ Migrated a large service to Haskell
▪ thousands of machines
▪ every action on Facebook (and Instagram) runs some Haskell code
▪ live code updates (<10 min)
▪ This talk
▪ The problem we’re solving: abuse detection & remediation
▪ Why (and how) Haskell?
▪ Tales from the trenches
![Page 3: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/3.jpg)
The problem
▪ There is spam and other types of abuse
▪ Malware attacks, credential stealing
▪ Sites that trick users into liking/sharing things or divulging passwords
▪ Fake accounts that spam people and pages
▪ Spammers can use automation and viral attacks
▪ Want to catch as much as possible in a completely automated way
▪ Squash attacks quickly and safely
![Page 4: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/4.jpg)
![Page 5: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/5.jpg)
Evil?
Yes!
![Page 6: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/6.jpg)
∑ We call this system Sigma
![Page 7: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/7.jpg)
Sigma :: Content -> Bool
▪ Sigma classifies tens of billions of actions per day
▪ Facebook + Instagram
▪ Sigma is a rule engine
▪ For each action type, evaluate a set of rules
▪ Rules can block or take other action
▪ Manual + machine learned rules
▪ Rules can be updated live
▪ Highly effective at eliminating spam, malware, malicious URLs, etc. etc.
![Page 8: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/8.jpg)
How do we define rules?
![Page 9: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/9.jpg)
Example
▪ Fanatics are spamming their friends with posts about Functional
Programming!
▪ Let’s fix it!
![Page 10: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/10.jpg)
Example
▪ We want a rule that says
▪ If the person is posting about Functional Programming
▪ And they have >100 friends
▪ And more than half of their friends like C++
▪ Then block, else allow
Need info about the content
Need to fetch the friend list
Need info about each friend
![Page 11: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/11.jpg)
fpSpammer :: Haxl BoolfpSpammer =
Our rule, in Haskell
▪ Haxl is a monad
▪ “Haxl Bool” is the type of a computation that may:
▪ do data-fetching
▪ consult input data
▪ maybe throw exceptions
▪ finally, return a Bool
![Page 12: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/12.jpg)
fpSpammer :: Haxl BoolfpSpammer = talkingAboutFPwheretalkingAboutFP = strContains “Functional Programming” <$> postContent
Our rule, in Haskell
▪ postContent is part of the input (say)
postContent :: Haxl Text
![Page 13: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/13.jpg)
fpSpammer :: Haxl BoolfpSpammer = talkingAboutFP .&&numFriends .> 100 wheretalkingAboutFP =
strContains “Functional Programming” <$> postContent
Our rule, in Haskell
(.&&) :: Haxl Bool -> Haxl Bool -> Haxl Bool(.>) :: Ord a => Haxl a -> Haxl a -> Haxl BoolnumFriends :: Haxl Int
![Page 14: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/14.jpg)
fpSpammer :: Haxl BoolfpSpammer = talkingAboutFP .&&numFriends .> 100 .&&friendsLikeCPlusPluswheretalkingAboutFP =
strContains “Functional Programming” <$> postContent
friendsLikeCPlusPlus = dofriends <- getFriendscppFriends <- filterM likesCPlusPlus friendsreturn (length cppFriends >= length friends `div` 2)
Our rule, in Haskell
![Page 15: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/15.jpg)
Observations
▪ Our language is Haskell + libraries
▪ Embedded Domain-Specific Language (EDSL)
▪ Users can pick up a Haskell book and learn about it
▪ Tradeoff: not exactly the syntax we might have chosen, but we get to
take advantage of existing tooling, documentation etc.
▪ Focus on expressing functionality concisely, avoid operational details
▪ “pure” semantics
▪ no side effects – easy to reason about
▪ scope for automatic optimisation
![Page 16: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/16.jpg)
Efficiency
▪ Rules are data + computation
▪ Fetching remote data can be slow
▪ Latency is important!
▪ We’re on the clock: the user is waiting
▪ So what about efficiency?
![Page 17: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/17.jpg)
Fetching data efficiently is all that matters.
![Page 18: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/18.jpg)
1. Fetch only the data you need to make a decision
2. Fetch data concurrently whenever possible
Let’s deal with (1) first.
![Page 19: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/19.jpg)
Example
▪ We want a rule that says
▪ If the person is posting about Functional Programming
▪ And they have >100 friends
▪ And more than half of their friends like C++
▪ Then block, else allow
▪ Avoid slow checks if fast checks already determine the answer
Fast
Slow
Very slow
![Page 20: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/20.jpg)
fpSpammer :: Haxl BoolfpSpammer = talkingAboutFP .&&numFriends .> 100 .&&friendsLikeCPlusPluswheretalkingAboutFP =
strContains “Functional Programming” <$> postContent
friendsLikeCPlusPlus = dofriends <- getFriendscppFriends <- filterM likesCPlusPlus friendsreturn (length cppFriends >= length friends `div` 2)
.&& is short-cutting
▪ Programmer is responsible for getting the order right
▪ (tooling helps with this)
![Page 21: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/21.jpg)
We can speculate
fpSpammer :: Haxl BoolfpSpammer =
talkingAboutFP .&& do a <- numFriends .> 100
b <- friendsLikeCPlusPlusreturn (a && b)
wheretalkingAboutFP =
strContains “Functional Programming” <$> postContent
friendsLikeCPlusPlus = dofriends <- getFriendscppFriends <- filterM likesCPlusPlus friendsreturn (length cppFriends >= length friends `div` 2)
avoid shortcutting behaviour by
explicitly evaluating both
conditions
![Page 22: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/22.jpg)
Concurrency
▪ Multiple independent data-fetch requests must be executed
concurrently and/or batched
▪ Traditional languages and frameworks make the programmer deal with
this
▪ threads, futures/promises, async, callbacks, etc.
▪ Hard to get right
▪ Our users don’t care
▪ Clutters the code
▪ Hard to refactor later
![Page 23: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/23.jpg)
Haxl’s advantage
▪ Because our language has no side effects, the framework can handle
concurrency automatically
▪ We can exploit concurrency as far as data dependencies allow
▪ The programmer doesn’t need to think about it
friendsLikeCPlusPlus = dofriends <- getFriendscppFriends <- filterM likesCPlusPlus friends...
getFriends
likesCPlusPluslikesCPlusPlu
slikesCPlusPluslikesCPlusPlu
slikesCPlusPlus
![Page 24: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/24.jpg)
numCommonFriends a b = dofa <- friendsOf afb <- friendsOf breturn (length (intersect fa fb))
friendsOf a friendsOf b
length (intersect ...)
![Page 25: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/25.jpg)
How does Haxl work?
![Page 26: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/26.jpg)
Step 1
▪ Haxl is a Monad
▪ The implementation of (>>=) will allow the computation to block,
waiting for data.
data Result a= Done a| Blocked (Seq BlockedRequest) (Haxl a)
newtype Haxl a = Haxl { unHaxl :: IO (Result a) }
Done indicates that we have
finished
Blocked indicates that the computation requires this
data.
Haxl may need to do IO
This is the result of a
computation
![Page 27: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/27.jpg)
Monad instance
instance Monad Haxl wherereturn a = Haxl $ return (Done a)
Haxl m >>= k = Haxl $ dor <- mcase r of
Done a -> unHaxl (k a)Blocked br c -> return (Blocked br (c >>= k))
If m blocks with continuation c, the continuation for m >>= k is c >>= k
![Page 28: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/28.jpg)
So far we can only block on one data-fetch
• Our example will block on the first friendsOf request:
• How do we allow the Monad to collect multiple data-fetches, so we
can execute them concurrently?
numCommonFriends a b = dofa <- friendsOf afb <- friendsOf breturn (length (intersect fa fb))
blocks here
![Page 29: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/29.jpg)
First, rewrite to use Applicative operators
▪ Applicative is a weaker version of Monad
▪ When we use Applicative, Haxl can collect multiple data fetches and
execute them concurrently.
numCommonFriends a b =length <$> (intersect <$> friendsOf a <*> friendsOf b)
class Applicative f wherepure :: a -> f a(<*>) :: f (a -> b) -> f a -> f b
class Monad m wherereturn :: a -> m a(>>=) :: m a -> (a -> m b) -> m b
![Page 30: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/30.jpg)
Applicative instance
instance Applicative Haxl wherepure = return
Haxl f <*> Haxl x = Haxl $ dof' <- fx' <- xcase (f',x') of
(Done g, Done y ) -> return (Done (g y))(Done g, Blocked br c ) -> return (Blocked br (g <$> c))(Blocked br c, Done y ) -> return (Blocked br (c <*> return y))(Blocked br1 c, Blocked br2 d) -> return (Blocked (br1 <> br2) (c <*> d))
▪ <*> allows both arguments to block waiting for data
▪ <*> can be nested, letting us collect an arbitrary number of data
fetches to execute concurrently
![Page 31: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/31.jpg)
(Some) Concurrency for free
▪ Applicative is a standard class in Haskell
▪ Lots of library functions are already defined using it
▪ These work concurrently when used with Haxl
▪ e.g.sequence :: Monad m => [m a] -> m [a]mapM :: Monad m => (a -> b) -> m [a] -> m [b]filterM :: Monad m => (a -> m Bool) -> [a] -> m [a]
friendsLikeCPlusPlus = dofriends <- getFriendscppFriends <- filterM likesCPlusPlus friends...
![Page 32: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/32.jpg)
Back to our example
▪ These behave the same:
▪ Data dependencies tell us we can translate one into the other
numCommonFriends a b = dofa <- friendsOf afb <- friendsOf breturn (length (intersect fa fb))
numCommonFriends a b =length <$> (intersect <$> friendsOf a <*> friendsOf b)
This is the version we want
to run
This is the version we want
to write
![Page 33: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/33.jpg)
Applicative Do
▪ We implemented this transformation in the compiler
▪ Users just turn it on:
▪ ... and get automatic concurrency/batching when using “do”
▪ Semantics-preserving with existing code (if certain standard properties
hold), but provides better performance in some cases
▪ Extension pushed upstream to GHC (17/9/2015), will be in 8.0.1
{-# LANGUAGE ApplicativeDo #-}
![Page 34: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/34.jpg)
Does this work in practice?
▪ In Sigma, our most common request executes
hundreds of fetches in under ten rounds.
▪ Performance problems that come up in code reviews
(and production) tend to be about fetching too much
data, almost never about concurrency
![Page 35: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/35.jpg)
Caching
![Page 36: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/36.jpg)
Caching
▪ In addition to concurrency, caching is crucial.
▪ “fetch only the data you need”
▪ e.g. if we have “numFriendsOf x” in two places in the codebase, we
should only fetch it once
▪ programmers are free to fetch data wherever, without fear of
duplication, or having to plumb data around
▪ Means that we could safely common-up identical expressions
friendCondition x y = donumFriendsOf x .> 100 .&&numFriendsOf x .> numFriendsOf y
![Page 37: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/37.jpg)
Side benefit of caching: replayability
▪ cache is a complete record of what was fetched.
▪ If we save the cache, we can re-run the request at any time without
fetching any data.
▪ Keep it as a unit test
▪ Debug an error that occurred in production
▪ Analyse performance
![Page 38: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/38.jpg)
But caching by itself isn’t enough.
▪ We often want to re-use decisions in multiple places.
▪ Clear and concise... but we’re recomputing fpSpammer
▪ Caching will avoid re-fetching everything that fpSpammer needs
▪ But it might be expensive to recompute
▪ Answer: memoization
▪ Everything top-level gets automatically memoized
▪ Manual memoization available for other things
anySpammer =fpSpammer .|| otherSpamType .|| ...
![Page 39: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/39.jpg)
Haxl project: experience
![Page 40: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/40.jpg)
The Haxl project
▪ We had an existing system and home-grown DSL, called FXL, and lots
of code written in it
▪ Started April 2013
▪ By July 2015 we had deleted all the FXL code and replaced it with
Haskell, and trained our engineers to use Haskell.
![Page 41: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/41.jpg)
1. Existing source code
▪ hundreds of thousands of lines of existing FXL code
▪ Impractical and error-prone to translate code by hand
▪ Wrote a tool to do the migration
▪ Source code still FXL (during the migration), run the tool each time the
code changes.
FXL Code Haskell CodeAutomatic translation
![Page 42: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/42.jpg)
2. Migrating running code to Haskell
![Page 43: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/43.jpg)
2. Migrating running code to Haskell
▪ hundreds of different requests (one for each write action)
▪ had to ensure that each one:
▪ performed well enough
▪ gave exactly the same answers
▪ (otherwise we introduce false positives/negatives)
▪ As each request type is ready, we want to switch it over to running on
Haskell in production
![Page 44: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/44.jpg)
![Page 45: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/45.jpg)
At scale, the edge cases happen all the time
▪ Invalid Unicode
▪ Invalid arguments to primitives & library functions
▪ Exception behaviour, values of exceptions
▪ Is “NaN” a valid number in JSON? What about “infinity”?
▪ Floating-point:
▪ round 0.5
▪ printing floating-point values
▪ divide-by-zero throws in FXL
▪ semantics of \s in regex with Unicode
▪ etc. etc.
![Page 46: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/46.jpg)
3. Migrating the users
▪ Dozens of FXL users in multiple
geographical locations
▪ Wrote a lot of teaching material
▪ Ran multi-day hands-on workshops
▪ Created internal Facebook group for
questions (“Haxl Therapy”)
▪ Haxl team helped with code reviews
![Page 47: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/47.jpg)
How did users cope with the switch?
▪ Still committing happily
▪ Some struggles with do-notation vs. fmap, <$>, =<<
▪ “How do I convert a Haxl t to t?”
▪ Users started embracing the new features
▪ Started building abstractions, adding types, creating tests
▪ Unblocks some large-scale rewrites and redesigns of subsystems
![Page 48: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/48.jpg)
Does it work?
![Page 49: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/49.jpg)
▪ Is it stable enough?
▪ Is performance good enough?
▪ Did we have to hack the compiler?
▪ What about build systems, packages, cabal hell, etc. etc.
▪ How do we do live updates?
![Page 50: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/50.jpg)
Sigma (C++)
FXLEngine (C++)
Data Sources (C++)
Thrift
Client Code (FXL)
Sigma (C++)
Data Sources (C++)
Thrift
Client Code (Haskell)
Execution layer (Haskell)
Haxl framework, libraries (Haskell)
Haxl Data Sources (Haskell/C++)
![Page 51: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/51.jpg)
▪ There was a multi-year-old bug in the GC that caused our machines
to crash every few hours under constant load.
▪ This is the only runtime bug we’ve found
▪ (we found one more that only affected shutdown)
Rolled out new release with GC fix
Found one bug in the GHC garbage collector
![Page 52: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/52.jpg)
Stability
▪ The Haskell code just doesn’t crash. (*)
▪ which is good, because diagnosing a crash in Haskell is Very Hard
▪ (*) except for FFI code
▪ One FFI-related memory bug was particularly hard to find
![Page 53: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/53.jpg)
Performance
![Page 54: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/54.jpg)
Performance
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
RE
LA
TIV
E P
ER
FO
RM
AN
CE
REQUEST TYPE
Haxl performance vs. FXL
FXL Haxl
Overall: 30% better throughput for typical
workload
![Page 55: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/55.jpg)
Good monitoring is essential
▪ Noisy environment:
▪ multiple sources of changes
▪ dependencies on other services
▪ traffic isn’t constant
▪ spam/malware attacks are bursty
▪ Hooked up GHC’s garbage collector stats API to Facebook’s
monitoring infrastructure
![Page 56: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/56.jpg)
Resource limits
▪ Server resources are finite, and we have latency bounds
▪ Our job is to keep the system responsive, so we cannot allow
individual requests to hog resources
▪ Fact of life: individual requests will sometimes hog resources if allowed
to
▪ Usually: doing some innocuous operation on untypically large data
▪ e.g. regex engines sometimes go out to lunch
![Page 57: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/57.jpg)
Allocation limits
▪ So we implemented allocation limits in GHC
▪ Counter is per-thread, ticks down with memory allocation
▪ Triggers an exception when the counter reaches zero
▪ Easy in Haskell, very difficult in C++
setAllocationCounter :: Int64 -> IO ()getAllocationCounter :: IO Int64
enableAllocationLimit :: IO ()disableAllocationLimit :: IO ()
![Page 58: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/58.jpg)
How important are allocation limits?
one heavyweight request was enabledallocation limits enabled
![Page 59: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/59.jpg)
GC vs. latency
▪ GHC has a stop-the-world parallel collector
▪ obviously to meet our latency goals we cannot GC for too long
![Page 60: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/60.jpg)
GC vs. latency
▪ So how do we manage this?
▪ Fixed number of worker threads + allocation limits
▪ effectively puts a bound on the amount of work we are doing at any
given time
▪ Very little persistent state (a few MB).
▪ A handful of GC improvements, all upstreamed
![Page 61: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/61.jpg)
Hot-swapping code
![Page 62: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/62.jpg)
Fast deployment
▪ The faster we can get new rules into production, the more spam we
catch before people see it, the faster we stop viral malware, etc.
▪ (Not all changes need immediate deployment: code review is the
norm)
▪ “Code in the repo is what is running in production”
▪ Deployment typically on the order of a few minutes
![Page 63: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/63.jpg)
How can we deploy new code?
▪ Haskell has an optimising compiler, runs native code
▪ Haskell code needs to be compiled and distributed to servers
▪ Servers need to start running the new code somehow
![Page 64: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/64.jpg)
Restarting the process doesn’t work
▪ Takes a while to start up
▪ Caches would be cold
▪ A rolling restart would take too long
![Page 65: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/65.jpg)
Live updates
Sigma (C++)
Data Sources (C++)
Client Code (Haskell)
Execution layer (Haskell)
Haxl framework, libraries (Haskell)
Haxl Data Sources (Haskell/C++)
New Client Code (Haskell)
![Page 66: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/66.jpg)
Live Updates
▪ Main idea
▪ load the new code directly into the running process
▪ needs a dynamic linker
▪ Start taking requests using the new code
▪ When all requests running on the old code have finished, remove it
from the process
▪ GHC’s runtime has a built-in linker
▪ We added support for unloading objects with GC integration
![Page 67: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/67.jpg)
Other things
![Page 68: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/68.jpg)
Don’t have time to talk about...
▪ Build systems
▪ (we use Stackage LTS + cabal + FB build system)
▪ FFI with C++
▪ Wrote a tool to mangle C++ function names
▪ Don’t forget: catch all your exceptions in C++
▪ Customized GHCi
▪ we built our own customized GHCi that uses the Haxl monad by
default, and has Hoogle integration for our codebase
![Page 69: Fighting Spam with Haskell - Multicore Programming Groupmulticore.doc.ic.ac.uk/iPr0gram/slides/2015-2016/Marlow-fighting-sp… · Observations Our language is Haskell + libraries](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f95d721d822f00f7c4fa4b4/html5/thumbnails/69.jpg)
Questions?
The Haxl Team, past and present
Jonathan Coens
Bartosz Nitka
Aaron Roth
Kubo Kováč
Katie Miller
Jon Purdy
Zejun Wu
Jake Lengyel
Andrew Farmer
Louis Brandy
Noam Zilberstein
Simon Marlow