1. when erlang makes sense ( erlang at hover.in )
http://developers.hover.in Bhasker V Kode co-founder& CTO at
hover.in at devCamp, Bangalore Apr 11, 2009
2. brief introduction to hover.in choose words from your blog,
& decide what content / ad you want when you hover* over it *
or other events like click,right click,etc
http://developers.hover.in
3. brief introduction to hover.in or... the worlds first
publisher drivenin-text content & ad delivery platform...
http://developers.hover.in
4. brief introduction to hover.in or lets web publishers push
client-side event handling to the cloud ,to run various rich
applications calledhoverlets http://developers.hover.in
butbacteria .... around for millions of years ... so this talk
is going to be about what we can learn from bacteria, the brain ,
and memory in a concurrent world
now that they recognize each other , tryrpc:call
(TargetNode,Module,Fn,Args). spawn ( Node , Fn )spawn ( Node,
Module, Fn, Args ) .
http://developers.hover.in
12. this is how erlang looks like...
lists: foldl ( fun( X , Accumulator ) -> Rem = X rem 2,case
Rem of 0 ->Accumulator ++ [X]; _ ->Accumulator
end,[ ], lists: seq( 1, 10 )).[2,4,6,8,10]
http://developers.hover.in
13. bacteria that exhibit group dynamics
bacteria peforms group operation like a lists fold
operation
each bacteria spawns its own set of proteins, when only when
theAccumulatoris > some threshhold, will group dynamics of
making light (bioluminiscence) kick ( eg: deep sea animals)
All bacteria have some sort ofsome presence & replies
associated which are queried
http://developers.hover.in
14.
in erlang, you can create a new concurrent process that
evaluates a Fun.
Pid = spawn (fun() ->%% do somethingend )
Pid !Msg , message sending is asynchronous
erlang can be set to utilize smp support
http://developers.hover.in
15. what does smp mean
SMP (Structured Message Passing) supports the dynamic
construction of process families that communicate through
asynchronous messages
Process families can be connected together
Each process can communicate with its parent, its children, and
a subset of its siblngs, as specified by the family topology
http://developers.hover.in
16. pattern matching
each molecule connects to its specific receptor protein to
complete the missing piece,to ingite the group behaviour that are
only succesful when all of the cells participate in unison.
Type = case UserType ofuser -> true; admin -> true;
_Else-> false end
http://developers.hover.in
17. binary pattern matching
Words= [ , , ],
Suggest= fun(UserTyped)-> lists:foldl ( fun(X, Acc) ->
Size = Get_size(UserTyped), %% 8bits per char case UserTyped of
-> Acc ++ X; _Else > Accend end , [] , Words ) end) ,
Suggest()gives[,]
http://developers.hover.in
18. fault-tolerance
father of DNA, says that all humans have 10 places in our
genome, where we have lost one or gained an another one
erlang catches timeouts for receiving a reply once you spawn a
process, you can monitor it & link errors, or use the
erlang/OTP architecture to supervise process's and set restart
strategies
enabled ericcson to run with 9 9's uptime (99.999999999 %)
http://developers.hover.in
19. fault tolerance in the real world
for a singlegoogle search result , the same requests are sent
to multiple machines( ~1000 as of 09), which ever replies the
quickest wins.
inamazon's dynamo architecturethat powers S3, use a (3,2,2)
rule . ie Maintain 3 copies of the same data, reads/writes are
succesful only when 2 concurrent requests succeed. This ratio
varies based on SLA, internal vs public service. (more on conflict
resolution...)
http://developers.hover.in
20. inter-species communication
if you look at your skin consists of very many different
species, but all bacteria found to communicate using one common
chemical language.
http://developers.hover.in
21. inter-species communication
if you look at your skin consists of very many different
species, but all bacteria found to communicate using one common
chemical language.hmmmmmmmmmmmmmmmmmmm..............
....serialization ?! ....a common protein interpret ?! ....or
perhaps just in time protein compilation?!
http://developers.hover.in
22. interspecies comm. in the real world
attempts atserialization, cross language communication
include:
thrift( by facebook)
protocol buffers( by google)
base64 en/decoding , port based communication ( erlangpython at
hover.in )
as for interspecies communication look no further:
http://developers.hover.in
23. interspecies comm. in the real world
as for interspecies communication in the real world... look no
further: JRuby !!!!IronPython!!!GWT !!!...& coming to a theatre
near you this..... FortScala ???FlashTheApple++ ???
VisualJavaTranScriptual ????
http://developers.hover.in
24. talking about scaling
The brain of the worker honeybee weighs about 1mg, the total
number of neurons in its brain is estimated to be 950,000
in essence- the left & right brain vary in: left ->
persistent disk , handles past/future right -> temporal caches!
, handles present
http://developers.hover.in
26. what we can learn
whysharding datais critical in concurrent programming ( DB
tx'ion locks )
implementingflowcontrol ( eg: memory game)
event based alarms, timeouts,supervised workers
wrt performance/scaling , you can't improve what you
cantmeasure
http://developers.hover.in
27. working with data concurrently?
typical web backends all user data in one table then clustering
justsplits that by artibary basis. Query content table where
user=user1,
what if you have N concurrentprocess's accessing N diff user
tables no locks, you can parallize since sufficiently
un-related.
same with mapreduce algo's, if the data
issufficientlyunrelated, then parallelized easy, results can come
back asynchronously
http://developers.hover.in
28. retrieving data concurrently?
replicationvslocation transparency, are they fragmented, are
some nodes read-only ? (rpc...)
need metadata for which node to acess for user1, (or use
hashing fn like memcache)
are tables in-memory (right brain ), cached from disk , or on
disk alone ( left brain )
mnesia, erlang's inbuilt ~database lets you make highly
granular choices
http://developers.hover.in
29. measurement
you can't improve what you can't measure.
introducing theheat-seeking algoat hover.in
usingtsung(written in erlang again ) load performance testing
tool, for simulating 100's of concurrent users/requests , and great
for analysing bottlenecks of your system, benchmarking (content
delivery networks (CDN's ) , etc
http://developers.hover.in
30. built-in datatypes
atoms, integers, floats, tuples, lists
binary
dict
queues, sets , ord_sets
gb_trees
and unlike mysql, you can store complex datatypes into mnesia ,
and pattern match them
31. temporal data
erlang/OTP comes with building blocks for making granular
choices in data structures:
gen_servers -> client/server architecture for process's
gen_event -> event based
get_fsm -> finite state machine , etc
move over db, files. A single erlang process spawnedcan hold
state. we use it to build own own set/get based cache workers.
http://developers.hover.in
32. temporal data in the real world
you listen to a phone number in batche of 3 or 4 digits. the
part that absorbs just before writing (temporal), until you write
into your contact book or memorize it ( persistent)
a smart way of building counters that move ultra-fast in
erlang, would be a gen_server that resides in-memory, accepting
requests via a flowcontrol , getting a new state , and then writing
to disc/db when conveneient.
http://developers.hover.in
33. more on sychcronous,asynchronous
gen_sever's let you makesynchonouscall's that are blocking (
wait till it returns with the result ), and can catch timeouts,
restarts,etc.
or can be non-blocking asynchronouscasts ( send the instruction
of sending a mail,and return to thank you page immediately, dont
wait for the mail to be sent.
ofcourse or you use the spawn,pid to write your own
implementation of gen_*'s
http://developers.hover.in
34. flowcontrol
queues to handle intents of reads/writes, to determine
bottlenecks.(eg ur own,rabbitmq,etc )
eg1:when we addjobs to the queue, if it takes greater than X
consistently we move it to high traffic bracket, do things
differently, possibly add workersorignore based on the task.
eg2:amazon shopping carts, are known to be extra resilient to
write failures, (dont mind multiple versions of them over
time)
http://developers.hover.in
35. supervisors, workers
as bacteria grow, they split into two. when muscle tears, it
knows exactly what to replace.
erlang supervisors can decide restart policies: if one worker
fails, restart all .... or if one worker fails, restart just that
worker, more tweaks.
can spawn multiple workers on the fly, much like the need for
launching a new ec2 instant
http://developers.hover.in
36. how do you know where to start/stop
build your ownbacteria antidote ,stress tests to see, on your
typical production server :
how many process's can u create, how many open file sockets can
you have,( system limitations, tweak )
how mant tables can you store,
how many rows will it take before it getsslow ( time to
fragment )
learn from the brain, gr8 videos on ted.com
37. summary of tech at hover.in
3 node cluster (64-bit 4gb )on the LYME stack
python crawler, associated NLP parsers, mini metadata
interpreter in client-side js , maybe moving to spidermonkey
remote node debugger/handler, flowcontrol, heat-seeking, cpu
time-splicing algo's, headless-firefox for thumbnails queue
touching 500k hovers/month in Apr'09 , upwards of 25million S3
GET requests/month