erlang at hover.in , Devcamp Blr 09

download erlang at hover.in , Devcamp Blr 09

If you can't read please download the document

Transcript of erlang at hover.in , Devcamp Blr 09

  1. 1. when erlang makes sense ( erlang at hover.in ) http://developers.hover.in Bhasker V Kode co-founder& CTO at hover.in at devCamp, Bangalore Apr 11, 2009
  2. 2. brief introduction to hover.in choose words from your blog, & decide what content / ad you want when you hover* over it * or other events like click,right click,etc http://developers.hover.in
  3. 3. brief introduction to hover.in or... the worlds first publisher drivenin-text content & ad delivery platform... http://developers.hover.in
  4. 4. brief introduction to hover.in or lets web publishers push client-side event handling to the cloud ,to run various rich applications calledhoverlets http://developers.hover.in
  5. 5.
    • hover.in founded late 2007
    http://developers.hover.in http://developers.hover.in
  6. 6.
    • hover.in founded late 2007
    • the web ~ 10- 20 years old
    http://developers.hover.in http://developers.hover.in
  7. 7.
    • hover.in founded late 2007
    • the web ~ 10- 20 years old
    • humans 100's of thousands of years
    http://developers.hover.in http://developers.hover.in
  8. 8.
    • hover.in founded late 2007
    • the web ~ 10- 20 years old
    • humans 100's of thousands of years
    • butbacteria .... around for millions of years ... so this talk is going to be about what we can learn from bacteria, the brain , and memory in a concurrent world
    http://developers.hover.in http://developers.hover.in
  9. 9. where are we heading
    • the C10k problem & beyond
    • disk , hosting getting comparitively cheaper
    • horizontal scaling, moving beyond a single node
    • era of multi-core computing, how do we utilize more cpu's packed into each chip
    • concurrent programming
    http://developers.hover.in
  10. 10. erlang
    • functional programming language , ideal for concurrent,distributed, fault-tolerant applications
    • relies on message passing
    • immutable state, .erl files compiled to .beam
    • can explicitly decide if tasks need to done:
      • syncronously or asychronously
      • in parallel (concurrent) , or distributed (multi-node)
    http://developers.hover.in
  11. 11. demo
    • >
    • pong.
    • now that they recognize each other , tryrpc:call (TargetNode,Module,Fn,Args). spawn ( Node , Fn )spawn ( Node, Module, Fn, Args ) .
    http://developers.hover.in
  12. 12. this is how erlang looks like...
    • lists: foldl ( fun( X , Accumulator ) -> Rem = X rem 2,case Rem of 0 ->Accumulator ++ [X]; _ ->Accumulator
    • end,[ ], lists: seq( 1, 10 )).[2,4,6,8,10]
    http://developers.hover.in
  13. 13. bacteria that exhibit group dynamics
    • bacteria peforms group operation like a lists fold operation
    • each bacteria spawns its own set of proteins, when only when theAccumulatoris > some threshhold, will group dynamics of making light (bioluminiscence) kick ( eg: deep sea animals)
    • All bacteria have some sort ofsome presence & replies associated which are queried
    http://developers.hover.in
  14. 14.
    • in erlang, you can create a new concurrent process that evaluates a Fun.
        • Pid = spawn (fun() ->%% do somethingend )
    • Pid !Msg , message sending is asynchronous
    • erlang can be set to utilize smp support
    http://developers.hover.in
  15. 15. what does smp mean
    • SMP (Structured Message Passing) supports the dynamic construction of process families that communicate through asynchronous messages
    • Process families can be connected together
    • Each process can communicate with its parent, its children, and a subset of its siblngs, as specified by the family topology
    http://developers.hover.in
  16. 16. pattern matching
    • each molecule connects to its specific receptor protein to complete the missing piece,to ingite the group behaviour that are only succesful when all of the cells participate in unison.
    • Type = case UserType ofuser -> true; admin -> true; _Else-> false end
    http://developers.hover.in
  17. 17. binary pattern matching
    • Words= [ , , ],
    • Suggest= fun(UserTyped)-> lists:foldl ( fun(X, Acc) -> Size = Get_size(UserTyped), %% 8bits per char case UserTyped of -> Acc ++ X; _Else > Accend end , [] , Words ) end) , Suggest()gives[,]
    http://developers.hover.in
  18. 18. fault-tolerance
    • father of DNA, says that all humans have 10 places in our genome, where we have lost one or gained an another one
    • erlang catches timeouts for receiving a reply once you spawn a process, you can monitor it & link errors, or use the erlang/OTP architecture to supervise process's and set restart strategies
    • enabled ericcson to run with 9 9's uptime (99.999999999 %)
    http://developers.hover.in
  19. 19. fault tolerance in the real world
    • for a singlegoogle search result , the same requests are sent to multiple machines( ~1000 as of 09), which ever replies the quickest wins.
    • inamazon's dynamo architecturethat powers S3, use a (3,2,2) rule . ie Maintain 3 copies of the same data, reads/writes are succesful only when 2 concurrent requests succeed. This ratio varies based on SLA, internal vs public service. (more on conflict resolution...)
    http://developers.hover.in
  20. 20. inter-species communication
    • if you look at your skin consists of very many different species, but all bacteria found to communicate using one common chemical language.
    http://developers.hover.in
  21. 21. inter-species communication
    • if you look at your skin consists of very many different species, but all bacteria found to communicate using one common chemical language.hmmmmmmmmmmmmmmmmmmm.............. ....serialization ?! ....a common protein interpret ?! ....or perhaps just in time protein compilation?!
    http://developers.hover.in
  22. 22. interspecies comm. in the real world
    • attempts atserialization, cross language communication include:
      • thrift( by facebook)
      • protocol buffers( by google)
      • base64 en/decoding , port based communication ( erlangpython at hover.in )
    • as for interspecies communication look no further:
    http://developers.hover.in
  23. 23. interspecies comm. in the real world
    • as for interspecies communication in the real world... look no further: JRuby !!!!IronPython!!!GWT !!!...& coming to a theatre near you this..... FortScala ???FlashTheApple++ ??? VisualJavaTranScriptual ????
    http://developers.hover.in
  24. 24. talking about scaling
    • The brain of the worker honeybee weighs about 1mg, the total number of neurons in its brain is estimated to be 950,000
    • Flies acrobatically , ecognizes patterns, navigates , norages , communicates
    • Energy consumption: 1015 J/op, at least 106 more efficient than digital silicon
    http://developers.hover.in
  25. 25. the human brain
    • 100 billion neurons, stores ~100 TB
    • Differential analysis e.g., we compute color
    • Multiple inputs: sight, sound, taste, smell, touch
    • Facial recognition subcircuits, peripheral vision
    • in essence- the left & right brain vary in: left -> persistent disk , handles past/future right -> temporal caches! , handles present
    http://developers.hover.in
  26. 26. what we can learn
    • whysharding datais critical in concurrent programming ( DB tx'ion locks )
    • implementingflowcontrol ( eg: memory game)
    • event based alarms, timeouts,supervised workers
    • wrt performance/scaling , you can't improve what you cantmeasure
    http://developers.hover.in
  27. 27. working with data concurrently?
    • typical web backends all user data in one table then clustering justsplits that by artibary basis. Query content table where user=user1,
    • what if you have N concurrentprocess's accessing N diff user tables no locks, you can parallize since sufficiently un-related.
    • same with mapreduce algo's, if the data issufficientlyunrelated, then parallelized easy, results can come back asynchronously
    http://developers.hover.in
  28. 28. retrieving data concurrently?
    • replicationvslocation transparency, are they fragmented, are some nodes read-only ? (rpc...)
    • need metadata for which node to acess for user1, (or use hashing fn like memcache)
    • are tables in-memory (right brain ), cached from disk , or on disk alone ( left brain )
    • mnesia, erlang's inbuilt ~database lets you make highly granular choices
    http://developers.hover.in
  29. 29. measurement
    • you can't improve what you can't measure.
    • introducing theheat-seeking algoat hover.in
    • usingtsung(written in erlang again ) load performance testing tool, for simulating 100's of concurrent users/requests , and great for analysing bottlenecks of your system, benchmarking (content delivery networks (CDN's ) , etc
    http://developers.hover.in
  30. 30. built-in datatypes
    • atoms, integers, floats, tuples, lists
    • binary
    • dict
    • queues, sets , ord_sets
    • gb_trees
    • and unlike mysql, you can store complex datatypes into mnesia , and pattern match them
  31. 31. temporal data
    • erlang/OTP comes with building blocks for making granular choices in data structures:
      • gen_servers -> client/server architecture for process's
      • gen_event -> event based
      • get_fsm -> finite state machine , etc
    • move over db, files. A single erlang process spawnedcan hold state. we use it to build own own set/get based cache workers.
    http://developers.hover.in
  32. 32. temporal data in the real world
    • you listen to a phone number in batche of 3 or 4 digits. the part that absorbs just before writing (temporal), until you write into your contact book or memorize it ( persistent)
    • a smart way of building counters that move ultra-fast in erlang, would be a gen_server that resides in-memory, accepting requests via a flowcontrol , getting a new state , and then writing to disc/db when conveneient.
    http://developers.hover.in
  33. 33. more on sychcronous,asynchronous
    • gen_sever's let you makesynchonouscall's that are blocking ( wait till it returns with the result ), and can catch timeouts, restarts,etc.
    • or can be non-blocking asynchronouscasts ( send the instruction of sending a mail,and return to thank you page immediately, dont wait for the mail to be sent.
    • ofcourse or you use the spawn,pid to write your own implementation of gen_*'s
    http://developers.hover.in
  34. 34. flowcontrol
    • queues to handle intents of reads/writes, to determine bottlenecks.(eg ur own,rabbitmq,etc )
    • eg1:when we addjobs to the queue, if it takes greater than X consistently we move it to high traffic bracket, do things differently, possibly add workersorignore based on the task.
    • eg2:amazon shopping carts, are known to be extra resilient to write failures, (dont mind multiple versions of them over time)
    http://developers.hover.in
  35. 35. supervisors, workers
    • as bacteria grow, they split into two. when muscle tears, it knows exactly what to replace.
    • erlang supervisors can decide restart policies: if one worker fails, restart all .... or if one worker fails, restart just that worker, more tweaks.
    • can spawn multiple workers on the fly, much like the need for launching a new ec2 instant
    http://developers.hover.in
  36. 36. how do you know where to start/stop
    • build your ownbacteria antidote ,stress tests to see, on your typical production server :
      • how many process's can u create, how many open file sockets can you have,( system limitations, tweak )
      • how mant tables can you store,
      • how many rows will it take before it getsslow ( time to fragment )
      • learn from the brain, gr8 videos on ted.com
  37. 37. summary of tech at hover.in
    • 3 node cluster (64-bit 4gb )on the LYME stack
    • python crawler, associated NLP parsers, mini metadata interpreter in client-side js , maybe moving to spidermonkey
    • remote node debugger/handler, flowcontrol, heat-seeking, cpu time-splicing algo's, headless-firefox for thumbnails queue
    • touching 500k hovers/month in Apr'09 , upwards of 25million S3 GET requests/month
    http://developers.hover.in
  38. 38. summary of our erlang modules
    • rewrites.erl error.erl frag_mnesia.erl hi_api_response.erlhi_appmods_api_user.erlhi_cache_app.erl , hi_cache_sup.erlhoverlingo.erl hi_cache_worker.erlhi_classes.erlhi_community.erl hi_cron_hoverletupdater_app.erlhi_cron_hoverletupdater.erl hi_cron_hoverletupdater_sup.erlhi_cron_kwebucket.erlhi_crypto.erl hi_flowcontrol_hoverletupdater.erl hi_htmlutils_site.erl hi_hybridq_app.erl hi_hybridq_sup.erl hi_hybridq_worker.erl hi_login.erl hi_mailer.erl hi_messaging_app.erl hi_messaging_sup.erl hi_messaging_worker.erl hi_mgr_crawler.erl hi_mgr_db_console.erl hi_mgr_db.erl hi_mgr_db_mnesia.erl hi_mgr_hoverlet.erl hi_mgr_kw.erl hi_mgr_node.erl hi_mgr_thumbs.erl hi_mgr_traffic.erl hi_nlp.erl hi_normalizer.erl hi_pagination_app.erlhi_pagination_sup.erl, hi_pagination_worker.erl hi_pmap.erlhi_register_app.erl hi_register.erl, hi_register_sup.erl, hi_register_worker.erl hi_render_hoverlet_worker.erl hi_rrd.erl , hi_rrd_worker.erl hi_settings.erlhi_sid.erl hi_site.erl hi_stat.erl hi_stats_distribution.erl hi_stats_overview.erl hi_str.erl hi_trees.erl hi_utf8.erl hi_yaws.erl
    http://developers.hover.in
  39. 39. thank you http://developers.hover.in
  40. 40. references
    • http://hover.in
    • http://erlang.org
    • www.kegel.com/c10k.html
    • http://memcached.org
    • http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
    • all the amazing brain-related talks athttp://ted.com,
    • shoutout to everyone at #erlang !
    http://developers.hover.in