Understanding big data-drupalcamp

download Understanding big data-drupalcamp

If you can't read please download the document

Transcript of Understanding big data-drupalcamp

1. Understanding Big Data... ...or trying to understand a new awesome paradigm 2. POST /hello-world Javier Lafora (@eLafo) 3. A small walk through historyA small walk through history 4. Data Growth 90% of the data in the world today has been created in the last two years alone. - IBM http://www-01.ibm.com/software/in/data/bigdata/ 5. Data growth There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days. - Eric Schmidt, Google CEO, 2010 6. Traditional approach One powerful computer for all the data... 7. Traditional approach ...until it reaches storage limit... 8. Traditional approach ...or until it reaches process limit 9. Distribute your data 10. Parallelize your computing 11. Distributing your data 12. Distributed filesystems A DFS manages files and folders across multiple computers. It serves the same purpose as a traditional file system, but is designed to provide file storage and controlled access to files over local and wide area networks. 13. You don't want to worry about where a file is located 14. You don't want to worry about replicating data 15. You don't want to worry about managing failures 16. Parallelizing computation 17. You don't want to worry about breaking computation into pieces 18. You don't want to worry about scaling your code 19. Hadoop to the rescue 20. Distributed File System 21. Parallelizing algorithm 22. Map Reduce 23. Three different problems 24. Volume 25. Streaming / real time 26. Variety 27. Structured data 28. Semi-structured data 29. Unstructured data 30. Questions? 31. aspgems.com fin Thanks