BIG DATA
BIGDATA , collection of large and complex data sets difficult to process usingon-hand database tools
V V V
WHY IS IT INTRODUCED
Lots of data is being collected & warehoused.
Processing exceeds database system capacity.
Structured & Unstructured data Separation of data from application. Understanding data analytics. Faster development, faster runtime. Elastic Feature-Level Scalability.
APACHE HADOOP
Provides massive scalable storage, its not a
database Data Processing Platform HDFS, a fault tolerant storage Store data in native format Reduce cost & lower risks Extracting business value from data Deliver new insights Automatically handles s/w & h/w failures
HDFS
Fault tolerant storage Survive failure on disk, network and network
interface Uses Map-Reduce programs Creates clusters of machines and co-ordinates
Storage on clusters using blocks No special hardware compared to RAID
PROBLEMS WITH BIG DATA
Will be so overwhelmed Costs escalate too fast Storage consumed 3 times Timeliness Analysis Poor data locality Incompatible & Replicated data
CONCLUSION
Big Data will replace the approaches, tools and
systems that underpin development work. Better analysis of the large volumes of data.
Potential for advancing in many scientific
disciplines. Improving the profitability. Technical challenges to be addressed dynamically
REFERENCES
www.bigdatauniversity.com www.sas.com/big-data/ en.wikipedia.org/wiki/Big_data cra.org/ccc/docs/init/bigdatawhitepaper.pdf dataanalyticssummit.com hadoop.apache.org
THANK YOU
Today’s Big Data Is Not Tomorrow’s Big Data
Top Related