Questions
-
Upload
sandhya-rani-padhy -
Category
Documents
-
view
3 -
download
1
Transcript of Questions
Given 1 PB of data to migrate into Hadoop, tell me every aspect and task involved in the migration
My humble attempt:
Understand vision - how the data will be used
Get data growth rate
Understand how future-proofed your solution should be - 2 years, 3 years etc
Assumed disk space on commodity hardware
Assumed replication factor - default of 3
Calculate the disk space for the projected years
Add 30% extra to the space calculation to allow for hadoop machiner usage for MR and other aspects
Factor in NN server requirement and HA of the same
Factor in JT server requirements and HA of the same
Factor in Zookeeper requirement and HA of the same
Decide format of date storage - text file, sequence file/AVRO/RCFile/Compression & codec - although this might impact space requirement - minimize it
Factor in HBase master server requirement and HA of the same if HBase is to run
Factor in racks/switches
Factor in number of environments - dev/test/QA/prod parallel/production..and cluster size
Plan out users/groups/authentication/authorization and any data encryption needed.
Plan the load tasks, and data integrity checks