Unlocking Hadoop for Your Rela4onal DB
Kathleen Ting | @kate_ting Technical Account Manager, Cloudera | Sqoop PMC Member BigData.be April 4, 2014
Who Am I?
• Started 3 yr ago as 1st Cloudera Support Eng • Now manages Cloudera’s 2 largest customers
• Sqoop CommiJer, PMC Member • Co-‐Author of the Apache Sqoop Cookbook
What is Sqoop?
• Apache Top-‐Level Project • SQl to hadOOP • Tool to transfer data from rela4onal databases
• Teradata, MySQL, PostgreSQL, Oracle, Netezza
• To/From Hadoop ecosystem • HDFS (text, sequence file), Hive, HBase, Avro
3
Why Sqoop?
• Efficient/Controlled resource u4liza4on • Concurrent connec4ons, Time of opera4on
• Datatype mapping and conversion • Automa4c, and User override
• Metadata propaga4on • Sqoop Record • Hive Metastore • Avro
Agenda
Sqoop 1 • Sqoop 1 Architecture • Sqoop 1 Command Line • Sqoop 1 Examples • Sqoop 1 Challenges • Troubleshoo4ng Sqoop 1 • Common Sqoop 1 Issues
• Protec4ng Your Password • Sqoop Works on CLI Not in Oozie • Choosing Proper Connector • Overriding Type Mapping
Sqoop 2 • Sqoop 2 Architecture • Sqoop 2 Design Goals • Sqoop 2 UI in Hue Resources
Agenda
Sqoop 1 • Sqoop 1 Architecture • Sqoop 1 Command Line • Sqoop 1 Examples • Sqoop 1 Challenges • Troubleshoo4ng Sqoop 1 • Common Sqoop 1 Issues
• Protec4ng Your Password • Sqoop Works on CLI Not in Oozie • Choosing Proper Connector • Overriding Type Mapping
Sqoop 2 • Sqoop 2 Architecture • Sqoop 2 Design Goals • Sqoop 2 UI in Hue Resources
Sqoop 1 Command Line
sqoop TOOL PROPS ARG [-- EXTRA] • TOOL: import, export • PROPS
• Hadoop (java) proper4es • -Dwhatever.whenever=yes
• ARG • Generic SQOOP arguments • --table, --connect, ...
• EXTRA • connector specific • --schema (PostgreSQL and Microsoa SQL Server)
Sqoop 1 Example
sqoop import \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop --password sqoop \ --table cities
sqoop export \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop --password sqoop \ --table cities \ --export-dir /temp/cities
Sqoop 1 Challenges
• Cryp4c, contextual command line arguments • Security concerns • Type mapping is not clearly defined • Client needs access to Hadoop binaries/configura4on and database
• JDBC model is enforced
10
Troubleshoo4ng Sqoop 1
• Versions: Sqoop, Hadoop, OS, JDBC • Console log aaer running with the --verbose flag
• Capture the en4re output via sqoop import … &> sqoop.log • En4re Sqoop command including the op4ons-‐file if applicable • Expected output and actual output • Table defini4on • Small input data set that triggers the problem
• Especially with export, malformed data is oaen the culprit • Hadoop task logs
• Oaen the task logs contain further informa4on describing the problem • Permissions on input files
Troubleshoo4ng Sqoop 1
Imported table has more rows than source table? • Data contains char used as Hive’s delimiters
• Clean up data • --hive-drop-import-delims
• Removes \n, \t, and \01 char
• --hive-delims-replacement “SPECIAL” • Replaces \n, \t, and \01 char with string SPECIAL
• Not restricted to Hive -‐ any import job using text files • Ensure output files have one line per imported row
Agenda
Sqoop 1 • Sqoop 1 Architecture • Sqoop 1 Command Line • Sqoop 1 Examples • Sqoop 1 Challenges • Troubleshoo4ng Sqoop 1 • Common Sqoop 1 Issues
• Protec4ng Your Password • Sqoop Works on CLI Not in Oozie • Choosing Proper Connector • Overriding Type Mapping
Sqoop 2 • Sqoop 2 Architecture • Sqoop 2 Design Goals • Sqoop 2 UI in Hue Resources
Common Sqoop 1 Issues
• Protec4ng Your Password • Sqoop Works on CLI Not in Oozie • Choosing Proper Connector • Overriding Type Mapping
Common Sqoop 1 Issues
• Protec4ng Your Password • Sqoop Works on CLI Not in Oozie • Choosing Proper Connector • Overriding Type Mapping
Protec4ng Your Password
sqoop import \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop \ --table cities \ -P
sqoop import \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop \ --table cities \ --password-file my-sqoop-password
Common Sqoop 1 Issues
• Protec4ng Your Password • Sqoop Works on CLI Not in Oozie • Choosing Proper Connector • Overriding Type Mapping
Sqoop Works on CLI Not in Oozie
Character parameter '|' has multiple characters; only the first will be used.
Got error creating database manager: java.io.IOException:
No manager for connect string: "jdbc:teradata...”
Sqoop Works on CLI Not in Oozie
sqoop import --password "spEci@l\$" \ –connect 'jdbc:x:/yyy;db=sqoop’
• Remove all escaping that you’ve added for the shell • Use <arg> vs <command> tags as content is considered to be one parameter
• Put all -‐D parameters into configura4on sec4on • Install driver into workflow’s lib/ directory or shared ac4on library /user/oozie/share/lib/sqoop/
Common Sqoop 1 Issues
• Protec4ng Your Password • Sqoop Works on CLI Not in Oozie • Choosing Proper Connector • Overriding Type Mapping
Choosing Proper Connector
• JDBC driver is dependency for all three connectors
• Sqoop automa4cally chooses most op4mal connector (OraOoop, built-‐in,
Generic JDBC Connector) • Or explicitly chose: --connection-manager com.quest.oraoop.OraOopConnManager
Common Sqoop 1 Issues
• Protec4ng Your Password • Sqoop Works on CLI Not in Oozie • Choosing Proper Connector • Overriding Type Mapping
Overriding Type Mapping
-‐-‐map-‐column-‐java parameter • comma separated list of key-‐value pairs
• key = exact column name • value = target Java type
sqoop import \
--map-column-java \
c1=Float,c2=String,c3=String ...
Agenda
Sqoop 1 • Sqoop 1 Architecture • Sqoop 1 Command Line • Sqoop 1 Examples • Sqoop 1 Challenges • Troubleshoo4ng Sqoop 1 • Common Sqoop 1 Issues
• Protec4ng Your Password • Sqoop Works on CLI Not in Oozie • Choosing Proper Connector • Overriding Type Mapping
Sqoop 2 • Sqoop 2 Architecture • Sqoop 2 Design Goals • Sqoop 2 UI in Hue Resources
Sqoop 2 Design Goals
• Security and Separa4on of Concerns • Role based access and use
• Ease of extension • No low-‐level Hadoop knowledge needed • No func4onal overlap between Connectors
• Ease of Use • Uniform func4onality • Domain specific interac4ons
Sqoop 2 UI in Hue
• Troubleshoo4ng • sqoop.log file is located in @LOGDIR@ and the rest should be in server/logs/*
• Look for catalina.out, catalina.log, localhost-‐*.log
Agenda
Sqoop 1 • Sqoop 1 Architecture • Sqoop 1 Command Line • Sqoop 1 Examples • Sqoop 1 Challenges • Troubleshoo4ng Sqoop 1 • Common Sqoop 1 Issues
• Protec4ng Your Password • Sqoop Works on CLI Not in Oozie • Choosing Proper Connector • Overriding Type Mapping
Sqoop 2 • Sqoop 2 Architecture • Sqoop 2 Design Goals • Sqoop 2 UI in Hue Resources
Top Related