Building secure NoSQL applications nosqlnow_conf_2014

Post on 15-Jul-2015

539 views 0 download

Tags:

Transcript of Building secure NoSQL applications nosqlnow_conf_2014

+

Building Secure Applications With HBase / Accumulo

Sujee Maniyam sujee@elephantscale.com

Nosql now! 2014 Conference

Aug 2014, San Jose, CA

+About This Talk…

n Some practical tips & design patterns on building secure applications using HBase and Accumulo

n A quick demo (fingers crossed!)

n Audience : technical

+Who Invited This Guy?

n  HI, I am Sujee Maniyam

n  Founder / Principal @ Elephant Scale Consulting & Training in Big Data, NoSQL

n  Co-Author of open source Hadoop book: http://hadoopilluminated.com

n  Founder / Organizer of ‘Big Data Guru’ meetup http://www.meetup.com/BigDataGurus/

n  Open source : http://github.com/sujee

n  http://sujee.net | http://www.linkedin.com/in/sujeemaniyam

+NoSQL eco-system (too many!)

+HBase : Quick Intro

n  Modeled after Google Big Table

n  Distributed, Nosql store built on Hadoop / HDFS

n  Apache project

n  http://hbase.apache.org/

HDFS

HBase

+Accumulo : Quick Intro

n  Developed by the National Security Agency (NSA) !

n  Google Big Table implementation

n  Nosql store on top of HDFS

n  Security is a first grade concept

HDFS

Accumulo

+HBase & Accumulo

n  Both are Big Table implementation

n  Based on HDFS

n  Written in Java

n  Apache open source projects

HDFS

HBase Accumulo

+Approach to Security in Hadoop Until Recently…

+But Security Picture Has Improved Rapidly…

n  Lot of work going on in the eco system

n  Hadoop vendors (Cloudera / HortonWorks ..) have been very actively working on security features

n  ‘the core’ features are in

n  Ease of use improving as well

+Next : Building Secure Applications

+What Does It Mean to be ‘Secure’?

n  1) Control who can get in?

n  2) Verify the person’s identity

n  3) safeguard communications with user

n  4) What is allowed for this user

n  5) And finally… n  Protect data at rest

+1) Who can get in

n  Control which machines can connect to NoSQL cluster

n  Don’t expose the cluster to public n  Too many open ports

n  Too vulnerable

n  Solutions: n  Run cluster behind firewall

n  Restrict which machines can connect to cluster

n  Linux / Network level security

n  Outside the actual NoSQL

+Trusted Environment

+2) User Authentication

n  Wolf: Knock… Knock…

n  Pig : Who is there?

n  Wolf : It is me… little pig

n  How can we verify the user? n  Username / password (gmail)

n  Or use a third person (referee) n  Kerberos

Source : http://1.bp.blogspot.com/

+Kerberos : Quick Primer

n  Kerberos is a authentication protocol for networked machines

n  Validates client to server and vice-versa

n  Strong crypto algorithms (AES, 3DES…)

+Kerberos Protocol for Getting a Beer in a Carnival / Fair J_

+Kerberos Protocol Explained : Getting Beer @ Fair / Party

n  Prove your age (identity) to wrist-band issuer n  Ticket Granting Ticket

n  Get a wristband à qualifies you to get beer n  Service Ticket

n  Go to bartender and ask for beer using your wrist-band n  Service Request

n  Get Beer ! J

n  For technically correct explanation see : http://www.roguelynn.com/words/explain-like-im-5-kerberos/

+Kerberos Integration

HBase Accumulo

Kerberos Integration yes Yes (simple authentication built-in also)

+3) Secure Client Communication

n  Guard client / server communication (‘on the wire’)

n  Done by using SASL (certificates)

n  Prevents snooping by third parties

Hbase Accumulo

Secure client communications

Yes Yes

+4) What Is Allowed For This User?

n  In unsecured environment users can read / write to any table n  à not very secure!

n  Control which data users can see..

+Quick Primer on HBase Storage

n  Tables have many rows

n  Row has multiple columns (or qualifiers)

n  They are grouped into column families

n  Each cell also has a timestamp (not shown here)

info secure

Customer_id name email phone Last 4 social

Full ssn

Family1

Cell

Family2

+HBase Allows Access Control At Family Level

info secure

Customer_id name email phone Last 4 social

Full ssn

First level CSR can Only access this family

Only supervisors can access this family

+Need More Fine Grained Access

n  We like to provide ‘cell level’ access controls

n  Greater flexibility in application development

n  More fine grained access controls

n  Meet Accumulo’s Data Model

+Accumulo Data Model

Family : info

Columns à name email Last 4 ssn Ssn Gmail password

Visibility tokens à

Level 1 Level 1 Level 1 Level 2 OR Top clearance

Top clearance

•  Every thing in HBase data model •  Plus each row has a ‘Visibility Token’

+Users Are Assigned ‘Visibility Tokens’

User id Visibility levels

User 1 Level 1

User 2 Level 1 + Level 2

Edward Snowden Level 1 + Level 2 + Top Clearance

+Accumulo only returns cells visible to user

family

Columns à name email Last 4 SSN Full SSN Gmail password

person1 Joe joe@gmail.com

6789 123-45-6789

JoeSuperMan!

Visibility tokens à

Level 1 Level 1 Level 1 Level 2 OR Top clearance

Top clearance

+What Users Can See…

User Visibility Privilage Visible Cells

User 1 Level 1 Name Email Last 4 ssn

User 2 Level 1 + Level 2

Name Email Last 4 SSN Full SSN

Edward Snowden Level 1 + Level 2 + Top Clearance

Name Email Last 4 SSN Full SSN Gmail Password

+Good News For HBase

n  With release 0.98 Hbase also allows cell based access controls

n  Called ‘tags’

n  Need to upgrade to Hfile V3 (version 3) format

+Visibility / Access Controls

n  Both HBase and Accumulo allow access control for the data

Hbase Accumulo

Cell Level Visibility Yes (Starting with v 0.98)

Yes

+5) Final Step : Encrypt Data At Rest

n  Eventually data ends up in disk

n  We need to protect the ‘raw data’ on disk

n  To prevent n  Users going to disk directly

n  Theft of hardware

+Solution : Encrypt Data Transparently

n  Encryption is done via keys n  Uses Java Cryptography Extension (JCE)

n  Data is encrypted before writing to HDFS n  Does not rely on HDFS or Linux level encryption

n  Per family encryption is supported

Hbase Accumulo

Encryption At Rest Yes Yes

+HBase & Accumulo : Transparent Encryption

+Encryption : Key Management

n  The keys have to managed carefully… n  Don’t loose them !

n  Don’t compromise them !!

n  Possible storage mechanisms n  Database

n  Remote file server

n  Key management server

n  Local file system

+Summary

HBase Accumulo

Runs in a trusted environment Yes (outside configuration)

Yes (outside configuration)

User Authentication Kerberos Kerberos + Built-in

Secure client communications (via SSL)

Yes Yes

Visibility at cell level Yes (starting from v0.98)

Yes

Encrypt data at rest Yes Yes

+Useful Resources

n  Accumulo n  http://www.slideshare.net/DonaldMiner/accumulo-

oct2013bofpresentation

n  HBase n  http://hbase.apache.org/book/hbase.encryption.server.html

+DEMO

+Demo Explained

Name email ssn Gmail_password

Person1 Joe Smith joe@gmail.com

123-45-6789 ‘JoeDaMan!’

Visibility Level

Level 1 Level 1 Level 2 Top

Demonstrate cell level visibility feature of accumulo Here is how the data looks like:

+Demo : Accumulo Users + Visibility

Accumulo user

Table1 access

Access level

Visible Columns

root yes all all

user1 yes Level 1 Name, email

user2 yes Level 1 + Level 2

Name, email + SSN

esnowden yes Level 1 + Level 2 + Top

Name, email + SSN + Gmail password J

user3 no N/A N/A

+Thanks & Questions!

sujee@ElephantScale.com

http://ElephantScale.com

Expert consulting & training in Big Data (Hadoop, NoSQL, Spark)

Free, online Hadoop book ‘Hadoop illuminated’