Kudu and Rust

16
© Cloudera, Inc. All rights reserved. 1 Dan Burkert github.com/danburkert dcb on Mozilla IRC channels getkudu.io Rust and Kudu

Transcript of Kudu and Rust

© Cloudera, Inc. All rights reserved. 1

Dan Burkert github.com/danburkert dcb on Mozilla IRC channels

getkudu.io

Rust and Kudu

© Cloudera, Inc. All rights reserved. 2

Agenda

• Kudu (what is it?) • Rust and Kudu • Demo

6

© Cloudera, Inc. All rights reserved. 3

What’s Kudu?

6

© Cloudera, Inc. All rights reserved. 4

• High throughput for big scans Goal: Within 2x of HDFS + Parquet

• Low-latency for short accesses Goal: 1ms read/write on SSD

• Database-like semantics (initially single-row ACID)

• Relational data model • SQL queries are easy • “NoSQL” style scan/insert/update • (Java, C++ and now Rust clients)

Kudu Design Goals

© Cloudera, Inc. All rights reserved. 5

Using Kudu

• Table has a SQL-like schema • Finite number of columns (unlike HBase/Cassandra) • Types: BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP • Some subset of columns makes up a possibly-composite primary key • Flexible data distribution policies • Fast ALTER TABLE • “NoSQL” style API • Insert(), Update(), Delete(), Scan() • Integrations with higher level compute frameworks (Spark, Map/Reduce)

© Cloudera, Inc. All rights reserved. 6

Replication

© Cloudera, Inc. All rights reserved. 7

Columnar Storage

{25059873, 22309487, 23059861, 23010982}

Tweet_id

{newsycbot, RideImpala, fastly, llvmorg}

User_name

{1442865158, 1442828307, 1442865156, 1442865155}

Created_at

{Visual exp…, Introducing .., Missing July…, LLVM 3.7….}

text

© Cloudera, Inc. All rights reserved. 8

Using Kudu from Rust

• Experimental client library: github.com/danburkert/kudu-rs

• Depends on a new C client library: github.com/danburkert/kudu/tree/c-api

• Goal is to merge both the Rust and C clients into the Kudu project

© Cloudera, Inc. All rights reserved. 9

Sample API

struct PartialRow { .. } impl <'a> PartialRow<'a> { pub fn set<T>(&mut self, column_name: &str, value: T) -> Result<()> where T: ColumnType<'a> { .. } pub fn set_copy<'b, T>(&mut self, column_name: &str, value: T) -> Result<()> where T: VarLengthColumnType<'b> { .. } pub fn get<T>(&'a self, column_name: &str) -> Result<T> where T: ColumnType<'a> { .. } }

// with impls for bool, i{8, 16, 32, 64}, f{32, 64}, SystemTime, &str, &[u8] trait <'a> ColumnType<'a> { .. }

// with impls for &str, &[u8] trait <'a> VarLengthColumnType<'a> { .. }

© Cloudera, Inc. All rights reserved. 10

Sample Application: KuduSQL

• A SQL-like shell for Kudu

• Supports limited CRUD and DDL functionality on Kudu tables

• Designed for interactive usage with tab completion, error reporting

• Depends on many community libraries (chrono, term, docopt, others)

• github.com/danburkert/kudusql

© Cloudera, Inc. All rights reserved. 11

Read User Input Parse Command Execute Command

© Cloudera, Inc. All rights reserved. 12

INSERT INTO tweets (tweet_id, user_name, created_at, text) VALUES (2344242,

"rustlang", 2016-03-21T17:29:42Z,

"Please welcome erickt to the core team!");

Read User Input Parse Command Execute Command

© Cloudera, Inc. All rights reserved. 13

Read User Input Parse Command Execute Command

Command::Insert { table: “tweets”, columns: vec![“tweet_id”, “user_name”, “created_at”, “text”], row: vec![ Literal::Int(2344242), Literal::String(Cow::Borrowed(“rustlang”)), Literal::Timestamp(2016-03-21T11:29:42Z), Literal::String(Cow::Borrowed(“Please…”))], }

© Cloudera, Inc. All rights reserved. 14

Read User Input Parse Command Execute Command

fn insert(client: &mut kudu::Client, table_name: &str, columns: Vec<&str>, row: Vec<Literal>) -> kudu::Result<()> {

let table = try!(client.open_table(table_name)); let mut session = client.new_session(); let mut insert = table.new_insert();

for (column, literal) in columns.iter().zip(row) { match literal { Literal::Bool(b) => insert.row().set(column, b), .. Literal::String(cow) => insert.row().set(column, &cow[..]), } }

try!(session.insert(insert)); session.flush() }

© Cloudera, Inc. All rights reserved. 15

When to Use Kudu

• Big, constantly growing and updating datasets (1TiB+)

• Sequential Access to many rows per query

• Fast hardware • Takes full advantage of SSDs, NVRAM

© Cloudera, Inc. All rights reserved. 16

http://getkudu.io/ github.com/danburkert/kudu-rs