Protocol buffers

Post on 16-Jul-2015

426 views 2 download

Tags:

Transcript of Protocol buffers

Protocol Buffers OverviewFabrício Epaminondas - @fabricioepa

Senior Software Engineer, Signove

About me

BSc in Computer Science at Federal University of Campina Grande, UFCG.

Recent activities

• Implementation of IEEE Data Exchange Protocol 11073 part 20601

• Data modeling for Bluetooth services• Data synchronization using REST

services

Agenda

Background

What are Protocol Buffers?

How do they work?

Why use Protocol Buffers?

Techniques

Questions

Quick Links

Background

Data Formats in Information Technology

• Typing/interpretation, transmission, storage

Popular data formats...

CSV

• Simple to read/write by

application

• Tabular data structure

• Flat

• No validation

Name, Age, Phone

Fabricio, 26, +558388000000

Kaka, 28, +558388000001

Cafu, 40, +558388000002

Pele, 70, +558388000003

XML

• Markup language for Documents

• Hierarchical structure

• Data validation

• A common standard with great acceptance

<person>

<name>Fabricio</name>

<age>26</age>

<contacts>

<email>

my@email.com

</email>

<phone>999</phone>

</contacts>

</person>

JSON

• Lightweight data-interchange format

• Browser support

• Alternative to XML

person {

name: “Fabricio”

age: 26

contacts: {

email: “my@email.com”

phone: “999”

}

}

Comparison

CSV

XML

JSON

Parsing efficiency

ReusableModel Update

Hierarchical Small Size

Google's Data Interchange

RequirementsWe use literally thousands of different data formats to represent:

• networked messages between servers• index records in repositories• geospatial datasets

Most of these formats are structured, not flat. This raises an important question…

How do we encode it all?

Requirements:

Hierarchical data structure

Small data size

Parsing performance

Model update: add/ignore fields, modify parser code...

Backwards compatible

What are Protocol Buffers?

A language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

It was initially developed at Google to deal with an index server request/response protocol

How do they work?

You define how your structured data format is a descriptor file

Generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

You can even update your data structure without breaking deployed programs that are compiled against the "old" format.

Writing some code…

.proto C++message Person {

required string name = 1;required int32 id = 2;optional string email = 3;

enum PhoneType {MOBILE = 0;HOME = 1;WORK = 2;

}

message PhoneNumber {required string number = 1;optional PhoneType type = 2 [default = HOME];

}

repeated PhoneNumber phone = 4;}

Person person;person.set_name("John Doe");person.set_id(1234);person.set_email("jdoe@example.com");fstream output("myfile", ios::out | ios::binary);//Writeperson.SerializeToOstream(&output);

//Readfstream input("myfile", ios::in | ios::binary);Person person;person.ParseFromIstream(&input);cout << "Name: " << person.name() << endl;cout << "E-mail: " << person.email() << endl;

Generated code

Messages

• Immutable

Builders

Enums and Nested Classes

• C++: Person:: Mobile• Java: Person.PhoneType.MOBILE

Parsing and Serialization

Why use Protocol Buffers?

Protocol Buffers’ major design goals is simplicity

Protocol buffers are the flexible, efficient

PB are 3 to 10 times smaller than XML

PB are 20 to 100 times faster than XML

Comparison

CSV

XML

JSON

PB

Parsing efficiency

ReusableModel Update

Hierarchical Small Size

Why use Protocol Buffers?

Use object serialization (like in Java) causes interoperability problems.

In C/C++ the raw in-memory data structures can be sent/saved in binary form, but is hard to extend.

Alternatives

Thrift

ASN1

Java Externalizable

Others IDL...

• WSDL, XSD, XML• CORBA, Java-IDL, etc…

Techniques

Backward/Forward compatibility

Updating Message Types

O-O Design

Backward/Forward compatibility

You must not change the tag numbers of any existing fields.

You must not add or delete any required fields.

Consider writing application-specific custom validation routines instead of required fields

You may delete optional or repeated fields.

You may add new optional or repeated fields but you must use fresh tag numbers…

(i.e. tag numbers that were never used in this protocol buffer, not even by deleted fields).

Backward/Forward compatibility

Old code will simply ignore new fields, for deleted fields it will read default values

Unknown fields are not discarded, and if the message is later serialized, the unknown fields are serialized along with it

Changing a default value is generally OK, but remember default values are never sent over the wire

Receiver will NOT see the default value that was defined in the sender's code.

New code will also transparently read old messages

Updating Message Types

Don't change the numeric tags for any existing fields.

Despite of non-required fields can be removed, it’s better to rename the field instead to something like “DEPRECATED_...”

int32, uint32, int64, uint64, and bool are all compatible. It does not breaks forwards- or backwards-compatibility.

string and bytes are compatible as long as the bytes are valid UTF-8.

More issues in protobuf manual

O-O Design

Generated source code of message objects should not be modified

Use wrappers to encapsulate messages

Do not inherit from message objects

Questions…

Quick Links

• API

▫ http://code.google.com/apis/protocolbuffers/

• Post By Kenton Varda, Protocol Buffers Team▫ http://google-opensource.blogspot.com/2008/07/protocol-buffers-googles-data.html

• Kevin Weil, Analytics Lead, Twitter

▫ http://www.slideshare.net/kevinweil/protocol-buffers-and-hadoop-at-twitter

• Benchmarks

▫ http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking

• Computer World Article

▫ http://www.computerworld.com/s/article/9191098/Twitter_solves_its_data_formatting_challenge