Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin...

30
Google App Engine Development Java, Data Models, and Other Things You Should Know Navin Kumar Socialwok

Transcript of Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin...

Page 1: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Google App Engine Development

Java, Data Models, and Other Things You Should Know

Navin KumarSocialwok

Page 2: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Introduction to Google App Engine

• Google App Engine is an on-demand cloud platform that can be used to rapidly develop and scale web applications.

 • Advantages:

o You are using the same architecture and tools that Google uses to scale their own applications. 

o Easy to develop your own applications using Java and Python

o Free Quotas to get you started immediately.

Page 3: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Java Support on Google App Engine

• Java support was introduced on April 2009 • Remarkable milestone for several reasons:

o Brought the Java Servlet development model to Google App Engine

o You can use your favorite Java IDE to develop your applications now (Eclipse, NetBeans, IntelliJ)

o Database development is easy with JDO and JPAo Not only limited to the Java Language, but ANY JVM-

supported language can be used (JRuby, Groovy, Scala, even JavaScript(Rhino), PHP etc.)

Page 4: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Eclipse Support and GWT

• Eclipse is the premier open source Java IDE, and with the Google Plugin for Eclipse, developing Google AppEngine apps can be done very easily.

 • Eclipse will automatically layout your web application for you

in addition to providing 1-click deployment. • GWT is also supported by the Eclipse plugin, and can also

be used along with your Google AppEngine codebase.o End-to-end Java development of powerful Java-based

web applications.

Page 5: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Google Plugin for Eclipse (GWT and AppEngine)

Page 6: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

BigTable: Behind Google's Datastore

• BigTable: A Distributed Storage System for Structured Data (http://labs.google.com/papers/bigtable.html)o Built on top of GFS (Google File System) (

http://labs.google.com/papers/mapreduce.html)   • Strongly consistent and uses optimistic concurrency control

 •  But it's not a relational database 

o No Joins or true OR querieso "!=" is not implementedo Limitations on the use of "<" and ">"

Page 7: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Data Models• DataNucleus (http://www.datanucleus.org) is used to handle

the Java persistence frameworks on AppEngine • 2 Choices: JDO (Java Data Objects) or JPA (Java

Persistence API) (JPA will be very familiar to those who have used Hibernate or EJB persistence frameworks)

 • Both involve very similar coding styles.

 • For this talk, we will focus on JDO, but JPA is very similar,

so the same concepts can be applied.  • There is also a low-level datastore API that we will touch on

as well

Page 8: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Defining Your Data Modelpackage com.socialwok.server.data;import java.io.Serializable;import javax.jdo.annotations.*;import com.google.appengine.api.datastore.Text;

@PersistenceCapable(identityType = IdentityType.APPLICATION)public class Post implements Serializable {    private static final long serialVersionUID = 1L;    @PrimaryKey    @Persistent(valueStrategy=IdGeneratorStrategy.IDENTITY)    @Extension(vendorName="datanucleus", key="gae.encoded-pk", value="true")    private String id;    public String getId() { return id; }

    @Persistent    private String title;    public String getTitle() { .. }    public void setTitle(String title) { .. }

    @Persistent    private Text content;    public String getContent() { .. }    public void setContent(String content) { .. }    ..}

Page 9: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Creating, Deleting, and Querying • At the heart of everything is the PersistenceManager

             PersistenceManager pm =  PMF.get().getPersistenceManager();      Post post = new Post();      post.setTitle("Title");      post.setContent("Google AppEngine for Java");      try {         pm.makePersistent(post);      }                  pm.close();      ...      Post deleteMe = pm.getObjectById(Post.class, deleteId);      try {          pm.deletePersistent(deleteMe);      }      ... • Build queries using JDOQL

      Query query = pm.newQuery(Post.class);      query.setFilter("title == titleParam");      query.declareParameters("String titleParam");      query.setUnique(true);      Post post = (Post) query.execute("Title");     

Page 10: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Relationships

• Owned one-to-one and one-to-many @Persistent(mappedBy="field") annotation syntax. • Unowned relationships (one-to-one, one-to-many, many-to-

many) @Persistent Key otherEntity; @Persistent List<Key> otherEntities; • Owned relationships create a parent-child relationship

o Parent and child entities are stored in the same entity group

o Entity group defines a location in the datastore o This is important because Transactions on the datastore

can only be applied over a single entity group

Page 11: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Other APIs you should be aware

• UsersServiceo Don't write a login, use Google's!

• ImagesServiceo Picasa image manipulation web services

• Memcacheo Distributed cache for objectso Very useful! More on this later...

• URL Fetch• Mail service

o Send outbound emails w/ some restrictions • APIs (except UsersService) subject to quota limitations

Page 12: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

And now for the fun stuff...

 

Page 13: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

• Enterprise social collaboration application built on Google App Engine. o Utilizes a social concept of feeds (also referred to as

presence and activity streams)o Combines the querying of reasonable complex data with

privacy requirements of social networking. • Uses tons of Google App Engine APIs, Google APIs, and

GWT. •  As we have built it, we have learned several aspects about

Google App Engine that have allowed us to make the app reasonable fast and responsive.

Page 14: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Lesson 1: Utilization of Memcache

• Data structure of each feed is relatively complexo At least 3 explicit unowned relationships

      @Persistent Key user      @Persistent Key network      @Persistent List<Key> attachments  

Requires querying for each these objects explicitly when representing in the feed.

• Feed is fetched repeated by several (hundreds) concurrent userso There is need for the feed display to be reasonable

responsive for all the different users

Page 15: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Lesson 1 (cont.) Solution: Memcache

• Distributed in-memory cache o Uses javax.cache.* APIso Also, a lowlevel  API: com.google.appengine.api.memcache.*

• Basic uses:o Speed up existing common datastore querieso Session data, user preferences

• Cache data is retained as long as possible if no expiration is set

• Data is not stored on any persistent storage, so you must be sure your app can handle a "cache miss"

Page 16: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Lesson 1: Memcache conclusions

•  Works really well!o Responsive requestso 2 s. => ~800 ms. resp. time (60% decrease)

• Cache data is generally retained for a very long time• Distributed nature of cache provides benefits to every user

on the system.o The more people who use your app, the better your app

performs**• Even free quota for Memcache is quite generous:

o ~ 8.6 million API calls.

Page 17: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Lesson 2: Message Delivery Fanout

• Adapted from Building Scalable, Complex Apps... from Google I/O by Brett Slatkino http://code.google.com/events/io/sessions/BuildingScalab

leComplexApps.html•  Basically deals with a problem of fan-out

o Socialwok has a concept of "following" (which is basically a subscription between users)

o In our case, one user posts a single message that needs to be "delivered"  to all his subscribers

o How do we show the message efficiently to all his subscribers? We can deliver the message by reference to its

recipients.

Page 18: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Lesson 2 (cont.): RDBMS version

User ID Name1 Navin2 John3 Vikram

2 Primary Tables

Message ID Message User ID1 Hello world 12 Another message 3

2 Join TablesFollower ID Following ID1 21 32 1

Recipient ID Message ID

1 34

1 67

• To get Messages to display for the current user SELECT * from Messages INNER JOIN UserMessages USING (message_id) WHERE UserMessages.user_id = 'current_user_id'

• But there aren't any joins on AppEngine!

Page 19: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Lesson 2: List Properties to the Rescue• A list property is property in the datastore that has multiple values: @Persistent private Collection<String> values;

o Represented in Java using Collection fields (Set, List, etc.)o Indexed in the same way that normal fields are

     values Index

key=1,values=1

key=2,values=2

key=2,values=1

o Densely pack information 

o Query like you query any single-valued property: query.setFilter("values == 2");

Page 20: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Lesson 2: Our new data definition• Now we can define a collection field to store the list of

recipients public class Message {     @Persistent private String msg;     @Persistent private List<String> recipients;     ... }•  Query on the collection field: Query query = pm.newQuery(Message.class); query.setFilter("recipients == recptParam"); List<Message> msgs =      (List<Message>) query.execute(currentUserId);• But there is one issue with this:

o Serialization overhead when fetching the messageso We don't really care about the contents of this field when

displaying the messageso So we will take advantage of another trick

Page 21: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Lesson 3: Keys-only Queries and AppEngine Key Structure• We can perform queries whose return values are restricted

to the keys of the entityo Currently only supported in low-level datastore API

• AppEngine keys are structured in a very special wayo  Stored in protocol buffers o  Consists of an app ID, and series of type-id_or_name

pairs pair is entity type name and autogenerated-integer ID

or user-provided nameo Root entities have exactly one of these pairs; child

entities have one for each parent and their own• Presents a unique ability to retrieve a parent entity's key

from the child entity's key

Page 22: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Lesson 3: A solution to our Serialization Problem• Now we can store the irrelevant recipients in a child entity

 1. Here's the process:

1.Define a child entity with the recipients field• Store the recipients of the message in the child entity • Create a keys-only query on the child entity that filters on

the recipients field.• Get a list of parent keys from the list of child keys• Bulk-fetch the parents from the datastore

Page 23: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Lesson 3 (contd.): Solution (Data Def.)public class MessageRecipients {    @PrimaryKey private Key id;    @Persistent private List<String> recipients;    @Persistent private Date date;     @Persistent(mappedBy="msgRecpt") private Message msg;     ...}public class Message {    ...    @Persistent private Date date;     @Persistent private String msg;     @Persistent private MessageRecipients msgRecpt;    ... }

Page 24: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Lesson 3 (contd): Solution (Querying)DatastoreService dataSvc = ...; Query query = new Query("MessageRecipients")  .addFilter("recipients"),FilterOperator.EQUAL,userid)  .addSort("date", SortDirection.DESCENDING)   .setKeysOnly();   // <-- Only fetch keys! List<Entity> msgRecpts = dataSvc.prepare(query).asList();List<Key> parents = new ArrayList<Key>();for (Entity recep : msgRecpts) {    parents.add(recep.getParent());}

// Bulk fetch parents using key listMap<Key,Entity> msgs = dataSvc.get(parents);

Page 25: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Cool Trick: Lite Full Text Search• Most web applications nowadays need some form of full-text

search• Well we are on Google AppEngine aren't we!

 • Google actually did really release a basic searchable model

implementationo Limited to Python (google.appengine.ext.search)o More info:

http://www.billkatz.com/2008/8/A-SearchableModel-for-App-Engine

o Proper full-text search is in the AppEngine roadmap  • Some of our earlier lessons do apply here.

Page 26: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

How do we build it

• First, it helps to understand how a basic full-text search index workso First, break up the text into terms using lexographical

analysis o Then store the terms in a lookup table based on key of

the message With List fields, Google AppEngine gives us this one.

o We build queries using the same tricks. • We also apply the same tricks using child entities and key-

only queries to optimize for the serialization overhead.

Page 27: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Live example

• I have deployed a modified version of Google AppEngine guestbook example:o http://searchguestbook.appspot.com

 • If anyone wants to "sign" it right now, please go ahead.

 • We will now search the data

o Limited to 1-2 word queries

Page 28: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

How it works.• Applies lessons from list fields and keys-only queries

 @Persistent Set<String> searchTerms;• Our "lexigraphical analysis": Java regular expression String[] tokens = content.toLowerCase().split("[^\\w]+");

o Can use a full-text search library like Lucene to improve this part

• Another cool feature of list properties: merge-joino Think about organizing your data in a Venn-diagram

fashion and finding the intersection of your data.o Watch your indexes!

• Can improve this implementation by using Memcache to cache common search queries.

• Code will be made available after the talk, so you can take a good look for yourself!

 

Page 29: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Conclusions• Google AppEngine for Java provides a standardized way to

build applications for Google AppEngine• In building Socialwok, we have learned several lessons that

apply when building a scalable application on Google App Engine

• Get the Searchable Guestbook code here:o http://searchguestbook.appspot.com/searchguestbook.tar

.gz• In short, Google AppEngine development has never been

easier and more interesting!• Get started by visiting: http://code.google.com/appengine

Page 30: Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin Kumar, CTO of Socialwok.com)

Q & A