Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin...
-
Upload
singapore-google-technology-user-group -
Category
Technology
-
view
11.084 -
download
1
Transcript of Talk 1: Google App Engine Development: Java, Data Models, and other things you should know (Navin...
Google App Engine Development
Java, Data Models, and Other Things You Should Know
Navin KumarSocialwok
Introduction to Google App Engine
• Google App Engine is an on-demand cloud platform that can be used to rapidly develop and scale web applications.
• Advantages:
o You are using the same architecture and tools that Google uses to scale their own applications.
o Easy to develop your own applications using Java and Python
o Free Quotas to get you started immediately.
Java Support on Google App Engine
• Java support was introduced on April 2009 • Remarkable milestone for several reasons:
o Brought the Java Servlet development model to Google App Engine
o You can use your favorite Java IDE to develop your applications now (Eclipse, NetBeans, IntelliJ)
o Database development is easy with JDO and JPAo Not only limited to the Java Language, but ANY JVM-
supported language can be used (JRuby, Groovy, Scala, even JavaScript(Rhino), PHP etc.)
Eclipse Support and GWT
• Eclipse is the premier open source Java IDE, and with the Google Plugin for Eclipse, developing Google AppEngine apps can be done very easily.
• Eclipse will automatically layout your web application for you
in addition to providing 1-click deployment. • GWT is also supported by the Eclipse plugin, and can also
be used along with your Google AppEngine codebase.o End-to-end Java development of powerful Java-based
web applications.
Google Plugin for Eclipse (GWT and AppEngine)
BigTable: Behind Google's Datastore
• BigTable: A Distributed Storage System for Structured Data (http://labs.google.com/papers/bigtable.html)o Built on top of GFS (Google File System) (
http://labs.google.com/papers/mapreduce.html) • Strongly consistent and uses optimistic concurrency control
• But it's not a relational database
o No Joins or true OR querieso "!=" is not implementedo Limitations on the use of "<" and ">"
Data Models• DataNucleus (http://www.datanucleus.org) is used to handle
the Java persistence frameworks on AppEngine • 2 Choices: JDO (Java Data Objects) or JPA (Java
Persistence API) (JPA will be very familiar to those who have used Hibernate or EJB persistence frameworks)
• Both involve very similar coding styles.
• For this talk, we will focus on JDO, but JPA is very similar,
so the same concepts can be applied. • There is also a low-level datastore API that we will touch on
as well
Defining Your Data Modelpackage com.socialwok.server.data;import java.io.Serializable;import javax.jdo.annotations.*;import com.google.appengine.api.datastore.Text;
@PersistenceCapable(identityType = IdentityType.APPLICATION)public class Post implements Serializable { private static final long serialVersionUID = 1L; @PrimaryKey @Persistent(valueStrategy=IdGeneratorStrategy.IDENTITY) @Extension(vendorName="datanucleus", key="gae.encoded-pk", value="true") private String id; public String getId() { return id; }
@Persistent private String title; public String getTitle() { .. } public void setTitle(String title) { .. }
@Persistent private Text content; public String getContent() { .. } public void setContent(String content) { .. } ..}
Creating, Deleting, and Querying • At the heart of everything is the PersistenceManager
PersistenceManager pm = PMF.get().getPersistenceManager(); Post post = new Post(); post.setTitle("Title"); post.setContent("Google AppEngine for Java"); try { pm.makePersistent(post); } pm.close(); ... Post deleteMe = pm.getObjectById(Post.class, deleteId); try { pm.deletePersistent(deleteMe); } ... • Build queries using JDOQL
Query query = pm.newQuery(Post.class); query.setFilter("title == titleParam"); query.declareParameters("String titleParam"); query.setUnique(true); Post post = (Post) query.execute("Title");
Relationships
• Owned one-to-one and one-to-many @Persistent(mappedBy="field") annotation syntax. • Unowned relationships (one-to-one, one-to-many, many-to-
many) @Persistent Key otherEntity; @Persistent List<Key> otherEntities; • Owned relationships create a parent-child relationship
o Parent and child entities are stored in the same entity group
o Entity group defines a location in the datastore o This is important because Transactions on the datastore
can only be applied over a single entity group
Other APIs you should be aware
• UsersServiceo Don't write a login, use Google's!
• ImagesServiceo Picasa image manipulation web services
• Memcacheo Distributed cache for objectso Very useful! More on this later...
• URL Fetch• Mail service
o Send outbound emails w/ some restrictions • APIs (except UsersService) subject to quota limitations
And now for the fun stuff...
• Enterprise social collaboration application built on Google App Engine. o Utilizes a social concept of feeds (also referred to as
presence and activity streams)o Combines the querying of reasonable complex data with
privacy requirements of social networking. • Uses tons of Google App Engine APIs, Google APIs, and
GWT. • As we have built it, we have learned several aspects about
Google App Engine that have allowed us to make the app reasonable fast and responsive.
Lesson 1: Utilization of Memcache
• Data structure of each feed is relatively complexo At least 3 explicit unowned relationships
@Persistent Key user @Persistent Key network @Persistent List<Key> attachments
Requires querying for each these objects explicitly when representing in the feed.
• Feed is fetched repeated by several (hundreds) concurrent userso There is need for the feed display to be reasonable
responsive for all the different users
Lesson 1 (cont.) Solution: Memcache
• Distributed in-memory cache o Uses javax.cache.* APIso Also, a lowlevel API: com.google.appengine.api.memcache.*
• Basic uses:o Speed up existing common datastore querieso Session data, user preferences
• Cache data is retained as long as possible if no expiration is set
• Data is not stored on any persistent storage, so you must be sure your app can handle a "cache miss"
Lesson 1: Memcache conclusions
• Works really well!o Responsive requestso 2 s. => ~800 ms. resp. time (60% decrease)
• Cache data is generally retained for a very long time• Distributed nature of cache provides benefits to every user
on the system.o The more people who use your app, the better your app
performs**• Even free quota for Memcache is quite generous:
o ~ 8.6 million API calls.
Lesson 2: Message Delivery Fanout
• Adapted from Building Scalable, Complex Apps... from Google I/O by Brett Slatkino http://code.google.com/events/io/sessions/BuildingScalab
leComplexApps.html• Basically deals with a problem of fan-out
o Socialwok has a concept of "following" (which is basically a subscription between users)
o In our case, one user posts a single message that needs to be "delivered" to all his subscribers
o How do we show the message efficiently to all his subscribers? We can deliver the message by reference to its
recipients.
Lesson 2 (cont.): RDBMS version
User ID Name1 Navin2 John3 Vikram
2 Primary Tables
Message ID Message User ID1 Hello world 12 Another message 3
2 Join TablesFollower ID Following ID1 21 32 1
Recipient ID Message ID
1 34
1 67
• To get Messages to display for the current user SELECT * from Messages INNER JOIN UserMessages USING (message_id) WHERE UserMessages.user_id = 'current_user_id'
• But there aren't any joins on AppEngine!
Lesson 2: List Properties to the Rescue• A list property is property in the datastore that has multiple values: @Persistent private Collection<String> values;
o Represented in Java using Collection fields (Set, List, etc.)o Indexed in the same way that normal fields are
values Index
key=1,values=1
key=2,values=2
key=2,values=1
o Densely pack information
o Query like you query any single-valued property: query.setFilter("values == 2");
Lesson 2: Our new data definition• Now we can define a collection field to store the list of
recipients public class Message { @Persistent private String msg; @Persistent private List<String> recipients; ... }• Query on the collection field: Query query = pm.newQuery(Message.class); query.setFilter("recipients == recptParam"); List<Message> msgs = (List<Message>) query.execute(currentUserId);• But there is one issue with this:
o Serialization overhead when fetching the messageso We don't really care about the contents of this field when
displaying the messageso So we will take advantage of another trick
Lesson 3: Keys-only Queries and AppEngine Key Structure• We can perform queries whose return values are restricted
to the keys of the entityo Currently only supported in low-level datastore API
• AppEngine keys are structured in a very special wayo Stored in protocol buffers o Consists of an app ID, and series of type-id_or_name
pairs pair is entity type name and autogenerated-integer ID
or user-provided nameo Root entities have exactly one of these pairs; child
entities have one for each parent and their own• Presents a unique ability to retrieve a parent entity's key
from the child entity's key
Lesson 3: A solution to our Serialization Problem• Now we can store the irrelevant recipients in a child entity
1. Here's the process:
1.Define a child entity with the recipients field• Store the recipients of the message in the child entity • Create a keys-only query on the child entity that filters on
the recipients field.• Get a list of parent keys from the list of child keys• Bulk-fetch the parents from the datastore
Lesson 3 (contd.): Solution (Data Def.)public class MessageRecipients { @PrimaryKey private Key id; @Persistent private List<String> recipients; @Persistent private Date date; @Persistent(mappedBy="msgRecpt") private Message msg; ...}public class Message { ... @Persistent private Date date; @Persistent private String msg; @Persistent private MessageRecipients msgRecpt; ... }
Lesson 3 (contd): Solution (Querying)DatastoreService dataSvc = ...; Query query = new Query("MessageRecipients") .addFilter("recipients"),FilterOperator.EQUAL,userid) .addSort("date", SortDirection.DESCENDING) .setKeysOnly(); // <-- Only fetch keys! List<Entity> msgRecpts = dataSvc.prepare(query).asList();List<Key> parents = new ArrayList<Key>();for (Entity recep : msgRecpts) { parents.add(recep.getParent());}
// Bulk fetch parents using key listMap<Key,Entity> msgs = dataSvc.get(parents);
Cool Trick: Lite Full Text Search• Most web applications nowadays need some form of full-text
search• Well we are on Google AppEngine aren't we!
• Google actually did really release a basic searchable model
implementationo Limited to Python (google.appengine.ext.search)o More info:
http://www.billkatz.com/2008/8/A-SearchableModel-for-App-Engine
o Proper full-text search is in the AppEngine roadmap • Some of our earlier lessons do apply here.
How do we build it
• First, it helps to understand how a basic full-text search index workso First, break up the text into terms using lexographical
analysis o Then store the terms in a lookup table based on key of
the message With List fields, Google AppEngine gives us this one.
o We build queries using the same tricks. • We also apply the same tricks using child entities and key-
only queries to optimize for the serialization overhead.
Live example
• I have deployed a modified version of Google AppEngine guestbook example:o http://searchguestbook.appspot.com
• If anyone wants to "sign" it right now, please go ahead.
• We will now search the data
o Limited to 1-2 word queries
How it works.• Applies lessons from list fields and keys-only queries
@Persistent Set<String> searchTerms;• Our "lexigraphical analysis": Java regular expression String[] tokens = content.toLowerCase().split("[^\\w]+");
o Can use a full-text search library like Lucene to improve this part
• Another cool feature of list properties: merge-joino Think about organizing your data in a Venn-diagram
fashion and finding the intersection of your data.o Watch your indexes!
• Can improve this implementation by using Memcache to cache common search queries.
• Code will be made available after the talk, so you can take a good look for yourself!
Conclusions• Google AppEngine for Java provides a standardized way to
build applications for Google AppEngine• In building Socialwok, we have learned several lessons that
apply when building a scalable application on Google App Engine
• Get the Searchable Guestbook code here:o http://searchguestbook.appspot.com/searchguestbook.tar
.gz• In short, Google AppEngine development has never been
easier and more interesting!• Get started by visiting: http://code.google.com/appengine
Q & A