Allura - an Open Source MongoDB Based Document Oriented SourceForge

21
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeat Geeknet, page 1 Allura – an Open Source MongoDB Based Document Oriented SourceForge Rick Copeland @rick446 [email protected]

description

MongoSF 2011 talk on Allura, the new platform for SourceForge that we released under an Apache license

Transcript of Allura - an Open Source MongoDB Based Document Oriented SourceForge

Page 1: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 1

Allura – an Open Source MongoDB Based Document

Oriented SourceForge

Rick Copeland@rick446

[email protected]

Page 2: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 2

I am not Mark Ramm (sorry)

Page 3: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 3

Allura (SF.net “beta” devtools)

Rewrite developer tools with new architecture

Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come

Single MongoDB replica set

Release early & often

Page 4: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 4

Allura ScalingSourceForge.net currently handles ~4M pageviews per day

Allura will eventually handle 10% (with lots of writing)

“Consume” currently handles 3M+ pageviews/day on one shard (read-mostly)

Allura can handle ~48k pageviews / day / shard

Add shards & optimize queries as we migrate projects to sf.net

Most data is project-specific; sharding by project is straightforward

Page 5: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 5

System Architecture

Web-facing App Server

Task Daemon

SMTPServer

FUSE Filesystem(repository hosting)

Page 6: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 6

Ming – an “Object-Document

Mapper?” Your data has a schema Your database can define and enforce it

It can live in your application (as with MongoDB)

Nice to have the schema defined in one place in the code

Sometimes you need a “migration” Changing the structure/meaning of fields

Adding indexes, particularly unique indexes

Sometimes lazy, sometimes eager

“Unit of work:” Queuing up all your updates can be handy

Python dicts are nice; objects are nicer

Page 7: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 7

Ming Concepts Inspired by SQLAlchemy

Group of collection objects with schemas defined

Group of classes to which you map your collections

Use collection-level operations for performance

Use class-level operations for abstraction

Convenience methods for loading/saving objects and ensuring indexes are created

Migrations

Unit of Work – great for web applications

MIM – “Mongo in Memory” nice for unit tests

Page 8: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 8

Ming Examplefrom ming import schema, Fieldfrom ming.orm import (mapper, Mapper, RelationProperty,

ForeignIdProperty)

WikiDoc = collection(‘wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str))CommentDoc = collection(‘comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str))

class WikiPage(object): passclass Comment(object): pass

ormsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('WikiComment')))ormsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage')))

Mapper.compile_all()

Page 9: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 9

Allura Artifacts

Artifacts include tickets, wiki pages, discussions, comments, merge requests, etc.

On artifact change, a session extension:

• Queues a Solr index operation (for full text search support)

• Scans the artifact text for references to other artifacts

• Updates statistics on objects created/modified/deleted

Artifact

VersionedArtifact Snapshot Message

Page 10: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 10

Allura Threaded DiscussionsMessageDoc = collection( 'message', project_doc_session, Field('_id', str, if_missing=h.gen_message_id), Field('slug', str, if_missing=h.nonce), Field('full_slug', str), Field('parent_id', str),…)

_id – use an email Message-ID compatible key

slug – threaded path of random 4-digit hex numbers prefixed by parent (e.g. dead/beef/f00d dead/beef dead)

full_slug – slug interspersed with ISO-formatted message datetime

Easy queries for hierarchical data

Find all descendants of a message – slug prefix search “dead/.*”

Sort messages by thread, then by date – full_slug sort

Page 11: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 11

MonQ: Async Queueing in MongoDB

states = ('ready', 'busy', 'error', 'complete')result_types = ('keep', 'forget')

MonQTaskDoc = collection( 'monq_task', main_doc_session, Field('_id', schema.ObjectId()), Field('state', schema.OneOf(*states)), Field('result_type', Schema.OneOf(*result_types)), Field('time_queue', datetime), Field('time_start', datetime), Field('time_stop', datetime), # dotted path to function Field('task_name', str), Field('process', str), # worker process name: “locks” the task Field('context', dict( project_id=schema.ObjectId(), app_config_id=schema.ObjectId(), user_id=schema.ObjectId())), Field('args', list), Field('kwargs', {None:None}), Field('result', None, if_missing=None))

Page 12: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 12

Repository Cache Objects

On commit to a repo (Hg, SVN, or Git)

• Build commit graph in MongoDB for new commits

• Build auxiliary structures

• tree structure, including all trees in a commit & last commit to modify

• linear commit runs (useful for generating history)

• commit difference summary (must be computed in Hg and Git)

• Note references to other artifacts and commits

Repo browser uses cached structure to serve pages

Commit

Tree Trees CommitRun

LastCommitDiffInfo

Page 13: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 13

Repository Cache Lessons Learned

Using MongoDB to represent graph structures (commit graph, commit trees) requires careful query planning. Pointer-chasing is no fun!

Sometimes Ming validation and ORM overhead can be prohibitively expensive – time to drop down a layer.

Benchmarking and profiling are your friends, as are queries like {‘_id’: {‘$in’:[…]}} for returning multiple objects

Page 14: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 14

Authorization: ProjectRole Objects

ProjectRoleDoc = collection( 'project_role', main_doc_session, Field('_id', schema.ObjectId()), Field('user_id', schema.ObjectId(), index=True), Field('project_id', schema.ObjectId(), index=True), Field('name', str), Field('roles', [schema.ObjectId()]), Index('user_id', 'project_id', 'name', unique=True) )

class ProjectRole(object): passmain_orm_session.mapper(ProjectRole, ProjectRoleDoc, properties=dict( user_id=ForeignIdProperty('User'), project_id=ForeignIdProperty('Project'), user=RelationProperty('User'), project=RelationProperty('Project’)))

Page 15: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 15

Authorization: ProjectRole Objects

Roles can be named roles (“Groups”) or user proxies. Roles inherit all permissions of the roles they can “act as”

User membership in a group is stored on the user proxy object (the list of roles for which the user has permission)

Authorization checks all roles transitively for a user. If any role has the appropriate permission being required, then access is granted.

Hierarchical role structures are supported, but not exposed in the UI.

Page 16: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatConfidential Geeknet, page 16

Flyway Migrations

Ming supports “lazy migrations” from one schema version to another automatically

Sometimes you want to explicitly version your DB

Flyway allows you to define various versions of your schema with pre- and post-conditions for running an “up” migration and a “down” migration

With multiple tools with interdependencies and a platform under it all, we thought we needed it

We didn’t, but it’s there and it works….

Page 17: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 17

What We Liked Performance, performance, performance – Easily handle

90% of SF.net traffic from 1 DB server, 4 web servers

Schemaless server allows fast schema evolution in development, making many migrations unnecessary

Replication is easy, making scalability and backups easy Keep a “backup slave” running

Kill backup slave, copy off database, bring back up the slave

Automatic re-sync with master

Query Language You mean I can have performance without map-reduce?

GridFS

Page 18: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 18

Pitfalls Too-large documents

Store less per document Return only a few fields

Ignoring indexing Watch your server log; bad queries show up there

Too much denormalization Try to use an index if all you need is a backref

Ignoring your data’s schema Using many databases when one will do Using too many queries

Page 19: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 19

Open Source

Minghttp://sf.net/projects/merciless/

MIT License

Allurahttp://sf.net/p/allura/

Apache License

Page 20: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 20

Future Work

mongos New Allura Tools Migrating legacy SF.net projects to Allura Stats all in MongoDB rather than Hadoop? Better APIs to access your project data

Page 21: Allura - an Open Source MongoDB Based Document Oriented SourceForge

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 21

Rick Copeland@rick446

[email protected]