File System On Steroids

25
File system on steroids an introduction to JCR Jukka Zitting Apache Jackrabbit

description

Presentation at ApacheCon EU 2008 in Amsterdam

Transcript of File System On Steroids

Page 1: File System On Steroids

File system on steroidsan introduction to JCR

Jukka Zitting

Apache Jackrabbit

Page 2: File System On Steroids

Agenda

• Big Picture

• Content Repository

• Repository Features

• Apache Jackrabbit

Page 3: File System On Steroids

The Big Picture

User Interface

Processing

Storage

Page 4: File System On Steroids

Our Focus: Storage

Main requirements

• Persistence

• Consistency

• Scalability

• Performance

Main alternatives

• File system

• Database

• Network

Page 5: File System On Steroids

Introducing TheContent Repository

File system Database

Content Repository

readwrite

transactions

structuredintegrity

query

hierarchicalstreams

access control

locking

observationversioning

full text

unstructured

Page 6: File System On Steroids

JCR, JSR 170, JSR 283• Content Repository for Java

Technology API– Not just the Java API, but also the

content repository semantics– POSIX file system defined as a C API

• Accessible from other environments– JVM: Groovy, JRuby, Scala, etc.– Network: WebDAV, Ajax (JSON)– Ports planned: .NET, PHP

Page 7: File System On Steroids

Why Something New?

• Goal: Single API for all storage– Universal access– No content silos

• Existing systems don't cover all needs– Reiser: “Storage layers above the FS: A

sure symptom the FS developer has failed”

• Solution: Content repository

Page 8: File System On Steroids

Content Repository Semantics

• Everything is content– Hierarchy of named and typed nodes– Content in named and typed properties

• Superset of file system semantics– Can be used to store files and folders,

and more– Can be mounted as a file system

• With many database semantics

Page 9: File System On Steroids

Granularity of Content

Page 10: File System On Steroids

Granularity of Content, 1/2

• File systems are typically best with coarse grained content– Small files in ReiserFS, NTFS, etc. – Extended properties in many systems

• XML & co for fine grained content– DJB: “Don't parse”

Page 11: File System On Steroids

Granularity of Content, 2/2

• Databases are best with fine grained content– Blobs are becoming better supported– Often special limitations for search,

access, etc.

• Content repository: Uniform interface for both stream and scalar properties

Page 12: File System On Steroids

Structure vs. Flexibility

Page 13: File System On Steroids

Structure vs. Flexibility

• File systems have no constraints– Any file or directory can go anywhere– Naming conventions and access control

• Databases have nothing but constraints– Structure of content is predefined

• Content repository: Both structured and unstructured content

Page 14: File System On Steroids

Search

Page 15: File System On Steroids

Search

• Traditionally no search in file systems

• Custom indexers and search APIs– Google Desktop Search– Mac OS X Spotlight– Lucene in many applications

• Content repository: Built-in search with full text indexing

Page 16: File System On Steroids

Transactions

Page 17: File System On Steroids

Transactions

• File systems have limited support for atomic updates– The copy-and-move trick

• No transactions that cover multiple changes– Journaling is internal to the system

• Content repository: Change sets, distributed transactions

Page 18: File System On Steroids

Versioning

Page 19: File System On Steroids

Versioning

• Typically no tracking of previous versions of content– Snapshots in ZFS & co.– Version control systems

• Backups for archival vs. restore purpose– Mac OS X Time Machine

• Content repository: Built-in versioning

Page 20: File System On Steroids

Observation

Page 21: File System On Steroids

Observation

• File system change monitoring– File Alteration Monitor– Polling– Event APIs

• Triggers in databases

• Content repository: Standard observation API

Page 22: File System On Steroids

Apache Jackrabbit

Page 23: File System On Steroids

Apache Jackrabbit

• Fully featured JCR content repository

• Releases– 1.0 in 2006– 1.4 available since January 2008– 1.5 (with explorer) planned for Q2– 2.0 (with JCR 2.0) planned for 2008

• Focus on conformance and flexibility

Page 24: File System On Steroids

Image creditsImages from the morgueFile archive, used as licensed

– http://morguefile.com/archive/?display=96733, Infographe_Elle

– http://morguefile.com/archive/?display=81906, msxo

– http://morguefile.com/archive/?display=132988, imelenchon

– http://morguefile.com/archive/?display=95446, ronnieb

– http://morguefile.com/archive/?display=175657, seriousfun

– http://morguefile.com/archive/?display=135511, rollingroscoe

– http://morguefile.com/archive/?display=134540, cohdra

– http://morguefile.com/archive/?display=196920, penywise

– http://morguefile.com/archive/?display=48096, bluekdesign

– http://morguefile.com/archive/?display=128133, gracey

Page 25: File System On Steroids

Thank you!

Questions / Comments?