Why EDP Chose MongoDB

20
Why EDP chose Artyom Diky William Biesty Mark Velez

description

The DOHMH (NYC Department of Mental Health and Hygiene) uses MongoDB for their internal document management system called DocSpace. This presentation outlines -the system -how they came to adopt MongoDB -migrating from a relational DB to a document-oriented one -the advantages and disadvantages we’ve encountered and how we have managed them -Next steps with MongoDB

Transcript of Why EDP Chose MongoDB

Page 1: Why EDP Chose MongoDB

Why EDP chose

Artyom Diky

William Biesty

Mark Velez

Page 2: Why EDP Chose MongoDB

Agenda

• Who are we?

• Evolution of Document Management

• File system to relational DB

• Relational to document-oriented DB

• Paper to electronic

• Advantages and Challenges

• Questions?

Page 3: Why EDP Chose MongoDB

Who Are We?

• New York City Department of Health and Mental Hygiene

• Environmental Health Services (EHS)

• Environmental Disease Prevention (EDP) • Lead Poisoning Prevention Program (LPPP)

• MIS Unit we are here

• We support many programs within EDP

• Who are our stakeholders? • Inspectors

• Researchers

• Clinical Staff

• Lawyers (FOIL)

Page 4: Why EDP Chose MongoDB

Evolution of Document Management Paper

• A lot of legal documents on paper

• Historic - from the '70s and up

• Current (ongoing)

• Problems with Paper

• Time and Labor Intensive • Locate, Copy, Redact, Copy, Mail (Repeat….)

• Storage Space

• Disaster Recovery

Page 5: Why EDP Chose MongoDB

Evolution of Document Management eFiles

• VB6

• Scanning utilities

• File-system based storage

• Millions of files

• Identifiers based on child ID

Page 6: Why EDP Chose MongoDB

Evolution of Document Management eFiles Issues

• Technical • VB6 phased out

• Outdated 3rd party tools changed API

• License expired

• Security • Documents have been redacted permanently

• No access control to private information

• Scalability • New document types

• New indexing (tagging) mechanisms for search

Page 7: Why EDP Chose MongoDB

Evolution of Document Management

• Need for better document management

• Paperless offices mandate

• Expand searchable attributes and document text

• Update technology

• Improved security

• HIPAA compliance

• Platform for future applications

Page 8: Why EDP Chose MongoDB

File System to Relational DB

• Challenges:

• 1M+ historical documents as image files

• Need for document metadata

• Various and evolving schemas

• Security

• Updates and migration

• Fail-safe storage

Page 9: Why EDP Chose MongoDB

Technologies

• We use Microsoft technologies

• SQL Server

• .NET

• We are a small team that develop and support dozens of data collection apps (forms)

• Risk assessments

• Inspection Reports

• Research

• Case Management

Page 10: Why EDP Chose MongoDB

Example Documents event_date child ID document_type

me_num

Page 11: Why EDP Chose MongoDB

File System to Relational DB

FileStream • MSSQL 2008

o Data storage with FileStream

o Metadata with Entity-Attribute-Value

sql_variant

o Data-driven application design

• Rich service-oriented API through WCF

• Search engine

• Added features

o Versioning

Change and revert

Page 12: Why EDP Chose MongoDB

DocSpace SQL Architecture

Page 13: Why EDP Chose MongoDB

Limitations of Relational Model

• Need faster development cycle

• Double effort for development and maintenance

• On application and database level

• Document definition (metadata) first, content later

• Changing schema

• Rigid document structure • Not amenable to change

• No support for non-primitive values

Page 14: Why EDP Chose MongoDB

Effects on Development Cycle

• SQL Waterfall-like approach

• Fully develop requirements before implementation • Gotta get the schema right to avoid hassle

• Change discouraged

• MongoDB Rapid Application Development

• Prototyping

• Change accommodated

Page 15: Why EDP Chose MongoDB

Document Management System Done Right

• Faster development cycles

• No translation of complex document structure into relational model

• Application driven schema

• Document content first, metadata later

• Flexible document structure driven by user requirements

• GridFS for large documents

Page 16: Why EDP Chose MongoDB

DocSpace MongoDB Architecture

Page 17: Why EDP Chose MongoDB

Case Study - Traffic Fatalities

• A study of traffic-related fatalities in NYC

• Injury Surveillance and Prevention

• Offline data collection

• 330+ data points

• Multiple weekly changes to schema

o Add/remove fields

o Value types

• Developed in 500 hrs (3 months)

• 1 intermediate developer, 1 novice

Page 18: Why EDP Chose MongoDB

Evolving Use of MongoDB

• Single Node with Database Security

• Nightly Dump for Backup Archiving

• Master – Slave Nodes

• Replica Sets – 3 Nodes

• Distributed across Metropolitan Area Network

• Bare Iron Primary, VMware ESX and Hyper-V VM Secondaries

•Hurricane Sandy – No downtime, one node failed

Page 19: Why EDP Chose MongoDB

Thank you!

Questions