Cisco: Application clustering with Couchbase – Couchbase Connect 2016
Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016
Transcript of Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016
Confidential and Proprietary
CONNECTING THE DOTS WITH COUCHBASE
Nov 2016Jay Duraisamy
Gijun Lee
Confidential and Proprietary 2
PresentersJay Duraisamy – VP Technology
• Currently leading Data Platforms group within Equifax, a core platform organization that supports US Consumer Information Solutions group ($1.3 billion). The group is responsible for petabyte scale infrastructure (5PB's and growing) for both offline (MPP, and Big Data) and Online (incl. NoSQL).
• 18 years of Industry experience in building teams and platforms leveraging expertise in software architecture and design philosophies. Worked as a developer, lead and architect on B2B, B2C and Big Data technologies. Graduate degree from Indian Institute of Technology and MBA from Goizueta Business School from Emory University, Atlanta.
• Enjoy Jogging, Reading and spending time with his twin daughters!
Gijun Lee - Application Developer IV• Currently working on Equifax B2B data platform that supports US consumer data analytics &
processing both offline & online. Recently developed B2B Rest service that serves financial history of U.S. consumers on Couchbase in Java.
• 16 years of design/development experience in financial applications including infrastructure, online/offline data analytics & processing in C/C++ & Java on Linux/Unix platform. Huge interest in NoSQL & Hadoop platform in Big Data space. Master of Science in Computer Science from University of Arkansas at Fayetteville.
• Enjoy hiking, watching movies, and travelling with family.
Confidential and Proprietary 3
Equifax & The Business of Big Data• An Information Technology company that operates in 24 countries. • A consumer credit company grown into a leading provider of insights and knowledge that helps
its customers make informed decisions. • The company organizes, assimilates and analyzes data on more than 820 million consumers and
more than 91 million businesses worldwide, and its database includes employee data contributed from more than 5,000 employers.
• Big Data before Big Data• First MPP/Grid Computing in 2003,
currently in production• Focused on high throughput
systems to deliver terabytes of data and Insights to FI’s and Banks
• Petabytes in Scale• Talent that can distinguish and gain
between low latency and high throughput trade offs.
Big Data Why NoSQL?
Confidential and Proprietary 4
Big Data Online & The Teamwork
Confidential and Proprietary
Technology Requirements
PLAN EVALUATE
Q1 ‘16 Q4 ’15 Q1’16 Q2’16
INTEGRATEBUILD
Potential TimelineLAUNCH
Next steps
1. Keep in mind of the tight SLA (5ms) and timeline for Q2’16 launch
2. Evaluate Technologies – Redis, Mongo and Couchbase
3. Grade the Technical support from the Partners during the evaluation
4. Choose the winning Technology Partner and Negotiate the Software agreement
5. Build, Integrate, Deploy and Run
Key Value Store• Key to retrieve data, no complex queries• NoSQL document – Complex data objects with no normalization
Ever Growing Data• Current use case is little over a TB, but plan for other use cases• Scale for Multi-terra bytes of data expected in the future.
High Performance & Availability• System uptime and replication for fault tolerance. • DR Capabilities
Others• Application Development friendly• Integration with Hadoop, Spark and Elastic Search
Confidential and Proprietary 6
The Winner is … In Memory and Disk Key Value Store - ForestDB Distributed Documented Database Automatic Replication Integrated Caching Primary and Secondary Indexes Spatial Querying LDAP integration and admin auditing Master-Master and Master-Slave Replication Memcached Protocol and Restful HTTP API N1QL – SQL-like query language Multi-dimensional scaling Cross data center replication filtering - XDCR
Confidential and Proprietary 7
NuDB – Architecture
Confidential and Proprietary 8
NuDB Development
Storage Format•24 month trended credit data in JSON
•App specific metadata
•Compression with base64 encoding
Interface•JSON based HTTP Post
•Retrieve, Update, Add and Delete operations
•Spring MVC to marshal request response
App Server•App server in Tomcat shields Couchbase as backend
•Simple drop installation
•DAO to decouple Database transactions
Data Ingestion•Online live system, Ingest data faster with little downtime
•RxJava, multi-threaded parallel loader
•Programmed in Java
Confidential and Proprietary 9
NuDB Deployment
Cluster
App Server
8 Node Cluster with 2x replication, 100% data cached in memory, RAID 10 mirroring
2 Linux ETL server as App Server w/ failover, Load balancing with F5
Monthly import and export via Control-M scheduler when cluster is live, No impact to production
System generated transactions to monitor health, Transactions aggregated time monitored
Regular transactions extractions to UAT to monitor for verification and validation
XDCR to handle Cluster Replication. No coding required
Ingestion
Monitoring
Sampling
DR
Confidential and Proprietary 10
NuDB – Lessons LearnedData Compression RxJava View and Consoles
• Compression friendly internal data format
• Compression saved 70% in document size
• Compression helps nullify the additional storage needed for replication
• Compression helps in data import. IO bound operations with 50% increase in CPU clock time
• Hadoop based import tool was replaced by RxJava
• RxJava utilizes resources better
• 300 million documents (1TB) in 40 minutes with 2 Java processes
• Exported 50million transactions in 10 minutes with 1 Java process
• Need to identify the latest updated transactions
• Initial design was to use Kafka asynchronous and switched to Couchbase views
• Operations team uses Views to analyze data. No additional coding required
• Couchbase health via Console
Confidential and Proprietary 11
Performance and Stress Testing
• 8 external servers with 2 threads per server
• 15 hours of continuous transactions
• Estimated 115 million transactions
• Average transaction time is 60ms
• Only failure observed was due to log filling disk after 15 hours
Stress Testing• 2133 Ops/Sec in Debug
mode. • 500K to 1.6million
Ops/Sec with Couchbase Pillowfight load test tool
• System can support up to an estimated 250million transactions/day approximately
Performance Sample Stats
Confidential and Proprietary 12
IN T
HE
NE
WS
Confidential and Proprietary 13
Questions?