Post on 16-Jan-2017
2
Onofre Profile
• Onofre: CVS Brazil´s operations. • Pharmacy network 50 stores.• 2100 employees• 01 distribuition center• 37% sales thru e-commerce• 25% thru mobile/tablet
• CallCenter: 201 positions• No omni-channel process.
3
IT perspective
• SAP/ECC IS Retail as a central component• SAP/BObjects: Limited licences per users• Just finantial team
• POS/System legacy: Cobol• Okidata/Itautec
• Ecommerce legacy: Vanroy• .NET customized solution
• 100% datacenter operation internal• No outsourcing• No Cloud Services
5
Case: Sales Performance Info• No mobile for sales report: Just
desktop access.• No friendly & resumed dashboard• +1 day delay: Todahy sales just from
yesterday.• Slow performance: More than 1
minute per report• E-commerce
• No sales result by region• No compete conversion rate report
• Main Physical store needs• No sales loss caused by stock-rupture
6
Project ‘WEB Pharma’• Objectives• Make user-fliendly dashboard with main business retail decision info.• Be mobile!. Users must use dashboard remotly using internet devices.• Ecommerce & Physical stores resume sales toghether• All reports must be delivered in less than 10s
• Strategy• Export legacy data for a external-cloud dataserver. (No use internal datacenter)• Data-streaming must process data from last 1 hour sales.
• Premisses• 100% secure connection (SOX complience)• Low CAPEX & limited budget• 03 months deadline.
7
Big Data Architecture
Brick&Mortar Store
E-commerce (WEB) Vanroy.NET
CobolOkidata
.csv
.csv
DataPipe
Data Integrator
ApacheFlume
MapReduceHDFS
UserInterface
ApacheFlume
WorkflowScheduling
ApacheOozie
CDH3
Hbase
HiveSQL
Tableau Connector
Sqoop
Tableau OnLine
D3 Visualization
SSH
SSH
MySQL/S3
8
BI x Big Data: ComparisonBusiness Intelligence Big Data
Volume Terabytes Petabytes
Velocity Batch, Real-Time, Near RT Streams
Data Source Internal ExternalValue One single font of true Statistical and hypothetical
Variety Single sources Probabilistic and multi-factor
Data sharpness Consistent and reliable Better to be roughly right than precisely wrongFrequency Millions of records per minute Billions of documents per second
Master Data Important part of results Not necessary
Servers Sizing Evolution planned. Could be done internally. Elastic Cloud considered an alternative.
Storage/memory growing faster than ever. Elastic Cloud is crucial.
9
BI x Big Data: ComparisonBusiness Intelligence Big Data
Main Business Objective
Business Monitoring, internal insights and process optimization
Data monetization, business metamorphosis and new opportunities
Object of analysis Current business process Non existent business process
Data Source Internal ExternalApproach Reactive. What happened and lets see what we can do?
Predictive. What will happen tomorrow and lets be prepared?
Mindset Examine the data and find the problem root causes, proposing process optimization
How we can make some REAL money with this data?
Data sharpness Consistent and reliable Better to be roughly right than precisely wrongScope 02 or 03 departments Intire company cross departments
Business Model Benchmark pre-existent No benchmark
View Modeling Pre concepted KPI already pre-formatted
No idea what exactly the objective and business needs
10
Why AWS?• Ready to GO cloud services;• Scalable;• Cost-Effective;In this project• Ready Secure Internet connection (SSH)• S3: Simple web services interface• EC2: Linux CentOS ready to go template.• Cloudera Partner• Pipeline: Reliably process and move data between different AWS
compute and storage services
11
Server Highlights• 21.5 TB historic data (03 years) • Risk: Poor data-transfer network• AWS Import/Export Snowball
Data • Data transfer Estimate> 140MB per
data-package• 200 package/day: 28GB/DAYPRD Server config• RedHat 6.4, 256GB of RAM, • Processor: 4 x 12 Cores – 5Ghz• 2x420 storage (10G)
Users• 350 users• 50 stores• 40MB/day each
Network Bandwidth• Inbound:
• 5Gb• Outbound:
• 10Gb
12
Hadoop Highlights• Objective: Fast response for final users
• Masternodes> 01 (*)• SlaveNodes > 07
• Sqoop: Hadoop native connector > MySQL • Hue: SQLlike soft UI for DBA for data-
validation.• Oozie: Scheduler system to manage
Hadoop jobs.• triggered by time (frequency) and data
availability.
• Hive: Querying large datasets. • SQL-like language: HiveQL.
(*) Modified after go-live
13
Why Cloudera?• Stable Hadoop distribuition• Simple admin: Cloudera Manager• IntegradedIn this project• Tableau ready-to-go connector• CDH3: Open source (cost-effective)• Fast installation• Fast Tunning
14
Why Tableau?• User friendly with high user satisfaction impact• Mobile ready-to-go application• Easy to install in Androi Apps.In this project• Cost-effective solution• Lowest price by final user.
• Retail ready-to-go template.• Brazilian localization done.• BC in Retail
15
Why Not SAP?• High cost in user-licence (Project demands 350 new users)• SAP/Business objects retail template with Low adherence • Huge investiment in customized reporting
• Hardware processing concorrence with financial users• Impact in results monthly closing reporting.• High investment in hardware instance to get expected performance
• 2013: No AWS instance ready for SAP/BOBJ• SAP/HANA not mature yet. • Lack of consultants
• No business case (Retail) running in Brazil
16
Project Methodology• BI projects: Intensive REAL data validation• Key-Users must really believe in new indicators (expectations).
• Intense deliverable schudule: Antecipation for Validation• Minimum project Scope: 10 reports
• 07 standards: Tableau• 03 Customized: D3 visualization
• 01 Dashboard• Tableau
• Project implementation Strategy: PoC• Consistent validation: 02 Stores & 10 users• Testin with real environment: Consistent Issues Log (performance)
17
Project Schedule
AWSS3, EC2 & Data Pipes Instalation
Cloudera (Hadoop)CDH3 InstallationFlume & Hive Set-up
IntegrationsCSV data entryTableau conectorSqoop set-up
VisualizationIndicators DesignTableau configurationD3 configuration
Testing & QALoad historic DataFinal Devs ValidationPoC (02 stores)Adjustments & Tunning
GOFinal PRD DeliveryAssisted Operation
01 02 03
Go-LiveDuration (in months)Activities PoC
18
Project Results• Reponse time: 0,4s• High adherence from users.• Data visualization triggers
several bisiness iniciatives• 2ª wave aproveed with 02
additional dashboards and 32 new reports.• WEB reports demonstrate
OMNI channel process struturation & new Business needs