Lessons learned from the proverbial battlefield - Hortonworks roadshow

Post on 15-Feb-2017

13 views 1 download

Transcript of Lessons learned from the proverbial battlefield - Hortonworks roadshow

Lessons learned from the proverbial

battlefield

Suhail Shergill, Scotiabank

Anonymous
The Wisconsin Heights Battlefield is an area in Dane County, Wisconsin where the penultimate battle of the 1832 Black Hawk War occurred. The conflict was fought between the Illinois and Michigan Territory militias and Sauk chief Black Hawk and his band of warriors, who were fleeing their homeland following the Fox Wars. The Wisconsin Heights Battlefield is the only intact battle site from the Indian Wars in the U.S. Midwest. Today, the battlefield is managed and preserved by the state of Wisconsin as part of the Lower Wisconsin State Riverway. In 2002, it was listed on the U.S. National Register of Historic Places.

Who Am ISuhail Shergill (@suhailshergill)

• Computer Science background (Programming Languages and Machine Learning)

• create and run skunkworks teams focused on data science and technology

• technical advisor to startups

• organizer of a few technical meetups

• leading the Data Science & Model Innovation group in GRM at Scotiabank.

ObjectiveWhat’s in scope• What is “Big Data”

• What are the challenges of “Big Data”

• How can some of these challenges be addressed – lessons learned

• What are we doing in Scotia

“Big Data” and Hadoop

Hadoop

Challenges of Big Data

Feedback loops • Very “big”

• Getting “bigger” at a faster rate

• Long-term solutions need to have exponential/logarithmic characteristics

Feedback loops• Very “big”

• Getting “bigger” at a faster rate

• Long-term solutions need to have exponential/logarithmic characteristics

From data to insights

From data to insights

No free lunches / silver bullets

No free lunches / silver bullets

The challenges of “Big Data”We have a very “big” problem. How do we solve it?

How to solve it

Lessons learned

Data quality is paramount

Build tools

Teach enough to question

Rotations and harmonics

Open doors

Faster and shorter feedback loops

Summary

SummaryNo silver bullet

SummaryNo silver bullet

Data quality is paramount

SummaryNo silver bullet

Data quality is paramount

Build tools

SummaryNo silver bullet

Data quality is paramount

Build tools

Teach enough to question

SummaryNo silver bullet

Data quality is paramount

Build tools

Teach enough to question

Rotations and harmonics

SummaryNo silver bullet

Data quality is paramount

Build tools

Teach enough to question

Rotations and harmonics

Open doors

SummaryNo silver bullet

Data quality is paramount

Build tools

Teach enough to question

Rotations and harmonics

Open doors

Faster & shorter feedback loops

SummaryNo silver bullet

Data quality is paramount

Build tools

Teach enough to question

Rotations and harmonics

Open doors

Faster & shorter feedback loops

What we’re doing in Scotia

Scotiabank’s Enterprise Data Lake InitiativeScotiabank’s 2015 business strategy focuses on these priorities:

• Improving the customer experience;

• Enhancing leadership capabilities throughout the organization; and

• Improving operational efficiency and effectiveness.

• A key component of the digital strategy supporting these priorities is to leverage big data analytics in order to better understand and address customer needs and preferences.

• To this end, Scotiabank is making material investments in the Hadoop technology used to support big data analytics across a wide spectrum of companies and industries.

Scotiabank’s Enterprise Data Lake – Next Steps 1. EDL 1.0 :

• Initial cluster 1PB (Jan-2016) rapidly growing to accommodate more tenants

• A very good start with consistent and commoditized stack• A review of areas we can further optimize and identify gaps• A review of areas where we require higher level flexibility &

portability• A review of what made sense to be directed where to achieve

scale , yet preserve consistency• A review of where are the limiting factors : agile and repeatable

periodically every 2-3 months2. EDL 2.0:

• Need to drive velocity: refactor engineered infrastructure environment

• Need flexibility on workload: decouple compute & data• Need workload portability: next gen hybrid architecture & cloud

Scotiabank’s Enterprise Data Lake – Highlights 1. What we got out of EDL 1.0 :

• Regulatory & Risk Reporting (RDARR)• Consolidation of divisional data repositories• Capability for Anti Money Laundering• Capability for Asset Liability Management• Consolidation of International Banking Datawarehouses• M&A and Credit Card data acquisition and analysis

Thank you