Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
-
Upload
shirshanka-das -
Category
Data & Analytics
-
view
535 -
download
1
Transcript of Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
![Page 1: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/1.jpg)
Architecting for change: LinkedIn's new data ecosystem
Sept 28, 2016
Shirshanka Das, Principal Staff Engineer, LinkedIn Yael Garten, Director of Data Science, LinkedIn
@shirshanka, @yaelgarten
![Page 2: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/2.jpg)
Design for change. Expect it. Embrace it.
![Page 3: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/3.jpg)
Product Change Technology Culture &
Process
Learnings
![Page 4: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/4.jpg)
The Product Change: Launch a completely rewritten LinkedIn mobile app
![Page 5: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/5.jpg)
What does this impact?
Data driven product
![Page 6: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/6.jpg)
Tracking data records user activity
InvitationClickEvent()
![Page 7: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/7.jpg)
Tracking data records user activity
InvitationClickEvent()
Scale fact: ~ 1000 tracking event types, ~ Double-digit TB per day, hundreds of metrics & data products
![Page 8: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/8.jpg)
user engagement tracking data
metric scripts
production code
Tracking Data Lifecycle
TransportProduce Consume
Member facing data products
Business facing decision making
![Page 9: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/9.jpg)
Tracking Data Lifecycle & Teams
TransportProduce Consume
Product or App teams: PMs, Developers, TestEng
Infra teams: Hadoop, Kafka, DWH, ...
Data teams: Analytics, Relevance Engineers,...
user engagement tracking data
metric scripts
production code
Member facing data products
Business facing decision making
![Page 10: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/10.jpg)
How do we calculate a metric: ProfileViewsPageViewEvent
Record1:{"header":{"memberId":12345,"time":1454745292951,"appName":{"string":"LinkedIn""pageKey":"profile_page"},},"trackingInfo":{["vieweeID":"23456"], ...}}
Metric: ProfileViews = sum(PageViewEvent) where pageKey = profile_page
PageViewEvent
Record1:{"header":{"memberId":12345,"time":1454745292951,"appName":{"string":"LinkedIn""pageKey":"new_profile_page"},},"trackingInfo":{["vieweeID":"23456"], ...}}
or new_profile_page
![Page 11: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/11.jpg)
PageViewEvent Record1:{"header":{"memberId":12345,"time":1454745292951,"appName":{"string":"LinkedIn""pageKey":"profile_page"},},"trackingInfo":{["vieweeID":"23456"], ...}}CASEWHENtrackingInfo["profileIds"]...WHENtrackingInfo["profileid"]...WHENtrackingInfo["profileId"]...WHENtrackingInfo["url\$profileIds"]...WHENtrackingInfo["11"]LIKE'%profileIds=%'THENSUBSTRING(trackingInfo["11"],9,60)WHENtrackingInfo["12"]LIKE'%priceIds=%'THENSUBSTRING(trackingInfo["12"],9,60)ELSENULLENDASprofile_id
Evolution as we mature and grow...
Metric: ProfileViews = sum(PageViewEvent where pagekey = profile_page and
memberID != trackinginfo[vieweeID] )
![Page 12: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/12.jpg)
Eventually… unmaintainableget_tracking_codes=foreachget_domain_rolled_upgenerate..entry_domain_rollup, ((tracking_codematches'eml-ced.*'ortracking_codematches'eml-b2\\_content\\_ecosystem\\_digest.*'or(refererisnotnulland(referermatches'.*touch\\.linkedin\\.com.*trk=eml-ced.*'orreferermatches'.*touch\\.linkedin\\.com.*trk=eml-b2\\_content\\_ecosystem\\_digest.*'))?'Email-CED':(tracking_codematches'eml-.*'or(refererisnotnullandreferermatches'.*touch\\.linkedin\\.com.*trk=eml-.*')orentry_domain_rollup=='Email'?'Email-Other':(tracking_code=='hp-feed-article-title-hpm'andentry_domain_rollup=='Linkedin'?'HomepagePulseModule':((tracking_codematches'hp-feed-.*'andentry_domain_rollup=='Linkedin')or(std_user_interfacematches'(phoneapp|tabletapp|phonebrowser|tabletbrowser)'andtracking_code=='v-feed')or(tracking_code=='OrganicTraffic'andentry_domain_rollup=='Linkedin'and(referer=='https://www.linkedin.com/nhome'orreferer=='http://www.linkedin.com/nhome'))?'Feed':(tracking_codematches'hb_ntf_MEGAPHONE_.*'andentry_domain_rollup=='Linkedin'?'DesktopNotifications':(tracking_code=='m_sim2_native_reader_swipe_right'?'PushNotification':(tracking_code=='pulse_dexter_stream_scroll'andentry_domain_rollup=='Linkedin'?'Pulse-InfiniteScrollonDexter':--infinitescrollondexter((tracking_code=='pulse_dexter_nav_click'ortracking_code=='pulse-det-nav_art')andentry_domain_rollup=='Linkedin'?'Pulse-LeftRailClickonDexter':--leftrailclickondexter(tracking_code=='OrganicTraffic'andrefererisnotnullandreferermatches'.*linkedin\\.com\\/pulse\\/article.*'?'PublishingPlatform':'NoneFoundYet'))))))))))asentry_point;
Homepage team
Push Notification team
Email team
Long form post team
![Page 13: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/13.jpg)
PageViewEvent Record1:{"header":{"memberId":12345,"time":1454745292951,"appName":{"string":"LinkedIn""pageKey":"profile_page"},},"trackingInfo":{["vieweeID":"23456"], ...}}
We wanted to move to better data models
LI_ProfileViewEvent
Record1:{"header":{"memberId":12345,"time":4745292951145,"appName":{"string":"LinkedIn""pageKey":"profile_page"},},"entityView":{"viewType":"profile-view","viewerId":“12345”,
"vieweeId":“23456”,},}
![Page 14: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/14.jpg)
Two options:
1. Keep the old tracking: a. Cost: producers (try to) replicate it (write bad old code
from scratch), b. Save: consumers avoid migrating.
2. Evolve. a. Cost: time on data modeling, and on consumer
migration, b. Save: pays down data modeling tech debt
How much work would it be?
![Page 15: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/15.jpg)
How much work would it be?
Two options:
1. Keep the old tracking: a. Cost: producers (try to) replicate it (write bad old code
from scratch), b. Save: consumers avoid migrating.
2. Evolve. a. Cost: time on data modeling, and on consumer
migration, b. Save: pays down data modeling tech debt2000 daysEstimated cost to update consumers to new tracking with clean, committee-approved data models
Estimated cost for producers to attempt to replicate old tracking
5000 days#AnalyticsHappiness
![Page 16: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/16.jpg)
The Task and Opportunity
Must do: So we will do the data modeling, and rewrite all the metrics to account for the changes happening upstream… but…
Extra credit points: How do we make sure that the cost is not this high the next time?
How do we handle evolution in a principled way?
![Page 17: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/17.jpg)
Product Change Technology Culture &
Process
Learnings
![Page 18: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/18.jpg)
Metrics ecosystem at LinkedIn: 3 yrs ago
Operational Challenges Diminished Trust due to multiple sources of truth
![Page 19: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/19.jpg)
Data Stages
Ingest Process Serve VisualizeCreate
![Page 20: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/20.jpg)
Ingest Process Serve VisualizeCreate
Tracking
Kafka Espresso
…
![Page 21: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/21.jpg)
Tracking Architecture
SDKs in different frameworks (server, client)
Tracking front-end
Monitoring Tools
Components
KafkaClient-side Tracking
Tracking Frontend
Services
Tools
Create
![Page 22: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/22.jpg)
Data Stages
Ingest Process Serve VisualizeCreate
![Page 23: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/23.jpg)
Unified Ingestion with
Hundreds of TB / day
Thousands of datasets
80+% of data ingest
Ingest
![Page 24: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/24.jpg)
In production @ LinkedIn, Intel, Swisscom, NerdWallet, PayPal
Ingest
![Page 25: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/25.jpg)
Ingest Process Serve VisualizeCreate
Hadoop
![Page 26: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/26.jpg)
Processing engines @ LinkedInProcess
![Page 27: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/27.jpg)
Ingest Process Serve VisualizeCreate
Pinot
![Page 28: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/28.jpg)
Pinot
Kafka Hadoop
Samza Jobs
Pinot
minuteshour +
Distributed Multi-dimensional OLAP Columnar + indexes No joins Latency: low ms to sub-second
Serve
![Page 29: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/29.jpg)
Site-facingApps Reportingdashboards Monitoring
In production @ LinkedIn, Uber
Serve
![Page 30: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/30.jpg)
Ingest Process VisualizeCreate
Hadoop Pinot Raptor
Serve
![Page 31: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/31.jpg)
Ingest Process VisualizeCreate
Unified Metrics Platform (UMP)
Hadoop Pinot Raptor
Serve
![Page 32: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/32.jpg)
Unified Metrics Platform
Metrics Logic
Raw Data
Pinot
UMP HarnessIncremental Aggregate Backfill Auto-join
Raptor dashboards
HDFS
Aggregated Data
ExperimentAnalysis
Relevance
...
HDFS
Ad-hoc
![Page 33: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/33.jpg)
Ingest Process Serve VisualizeCreate
RaptorKafkaEspresso
…
Hadoop Pinot
Tracking Unified Metrics Platform (UMP)
![Page 34: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/34.jpg)
![Page 35: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/35.jpg)
How do we handle old and new?PageViewEvent
ProfileViewEvent
Producers Consumers
old
new
Relevance
Analytics
![Page 36: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/36.jpg)
The Big Challenge
load “/data/tracking/PageViewEvent” using AvroStorage()
(Pig scripts)
My Raw Data
Our scripts were doing ….
![Page 37: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/37.jpg)
My Raw DataMy Data API
We need “microservices" for Data
![Page 38: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/38.jpg)
The Database community solved this decades ago...
Views!
![Page 39: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/39.jpg)
We had been working on something that could help...
A Data Access Layer for Linkedin
Abstract away underlying physical details to allow users to
focus solely on the logical concerns
Logical Tables + Views
Logical FileSystem
![Page 40: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/40.jpg)
Solving With Views
Producers
LinkedInProfileView
PageViewEvent
ProfileViewEventnew
old
Consumers
pagekey==
profile
1:1
Relevance
Analytics
![Page 41: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/41.jpg)
Views ecosystem
41
Producers Consumers
LinkedInProfileView
JSAProfileViewJob Seeker App (JSA)
LinkedIn App
UnifiedProfileView
![Page 42: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/42.jpg)
Data Catalog + Discovery
(DALI)
DaliFileSystem Client
Data Source(HDFS)
Data Sink(HDFS)
Processing Engine(MapReduce, Spark)
DALI Datasets (Tables + Views)
Query Layers (Hive, Pig, Spark)
View Defs + UDFs
(Artifactory, Git)
Dataflow APIs(MR, Spark,
Scalding)DALI CLI
Dali: Implementation Details in Context
![Page 43: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/43.jpg)
From
load ‘/data/tracking/PageViewEvent’ using AvroStorage();
To
load ‘tracking.UnifiedProfileView’ using DaliStorage();
One small step for a script
![Page 44: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/44.jpg)
A Few Hard Problems
Versioning Views and UDFs
Mapping to Hive metastore entities Development lifecycle
Git as source of truth
Gradle for build LinkedIn tooling integration for deployment
![Page 45: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/45.jpg)
Early experiences with Dali views
How we executed Lots of work to get the infra ready Closed beta model Tons of training and education (hand holding) for all Governance body
Feedback from analysts is overwhelmingly positive: + Much simpler to share and standardize data cleansing code with peers + Provides effective insulation to scripts from upstream changes - Harder to debug where problems are due to additional layer
![Page 46: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/46.jpg)
State of the world today
~100 producer views
~200 consumer views ~30% of UMP metrics use Dali data
sources
~80 unique tracking event data sources
ProfileViews MessagesSent Searches InvitationsSent ArticlesRead JobApplications ...
![Page 47: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/47.jpg)
What’s next for Dali?
Real-time Views on streaming data
Selective materialization
Hive is an implementation detail, not a long term bet
Open source
Data Quality Framework
![Page 48: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/48.jpg)
Product Change Technology Culture &
Process
Learnings
![Page 49: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/49.jpg)
Infrastructure enables, but culture really preserves
get_tracking_codes=foreachget_domain_rolled_upgenerate..entry_domain_rollup, ((tracking_codematches'eml-ced.*'ortracking_codematches'eml-b2\\_content\\_ecosystem\\_digest.*'or(refererisnotnulland(referermatches'.*touch\\.linkedin\\.com.*trk=eml-ced.*'orreferermatches'.*touch\\.linkedin\\.com.*trk=eml-b2\\_content\\_ecosystem\\_digest.*'))?'Email-CED':(tracking_codematches'eml-.*'or(refererisnotnullandreferermatches'.*touch\\.linkedin\\.com.*trk=eml-.*')orentry_domain_rollup=='Email'?'Email-Other':(tracking_code=='hp-feed-article-title-hpm'andentry_domain_rollup=='Linkedin'?'HomepagePulseModule':((tracking_codematches'hp-feed-.*'andentry_domain_rollup=='Linkedin')or(std_user_interfacematches'(phoneapp|tabletapp|phonebrowser|tabletbrowser)'andtracking_code=='v-feed')or(tracking_code=='OrganicTraffic'andentry_domain_rollup=='Linkedin'and(referer=='https://www.linkedin.com/nhome'orreferer=='http://www.linkedin.com/nhome'))?'Feed':(tracking_codematches'hb_ntf_MEGAPHONE_.*'andentry_domain_rollup=='Linkedin'?'DesktopNotifications':(tracking_code=='m_sim2_native_reader_swipe_right'?'PushNotification':(tracking_code=='pulse_dexter_stream_scroll'andentry_domain_rollup=='Linkedin'?'Pulse-
![Page 50: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/50.jpg)
For a great data ecosystem that can handle change:
1. Standardize core data entities
2. Create clear maintainable contracts between data producers
& consumers
3. Ensure dialogue between data producers & consumers
![Page 51: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/51.jpg)
1. Standardize core data entities• Event types and names: Page, Action, Impression
• Framework level client side tracking: views, clicks, flows
• For all else (custom) - guide when to create a new Event or Dali view
Navigation
Page View
Control Interaction
![Page 52: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/52.jpg)
2. Create clear maintainable contracts
1
1. Tracking specification with monitoring: clear, visual, consistent contract
Need tooling to support culture shift
Tracking specification Tool
2
2. Dali dataset specification with data quality rules
![Page 53: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/53.jpg)
3. Ensure dialogue between Producers & Consumers• Awareness: Train about end-to-end data pipeline, data modeling • Instill communication & collaborative ownership process between all: a step-by-step
playbook for who & how to develop and own tracking PM → Analyst → Engineer → All3 → TestEng → Analyst
user engagement tracking data
metric scripts
productioncode
Member facingdata products
Business facing decision making
![Page 54: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/54.jpg)
Product Change Technology Culture &
Process
Learnings
![Page 55: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/55.jpg)
Our LearningsCulture and Process ● Spend time to identify what needs culture & process, and
what needs tools & tech ● Big changes can mean big opportunities ● Very hard to massively change things like data culture or data tech debt; never
a good time to invest in “invisible” behind-the-scenes change→ Make it non-invisible -- try to clarify or size out the cost of NOT doing it→ needed strong leaders, and a village
Tech
● Must build tooling to support that culture change otherwise culture will revert ● Work hard to make any new layer as frictionless as possible ● Virtual views on Hadoop data can work at scale! (Dali views)
![Page 56: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/56.jpg)
For a great data ecosystem that can handle change:
1. Standardize core data entities
2. Create clear maintainable contracts between data producers & consumers
3. Ensure dialogue between data producers & consumers
Design for change. Expect it. Embrace it.
![Page 57: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/57.jpg)
Did we succeed? We just handled another huge change!
#AnalyticsHappiness
![Page 58: Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem](https://reader031.fdocuments.in/reader031/viewer/2022022201/58890c211a28ab4a5c8b4f0d/html5/thumbnails/58.jpg)
Thank you.
@shirshanka, @yaelgarten