Nyc web perf-final-july-23
-
Upload
dan-boutin -
Category
Software
-
view
38 -
download
0
Transcript of Nyc web perf-final-july-23
mPulse
What’s a Beacon?
www.w3.org/TR/Beacon
Total Beacons Collected since 6/2013:~ 85 Billion
Run rate over 3B per week and growing Projected ~ 175B by 1/1/166
Big Data ChallengesData Scientists spend too much time ‘data wrangling’
“Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.”
NY Times – August 17th, 2014
Big Data ChallengesBuilding a data science platform is very difficult
Infrastructure
•Choosing big data technologies and setting up a cluster can easily take 9 months or more
Data Pipeline
•Building a high performing big data schema requires specialized skills
•Extracting, transforming, and loading of data (data wrangling) is an enormous time sink and a poor use of data scientists time
Analysis and Workflow
•Figuring out how you can ask questions of the data and how to visualize the results takes time that data scientists should be using to generate actionable insights from their studies
Julia Language & iJulia Notebook UI
Julia is a rising star in scientific programming
processing speed support for parallel processing
compatibility with 400+ prebuilt statistical packages large number and growing number of visualization libraries.
Trade-Offs
Why Julia?
R vs Python vs JuliaModern compiler technology
Data ConnectivityPackage Ecosystem
Functional Programming ConstructIntegration with Python, C, C++, R, …
© 2014 SOASTA. All rights reserved. April 15, 2023 8
Trade-Offs
•Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud.
•Columnar Database
•Extremely fast query times
•Attractive Economics
Hadoop vs Big Query vs Red Shift vs …Capabilty – managed Big Data up to 2 petabytes
Cloud Economics – $1,000 TB per month
Why Red Shift?
Data Science WorkbenchData Science without the data wrangling, and much more
Infrastructure
Data PipelineAnalysis and Workflow
• Data Science Workbench comes with the state-of-the-art technology you need to analyze your customer experiences
• All of the real user beacon data is loaded into Data Science Workbench into a highly optimized schema ready for analysis
• Data science is done with Julia, a remarkably fast and in-memory solution for analyzing huge data-sets
• Access to an ever growing library of analysis functions and visualizations based on SOASTA’s and our customers’ expertise
© 2014 SOASTA. All rights reserved. April 15, 2023 13
The Result!
• Every customer beacon unpacked, transformed and loaded nightly by SOASTA into a SOASTA designed Schema in Amazon Redshift. This process designed, supplied and supported by SOASTA
• Amazon Redshift is an extremely inexpensive and powerful BIG DATA database that can scale to almost 2 Petabytes in size. Amazon estimates compute and storage costs of $1,000/TB/month for our implementation
• An online, interactive explore, discover and develop interface based on the Julia scientific programming language developed at MIT and the iJulia Notebook UI
• SOASTA developed Functions & Statistical Models
procedure Traffic is type Airplane_ID is range 1..10; -- 10 airplanes task type Airplane (ID: Airplane_ID); -- task representing airplanes, with ID as initialisation parameter type Airplane_Access is access Airplane; -- reference type to Airplane protected type Runway is -- the shared runway (protected to allow concurrent access) entry Assign_Aircraft (ID: Airplane_ID); -- all entries are guaranteed mutually exclusive entry Cleared_Runway (ID: Airplane_ID); entry Wait_For_Clear; private Clear: Boolean := True; -- protected private data - generally more than just a flag... end Runway; type Runway_Access is access all Runway;
Trivia Time!
@DanBoutinSOASTA
19831995
© 2014 SOASTA. All rights reserved.
Thank You!
Dan Boutin – Senior Product [email protected]
Mobile (404) 304-9529@DanBoutinSOASTA