Post on 17-Mar-2021
1@arnon86 S7456
Leveraging Customer Behavioral Data
to Drive Revenue
the GPU way
2@arnon86 S7456
Hi! Arnon Shimoni
Senior Solutions Architect
I like hardware & parallel / concurrent stuff
In my 4th year at SQream Technologies
Send gifs to @arnon86 or arnon@sqream.com
3@arnon86 S7456
tl;dr
• GPUs are good number crunchers – makes them good for data processing
• SQream DB with GPUs is fast
• Rethink current solutions, the GPU can help
• Simple hardware is good enough, let’s avoid throwing lots of hardware at issues. Don’t need to shovel money at the problem!
4@arnon86 S7456
SQream DB – an SQL database powered by GPUs
Fast• Columnar storage • Always on compression• 2 TB / hour / GPU ingest speed
Scalable• 10 TB to 1 PB with ease
SQL Database• Familiar ANSI SQL• Standard connectors (ODBC, JDBC)
Extensible for AI• Python, Jupyter, etc• Data science
Powered by GPUs• Massively parallel engine• Relies on GPUs for power, not RAM
</>
5@arnon86 S7456
This story starts at MWC last yearThat’s my ear!
SQream knows telecoms
We’ve helped operators with
• Better analysis of network events
• Speeding up CDR preparations
• More history with security management (SIEM)
• And now – customer behaviour
7@arnon86 S7456
There is a lot of data about customers in telecoms
• Where and when they wake up and where they spend their days(daily grinders)
• When/where were they were Instagramming(When and where data was used)
• How frustrated they got(what the network experience was in each location)
• What modes of transport they use
• How close they are to competitor locations
But are they actually using this data? Are they getting anything actionable?
Are they looking at the entire customer base, and not just a single customer?
8@arnon86 S7456
“You know, Telefonica has this multi-million dollar product based on Hadoop for selling this customer behaviour data to 3rd party companies.
Have you thought about maybe getting the same solution for your company, but much simpler?”
9@arnon86 S7456
“Oh, and we’ll do it for you with a single machine”
10@arnon86 S7456
Why their current setup wasn’t good enough for this
• Data scientists and BI professionals have only short windows of time to run queries, because of overloaded systems
• Windows cut even shorter due to long overnight loading
• Queries take hours, and iterations become painful
Long queries Coffee breaks Bathroom breaks Unhappy managers Unhappy everyone
11@arnon86 S7456
Databases that displease data scientists
• When data scientists or BI professionals want to ask questions that no one has asked before, these systems tend to ‘break’ and not deliver what’s expected
• They’re just not designed for ad-hoc querying
• Legacy databases require indexing and a lot of manual tuning
• Newer databases like Vertica also require creating projections, which is time-consuming and inflexible
• Distributed databases don’t perform well when JOIN operations are necessary
• In-memory databases are very painful on the wallet if you need more than a couple of terabytes
12@arnon86 S7456
Picking the wrong databases will cause pain!
Just some of what we saw• Cloudera – for the BI team• Teradata – for the marketing team• Oracle Exadata – Transactional - for CDR collection and customer records• Vertica, Netezza – for financial• Lots of Greenplum – to collect from many sources, for marketing and BI
13@arnon86 S7456
Chanel says racks are fashionable. Our customers think otherwise
14@arnon86 S7456
SQream DB softwarein a standard 2U server
Configured with 96GB RAM and a single Tesla K80
for a $4,000 total investment.
Designed to handle ~40 TB of telecom data
15@arnon86 S7456
Sample dashboards generatedDashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, …).Larger circles represent more data throughput.
Colour becomes darker as the day progresses.Dark-outline circles mean more night-time traffic.
Dashboard aggregates directly off SQream DB, with no intermediate steps.
Represents 3 table join(3.3B rows ⋈ 40M rows ⋈ 300K rows)
16@arnon86 S7456
Sample dashboards generatedDashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, …).Larger circles represent more data throughput.
Colour becomes darker as the day progresses.Dark-outline circles mean more night-time traffic.
Dashboard aggregates directly off SQream DB, with no intermediate steps.
Represents 3 table join(3.3B rows ⋈ 40M rows ⋈ 300K rows)
17@arnon86 S7456
Saving hours on reporting with SQream DBAugmenting legacy MPP with a faster, easier to use GPU-powered analytics database
CDR 4G
CDR 3G
Non CDR Dozens of Reports
AggregationsETL Process
80 node
5 hours
Da
ta S
ou
rce
s
Direct Loading, 2TB/h ingest rate
20 minutes with SQream DB
15x faster
The cost of performance
80 nodes – 5 full racks960 CPU cores, 5.12 TB RAM
SQream DB v1.9.6
HP DL380g9 with NVIDIA Tesla K8096 GB RAM + 6 TB storage
$$$10,000,000
120 m
300 m 20 m
10 m
$200,000
ETL time15x faster
Reporting time12x faster
TCO w/license50x more cost
effective
33.70
56
4.0
12,000,000
That wasn’t an anomalyWe’ve done it against Netezza, Teradata, Oracle, Vertica, and even Hadoop based systems.
31.70
4
4.7
500,000
Netezza
8 full 42U racks, 56 S-Blades7 TB RAM
SQream DB v1.9.7
Dell C4130 with 4x NVIDIA Tesla K80512 GB RAM + iSCSI JBOD (20TB)
Average query time(seconds)
Processing Units(S-Blade / GPUs)
Compression ratio
Cost of Ownership $$
Find out more about SQream’s high performance
GPU-driven database software
www.sqream.comor arnon@sqream.com