ENTRADA: An Open Source Platform for Network Data...

25
ENTRADA: An Open Source Platform for Network Data Analysis Moritz Müller | 2 nd YETI DNS Workshop 2016-11-12 in Seoul, South Korea

Transcript of ENTRADA: An Open Source Platform for Network Data...

Page 1: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

Title slide

ENTRADA: An Open Source Platform for Network Data Analysis Moritz Müller | 2nd YETI DNS Workshop

2016-11-12 in Seoul, South Korea

Page 2: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

100% tekst

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

SIDN

•  Domain name registry for .nl ccTLD

•  > 5,6 million domain names

•  2,5 million domain names secured with DNSSEC

•  SIDN Labs is the research team of SIDN

Page 3: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

100% tekst

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

DNS Data @SIDN

> 3.1 million distinct resolvers

> 1.3 billion queries daily

> 300 GB of PCAP data daily

Page 4: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

100% tekst

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

ENTRADA

•  Goal: data-driven improved security & stability of .nl

•  Problem: Existing solutions do not work well with large datasets and have limited analytical capabilities.

ENhanced Top-Level Domain Resilience through Advanced Data Analysis

Page 5: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

Requirements •  SQL support

•  Scalability

•  High performance

•  Capacity for >1 year of DNS data

•  Extensibility

•  Stability

•  Don’t spend too much money!

Page 6: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

100% tekst

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

Query Engine Options

Evaluated SQL and NoSQL solutions

•  Relational SQL (PostgreSQL)

•  MongoDB

•  Cassandra

•  Elasticsearch

•  Hadoop (HBASE + Apache Phoenix or Hive)

à SQL on Hadoop (Impala + Parquet +HDFS)

Engines galore!

Page 7: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

100% tekst

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel Hadoop

Node N+2 Hadoop

Node N+1 Hadoop Node N

SQL on Hadoop

HDFS

PARQUET

IMPALA

Best fit for our requirements

PARQUET

IMPALA

PARQUET

IMPALA

Page 8: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

100% tekst

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

Impala Query Engine •  MPP (massively parallel processing)

•  Low latency and high concurrency for BI/analytic queries on Hadoop

•  Excellent performance compared to other Hadoop based query engines

Page 9: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

100% tekst

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

Impala (2)

Data formats •  Text

•  Hadoop formats

•  Apache Avro

•  Apache Parquet

Interfaces •  Web-based GUI

•  Command line (impala-shell)

•  Python (Impyla)

•  JDBC

Page 10: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

row oriented!

column oriented!

data!

Apache Parquet

•  Why not just use the PCAP files?

•  Reading (compressed) PCAP data is just too slow

•  Analytical engines cannot read PCAP files

Page 11: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

100% tekst

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

HDFS •  Distributed file system for storing large volumes of data

•  High availability through replication of data blocks

•  Scalable to hundreds of PB’s and thousands of servers

Page 12: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

100% tekst

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

ENTRADA Architecture

Main components

• Data sources

•  Platform

•  Applications and services

•  Privacy framework

Page 13: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

management node

nano sized

2Gb/s network

location I location II location III data nodes data nodes

Cluster Design

Page 14: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

100% tekst

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

Workflow

Query data available for analysis within 10 minutes

name server PCAP

staging PCAP

decode

Import

Hadoop

Parquet

Impala Analyst

Enrich

Join

Filter

Monitoring Metrics

Page 15: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

Performance

1 Year of data is 2.2TB Parquet ~ 52TB of PCAP

Select day, month, year, count(1) from dns.queries where ipv=4 group by day, month, year

Example query, count # ipv4 queries per day.

Query response times

Page 16: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

Focussed on increasing the security and stability of .nl Use Cases

•  Visualize DNS patterns

•  Statistics (stats.sidnlabs.nl)

•  Scientific research

•  Support for operators

•  Real-time Phishing detection

•  Detect botnet infections

Page 17: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

Focussed on increasing the security and stability of .nl Use Cases

•  Visualize DNS patterns

•  Statistics (stats.sidnlabs.nl)

•  Scientific research

•  Support for operators

•  Real-time Phishing detection

•  Detect botnet infections

Page 18: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

Malicious Domain Detection with nDEWS

Observation: Phishing domains have unique query patterns

Random Sample Jan−−Mar, 2015

Days After Registration

Dai

ly Q

uerie

s

0 5 10 15 20 25 30

3.5

4.5

5.5

Phishing

Days After Registration

Dai

ly Q

uerie

s

0 5 10 15 20 25 30

050

150

Ref: Moura, G.C. M., M �uller, M., Wullink, M, Hesselman, C.: nDEWS: a new domains early warning system for TLDs . In: IEEE/IFIP AnNet 2016

Page 19: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

Newly Registered Domains

nDEWS Architecture

Get Query Characteristics

Registry DB

Every day workflow

ENTRADA

Cluster Domains

Legit Domains

Share with Registrar

Suspicious Domains

Σ PReq: popularity Σ PIPs: resolver diversity Σ PCC: country diversity Σ PASes: AS diversity

Page 20: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

50% tekst / 50% beeld

VOEG EEN FOTO IN:

1.  Verwijder de bestaande foto en klik op het icoon, om een foto in te voegen:

2.  Zoek de gewenste foto en dubbelklik hierop.

3.  Staat de afbeelding er niet goed in? Selecteer de foto, klik ‘Format’ in het lint en selecteer ‘Crop’.

4.  De afbeelding is nu te verschuiven, door met een linkermuisklik vast te houden op de afbeelding en de muis naar de gewenste richting te bewegen.

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

Resolver Reputation (RESREP)

Goal: Detect malicious activity by assigning reputation scores to resolvers

How: “fingerprinting” resolver behaviour

Page 21: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

50% tekst / 50% beeld

VOEG EEN FOTO IN:

1.  Verwijder de bestaande foto en klik op het icoon, om een foto in te voegen:

2.  Zoek de gewenste foto en dubbelklik hierop.

3.  Staat de afbeelding er niet goed in? Selecteer de foto, klik ‘Format’ in het lint en selecteer ‘Crop’.

4.  De afbeelding is nu te verschuiven, door met een linkermuisklik vast te houden op de afbeelding en de muis naar de gewenste richting te bewegen.

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

RESREP Concept Malicious activity:

• Spam-runs

• Botnets

• DNS-amplification attacks

ISPResolvers

DNSques0onsandresponses

Authorita0ve DNS.nl

.nlRegistry

Page 22: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

50% tekst / 50% beeld

VOEG EEN FOTO IN:

1.  Verwijder de bestaande foto en klik op het icoon, om een foto in te voegen:

2.  Zoek de gewenste foto en dubbelklik hierop.

3.  Staat de afbeelding er niet goed in? Selecteer de foto, klik ‘Format’ in het lint en selecteer ‘Crop’.

4.  De afbeelding is nu te verschuiven, door met een linkermuisklik vast te houden op de afbeelding en de muis naar de gewenste richting te bewegen.

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

RESREP Architecture

Resolvers

Root operator

Child operator

(example.nl)

ISP network

User

www.example.nl

� HTTP

ENTRADA Platform

AbuseHUB Abusedesk

RESREP Privacy Policy

.nl Privacy Board

RESREP service

��

Page 23: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

100% tekst

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

Conclusions and Future Work •  Hadoop HDFS + Parquet + Impala is a winning combination!

•  Running for over 2 years

•  > 150 billion queries stored

•  Develop more use cases •  DNS Abuse

•  Operational support

•  Increase the number of ENTRADA users

Page 24: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

100% tekst

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

It’s open source! •  Since January 2016

•  Project site:

entrada.sidnlabs.nl

•  GitHub:

github.com/SIDN/ENTRADA/

•  6 registries are using it

Page 25: ENTRADA: An Open Source Platform for Network Data Analysisyeti-dns.org/resource/2016workshop/...Introduction.pdf · • Elasticsearch • Hadoop (HBASE + Apache Phoenix or Hive) à

100% tekst

TEXT LEVELS

Plain text 1

2

3

• Eerste bullet niveau • Tweede bullet

niveau 4 Titel

Questions? Feedback?

Moritz Müller

Research Engineer

[email protected]

@dhr_moe

www.sidnlabs.nl

entrada.sidnlabs.nl