Querying The Internet With PIER Nitin Khandelwal.

24
Querying The Internet With PIER Nitin Khandelwal

Transcript of Querying The Internet With PIER Nitin Khandelwal.

Page 1: Querying The Internet With PIER Nitin Khandelwal.

Querying The Internet With PIER

Nitin Khandelwal

Page 2: Querying The Internet With PIER Nitin Khandelwal.

Motivation

Inject a degree of distribution into databases Internet scale systems vs. hundred node

systems Large scale applications requiring database

functionaity

Page 3: Querying The Internet With PIER Nitin Khandelwal.

Applications

P2P Databases

Highly distributed and available data Network Monitoring

Intrusion detection

Fingerprint queries 

Page 4: Querying The Internet With PIER Nitin Khandelwal.

Design Principles 

Relaxed Consistency Sacrifice Consistency in face of Availability and Partition tolerance Organic Scaling Growth with deployment Natural Habitats for Data Data remains in original format with a DB interface Standard Schemas Achieved though common software 

Page 5: Querying The Internet With PIER Nitin Khandelwal.

DHTs

Implemented with CAN (Content Addressable Network).

Node identified by hyper-rectangle in d-dimensional space

Key hashed to a point, stored in corresponding node. Routing Table of neighbours is maintained. O(d)

Page 6: Querying The Internet With PIER Nitin Khandelwal.

DHT Design

Routing Layer

Mapping for keys

(-- dynamic as nodes leave and join) Storage Manager

DHT based data Provider

Storage access interface for higher levels

Page 7: Querying The Internet With PIER Nitin Khandelwal.

Provider

Couples the routing and storage layers

namespace – relation

resourceId – primary key  namespace + resourceId >> key

instanceId – distinguishes objects with

same namespace and resourceID

lifetime – item storage duration LScan, Multicast, Newdata

Page 8: Querying The Internet With PIER Nitin Khandelwal.

PIER Query Processor

Operators: Selection, proj, joins, grouping, agg Operators push and pull data Relaxed Consistency and reachable snapshot:

- working with nodes reachable at query issue.

- Instead, use arrival of query multicast message.

Page 9: Querying The Internet With PIER Nitin Khandelwal.

Join Algorithm

R, S – relations Nr, Ns – relation namespaces Nq - DHT-based temporary table Symmetric Hash Join:

- Rehashes the relations

- Scan and copy in new namespace Nq Fetch Matches

- One relation(S) already hashed on join attribute - Selections on non-join attributes of S cannot be pushed into the DHT

Page 10: Querying The Internet With PIER Nitin Khandelwal.

Join Rewriting

Aimed at lowering the bandwidth utilization  Symmetric semi-join - Local projections to Resource ID + join keys

- Symmetric Hash Join on two projections

- Global fetch matches join using Resource Ids of R and S

Bloom joins(Hashed semi-join)

- Bloom filter is hashing based bit-vector

- Local bloom filters are published into temporary namespaces

- Filters are OR-ed and multicast to opposite relation’s nodes

Page 11: Querying The Internet With PIER Nitin Khandelwal.

Workload Parameters 

CAN configuration: d = 4 R 10 times larger than S Constants provide 50% selectivity f(x,y) evaluated after the join 90% of R tuples match a tuple in S Result tuples are 1KB each Symmetric hash join used

Page 12: Querying The Internet With PIER Nitin Khandelwal.

Simulation Setup 

Up to 10,000 nodes Network cross-traffic, CPU and memory utilizations

ignored Data shipped from source to computation node for

every query operation 1. 100ms and 10Mbps fully connected links  2. GT-ITM transit-stub topology (similar results)

Page 13: Querying The Internet With PIER Nitin Khandelwal.

Join Algorithms

Infinite Bandwidth (Observe Impact of just propagation delay) 1024 data and computation nodes Core Join Algorithms:

Performs faster

Rewrites: Bloom Filter: two multicasts

Semi-join: two CAN lookups 

Page 14: Querying The Internet With PIER Nitin Khandelwal.

Join Algorithms -- 2

Limited Bandwidth Symmetric Hash Join:

- Rehashes both tables Semi Joins:

- Transfer only matching tuples At 40% selectivity, bottleneck switches from

computation nodes to query sites

Page 15: Querying The Internet With PIER Nitin Khandelwal.

Conclusions

Scalability of PIER dervies from relaxed design principles

- adoption of soft states

- dilated snapshot semantics Limitation: Just equality predicates Directions:

- Pushdown of selections into DHT

- Caching and replication of DHT data

- Catalog Manager – Stringent consistency and availability requirements.

Page 16: Querying The Internet With PIER Nitin Khandelwal.

Sophia: An Information Plane

Nitin Khandelwal

Page 17: Querying The Internet With PIER Nitin Khandelwal.

Shared Information Plane

Distributed System running throughout the network.

- Collects information about network elements

Local state(load/memory usage), local perspective (reachability of other nodes)

- Evaluate statements(questions) about the state

- Reacting according to conclusions

Killing misbehaving service

Page 18: Querying The Internet With PIER Nitin Khandelwal.

Challenges

Information is widely distributed and dynamic Statements formulated at run-time – not a-

priori Centralized analysis not practical

Push analysis to the nodes(push into the network)

Page 19: Querying The Internet With PIER Nitin Khandelwal.

Approach

Use logic programming model - In dynamic and distributed system, therefore

temporal and positional logic

Why? - Expressivity: Intuitive to make statements about the state of the system - Performance: :: Logic expression transformation for efficient evaluation :: Partial results caching

Page 20: Querying The Internet With PIER Nitin Khandelwal.

Time and Position in the Language

Every term in the system has an environment containing time and location

Eval( bandwidth( env (at(node(Node),

time(Time),

Time > 1032445465,

BwVar),

BwVar > 40000))

Page 21: Querying The Internet With PIER Nitin Khandelwal.

Performance

Aggressive Caching: - Evaluation results are cached

- Sometimes latency is more important then freshness

- Time environment used to control freshness

Scheduling - Pre-scheduling results to be available when and where they

may be needed.

- Cache can be refreshed with fresh values

Page 22: Querying The Internet With PIER Nitin Khandelwal.

Evaluation Planning

Given an expression, plan

- where(close to data)

- when (time when dependencies resolved)

- what to evaluate Logic expressions can be transformed at

runtime

Page 23: Querying The Internet With PIER Nitin Khandelwal.

Extensibility

Users can add new functionality at run-time Capabilities : to protect modules, grant and revoke

privileges. cap569354(Val) :- read sensor. cap435456(Val) :- cap569354(Val). bandwidth(Val) :- cap(435456(Val) Module Protection: All predicates transformed into

capabilities, shared through master key capability Danger in caching – different interfaces

Page 24: Querying The Internet With PIER Nitin Khandelwal.

PIER and Sophia

Sophia: location of code execution is both explicit in the language and can be evaluated in the course of evaluation.

PIER: details of query execution left to underlying implementation to optimize.

Consequence: Sophia queries are more sophisticated: both user and system participate in evaluation planning.