Toward Validation and Control of Network Models

39
1 Toward Validation and Control of Network Models Michael Mitzenmacher Harvard University

description

Toward Validation and Control of Network Models. Michael Mitzenmacher Harvard University. Internet Mathematics. Articles Related to This Talk. The Future of Power Law Research. A Brief History of Generative Models for Power Law and Lognormal Distributions. Motivation: General. - PowerPoint PPT Presentation

Transcript of Toward Validation and Control of Network Models

Page 1: Toward Validation and Control of Network Models

1

Toward Validation and Control of Network Models

Michael Mitzenmacher

Harvard University

Page 2: Toward Validation and Control of Network Models

2

Internet Mathematics

The Future of Power Law Research

Articles Related to This Talk

A Brief History of Generative Models for Power Law and Lognormal Distributions

Page 3: Toward Validation and Control of Network Models

3

Motivation: General

• Network Science and Engineering is emerging as its own (sub)field.– NSF : cross-cutting area starting this year.– Courses : Cornell (Easley/Kleinberg), Kearns (U Penn), many

others.• For undergrads, not just grads!

– In popular culture: books like Linked by Barabasi or Six Degrees by Watts.

– Other sciences: Economics, biology, physics, ecology, linguistics, etc.

• What has been and what should be the research agenda?

Page 4: Toward Validation and Control of Network Models

4

My (Biased) View

• The 5 stages of networking research.1) Observe: Gather data to demonstrate a behavior in a

system. (Example: power law behavior.)2) Interpret: Explain the importance of this observation in

the system context.3) Model: Propose an underlying model for the observed

behavior of the system.4) Validate: Find data to validate (and if necessary

specialize or modify) the model.5) Control: Design ways to control and modify the

underlying behavior of the system based on the model.

Page 5: Toward Validation and Control of Network Models

5

My (Biased) View

• In networks, we have spent a lot of time observing and interpreting behaviors.

• We are currently very active in modeling. – Many, many possible models.– Perhaps easiest to write papers about.

• We need to now put much more focus on validation and control.– Have been moving in this direction.– And these are specific areas where computer science

has much to contribute!

Page 6: Toward Validation and Control of Network Models

6

Models

• After observation, the natural step is to explain/model the behavior.

• Outcome: lots of modeling papers.– And many models rediscovered.

• Example : power laws

• Lots of history…

Page 7: Toward Validation and Control of Network Models

7

History• In 1990’s, the abundance of observed power laws in networks surprised the

community.– Perhaps they shouldn’t have… power laws appear frequently throughout the

sciences.• Pareto : income distribution, 1897• Zipf-Auerbach: city sizes, 1913/1940’s• Zipf-Estouf: word frequency, 1916/1940’s• Lotka: bibliometrics, 1926• Yule: species and genera, 1924.• Mandelbrot: economics/information theory, 1950’s+

• Observation/interpretation were/are key to initial understanding.• My claim: but now the mere existence of power laws should not be surprising, or

necessarily even noteworthy.• My (biased) opinion: The bar should now be very high for

observation/interpretation.

Page 8: Toward Validation and Control of Network Models

8

So Many Models…

• Preferential Attachment

• Optimization (HOT)

• Monkeys typing randomly (scaling)

• Multiplicative processes

• Kronecker graphs

• Forest fire model (densification)

Page 9: Toward Validation and Control of Network Models

9

What Makes a Good Model…

• New variations coming up all of the time.• Question : What makes a new network model

sufficiently interesting to merit attention and/or publication? – Strong connection to an observed process.

• Many models claim this, but few demonstrate it convincingly.

– Theory perspective: significant new mathematical insight or sophistication.

• A matter of taste?

• My (biased) opinion: the bar should start being raised on model papers.

Page 10: Toward Validation and Control of Network Models

10

Validation: The Current Stage

• We now have so many models.• It is important to know the right model, to

extrapolate and control future behavior.• Given a proposed underlying model, we need tools

to help us validate it.• We appear to be entering the validation stage of

research…. BUT the first steps have focused on invalidation rather than validation.

Page 11: Toward Validation and Control of Network Models

11

Examples : Invalidation• Lakhina, Byers, Crovella, Xie

– Show that observed power-law of Internet topology might be because of biases in traceroute sampling.

• Pedarsani, Figueiredo, Grossglauser– Show that densification may also arise by sampling

approaches, not necessarily intrinsic to network.

• Chen, Chang, Govindan, Jamin, Shenker, Willinger – Show that Internet topology has characteristics that do not

match preferential-attachment graphs.– Suggest an alternative mechanism.

• But does this alternative match all characteristics, or are we still missing some?

Page 12: Toward Validation and Control of Network Models

12

My (Biased) View

• Invalidation is an important part of the process! BUT it is inherently different than validating a model.

• Validating seems much harder.• Indeed, it is arguable what constitutes a validation. • Question: what should it mean to say

“This model is consistent with observed data.”

Page 13: Toward Validation and Control of Network Models

13

An Alternative View

• There is no “right model”. • A model is the best until some other model comes

along and proves better.– Greedy refinement via invalidation in model space.– Statistical techniques: compare likelihood ratios for

various models.

• My (biased) opinion: this is one useful approach; but not the end of the question.– Need methods other than comparison for confirming

validity of a model.

Page 14: Toward Validation and Control of Network Models

14

Time-Series/Trace Analysis

• Many models posit some sort of actions.– New pages linking to pages in the Web.– New routers joining the network.– New files appearing in a file system.

• A validation approach: gather traces and see if the traces suitably match the model.– Trace gathering can be a challenging systems problem.– Check model match requires using appropriate

statistical techniques and tests.– May lead to new, improved, better justified models.

Page 15: Toward Validation and Control of Network Models

15

Sampling and Trace Analysis• Often, cannot record all actions.

– Internet is too big!

• Sampling– Global: snapshots of entire system at various times.– Local: record actions of sample agents in a system.

• Examples: – Snapshots of file systems: full systems vs. actions of individual

users.– Router topology: Internet maps vs. changes at subset of routers.

• Question: how much/what kind of sampling is sufficient to validate a model appropriately?– Does this differ among models?

Page 16: Toward Validation and Control of Network Models

16

To Control

• In many systems, intervention can impact the outcome.– Maybe not for earthquakes, but for computer networks!– Typical setting: individual agents acting in their own

selfish interest. Agents can be given incentives to change behavior.

• General problem: given a good model, determine how to change system behavior to optimize a global performance function.– Distributed algorithmic mechanism design.– Mix of economics/game theory and computer science.

Page 17: Toward Validation and Control of Network Models

17

Possible Control Approaches

• Adding constraints: local or global– Example: total space in a file system.– Example: preferential attachment but links limited by

an underlying metric.

• Add incentives or costs– Example: charges for exceeding soft disk quotas.– Example: payments for certain AS level connections.

• Limiting information– Impact decisions by not letting everyone have true view

of the system.

Page 18: Toward Validation and Control of Network Models

18

My Related Work : Hash Algorithms

• On the Internet, we need a measurement and monitoring infrastructure, for validation and control.– Approximate is fine; speed is key.

– Must be general, multi-purpose.

– Must allow data aggregation.

• Solution : hash-based architecture.– Eventual goal: every router has a programmable “hash

engine”.

Page 19: Toward Validation and Control of Network Models

19

Vision

• Three-pronged research data.

• Low: Efficient hardware implementations of relevant algorithms and data structures.

• Medium: New, improved data structures and algorithms for old and new applications.

• High: Distributed infrastructure supporting monitoring and measurement schemes.

Page 20: Toward Validation and Control of Network Models

20

The High-Level Pitch

• Lots of hash-based schemes being designed for approximate measurement/monitoring tasks.– But not built into the system to begin with.

• Want a flexible router architecture that allows:– New methods to be easily added. – Distributed cooperation using such schemes.

Page 21: Toward Validation and Control of Network Models

21

What We Need

On-ChipMemory

Hashing Computation

Unit

Off-ChipMemory

CAM(s)

Programming Language

Memory

Unit for Other

Computation

Computation

Communication+ Control

ControlSystem

CommunicationArchitecture

Page 22: Toward Validation and Control of Network Models

22

Lots of Design Questions

• How much space for various memory levels? How to dynamically divide memory among competing applications?

• What hash functions should be included? Openness to new hash functions?

• What programming language and functionality?• What communication infrastructure?• Security?• And so on…

Page 23: Toward Validation and Control of Network Models

23

Which Hash Functions?

• Theorists:– Want analyzable hash functions.

– Dislike standard assumption of perfectly random hash functions.

– Hard to prove things about actual performance.

• Practitioners– Want easy implementation, speed, small space.

– Want simple analysis (back-of-the-envelope).

– Will accept simulated results under right settings.

Page 24: Toward Validation and Control of Network Models

24

Why Do Weak Hash Functions Work So Well?

• In reality, assuming perfectly random hash functions seems to be the right thing to do.– Easier to analyze.– Real systems almost always work that way,

even with weak hash functions!

• Can Theory explain strong performance of weak hash functions?

Page 25: Toward Validation and Control of Network Models

25

Recent Work

• A new explanation (joint work with Salil Vadhan):• Choosing a hash function from a pairwise independent

family is enough – if data has sufficient entropy.– Randomness of hash function and data “combine”.– Behavior matches truly random hash function with high

probability.

• Techniques based on theory of randomness extraction.– Extensions of Leftover Hash Lemma.

Page 26: Toward Validation and Control of Network Models

26

What Functionality?

• Hash tables should be a basic primitive.

• “Best” hash tables: cuckoo hashing.– Worst case constant lookup time.– Simple to build, design.

• How can we make them even better?– Move cuckoo hashing from theory to practice!

Page 27: Toward Validation and Control of Network Models

27

Cuckoo Hashing [Pagh,Rodler]

• Basic scheme: each element gets two possible locations.

• To insert x, check both locations for x. If one is empty, insert.

• If both are full, x kicks out an old element y. Then y moves to its other location.

• If that location is full, y kicks out z, and so on, until an empty slot is found.

Page 28: Toward Validation and Control of Network Models

28

Cuckoo Hashing Examples

A B C

E D

Page 29: Toward Validation and Control of Network Models

29

Cuckoo Hashing Examples

A B C

E D

F

Page 30: Toward Validation and Control of Network Models

30

Cuckoo Hashing Examples

A B FC

E D

Page 31: Toward Validation and Control of Network Models

31

Cuckoo Hashing Examples

A B FC

E D

G

Page 32: Toward Validation and Control of Network Models

32

Cuckoo Hashing Examples

E G B FC

A D

Page 33: Toward Validation and Control of Network Models

33

Cuckoo Hashing Examples

A B C

E D F

G

Page 34: Toward Validation and Control of Network Models

34

Cuckoo Hashing Failures

• Bad case 1: inserted element runs into cycles.• Bad case 2: inserted element has very long path before

insertion completes.– Could be on a long cycle.

• Bad cases occur with small probability when load is sufficiently low, but not low enough:

• Theoretical solution: re-hash everything if a failure occurs.

• For 2 choices, load less than 50%, n elements gives failure rate of (1/n); maximum insert time O(log n).– Better space utilization and rate for more choices, more elements per

bucket.

Page 35: Toward Validation and Control of Network Models

35

Recent Work : A CAM-Stash

• Use a CAM (Content Addressable Memory) to stash away elements that would cause failure. – Joint with Kirsch/Wieder.

• Intuition: if failures were independent, probability that s elements cause failures goes to (1/ns). – Failures not independent, but nearly so.– A stash holding a constant number of elements greatly reduces failure

probability. – Implemented as a CAM in hardware, or a cache line in

hardware/software.

• Lookup requires also looking at stash.

Page 36: Toward Validation and Control of Network Models

36

Modeling : Economic Principles

• Joint work with Corbo, Jain, Parkes.• Exploration : what models make sense for AS

connectivity.– Extending approach of Chang, Jamin, Mao, Willinger. – Entering nodes link according to business model, utility

function.– Nodes revise their links based on new entrants.

• Like the forest fire model.

• Future considerations: how to validate such models.

Page 37: Toward Validation and Control of Network Models

37

Conclusion : My (Biased) View• There are 5 stages of networking research.

1) Observe: Gather data to demonstrate power law behavior in a system.

2) Interpret: Explain the import of this observation in the system context.

3) Model: Propose an underlying model for the observed behavior of the system.

4) Validate: Find data to validate (and if necessary specialize or modify) the model.

5) Control: Design ways to control and modify the underlying behavior of the system based on the model.

• We need to focus on validation and control.– Lots of open research problems.

Page 38: Toward Validation and Control of Network Models

38

A Chance for Collaboration

• The observe/interpret stages of research are dominated by systems; modeling dominated by theory.– And need new insights, from statistics, control theory, economics!!!

• Validation and control require a strong theoretical foundation.– Need universal ideas and methods that span different types of systems.– Need understanding of underlying mathematical models.

• But also a large systems buy-in.– Getting/analyzing/understanding data.– Find avenues for real impact.

• Good area for future systems/theory/others collaboration and interaction.

Page 39: Toward Validation and Control of Network Models

39

More About Me

• Website: www .eecs.harvard.edu/~michaelm– Links to papers– Link to book– Link to blog : mybiasedcoin

• mybiasedcoin.blogspot.com