Structure and Uncertainty

77
1 Structure and Uncertainty Graphical modelling and complex stochastic systems Peter Green (University of Bristol) 2-13 February 2009

description

Structure and Uncertainty. Graphical modelling and complex stochastic systems. Peter Green (University of Bristol) 2-13 February 2009. What has statistics to say about science and technology?. Statistics and science. - PowerPoint PPT Presentation

Transcript of Structure and Uncertainty

Page 1: Structure and Uncertainty

1

Structure and UncertaintyGraphical modelling

and complex stochastic systems

Peter Green (University of Bristol) 2-13 February 2009

Page 2: Structure and Uncertainty

2

What has statistics to say about science and technology?

Page 3: Structure and Uncertainty

3

“If your experiment needs statistics, you ought to have done a better experiment”

Statistics and science

Ernest Rutherford (1871-1937)

Page 4: Structure and Uncertainty

4

What has statistics to say about the complexity of modern science?

Page 5: Structure and Uncertainty

5

Genenetworks

Page 6: Structure and Uncertainty

6Venter et al, Science, 16 February, 2001

Functional categories of genes in the human genome

Page 7: Structure and Uncertainty

7

Gene expression using Affymetrix microarrays

20µm

Millions of copies of a specificoligonucleotide sequence element

Image of Hybridised Array

Approx. ½ million differentcomplementary oligonucleotides

Single stranded, labeled RNA sample

Oligonucleotide element

**

**

*

1.28cm

Hybridised Spot

Expressed genes

Non-expressed genes

Zoom Image of Hybridised Array

Slide courtesy of Affymetrix

Expressed genes

Non-expressed genes

Page 8: Structure and Uncertainty

8

Page 9: Structure and Uncertainty

9

Velocity of recession determines ‘colour’ through redshift effect

z=0.02

z=0.5

Page 10: Structure and Uncertainty

10

Astronomy: redshifts

Page 11: Structure and Uncertainty

11

Probabilistic expert systems

part of expert system for muscle/nerve network

Page 12: Structure and Uncertainty

12

Complex stochastic systemsProblems in these areas – and many

others - have been successfully addressed in a modern statistical framework of structured stochastic modelling

Page 13: Structure and Uncertainty

13

Graphical modelling

Modelling

Inference

Mathematics

Algorithms

Page 14: Structure and Uncertainty

14

1. Mathematics

Modelling

Inference

Mathematics

Algorithms

Page 15: Structure and Uncertainty

15

Conditional independence

• X and Z are conditionally independent given Y if, knowing Y, discovering Z tells you nothing more about X: p(X|Y,Z) = p(X|Y)

• X Z Y

X Y Z

Page 16: Structure and Uncertainty

16

Coin-tossing

• You take a coin from your pocket, and toss it 10 times and get 10 heads

• What is the chance that the next toss gives head?

Now suppose there are two coins in your pocket – a 80-20 coin and a 20-80 coin – what is the chance now?

Page 17: Structure and Uncertainty

17

Coin-tossing

Choice of coin

Result of first 10 tosses Result of next toss

‘it must be the 80-20 coin’

‘so another head is much more likely’

(the odds on a head are now 3.999986 to 1)(conditionally independent given coin)

Page 18: Structure and Uncertainty

18

Conditional independence

as seen in data on perinatal mortality vs. ante-natal care….

Does survival depend on ante-natal care?

.... what if you know the clinic?

Page 19: Structure and Uncertainty

19

and ante and clinic are dependent

ante survival

but survival and ante are CI given clinic

clinic

survival and clinic are dependent

Conditional independence

Page 20: Structure and Uncertainty

20

Graphical models

Use ideas from graph theory to• represent structure of a joint

probability distribution• by encoding conditional

independencies

D

EB

C

A

F

Page 21: Structure and Uncertainty

21

Mendelian inheritance - a natural structured model

A

O

AB

A

O

AB AO

AO OO

OO

Mendel

Page 22: Structure and Uncertainty

23

D

EB

C

A

F

Conditional independence provides a mathematical basis for splitting up a large system into smaller components

Page 23: Structure and Uncertainty

24

B

C

E

D

A

B

F

D

E

Page 24: Structure and Uncertainty

25

2. Modelling

Modelling

Inference

Mathematics

Algorithms

Page 25: Structure and Uncertainty

26

Structured systems

A framework for building models, especially probabilistic models, for empirical data

Key idea - – understand complex system– through global model– built from small pieces

• comprehensible• each with only a few variables • modular

Page 26: Structure and Uncertainty

27

Modular structure

Basis for• understanding the real system• capturing important characteristics

statistically• defining appropriate methods• computation• inference and interpretation

Page 27: Structure and Uncertainty

28

Building a model, for genetic testing of paternity using DNA probes

child

mother

true father

putative father

Page 28: Structure and Uncertainty

29

Building a model, for genetic testing of paternity

Page 29: Structure and Uncertainty

30

… genes determine genotype

e.g. if child’s paternal gene is ’10’ and maternal gene is ’12’, then its genotype is ’10-12’

Page 30: Structure and Uncertainty

31

Building a model, for genetic testing of paternity

Page 31: Structure and Uncertainty

32

… Mendel’s law

the gene that the child gets from the father is equally likely to have come from

the father’s father or mother

Page 32: Structure and Uncertainty

33

Building a model, for genetic testing of paternity

Page 33: Structure and Uncertainty

34

… with mutation

there is a small probability of a gene mutating

Page 34: Structure and Uncertainty

35

Building a model, for genetic testing of paternity

Page 35: Structure and Uncertainty

36

… using population data

we need gene frequenciesrelevant to assumed population

for ‘founder’ nodes

Page 36: Structure and Uncertainty

37

Building a model, for genetic testing of paternity

Page 37: Structure and Uncertainty

38

Building a model, for genetic testing of paternity

• Having established conditional probabilities within each of these local models….

• We can insert ‘evidence’ (data) and draw probabilistic inferences…

Page 38: Structure and Uncertainty

39

Huginscreenshot

Page 39: Structure and Uncertainty

40

Page 40: Structure and Uncertainty

41

Photometric redshifts

Page 41: Structure and Uncertainty

42

Photometric redshifts

Page 42: Structure and Uncertainty

43

Photometric redshifts

Multiplicative model (on flux scale), involving an unknown mixture of templates

Page 43: Structure and Uncertainty

44

Photometric redshifts

template

redshift

filter response

Page 44: Structure and Uncertainty

45

Photometric redshifts

Page 45: Structure and Uncertainty

46

Photometric redshifts

good agreement with ‘gold-standard’ spectrographic measurement

Page 46: Structure and Uncertainty

47

Gene expression using Affymetrix microarrays

20µm

Millions of copies of a specificoligonucleotide sequence element

Image of Hybridised Array

Approx. ½ million differentcomplementary oligonucleotides

Single stranded, labeled RNA sample

Oligonucleotide element

**

**

*

1.28cm

Hybridised Spot

Slide courtesy of Affymetrix

Expressed genes

Non-expressed genes

Zoom Image of Hybridised Array

Page 47: Structure and Uncertainty

48

Variation and uncertainty

• condition/treatment• biological• array manufacture• imaging• technical

• within/between array variation

• gene-specific variability

Gene expression data (e.g. Affymetrix) is the result of multiple sources of variability

Structured statistical modelling allows considering all uncertainty at once

Page 48: Structure and Uncertainty

53

3. Algorithms

Modelling

Inference

Mathematics

Algorithms

Page 49: Structure and Uncertainty

54

Algorithms for probability and likelihood calculations

Exploiting graphical structure:• Markov chain Monte Carlo• Probability propagation (Bayes nets)• Expectation-Maximisation• Variational (mean-field) methods

Graph representation used in user interface, data structures and in controlling computation

Page 50: Structure and Uncertainty

55

Markov chain Monte Carlo

• Subgroups of one or more variables updated randomly,– maintaining detailed balance with

respect to target distribution

• Ensemble converges to equilibrium = target distribution ( = Bayesian posterior, e.g.)

Page 51: Structure and Uncertainty

56

Markov chain Monte Carlo

?

Updating ? - need only look at neighbours

Page 52: Structure and Uncertainty

57

Probability propagation

7 6 5

2 3 41

12

267 236 345626 36

2

form junction tree

Lauritzen &Spiegelhalter, 1987

Page 53: Structure and Uncertainty

58

root

root

Message passing in junction tree

Page 54: Structure and Uncertainty

59

rootroot

Message passing in junction tree - collect

Page 55: Structure and Uncertainty

60

rootroot

Message passing in junction tree - distribute

Page 56: Structure and Uncertainty

61

4. Inference

Modelling

Inference

Mathematics

Algorithms

Page 57: Structure and Uncertainty

62

Bayesian

Page 58: Structure and Uncertainty

63

or non-Bayesian

Page 59: Structure and Uncertainty

64

Bayesian paradigm in structured modelling

• ‘borrowing strength’• automatically integrates out all sources of

uncertainty• properly accounting for variability at all levels• including, in principle, uncertainty in model

itself• avoids over-optimistic claims of certainty

Page 60: Structure and Uncertainty

65

Bayesian structured modelling

• ‘borrowing strength’• automatically integrates out all sources

of uncertainty

• … for example in forensic statistics with DNA probe data…..

Page 61: Structure and Uncertainty

66

Page 62: Structure and Uncertainty

67

Page 63: Structure and Uncertainty

68

Bayesian structured modelling

• ‘borrowing strength’• automatically integrates out all sources

of uncertainty

• … for example in hidden Markov models for disease mapping

Page 64: Structure and Uncertainty

69

John Snow’s 1855 map of cholera cases

Page 65: Structure and Uncertainty

70

Mortality for diseases of the circulatory system in males in 1990/1991

Page 66: Structure and Uncertainty

75

Mapping of rare diseases using Hidden Markov model

Larynx cancer in females in France,1986-1993 (standardised ratios)

G & Richardson, 2002

Posterior probabilityof excess risk

Page 67: Structure and Uncertainty

76

Bayesian structured modelling

• ‘borrowing strength’• automatically integrates out all sources

of uncertainty

• … for example in modelling complex biomedical systems like ion channels…..

Page 68: Structure and Uncertainty

77

Ion channelmodel

levels &variances

modelindicator

transitionrates

hiddenstate

data

binarysignal

Hodgson and Green, Proc Roy Soc Lond A, 1999

Page 69: Structure and Uncertainty

78

levels &variances

modelindicator

transitionrates

hiddenstate

data

binarysignal

O1 O2

C1 C2 C3

** *

*******

*

Page 70: Structure and Uncertainty

79

O1 O2

C1 C2 C3

** *

*******

*

Unknown physiological states of channel, unknown connections

Continuous time Markov chain on this graph, with unknown transition rates

Only open/closed status of states is relevant to observation

We observe only in discrete time, with highly correlated noise

Page 71: Structure and Uncertainty

80

Truth and simulated data

Page 72: Structure and Uncertainty

81

Truth and 2 restorations

Page 73: Structure and Uncertainty

82

Ion channel model choiceposterior

probabilities

.405

.119

.369

.107

Page 74: Structure and Uncertainty

83

Structured systems’ success stories include...

• Genomics & bioinformatics– DNA & protein sequencing,

gene mapping, evolutionary genetics

• Spatial statistics– image analysis, environmetrics,

geographical epidemiology, ecology

• Temporal problems– longitudinal data, financial time series,

signal processing

Page 75: Structure and Uncertainty

84

Structured systems’ challenges include...

• Very large/high-dimensional data sets– genomics, telecommunications, commercial

data-mining…

Page 76: Structure and Uncertainty

85

Summary

Structured stochastic modelling (the ‘HSSS’ approach) provides a powerful and flexible approach to the challenges of complex statistical problems– Applicable in many domains– Allows exploiting scientific knowledge – Built on rigorous mathematics– Principled inferential methods

Page 77: Structure and Uncertainty

86

http://www.stats.bris.ac.uk/~peter

[email protected]