Perspectives From Shannon and Kolmogorov

16
Capturing the Intangible Concept of Information In Hydrological Simulation —— Perspectives From Shannon and Kolmogorov Pan Baoxiang Cong Zhentao Institute of Hydrology and Water Resources Tsinghua University January 21, 2015

Transcript of Perspectives From Shannon and Kolmogorov

Page 1: Perspectives From Shannon and Kolmogorov

Capturing the Intangible Concept of Information InHydrological Simulation

——Perspectives From Shannon and Kolmogorov

Pan Baoxiang Cong Zhentao

Institute of Hydrology and Water ResourcesTsinghua University

January 21, 2015

Page 2: Perspectives From Shannon and Kolmogorov

Outline

Introduction

Shannon Information & Kolmogorov Complexity

Case Study

Discussion & Conclusion

Page 3: Perspectives From Shannon and Kolmogorov

Introduction

Questions:

Given the p.d.f. of the rainfall amount of a certain year,through how many guesses could we reach its intervalestimation to a fixed accuracy?

Give an efficient description of an annual rainfall series.

Information is Bits + Context.

Page 4: Perspectives From Shannon and Kolmogorov

Introduction

Questions:

Given the p.d.f. of the rainfall amount of a certain year,through how many guesses could we reach its intervalestimation to a fixed accuracy?

Give an efficient description of an annual rainfall series.

Information is Bits + Context.

Page 5: Perspectives From Shannon and Kolmogorov

Measuring Information Contents

Shannon Entropy Kolmogorov Complexity

Definition H(X ) = −Σp(x)logp(x) The length (in bits) of theshortest computer program

h(X ) = −∫f (x)logf (x)dx that prints the sequence

and then halts.

Focus Random Source ObjectObject irrelevant Probability irrelevant

Property H(X ,Y ) ≤ H(X ) + H(Y ) Uncomputability

Estimator p.d.f Estimator Compressor

Dimension Bit

Page 6: Perspectives From Shannon and Kolmogorov

Measuring Information Connections

Mutual Information(S) Mutual Information(K)

Definition I (X ;Y ) =∑

p(x , y)log p(x,y)p(x)p(y)

KC(X ) + KC(Y ) − KC(X ,Y )

I (X ;Y ) =∫f (x , y)log f (x,y)

f (x)f (y)dxdy

Focus Random Source ObjectObject irrelevant Probability irrelevant

Property Symmetry Uncomputability

Estimator KNN+SVR ZIP(X ) + ZIP(Y ) − ZIP(X ,Y )

Dimension Bit

Page 7: Perspectives From Shannon and Kolmogorov

Bits in Hydrological Simulation Context

Hydrological Simulation

observation

{Input Measurements Xi

Output Measurements Xo

Model

Output Simulation Xs

State Variable Simulation S

Parameters

ModelStructure

Page 8: Perspectives From Shannon and Kolmogorov

Bits in Hydrological Simulation Context

Hydrological Simulation

observation

{Input Measurements Xi

Output Measurements Xo

Model

Output Simulation Xs

State Variable Simulation S

Parameters

ModelStructure

Page 9: Perspectives From Shannon and Kolmogorov

Bits in Hydrological Simulation Context

For a time/frequency/feature-space domain base,Xi ,Xo ,Xs ,S are represented by their coordinates,

Entropy / Kolmogorov Complexity of these coordinatesrepresents the complexity of expressing the signals with thatbase:H(Xo) / KC(Xo): Bits are required to depict Xo .Shannon / Kolmogorov Mutual Information betweencoordinates represents the information connection of thesignals at that base:MI (Xi ,Xo): Bits provided by Xi .MI (Xs ,Xo): Bits provided by Model.

Aleatory Uncertainty = H(Xo)/KC (Xo) −MI (Xi ,Xo)

Noise

Epistemic Uncertainty = MI (Xi ,Xo) −MI (Xs ,Xo)

Signal

Page 10: Perspectives From Shannon and Kolmogorov

Bits in Hydrological Simulation Context

For a time/frequency/feature-space domain base,Xi ,Xo ,Xs ,S are represented by their coordinates,

Entropy / Kolmogorov Complexity of these coordinatesrepresents the complexity of expressing the signals with thatbase:H(Xo) / KC(Xo): Bits are required to depict Xo .Shannon / Kolmogorov Mutual Information betweencoordinates represents the information connection of thesignals at that base:MI (Xi ,Xo): Bits provided by Xi .MI (Xs ,Xo): Bits provided by Model.

Aleatory Uncertainty = H(Xo)/KC (Xo) −MI (Xi ,Xo)

Noise

Epistemic Uncertainty = MI (Xi ,Xo) −MI (Xs ,Xo)

Signal

Page 11: Perspectives From Shannon and Kolmogorov

Bits in Hydrological Simulation Context

For a time/frequency/feature-space domain base,Xi ,Xo ,Xs ,S are represented by their coordinates,

Entropy / Kolmogorov Complexity of these coordinatesrepresents the complexity of expressing the signals with thatbase:H(Xo) / KC(Xo): Bits are required to depict Xo .Shannon / Kolmogorov Mutual Information betweencoordinates represents the information connection of thesignals at that base:MI (Xi ,Xo): Bits provided by Xi .MI (Xs ,Xo): Bits provided by Model.

Aleatory Uncertainty = H(Xo)/KC (Xo) −MI (Xi ,Xo) Noise

Epistemic Uncertainty = MI (Xi ,Xo) −MI (Xs ,Xo) Signal

Page 12: Perspectives From Shannon and Kolmogorov

Case Study 1: How water-heat correlation emergesthrough temporal upscaling

Relation of MI,Scale,Previous-Input-Step

Signal-to-noise Ratio Across Temporal Scales

Page 13: Perspectives From Shannon and Kolmogorov

Case Study 2: To what ratio can hydrological data becompressed

Huffman Encoding —— Lisp Implementation

Page 14: Perspectives From Shannon and Kolmogorov

Case Study 2: To what ratio can hydrological data becompressed

Weijs S V, Giesen N, Parlange M B. Data compression to define information content of hydrological time series[J].Hydrology and Earth System Sciences, 2013, 17(8): 3171-3187.

Page 15: Perspectives From Shannon and Kolmogorov

Discussion & Conclusion

Coding is not merely an E.E. trick. It’s hard to tell where datascience stops and coding starts.

Significance Digging

Application Expansion

Page 16: Perspectives From Shannon and Kolmogorov

https://github.com/morepenn

END