Displaying your dataweb.stanford.edu/class/physics91SI/cgi-bin/wp-content/uploads/201… · April...

48
Displaying your data Practical Computing for Scientists Physics 91SI M. Bellis Department of Physics Stanford University April 19 th , 2011 M. Bellis April ’11 Data display 1 / 21

Transcript of Displaying your dataweb.stanford.edu/class/physics91SI/cgi-bin/wp-content/uploads/201… · April...

  • Displaying your dataPractical Computing for Scientists

    Physics 91SI

    M. Bellis

    Department of PhysicsStanford University

    April 19th, 2011

    M. Bellis April ’11 Data display 1 / 21

  • Outline

    1 Understanding your dataMatplotlibDisplaying your data

    2 Summary

    M. Bellis April ’11 Data display 2 / 21

  • Introduction

    • Research is usually about working with new data.• Or about working with old data in a new way.• It behooves you to understand your data.

    M. Bellis April ’11 Data display 3 / 21

  • Data

    • Given some distribution of data, look at:• Mean (µ)• Standard deviation (σ)

    • Make use of built-in functions in numpy.• Try this with the lists of data you were sent.

    legba:~> ipython

    In [1]: from numpy import *

    In [2]: x = [0, 1, 2, 3, 4]

    In [3]: mean(x)

    Out[3]: 2.0

    In [4]: std(x)

    Out[4]: 1.4142135623730951

    M. Bellis April ’11 Data display 4 / 21

  • Data

    • Given some distribution of data, look at:• Mean (µ)• Standard deviation (σ)

    • Make use of built-in functions in numpy.• Try this with the lists of data you were sent.

    legba:~> ipython

    In [1]: from numpy import *

    In [2]: x = [0, 1, 2, 3, 4]

    In [3]: mean(x)

    Out[3]: 2.0

    In [4]: std(x)

    Out[4]: 1.4142135623730951

    M. Bellis April ’11 Data display 4 / 21

  • Data

    • Given some distribution of data, look at:• Mean (µ)• Standard deviation (σ)

    • Make use of built-in functions in numpy.• Try this with the lists of data you were sent.

    legba:~> ipython

    In [1]: from numpy import *

    In [2]: x = [0, 1, 2, 3, 4]

    In [3]: mean(x)

    Out[3]: 2.0

    In [4]: std(x)

    Out[4]: 1.4142135623730951

    M. Bellis April ’11 Data display 4 / 21

  • Data

    • Given some distribution of data, look at:• Mean (µ)• Standard deviation (σ)

    • Make use of built-in functions in numpy.• Try this with the lists of data you were sent.

    legba:~> ipython

    In [1]: from numpy import *

    In [2]: x = [0, 1, 2, 3, 4]

    In [3]: mean(x)

    Out[3]: 2.0

    In [4]: std(x)

    Out[4]: 1.4142135623730951

    M. Bellis April ’11 Data display 4 / 21

  • Data

    • Given some distribution of data, look at:• Mean (µ)• Standard deviation (σ)

    • Make use of built-in functions in numpy.• Try this with the lists of data you were sent.

    legba:~> ipython

    In [1]: from numpy import *

    In [2]: x = [0, 1, 2, 3, 4]

    In [3]: mean(x)

    Out[3]: 2.0

    In [4]: std(x)

    Out[4]: 1.4142135623730951

    M. Bellis April ’11 Data display 4 / 21

  • Data

    • Given some distribution of data, look at:• Mean (µ)• Standard deviation (σ)

    • Make use of built-in functions in numpy.• Try this with the lists of data you were sent.

    legba:~> ipython

    In [1]: from numpy import *

    In [2]: x = [0, 1, 2, 3, 4]

    In [3]: mean(x)

    Out[3]: 2.0

    In [4]: std(x)

    Out[4]: 1.4142135623730951

    M. Bellis April ’11 Data display 4 / 21

  • Matplotlib

    Figure: http://matplotlib.sourceforge.net/

    • Plotting library for Python.• Original author: John Hunter• pyplot

    • Included in pylab, along with numpy.• pylab aims to be a replacement for MATLAB.• So what does that give us?

    M. Bellis April ’11 Data display 5 / 21

    http://matplotlib.sourceforge.net/

  • Coding

    Import pylab and use the array object.

    legba:~> ipython

    In [1]: from pylab import *

    In [2]: x = array([0,1,2,3,4])

    In [3]: y = array([0,1,2,3,4])

    In [4]: plot(x,y)

    Out[4]: []

    But we don’t see the plot yet.

    In [5]: show()

    M. Bellis April ’11 Data display 6 / 21

  • Coding

    Import pylab and use the array object.

    legba:~> ipython

    In [1]: from pylab import *

    In [2]: x = array([0,1,2,3,4])

    In [3]: y = array([0,1,2,3,4])

    In [4]: plot(x,y)

    Out[4]: []

    But we don’t see the plot yet.

    In [5]: show()

    M. Bellis April ’11 Data display 6 / 21

  • Coding

    Import pylab and use the array object.

    legba:~> ipython

    In [1]: from pylab import *

    In [2]: x = array([0,1,2,3,4])

    In [3]: y = array([0,1,2,3,4])

    In [4]: plot(x,y)

    Out[4]: []

    But we don’t see the plot yet.

    In [5]: show()

    M. Bellis April ’11 Data display 6 / 21

  • Coding

    Import pylab and use the array object.

    legba:~> ipython

    In [1]: from pylab import *

    In [2]: x = array([0,1,2,3,4])

    In [3]: y = array([0,1,2,3,4])

    In [4]: plot(x,y)

    Out[4]: []

    But we don’t see the plot yet.

    In [5]: show()

    M. Bellis April ’11 Data display 6 / 21

  • Coding

    Import pylab and use the array object.

    legba:~> ipython

    In [1]: from pylab import *

    In [2]: x = array([0,1,2,3,4])

    In [3]: y = array([0,1,2,3,4])

    In [4]: plot(x,y)

    Out[4]: []

    But we don’t see the plot yet.

    In [5]: show()

    M. Bellis April ’11 Data display 6 / 21

  • Coding

    Import pylab and use the array object.

    legba:~> ipython

    In [1]: from pylab import *

    In [2]: x = array([0,1,2,3,4])

    In [3]: y = array([0,1,2,3,4])

    In [4]: plot(x,y)

    Out[4]: []

    But we don’t see the plot yet.

    In [5]: show()

    M. Bellis April ’11 Data display 6 / 21

  • Coding

    Import pylab and use the array object.

    legba:~> ipython

    In [1]: from pylab import *

    In [2]: x = array([0,1,2,3,4])

    In [3]: y = array([0,1,2,3,4])

    In [4]: plot(x,y)

    Out[4]: []

    But we don’t see the plot yet.

    In [5]: show()

    M. Bellis April ’11 Data display 6 / 21

  • plot

    http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot

    Default is a blue line connecting points, butthere are other plotting options.

    Close plot window, and try some of these:

    In [6]: plot(x,y,’r--’)

    In [6]: plot(x,y,’g-’)

    In [6]: plot(x,y,’ks-’,linewidth=4)

    In [6]: plot(x,y,’co’)

    In [6]: axes().set xlim(-10,10) // Set the range on

    the x-axis

    Note that you do not have to show() aftereach one.

    Note also that unless you close the window,these are overlaid on one another.

    M. Bellis April ’11 Data display 7 / 21

    http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot

  • plot

    http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot

    Default is a blue line connecting points, butthere are other plotting options.

    Close plot window, and try some of these:

    In [6]: plot(x,y,’r--’)

    In [6]: plot(x,y,’g-’)

    In [6]: plot(x,y,’ks-’,linewidth=4)

    In [6]: plot(x,y,’co’)

    In [6]: axes().set xlim(-10,10) // Set the range on

    the x-axis

    Note that you do not have to show() aftereach one.

    Note also that unless you close the window,these are overlaid on one another.

    M. Bellis April ’11 Data display 7 / 21

    http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.plot

  • Your plots

    The figure exists in global memory so it is easy to save it as a file.

    In [9]: savefig(’myplot.png’)

    Use https://afs.stanford.edu to download the image file to your desktop.Upload this image to the Google Doc for this lecture.

    https://docs.google.com/present/edit?id=0AaEmDaJ8A2rAZGhwc3pudzhfOTM1Y21wcDk5bjk&hl=en

    Let’s take a look at your plots!

    M. Bellis April ’11 Data display 8 / 21

    https://afs.stanford.eduhttps://docs.google.com/present/edit?id=0AaEmDaJ8A2rAZGhwc3pudzhfOTM1Y21wcDk5bjk&hl=en

  • Your plots

    The figure exists in global memory so it is easy to save it as a file.

    In [9]: savefig(’myplot.png’)

    Use https://afs.stanford.edu to download the image file to your desktop.Upload this image to the Google Doc for this lecture.

    https://docs.google.com/present/edit?id=0AaEmDaJ8A2rAZGhwc3pudzhfOTM1Y21wcDk5bjk&hl=en

    Let’s take a look at your plots!

    M. Bellis April ’11 Data display 8 / 21

    https://afs.stanford.eduhttps://docs.google.com/present/edit?id=0AaEmDaJ8A2rAZGhwc3pudzhfOTM1Y21wcDk5bjk&hl=en

  • Your data

    • If you are lucky, at some point in your life you will get to work on a problem forwhich the answer is not known.

    • There’s a reason why people do Sudoku/crossword puzzles/word jumbles: the joy ofsolving a puzzle.

    • Your research experience will hopefully help you learn how collect data and gleaninformation from it.

    • You also learn how to present that data to others.• Just as importantly you learn how to present that data to to yourself so that you

    can make accurate and precise statements about what you are measuring.

    • Never forget that the collection of experimental data is a means to an end...notsimply and end in itself.

    M. Bellis April ’11 Data display 9 / 21

  • Your data

    • If you are lucky, at some point in your life you will get to work on a problem forwhich the answer is not known.

    • There’s a reason why people do Sudoku/crossword puzzles/word jumbles: the joy ofsolving a puzzle.

    • Your research experience will hopefully help you learn how collect data and gleaninformation from it.

    • You also learn how to present that data to others.• Just as importantly you learn how to present that data to to yourself so that you

    can make accurate and precise statements about what you are measuring.

    • Never forget that the collection of experimental data is a means to an end...notsimply and end in itself.

    M. Bellis April ’11 Data display 9 / 21

  • Your data

    • If you are lucky, at some point in your life you will get to work on a problem forwhich the answer is not known.

    • There’s a reason why people do Sudoku/crossword puzzles/word jumbles: the joy ofsolving a puzzle.

    • Your research experience will hopefully help you learn how collect data and gleaninformation from it.

    • You also learn how to present that data to others.• Just as importantly you learn how to present that data to to yourself so that you

    can make accurate and precise statements about what you are measuring.

    • Never forget that the collection of experimental data is a means to an end...notsimply and end in itself.

    M. Bellis April ’11 Data display 9 / 21

  • Tufte

    • Parts of this lecture are motivated by the following.• Tufte, E., The Visual Display of Quantitative Information

    • Can’t recommend this enough.• Examples of the good, the bad and the ugly in the world of charts, plots

    and graphs.• Can find Prof. Cabrera’s old monopole data as an example of a well

    constructed plot (p.39).

    M. Bellis April ’11 Data display 10 / 21

  • Tufte

    • Parts of this lecture are motivated by the following.• Tufte, E., The Visual Display of Quantitative Information

    • Can’t recommend this enough.• Examples of the good, the bad and the ugly in the world of charts, plots

    and graphs.• Can find Prof. Cabrera’s old monopole data as an example of a well

    constructed plot (p.39).

    M. Bellis April ’11 Data display 10 / 21

  • Tufte

    Tufte lists his Principles of Graphical Excellence, (p.51)

    • Graphical excellence is the well-designed presentation of interesting data - a matter ofsubstance, of statistics, and of design.

    • Graphical excellence consists of complex ideas communciated with clarity, precisiion,and efficiency.

    • Graphical excellence is nearly always multivariate.• Graphical excellence requires telling the truth about the data.• Graphical excellence is that which gives to the viewer the greatest nunber

    of ideas in the shortest time with the least ink in the smallest space

    M. Bellis April ’11 Data display 11 / 21

  • Tufte

    Tufte lists his Principles of Graphical Excellence, (p.51)

    • Graphical excellence is the well-designed presentation of interesting data - a matter ofsubstance, of statistics, and of design.

    • Graphical excellence consists of complex ideas communciated with clarity, precisiion,and efficiency.

    • Graphical excellence is nearly always multivariate.• Graphical excellence requires telling the truth about the data.• Graphical excellence is that which gives to the viewer the greatest nunber

    of ideas in the shortest time with the least ink in the smallest space

    M. Bellis April ’11 Data display 11 / 21

  • Tufte

    Tufte lists his Principles of Graphical Excellence, (p.51)

    • Graphical excellence is the well-designed presentation of interesting data - a matter ofsubstance, of statistics, and of design.

    • Graphical excellence consists of complex ideas communciated with clarity, precisiion,and efficiency.

    • Graphical excellence is nearly always multivariate.• Graphical excellence requires telling the truth about the data.• Graphical excellence is that which gives to the viewer the greatest nunber

    of ideas in the shortest time with the least ink in the smallest space

    M. Bellis April ’11 Data display 11 / 21

  • Tufte

    Tufte lists his Principles of Graphical Excellence, (p.51)

    • Graphical excellence is the well-designed presentation of interesting data - a matter ofsubstance, of statistics, and of design.

    • Graphical excellence consists of complex ideas communciated with clarity, precisiion,and efficiency.

    • Graphical excellence is nearly always multivariate.• Graphical excellence requires telling the truth about the data.• Graphical excellence is that which gives to the viewer the greatest nunber

    of ideas in the shortest time with the least ink in the smallest space

    M. Bellis April ’11 Data display 11 / 21

  • Tufte

    Tufte lists his Principles of Graphical Excellence, (p.51)

    • Graphical excellence is the well-designed presentation of interesting data - a matter ofsubstance, of statistics, and of design.

    • Graphical excellence consists of complex ideas communciated with clarity, precisiion,and efficiency.

    • Graphical excellence is nearly always multivariate.• Graphical excellence requires telling the truth about the data.• Graphical excellence is that which gives to the viewer the greatest nunber

    of ideas in the shortest time with the least ink in the smallest space

    M. Bellis April ’11 Data display 11 / 21

  • Tufte

    Tufte lists his Principles of Graphical Excellence, (p.51)

    • Graphical excellence is the well-designed presentation of interesting data - a matter ofsubstance, of statistics, and of design.

    • Graphical excellence consists of complex ideas communciated with clarity, precisiion,and efficiency.

    • Graphical excellence is nearly always multivariate.• Graphical excellence requires telling the truth about the data.• Graphical excellence is that which gives to the viewer the greatest nunber

    of ideas in the shortest time with the least ink in the smallest space

    M. Bellis April ’11 Data display 11 / 21

  • Binning

    • Even the simplest assumptions about how to display your data can affect theconclusions you or others draw.

    • Histogramming your data.• Freedom to choose bin size.• Can affect fits to data.

    • The following is the same 100 random events from a Gaussian distribution.• µ = 0.0, σ = 1.0

    • Plotted with 3 different bin sizes.

    M. Bellis April ’11 Data display 12 / 21

  • Binning

    • Even the simplest assumptions about how to display your data can affect theconclusions you or others draw.

    • Histogramming your data.• Freedom to choose bin size.• Can affect fits to data.

    • The following is the same 100 random events from a Gaussian distribution.• µ = 0.0, σ = 1.0

    • Plotted with 3 different bin sizes.

    M. Bellis April ’11 Data display 12 / 21

  • Binning

    • Even the simplest assumptions about how to display your data can affect theconclusions you or others draw.

    • Histogramming your data.• Freedom to choose bin size.• Can affect fits to data.

    • The following is the same 100 random events from a Gaussian distribution.• µ = 0.0, σ = 1.0

    • Plotted with 3 different bin sizes.

    M. Bellis April ’11 Data display 12 / 21

  • Binning

    10000 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    0.5

    1

    1.5

    2

    2.5

    3

    100 bins 10 bins

    M. Bellis April ’11 Data display 13 / 21

  • Binning

    10000 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    0.5

    1

    1.5

    2

    2.5

    3

    100 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10 bins

    M. Bellis April ’11 Data display 13 / 21

  • Binning

    10000 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    0.5

    1

    1.5

    2

    2.5

    3

    100 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    10

    20

    30

    40

    50

    M. Bellis April ’11 Data display 13 / 21

  • Binning

    10000 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    0.5

    1

    1.5

    2

    2.5

    3

    Constant 0.104± 1.026 Mean 10.68848± -0.08034 Sigma 8.078± 9.954

    100 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    Constant 0.466± 3.076 Mean 0.209248± -0.003924 Sigma 0.262± 1.341

    10 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    10

    20

    30

    40

    50

    Constant 4.99± 37.25 Mean 0.13808± -0.03304 Sigma 0.115± 1.081

    • Using same fitting tool, wound up with 3 very different widths.• Couldn’t even get a sense of the parent distribution from first plot.• In general the binning should be motivated by the resolution of your measurements.• If your detector/ruler/samples have resolution x, you don’t want to plot your data

    with bins of x10

    width.

    • Don’t not think about your data!• Even if it seems to be super trivial!• Your data should tell a clear story.

    M. Bellis April ’11 Data display 14 / 21

  • Binning

    10000 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    0.5

    1

    1.5

    2

    2.5

    3

    Constant 0.104± 1.026 Mean 10.68848± -0.08034 Sigma 8.078± 9.954

    100 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    Constant 0.466± 3.076 Mean 0.209248± -0.003924 Sigma 0.262± 1.341

    10 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    10

    20

    30

    40

    50

    Constant 4.99± 37.25 Mean 0.13808± -0.03304 Sigma 0.115± 1.081

    • Using same fitting tool, wound up with 3 very different widths.• Couldn’t even get a sense of the parent distribution from first plot.• In general the binning should be motivated by the resolution of your measurements.• If your detector/ruler/samples have resolution x, you don’t want to plot your data

    with bins of x10

    width.

    • Don’t not think about your data!• Even if it seems to be super trivial!• Your data should tell a clear story.

    M. Bellis April ’11 Data display 14 / 21

  • Binning

    10000 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    0.5

    1

    1.5

    2

    2.5

    3

    Constant 0.104± 1.026 Mean 10.68848± -0.08034 Sigma 8.078± 9.954

    100 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    Constant 0.466± 3.076 Mean 0.209248± -0.003924 Sigma 0.262± 1.341

    10 bins

    Abitrary measurement-4 -2 0 2 4

    Num

    ber

    of c

    ount

    s

    0

    10

    20

    30

    40

    50

    Constant 4.99± 37.25 Mean 0.13808± -0.03304 Sigma 0.115± 1.081

    • Using same fitting tool, wound up with 3 very different widths.• Couldn’t even get a sense of the parent distribution from first plot.• In general the binning should be motivated by the resolution of your measurements.• If your detector/ruler/samples have resolution x, you don’t want to plot your data

    with bins of x10

    width.

    • Don’t not think about your data!• Even if it seems to be super trivial!• Your data should tell a clear story.

    M. Bellis April ’11 Data display 14 / 21

  • Napolean’s march

    Figure: Charles Joseph Minard’s display of Napolean’s excursion into Russia.

    M. Bellis April ’11 Data display 15 / 21

  • Train schedule

    Figure: 1880’s French train schedule.

    M. Bellis April ’11 Data display 16 / 21

  • John Snow

    Figure: John Snow’s map of cholera outbreak (1854).

    M. Bellis April ’11 Data display 17 / 21

  • Cosmic Microwave Background

    Figure: Comparison of COBE data with blackbody prediction.

    M. Bellis April ’11 Data display 18 / 21

  • Ratio

    Figure: Ratio of cross-sections for e+e− → hadrons to e+e− → µ+µ−, as afunction of center-of-mass energy

    .

    M. Bellis April ’11 Data display 19 / 21

  • Household debt

    Figure: David Bein. Ration of household debt vs. US GDP.http://www.npr.org/blogs/money/2009/02/household_debt_vs_gdp.html

    M. Bellis April ’11 Data display 20 / 21

    http://www.npr.org/blogs/money/2009/02/household_debt_vs_gdp.html

  • Summary

    • Collecting your data is not an end in inself.• Your data tell a story. Visualizations help us see that story.• Lots of good tools in Python and Matplotlib!

    M. Bellis April ’11 Data display 21 / 21

    Understanding your dataMatplotlibDisplaying your data

    Summary