Data Description.pdf
-
Upload
sobana-kanthi -
Category
Documents
-
view
250 -
download
0
Transcript of Data Description.pdf
-
7/29/2019 Data Description.pdf
1/59
Data Description
MTK3006
Department of Mathematics
Faculty of Science and Technology
Universiti Malaysia Terengganu
MTK3006 Statistics for Chemists Data Description
http://find/http://goback/ -
7/29/2019 Data Description.pdf
2/59
Part I
MTK3006 Statistics for Chemists Data Description
http://find/ -
7/29/2019 Data Description.pdf
3/59
Basic Terms
Population
A population is a collection of all subjects or objects of interest.
MTK3006 Statistics for Chemists Data Description
http://find/http://goback/ -
7/29/2019 Data Description.pdf
4/59
Basic Terms
Population
A population is a collection of all subjects or objects of interest.
Sample
A sample is a portion or part of the population of interest.
MTK3006 Statistics for Chemists Data Description
http://find/http://goback/ -
7/29/2019 Data Description.pdf
5/59
Basic Terms
Variable
A variable is a characteristic or attribute that can assume different values.
MTK3006 Statistics for Chemists Data Description
http://find/ -
7/29/2019 Data Description.pdf
6/59
Basic Terms
Variable
A variable is a characteristic or attribute that can assume different values.
Data
The values that a variable can assume are called data.
MTK3006 Statistics for Chemists Data Description
http://find/http://goback/ -
7/29/2019 Data Description.pdf
7/59
Basic Terms
Variable
A variable is a characteristic or attribute that can assume different values.
Data
The values that a variable can assume are called data.
Data set
A collection of data values or measurements forms a data set.
MTK3006 Statistics for Chemists Data Description
http://find/http://goback/ -
7/29/2019 Data Description.pdf
8/59
Basic Terms
Variable
A variable is a characteristic or attribute that can assume different values.
Data
The values that a variable can assume are called data.
Data set
A collection of data values or measurements forms a data set.
Types of data
Quantitative data is a numerical measurement expressed in terms ofnumbers.
Qualitative data is a categorical measurement expressed by means of anatural language description.
MTK3006 Statistics for Chemists Data Description
http://find/http://goback/ -
7/29/2019 Data Description.pdf
9/59
Basic Terms
Parameter
A parameter is a characteristic or measure obtained by using all the datavalues from a population.
MTK3006 Statistics for Chemists Data Description
http://find/ -
7/29/2019 Data Description.pdf
10/59
Basic Terms
Parameter
A parameter is a characteristic or measure obtained by using all the datavalues from a population.
Statistic
A statistic is a characteristic or measure obtained by using the data valuesfrom a sample.
MTK3006 Statistics for Chemists Data Description
http://find/ -
7/29/2019 Data Description.pdf
11/59
Basic Terms
Statistics
Statistics is the science of collecting, organizing, summarizing, analyzingand interpreting data.
MTK3006 Statistics for Chemists Data Description
http://find/ -
7/29/2019 Data Description.pdf
12/59
Basic Terms
Statistics
Statistics is the science of collecting, organizing, summarizing, analyzingand interpreting data.
Areas of statistics
The branch of statistics devoted to the organization, summarization,description and presentation of data sets is called descriptive statistics.
The branch of statistics concerned with using sample data to drawconclusions about a population is called inferential statistics.
MTK3006 Statistics for Chemists Data Description
http://find/ -
7/29/2019 Data Description.pdf
13/59
Part II
MTK3006 Statistics for Chemists Data Description
http://find/ -
7/29/2019 Data Description.pdf
14/59
Describing Data with Tables
Data collected in original form is called raw data.
A frequency distribution is the organization of raw data in table form
using classes and frequencies. There are three types of frequency distributions:
Categorical frequency distributions Ungrouped frequency distributions Grouped frequency distributions
MTK3006 Statistics for Chemists Data Description
http://find/http://goback/ -
7/29/2019 Data Description.pdf
15/59
Categorical Frequency Distribution
Can be used for data that can beplaced in specific categories.
Examples political affiliation,
religious affiliation, blood type, etc. Example Blood Type Data
A,B,B,AB,O,O,O,B,AB,B,B,B,O,A,O,A,O,O,O,AB,AB,A,O,B,A
Blood Type Frequency Distribution
Class Frequency Percent
A 5 20B 7 28O 9 36
AB 4 16
MTK3006 Statistics for Chemists Data Description
http://find/ -
7/29/2019 Data Description.pdf
16/59
Ungrouped Frequency Distribution
Can be used for data that can beenumerated and when the range ofvalues in the data set is not large.
Examples number of kilometersyour instructors have to travel fromhome to campus, number of girls in4-child family, etc.
Example Number of Kilometers
Travelled: 8, 5, 6, 5, 5, 7, 7
Number of Kilometers Travelled
Class Frequency
5 36 17 28 1
MTK3006 Statistics for Chemists Data Description
G d F Di ib i
http://find/ -
7/29/2019 Data Description.pdf
17/59
Grouped Frequency Distribution
Can be used when the range of values in the data set is very large.
Class limits represent the smallest and largest data values that can beincluded in a class. The smallest and largest possible data values in aclass are the lower and upper class limits.
Class boundaries separate the classes. To find a class boundary,average the upper class limit of one class and the lower class limit ofthe next class.
The class width is found by subtracting the lower (or upper) class limitof one class from the lower (or upper) class limit of the previous class.
The class midpoint can be calculated by averaging the upper and lowerclass limits.
MTK3006 Statistics for Chemists Data Description
G d F Di ib i
http://find/ -
7/29/2019 Data Description.pdf
18/59
Grouped Frequency Distribution
Rules for classes
There should be 5-20 classes.
The class width should be an odd number.
The classes must not overlap.
The classes must not have breaks.
The classes must include all the data values.
The classes must be equal in width.
MTK3006 Statistics for Chemists Data Description
G d F Di ib i
http://find/ -
7/29/2019 Data Description.pdf
19/59
Grouped Frequency Distribution
To construct a grouped frequency distribution:
Find the highest and lowest values.
Find the range.
Choose the number of classes.
Find the class width by dividing the range by the number ofclasses and rounding up.
Choose a starting point (usually the lowest value); add the classwidth to get all the lower limits.
Find the upper class limits.
Find the class boundaries.
Find the frequencies and the cumulative frequencies.
MTK3006 Statistics for Chemists Data Description
G d F Di t ib ti
http://find/ -
7/29/2019 Data Description.pdf
20/59
Grouped Frequency Distribution
Construct a grouped frequency distribution using 7 classes.
112 100 127 120 134 118 105 110 109 112 110118 117 116 118 122 114 114 105 109 107 112114 115 118 117 118 122 106 110 116 108 110
121 113 120 119 111 104 111 120 113 120 117105 110 118 112 114 114
MTK3006 Statistics for Chemists Data Description
G d F Di t ib ti
http://find/ -
7/29/2019 Data Description.pdf
21/59
Grouped Frequency Distribution
Construct a grouped frequency distribution using 7 classes.
112 100 127 120 134 118 105 110 109 112 110118 117 116 118 122 114 114 105 109 107 112114 115 118 117 118 122 106 110 116 108 110
121 113 120 119 111 104 111 120 113 120 117105 110 118 112 114 114
Class Limits Class Boundaries Frequency Cumulative Frequency100 - 104 99.5 - 104.5 2 2
MTK3006 Statistics for Chemists Data Description
Grouped Frequency Distribution
http://find/ -
7/29/2019 Data Description.pdf
22/59
Grouped Frequency Distribution
Class Limits Class Boundaries Frequency Cumulative Frequency100 - 104 99.5 - 104.5 2 2105 - 109 104.5 - 109.5 8 10
110 - 114 109.5 - 114.5 18 28115 - 119 114.5 - 119.5 13 41120 - 124 119.5 - 124.5 7 48125 - 129 124.5 - 129.5 1 49130 - 134 129.5 - 134.5 1 50
MTK3006 Statistics for Chemists Data Description
http://find/ -
7/29/2019 Data Description.pdf
23/59
Part III
MTK3006 Statistics for Chemists Data Description
Measures of Central Tendency
http://find/ -
7/29/2019 Data Description.pdf
24/59
Measures of Central Tendency
Given a set of data, we often would like to have one number that isrepresentative of a population or sample.
There are several standard ways to measure the center. Meanthe average of the data set
Medianthe midpoint of the data set
Modethe value that occurs most often in the data set
MTK3006 Statistics for Chemists Data Description
The Mean
http://find/http://goback/ -
7/29/2019 Data Description.pdf
25/59
The Mean
Denote by xi the ith observed data value in the population or sample.
Denote by N and n the population and sample sizes respectively. The population mean is the sum of all the population values divided by the
total number of population values:
=
1
N
Ni=1
xi.
The sample mean is the sum of all the sample values divided by the number ofsample values:
x = 1n
ni=1
xi.
Find the sample mean of 20, 26, 40, 36, 23, 42, 35, 24, 30.
MTK3006 Statistics for Chemists Data Description
The Mean
http://find/http://goback/ -
7/29/2019 Data Description.pdf
26/59
The Mean
Denote by xi the ith observed data value in the population or sample.
Denote by N and n the population and sample sizes respectively. The population mean is the sum of all the population values divided by the
total number of population values:
=
1
N
Ni=1
xi.
The sample mean is the sum of all the sample values divided by the number ofsample values:
x = 1n
ni=1
xi.
Find the sample mean of 20, 26, 40, 36, 23, 42, 35, 24, 30.Answer: x = 30.67
MTK3006 Statistics for Chemists Data Description
The Median
http://find/ -
7/29/2019 Data Description.pdf
27/59
The Median
The median is the middle value, or the average of the middle twovalues, of a population or sample, when the data values are arrangedfrom smallest to largest.
The median will be one of the data values if there is an odd numberof values.
The median will be the average of two data values if there is an evennumber of values.
Find the median of 684, 764, 656, 702, 856, 1133, 1132, 1303.
MTK3006 Statistics for Chemists Data Description
The Median
http://find/ -
7/29/2019 Data Description.pdf
28/59
The Median
The median is the middle value, or the average of the middle twovalues, of a population or sample, when the data values are arrangedfrom smallest to largest.
The median will be one of the data values if there is an odd numberof values.
The median will be the average of two data values if there is an evennumber of values.
Find the median of 684, 764, 656, 702, 856, 1133, 1132, 1303.
Answer: Median = 810
MTK3006 Statistics for Chemists Data Description
The Mode
http://find/ -
7/29/2019 Data Description.pdf
29/59
The Mode
The mode is the value in the population or sample that occurs mostfrequently.
It is sometimes said to be the most typical case.
There may be no mode, one mode (unimodal), two modes (bimodal),or many modes (multimodal).
Find the mode of 18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10.
Find the mode of 104, 104, 104, 104, 104, 107, 109, 109, 109, 110,
109, 111, 112, 111, 109.
MTK3006 Statistics for Chemists Data Description
Properties of the Mean
http://find/ -
7/29/2019 Data Description.pdf
30/59
Properties of the Mean
Uses all data values.
Sample mean varies less than the sample median or mode.
Used in computing other statistics, such as the variance. Unique, usually not one of the data values.
Affected by extremely high or low values, called outliers.
MTK3006 Statistics for Chemists Data Description
Properties of the Median
http://find/ -
7/29/2019 Data Description.pdf
31/59
Properties of the Median
Gives the midpoint.
Used when it is necessary to find out whether the data values fall into
the upper half or lower half of the data set. Affected less than the mean by extremely high or extremely low
values.
MTK3006 Statistics for Chemists Data Description
Properties of the Mode
http://find/ -
7/29/2019 Data Description.pdf
32/59
p
Used when the most typical case is desired.
Easiest to compute. Not always unique or may not exist.
MTK3006 Statistics for Chemists Data Description
Measures of Dispersion
http://find/ -
7/29/2019 Data Description.pdf
33/59
p
Dispersion refers to the spread or variability in a data set.
Measures of dispersion include range, variance, standard deviation,etc.
MTK3006 Statistics for Chemists Data Description
The Range
http://find/ -
7/29/2019 Data Description.pdf
34/59
g
The range is the difference between the highest and lowest values of a
population or sample. Two experimental brands of outdoor paint are tested to see how long
each will last before fading. Six cans of each brand constitute a smallpopulation. The results (in months) are:
Brand A Brand B
10 35
60 45
50 30
30 35
40 4020 25
The population mean for both brands is the same.
Which brand would you buy?
MTK3006 Statistics for Chemists Data Description
The Variance
http://find/ -
7/29/2019 Data Description.pdf
35/59
The variance is the average of the squares of the distance each value
is from the mean. The population variance is
2 =
1
N
N
i=1
(xi )2.
The sample variance is
s2 =1
n
1
n
i=1
(xi x)2.
This formula for s2 makes a better estimator of2 than if we haddivided by n.
MTK3006 Statistics for Chemists Data Description
The Standard Deviation
http://find/ -
7/29/2019 Data Description.pdf
36/59
The standard deviation is the square root of the variance.
The population standard deviation is .
The sample standard deviation is s.
The standard deviation is measured in the same unit as themeasurements in the population or sample.
A large standard deviation indicates that the data values are far fromthe mean, whereas a small standard deviation indicates that they are
clustered closely around the mean.
MTK3006 Statistics for Chemists Data Description
Alternate Formula for the Sample Standard Deviation
http://find/ -
7/29/2019 Data Description.pdf
37/59
s =
1
n 1
n
i=1x2
i
1
n
n
i=1xi
2
Saves time when calculating by hand.
Does not use the sample mean.
Find the sample standard deviation of 11.2, 11.9, 12.0, 12.8, 13.4,14.3.
MTK3006 Statistics for Chemists Data Description
Alternate Formula for the Sample Standard Deviation
http://find/ -
7/29/2019 Data Description.pdf
38/59
s =
1
n 1
n
i=1x2
i
1
n
n
i=1xi
2
Saves time when calculating by hand.
Does not use the sample mean.
Find the sample standard deviation of 11.2, 11.9, 12.0, 12.8, 13.4,14.3. Answer: s = 1.13
MTK3006 Statistics for Chemists Data Description
Measures of Position
http://find/ -
7/29/2019 Data Description.pdf
39/59
Measures of position or location are used to locate the relativeposition of a data value in the data set.
These measures include: z-score quartiles outlier
MTK3006 Statistics for Chemists Data Description
The z-score
http://find/ -
7/29/2019 Data Description.pdf
40/59
A z-score or standard score for a value is obtained by subtracting themean from the value and dividing the result by the standard deviation.
The formula for the population (or sample) z-score is
z =x
or =
x x
s
.
A z-score represents the number of standard deviations a value isabove or below the mean.
MTK3006 Statistics for Chemists Data Description
The Quartiles
http://find/ -
7/29/2019 Data Description.pdf
41/59
Quartiles separate the data set into 4 equal groups.
The first quartile (Q1) is the value that lies 25% of the way up fromthe smallest value.
The second quartile (Q2) is the value that lies 50% of the way upfrom the smallest value, and is equivalent to the median.
The third quartile (Q3) is the value that lies 75% of the way up fromthe smallest value.
The interquartile range (IQR) is the difference between the upper and
lower quartiles, i.e., IQR = Q3 Q1.
MTK3006 Statistics for Chemists Data Description
The Outlier
http://find/ -
7/29/2019 Data Description.pdf
42/59
An outlier is an extremely high or low data value when compared withthe rest of the data values.
A data value less than Q1 1.5 IQR or greater thanQ3 + 1.5 IQR can be considered an outlier.
MTK3006 Statistics for Chemists Data Description
http://find/ -
7/29/2019 Data Description.pdf
43/59
Part IV
MTK3006 Statistics for Chemists Data Description
Describing Data with Graphs
http://find/ -
7/29/2019 Data Description.pdf
44/59
Graphs used for qualitative data
Bar charts Pareto charts
Graphs used for quantitative data
Histograms Frequency polygons Stem and leaf plots Box plots Time series plots
MTK3006 Statistics for Chemists Data Description
Bar Chart
http://find/ -
7/29/2019 Data Description.pdf
45/59
A bar chart is a chart withrectangular bars.
The bars can be plottedvertically or horizontally.
Example Modes ofTransportation to Work
The vertical scale showsfrequencies.
The horizontal scale shows
categories.
How people get to work
People
0
5
10
15
20
25
30
Car Bus Train Walk
MTK3006 Statistics for Chemists Data Description
Pareto Chart
http://find/ -
7/29/2019 Data Description.pdf
46/59
A Pareto chart can be used to
represent a categorical frequencydistribution. It is a bar chartarranged in descending order ofheight from left to right.
How people get to work
People
0
5
10
15
20
25
30
Car Train Bus Walk
MTK3006 Statistics for Chemists Data Description
Histogram
http://find/ -
7/29/2019 Data Description.pdf
47/59
The histogram is a graph that displays the quantitative data by usingvertical bars of various heights to represent the frequencies of the
classes. The histogram is similar to the bar chart, but it is drawn without gaps
between the bars. The class boundaries are represented on the horizontal axis.
Record High Temperatures
Temperature ( F)
Frequency
99.5 104.5 109.5 114.5 119.5 124.5 129.5 134.5
0
3
6
9
12
15
18
| | | | | | | |
MTK3006 Statistics for Chemists Data Description
Frequency Polygon
http://find/ -
7/29/2019 Data Description.pdf
48/59
The frequency polygon is a graph that displays the quantitative databy using lines that connect points plotted for the frequencies at the
class midpoints. The frequencies are represented by the heights of the points. The class midpoints are represented on the horizontal axis.
q
q
q
q
q
q
q qq
Record High Temperatures
Temperature ( F)
Frequency
102 107 112 117 122 127 1320
3
6
9
12
15
18
| | | | | | |
MTK3006 Statistics for Chemists Data Description
Stem and Leaf Plot
http://find/ -
7/29/2019 Data Description.pdf
49/59
A stem and leaf plot is a data plot that uses part of a data value asthe stem and part of the data value as the leaf to form groups orclasses.
In a stem and leaf plot, each data value is split into a stem and a leaf.
The leaf is usually the last digit of the data value and the other digits
to the left of the leaf form the stem.
For example, the number 123 would be split as:
stem 12leaf 3
The stems are listed on the left and the corresponding leaves on theright.
MTK3006 Statistics for Chemists Data Description
Stem and Leaf Plot
http://find/ -
7/29/2019 Data Description.pdf
50/59
Construct a stem and leaf plot.
25 31 20 32 1314 43 2 57 2336 32 33 32 4432 52 44 51 45
MTK3006 Statistics for Chemists Data Description
Stem and Leaf Plot
http://find/ -
7/29/2019 Data Description.pdf
51/59
Construct a stem and leaf plot.
25 31 20 32 1314 43 2 57 2336 32 33 32 4432 52 44 51 45
0 21 3 4
2 0 3 53 1 2 2 2 2 3 64 3 4 4 55 1 2 7
MTK3006 Statistics for Chemists Data Description
Box Plot
http://find/ -
7/29/2019 Data Description.pdf
52/59
A box plot is a graph that presents information from a five-numbersummary.
The five-number summary is composed of the minimum, Q1, median,Q3 and maximum.
The five-number summary can be graphically represented by using abox plot.
MTK3006 Statistics for Chemists Data Description
Box Plot
http://find/ -
7/29/2019 Data Description.pdf
53/59
To construct a box plot:
Find the five-number summary.
Draw a horizontal axis with a scale that includes the maximum andminimum data values.
Draw a box with vertical sides through Q1 and Q3, and draw avertical line though the median.
Draw a line from the minimum data value to the left side of the boxand a line from the maximum data value to the right side of the box.
MTK3006 Statistics for Chemists Data Description
Box Plot
http://find/ -
7/29/2019 Data Description.pdf
54/59
Construct a box plot for the data:89, 47, 164, 296, 30, 215, 138, 78, 48, 39
MTK3006 Statistics for Chemists Data Description
Box Plot
http://find/ -
7/29/2019 Data Description.pdf
55/59
Construct a box plot for the data:89, 47, 164, 296, 30, 215, 138, 78, 48, 39
Five-number summary30-47-83.5-164-296
MTK3006 Statistics for Chemists Data Description
Box Plot
http://find/ -
7/29/2019 Data Description.pdf
56/59
Construct a box plot for the data:89, 47, 164, 296, 30, 215, 138, 78, 48, 39
Five-number summary30-47-83.5-164-296
0 100 200 300
30
47 83.5 164
296
MTK3006 Statistics for Chemists Data Description
Time Series Plot
http://find/ -
7/29/2019 Data Description.pdf
57/59
A time series plot represents data that occur over a specific period oftime.
It is a line graph where the time is represented on the horizontal axisand the quantity that varies over time is represented on the verticalaxis.
q
qq
q
q
q
q
q
qq
Temperature over a 9Hour Period
Time
Temperature(
F)
1 2 3 4 5 6 7 8 912
35
40
45
50
55
60
MTK3006 Statistics for Chemists Data Description
http://find/ -
7/29/2019 Data Description.pdf
58/59
Part V
MTK3006 Statistics for Chemists Data Description
R and R Commander
http://find/ -
7/29/2019 Data Description.pdf
59/59
R
A language and environment for statistical computing and graphics
Available as a free software at http://www.r-project.org/
A command-driven statistical program
R Commander
A graphical user interface for R
Its interface includes menus, buttons and a few other elements
MTK3006 Statistics for Chemists Data Description
http://www.r-project.org/http://www.r-project.org/http://find/