Descriptive Statistics with MATLAB/Octave · 2020. 1. 8. · Lab 1 Intro MATLAB Data descrip....
Transcript of Descriptive Statistics with MATLAB/Octave · 2020. 1. 8. · Lab 1 Intro MATLAB Data descrip....
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Descriptive Statistics1 withMATLAB/Octave
Telecommunications Engineering
Department of Statistics
Universidad Carlos III de Madrid
1Review files Descriptive 1 and Descriptive 2 (in Spanish)1 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Summary
1 Introduction to MATLAB/Octave’s environment
2 Description of a data set
3 Graphical Representation
4 Position measures
5 Dispersion measures
6 Joint description of two variables
2 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Summary
1 Introduction to MATLAB/Octave’s environment
2 Description of a data set
3 Graphical Representation
4 Position measures
5 Dispersion measures
6 Joint description of two variables
3 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Main window in MATLAB
Command WindowWorkspace
Command History
Current Directory
4 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Working directory (Current Directory).
Especify directory: C:/.../
for example: C:/lab1/
Here you can save the functions *.m created.
Working space (Workspace).
Here you can view all variables and functions loaded.
Window to type commands (Command Window).
Console where you type the commands to execute.
Command History
Here you can see all commands typed in the Command Window.
Useful files for Lab 1
Download from the webpage the next files internet.mat,AlumnosIndustriales.mat and histg.m here
5 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Summary
1 Introduction to MATLAB/Octave’s environment
2 Description of a data set
3 Graphical Representation
4 Position measures
5 Dispersion measures
6 Joint description of two variables
6 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: failures
Next table summarizes the number of failures in 40 electric centralstations (the table shown is incomplete):
Central Station No failures
1 12 33 24 2...
...39 140 2
7 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: failures
In MATLAB, we insert data over the command window
>> averias=[ 1; 3; 2; 2; 0; 3; 1; 0; 2; 1; 0; 1; 0; 3; 0; 2;
0; 2; 1; 1; 1; 0; 2; 3; 0; 0; 0; 0; 1; 1; 4; 0;
3; 1; 0; 2; 1; 1; 1; 2 ]
We have created the column vector averias
averias =
1
3
2
2
...
8 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: failures
The command tabulate calculates the absolute frequencies (Count)and the relative frequencies in % (Percent).
>> tabulate(averias)
Value Count Percent
0 13 32.50%
1 13 32.50%
2 8 20.00%
3 5 12.50%
4 1 2.50%
9 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: failuresTo manage the table as a matrix, we name it tabla. Now we canaccess to their elements easily.
>> tabla = tabulate(averias)
For instance, we can represent the relative frequencies between 0 and1, rather than in %.
>> tabla(:,3) = tabla(:,3)/100
tabla =
0 13.0000 0.3250
1.0000 13.0000 0.3250
2.0000 8.0000 0.2000
3.0000 5.0000 0.1250
4.0000 1.0000 0.0250
10 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: failures
We can calculate cumulative frequencies by means of commandcumsum.
>> abs_acum = cumsum(tabla(:,2))
ans =
13
26
34
39
40
>> rel_acum = cumsum(tabla(:,3))
ans =
0.3250
0.6500
0.8500
0.9750
1.0000
11 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: failures
The complete frequency table can be built binding the columns ofcumulated frequencies (abs acum and rel acum) to the previoustable.
>> tabla=[ tabla abs_acum rel_acum ]
tabla =
0 13.0000 0.3250 13.0000 0.3250
1.0000 13.0000 0.3250 26.0000 0.6500
2.0000 8.0000 0.2000 34.0000 0.8500
3.0000 5.0000 0.1250 39.0000 0.9750
4.0000 1.0000 0.0250 40.0000 1.0000
12 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: failures
How many stations have less than 2 failures?
No failures Abs Freq Rel Freq Cum Abs Freq Cum Rel Freq
0 13 0.325 13 0.325
1 13 0.325 26 0.6502 8 0.200 34 0.8503 5 0.125 39 0.9754 1 0.025 40 1
13 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: failures
What percentage of stations has 3 or less failures?
No failures Abs Freq Rel Freq Cum Abs Freq Cum Rel Freq
0 13 0.325 13 0.325
1 13 0.325 26 0.650
2 8 0.200 34 0.850
3 5 0.125 39 0.9754 1 0.025 40 1
14 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Summary
1 Introduction to MATLAB/Octave’s environment
2 Description of a data set
3 Graphical Representation
4 Position measures
5 Dispersion measures
6 Joint description of two variables
15 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Graphical RepresentationQuantitative variables
Some graphical representations are different depending ifvariables are discrete or continuous.
They are used with absolute, relative and cumulativefrequencies.
BarplotPiechart or pictogramHistogram
16 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Graphical RepresentationQuantitative variables
Example: central stations failures
Barplot:
In MATLAB we use the command bar
>> bar(tabla(:,2))
we have obtained the barplot of ab-solute frequencies.
1 2 3 4 50
2
4
6
8
10
12
14
17 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Graphical RepresentationQuantitative variables
Example: central stations failures
Piechart or pictogram:
In MATLAB, we use the command pie
>> pie(tabla(:,2))
33%
33%
20%
13%
3%
18 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Graphical RepresentationQuantitative variables
Example: students heights
Histogram:
The variable alturas contains the data of 28 students heights in aclassroom.
alturas = [ 156; 159; 153; 157; 162; 156; 162;
153; 158; 152; 153; 159; 154; 153;
150; 162; 163; 160; 162; 156; 164;
168; 152; 151; 159; 147; 165; 149 ]
Command hist
>> hist(alturas)
145 150 155 160 165 1700
1
2
3
4
5
6
19 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
We can choose the number of classes or intervals
>> hist(alturas, 5) % 5 classes or intervals
145 150 155 160 165 1700
1
2
3
4
5
6
7
8
20 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Let’s suppose there is a new student of 2.10 m.
We can create a new variable alturas2
>> alturas2 = [alturas; 210];
>> hist(alturas2)
140 150 160 170 180 190 200 2100
1
2
3
4
5
6
7
8
9
10
The new student is an outlier
21 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Diagram for cumulative freq.
Example: central stations failures
Now we consider absolute and cumulative frequenciesFrom tabla, we take columns 2 and 4
>> bar([tabla(:,2) tabla(:,4)])
1 2 3 4 50
5
10
15
20
25
30
35
40
Frec. Abs
Frec. Abs Acum.
22 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Summary
1 Introduction to MATLAB/Octave’s environment
2 Description of a data set
3 Graphical Representation
4 Position measures
5 Dispersion measures
6 Joint description of two variables
23 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Position measuresmean, median
Example: students heights
Let’s go back to alturas
The mean, x , is calculated by commands mean:
>> mean(alturas2)
ans =
158.7931
And the median, xmed , by command median:
>> median(alturas2)
ans =
157
24 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Summary
1 Introduction to MATLAB/Octave’s environment
2 Description of a data set
3 Graphical Representation
4 Position measures
5 Dispersion measures
6 Joint description of two variables
25 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Dispersion measuresvariance, standard deviation and range
They measure the “dispersion” of the observations with respect to aposition measure.
Variance:
VARx = S2x =
1
n
n∑i
(xi − x)2
Standard deviation:
DTx = Sx =√S2x
Range or amplitude:
Rangex = max(xi )− mın(xi )
26 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
In MATLAB
Command var
>> var(x)
Command std
>> std(x)
Command range
>> range(x)
Or also by means of commands max and min: max(x)-min(x).
27 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Summary
1 Introduction to MATLAB/Octave’s environment
2 Description of a data set
3 Graphical Representation
4 Position measures
5 Dispersion measures
6 Joint description of two variables
28 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: height/weight
We have measured the height (X in mm) and the weight (Y in gr) ofa sample of 117 bolts, the information is presented in the followingtable:
nij
X / Y 40-60 60-80 80-100 100-200
140-160 4 0 0 0160-180 14 60 2 0180-200 0 20 16 1
>> tornillos = [ 4 0 0 0
14 60 2 0
0 20 16 1 ]
29 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: height/weight
We have measured the height (X in mm) and the weight (Y in gr) ofa sample of 117 bolts, the information is presented in the followingtable:
nij
X / Y 40-60 60-80 80-100 100-200
140-160 4 0 0 0160-180 14 60 2 0180-200 0 20 16 1
>> tornillos = [ 4 0 0 0
14 60 2 0
0 20 16 1 ]
30 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
We can calculate the relative frequency table by simply:
fij =nijn
>> frec_relat = tornillos/117
X / Y 40-60 60-80 80-100 100-200
140-160 0.0342 0 0 0160-180 0.1197 0.5128 0.0171 0180-200 0 0.1709 0.1368 0.0085
31 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
We calculate marginal distributions adding up by columns and byrows:
by columns (1) and by rows (2):
>> sum(tornillos,1) >> sum(tornillos,2)
ans = ans =
18 80 18 1 4
76
37
X / Y 40-60 60-80 80-100 100-200 Total
140-160 4 0 0 0 4160-180 14 60 2 0 76180-200 0 20 16 1 37
Total 18 80 18 1 117
32 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: internet
Two internet services companies have done a survey in 95 householdswith internet connection. The measures variables were:
MB: downloaded MB’s per day.
Time connection: average daily time connection in hours.
ISP: Internet Service Provider (provider 1 and 2).
Moment: moment of connection (morning=1, evening=2,night=3).
The data are contained in internet.mat file.
33 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: internet
Here is shown the data table (incomplete):
MB Time connection ISP Moment
163 1,1 1 1170 3,5 1 1169 3 1 3173 2 1 2160 1,1 1 1
......
......
183 8,1 2 1175 5,6 2 3173 3 2 1
34 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: internet
To import the file in MATLAB:
File → Open or also File → Import data
We have in the workspace the matrix internet
Name the variables:
>> MB = internet(:,1);
>> tiempo_conexion = internet(:,2);
>> ISP = internet(:,3);
>> momento = internet(:,4);
35 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: internet
The command crosstab2 returns the counts of two vectors in across table. For example, for the qualitative variables moment
and ISP, it is possible to create the following matrix ofdimensions 3× 2, where the rows represent the categories of thevariable momento (1= morning, 2 = evening and 3 = night) andthe columns the two types of internet service ISP.
>> crosstab(momento,ISP)
ans =
3 10
11 25
18 28
ISPMOMENT 1 2
morning 3 10evening 11 25
nigth 18 28
2implemented in MATLAB’s stats toolbox36 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: InternetHistogram by groups
The file histg.m allows us to represent a histogram by groups.
Save the file in your working directory C:/lab1/
>> histg(MB,ISP)
155 160 165 170 175 180 185 190 195 2000
0.05
0.1
0.15
0.2
0.25
0.3
ISP 1ISP 2
37 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: InternetGroup analysis
Command grpstats allows us to calculate statistics by groups.
GRPSTATS(X,GROUP) calculates the mean by groups.
[MEANS,SEM,COUNTS,GNAME] = grpstats(X,GROUP),calculates means (MEANS), standard errors of the mean (SEM),number of elements by groups (COUNTS), and name of eachgroup (GNAME).
38 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: InternetGroup analysis
>> [MEANS,SEM,COUNTS,GNAME] = grpstats(MB,ISP)
MEANS =
165.3125
179.3492
SEM =
0.7834
0.6345
COUNTS =
32
63
GNAME =
’1’
’2’
39 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: InternetGroup analysis
The standard deviations by groups (Sg ) can be obtained fromSEM and COUNTS, using the formula:
SEMg =Sg√
COUNTSg
In recent versions of MATLAB (R2011a, R2013b, ...), commandgrpstats has been updated in such a way that we can directlyobtain the standard deviations by groups.
40 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: InternetGroup analysis
>> [MEANS,STDS,COUNTS,GNAME] =
grpstats(MB,ISP,{’mean’,’std’,’numel’,’gname’})
MEANS =
165.3125
179.3492
STDS =
4.4318
5.0359
COUNTS =
32
63
GNAME =
’1’
’2’
41 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: InternetDispersion plot
The function plot represents a dispersion plot of twoquantitative variables , for instance, for MB andtiempo conexion:
>> plot(MB,tiempo_conexion,’.’)
155 160 165 170 175 180 185 190 1950
2
4
6
8
10
12
Is there any linear relationship between MB and tiempo conexion?Any linear dependency?
42 / 43
Lab 1
Intro MATLAB
Data descrip.
Graph. Repr.
Positionmeasures
Dispersionmeasures
Descrip. of 2variables
Example: Internetcovariance and correlation
Command cov calculates the covariance matrix
>> cov(MB,tiempo_conexion)
ans =
67.6847 13.8172
13.8172 4.7746COV (X ,Y ) =
(S2x Sx,y
Sx,y S2y
)
Command corrcoef calculates the correlation coefficient
>> corrcoef(MB,tiempo_conexion)
ans =
1.0000 0.7686
0.7686 1.0000 CORR(X ,Y ) =
(rx,x rx,yrx,y ry ,y
)where rx,y =
Sx,ySxSy
.
43 / 43