R Installation
description
Transcript of R Installation
![Page 1: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/1.jpg)
R INSTALLATIONR is an open source software package for statistical data analysis
![Page 2: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/2.jpg)
R Installation • R download: http://www.r-project.org/
• Germany, Stefan Drees Bonn: http://cran.r-mirror.de/• This is the main program!
• RStudio (Auxiliary program for editing R files): http://www.rstudio.com/ide/download/
• Install the programs (R should be there already).
![Page 3: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/3.jpg)
Working with the R command prompt (without RStudio)
• Practical session
![Page 4: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/4.jpg)
Working with RStudio • Start RStudio
• „File -> New -> R Script“ Lets you edit a R command or R script (= small programme = several consecutive commands)
![Page 5: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/5.jpg)
Working with RStudio
New R files
The command prompt
Select• Files• Plots• Packages (for advanced analyses)• Help
![Page 6: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/6.jpg)
Working with RStudio• Commands and programs can be stored in R files
• Execute one command line:• Ctrl+Enter or• Button „Run“
• Execute several lines:• Mark lines and use „Ctrl+Enter“ or „Run“ button
![Page 7: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/7.jpg)
WHAT IS STATISTICS?
![Page 8: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/8.jpg)
What is statistics?• Statistics is a means to connect empirical knowledge and
theory and is constituted as follows:
• Data representation (Empirics)
• Methods for description, analysis, and interpretation of data, in order to allow predictions, conclusions and decisions (Statistical Theory)
![Page 9: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/9.jpg)
What is statistics?
I. Descriptive Statistics
II. Probability theory
III. Test theory
![Page 10: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/10.jpg)
DESCRIPTIVE STATISTICS
![Page 11: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/11.jpg)
Descriptive Statistics• Basic concepts:
• Population: Collection of objects for which a conclusion shall be made (can be human beings but also a collection of atoms when applied in physics)
• Sample: a representative part/sub-set of the population
• Random sample: elements of the population drawn randomly and independently of each other
• Example: „Mietspiegel“ (= statistics of rents) for the city of Bonn• Population: all rooms, flats etc. for rent in Bonn ( too many to investigate all)• Sample: selected part; all flats from Poppelsdorf• Random sample: Investigation of n = 100, 200,… random objects from Bonn
![Page 12: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/12.jpg)
Descriptive statistics
quantitative
Observational objects
discretecontinuous
Attributes / traits
Values
qualitative
patients, blood samples, DNA samples, houses, atoms
blood pressure, weight, age, blood group, number of siblings,
marital status, rent
Blood group, marital status
Number of siblings
blood pressure, weight, age, rent
![Page 13: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/13.jpg)
Descriptive statistics• Scaling:
• Nominal scale: attribute values that are not directly comparable (sex, subject of studies, country of origin) (qualitative)
• Ordinal scale: attribute values that have a „natural“ order (grades, font sizes: tiny-small-medium-large-huge)
• Interval scale: difference between attribute values is interpretable (temperature in °C) (quantitative)
• To be distinguished:• Discrete attributes: Attribute values can be counted• Continuous attributes: All real numbers, or at least all numbers from an interval, are
possible
![Page 14: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/14.jpg)
Descriptive statistics• Frequencies:
• Absolute frequency ni:• Number of obersvations with attribute value i (counts)
• Relative frequency hi:• Portion of elements with attribute value i • To be computed as absolute frequency devided by total number of
objects N: ni / N• Relative frequencies lie between 0 and 1• Relative frequencies have to add up to 1 (<- can be used to check
computation)
![Page 15: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/15.jpg)
Descriptive statistics
AB0 blood group
1.00N = 50
0
1
2
5
6
19
17
1.00
0.00
0.01
0.03
0.09
0.10
0.38
0.39
Köln
other
A2B
A1B
B
A2
A1
0
value
7
6
5
4
3
2
1
absolute frequency ni
200
0
2
6
18
20
76
78
KölnBonn
relative frequency hi
Bonn
0.00
0.02
0.04
0.10
0.12
0.38
0.34
Nnk
1ii
Nnh i
i
1h0 i
1hk
1ii
tally sheet
![Page 16: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/16.jpg)
Descriptive statistics• Frequencies:
• Cumulative frequency:
• Sum of all frequencies up to a given value i. • Denoted as for absolute frequencies and denoted asi for relative frequencies• Often used when values are subdivided into classes
• Classification:
• Arrangement of attribute values into disjoint groups, so called „classes“
• Classes are disjoint, i.e. non-overlapping, and neighbouring intervals of attribute values, which are defined by a lower and an upper bound. Neighbouring values implies that each value belongs to a class and does not lie outiside (completeness of the classification).
![Page 17: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/17.jpg)
Descriptive statisticsheight
classification: • complete
• disjoint (each value belongs to only one class)
class limits:
• (160; 170] contains all values, that are > 160 but 170.
150 160 170 180 190 200
150 200height [cm]
height [cm]
( ]( ] ( ]( ]( ] (
![Page 18: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/18.jpg)
Descriptive statisticsheight [cm]
relative hi
1,00
0.00
0.05
0.25
0.35
0.30
0.05
0.00
frequency
absolute ni
N=100
0
5
25
35
30
5
0
Cumulative frequency
relative Hi
1.00
1.00
0.95
0.70
0.35
0.05
0.00
absolute Ni
100
100
95
70
35
5
0
7
6
5
4
3
2
1
Class number i
> 200
(190; 200]
(180; 190]
(170; 180]
(160; 170]
(150; 160]
150
Class limits (ai-1; ai] Tally sheet
i
1kki nN
NNH i
i
![Page 19: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/19.jpg)
DESCRIPTIVE STATISTICSGraphical representation
![Page 20: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/20.jpg)
Descriptive statistics• Pie chart (R function: pie() )
• Shows absolute frequencies• Example: blood groups
![Page 21: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/21.jpg)
Descriptive statistics• Bar chart (R function: barplot() )
• Shows relative frequencies• Example: blood groups
![Page 22: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/22.jpg)
Descriptive statistics• Representation of cumulative frequencies with empirical
distribution function F
• Discrete trait: Number of Children
1.00
0.96
0.90
0.80
0.50
0.10
>4
4
3
2
1
0
Number of children
6
5
4
3
2
1
N = 50
2
3
5
15
20
5
relative hi
absoluteni
Frequencies
50
48
45
40
25
5
Cumulative frequencies
absoluteNi
relativeHi
0.04
0.06
0.10
0.30
0.40
0.10
1.00
Tally sheet
![Page 23: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/23.jpg)
Descriptive statistics
H
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2 3 4 >4
i
Number of children
hi
hi
hi
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2 3 4 >4
Bar chart
F
F: Empirical distribution function
Since the attribute is quantitative discrete, we obtain a step
function
![Page 24: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/24.jpg)
Descriptive statistics• Histogramms (R function: hist() )
• Construction:• Data is subdevided into classes
• Surface area of columns is proportional to the respective frequencies
• Columns are neighbouring since classes are neighbouring
![Page 25: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/25.jpg)
Descriptive statisticsExample: Height [cm]
0
0,2
0,4
0,6
0,8
1
150 160 170 180 190 200height [cm]
hiHistogram
![Page 26: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/26.jpg)
Descriptive statistics
••
•
•
••
0
0,2
0,4
0,6
0,8
150 160 170 180 190 200height [cm]
F
0
0,2
0,4
0,6
0,8
150 160 170 180 190 200height [cm]
f
empirical density function f
empirical distribution function F
(for continuous trait)
![Page 27: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/27.jpg)
Descriptive statistics
0
0,2
0,4
0,6
0,8
150 160 170 180 190 200height [cm]
f
empirical density function f
••
•
•
••
0
0,2
0,4
0,6
0,8
150 160 170 180 190 200height [cm]
F
empirical distribution function F
Ff
fF
hi
hi
![Page 28: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/28.jpg)
Descriptive Statistics• Note: Slides 23 and 26 both show empirical distribution
functions. In the first case, we obtain a step function since the trait under investigation is discrete.
![Page 29: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/29.jpg)
DESCRIPTIVE STATISTICSMeasures of central tendency, dispersion and spread
![Page 30: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/30.jpg)
Descriptive statistics• Measures of central tendency:
• A number to characterize the „center“ of the data
• Most important:• Mean• Median
![Page 31: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/31.jpg)
Descriptive statistics• Median (R function: median() )
• Sample: Order according to: Ordered sample: • Median
x8=7
x7=6
x6=4
x5=19
x4=8
x3=3
x2 =9
x1=5
sample ranks
x(8)=19
x(1)=3
x(2)=4
x(3)=5
x(4)=6
x(5)=7
x(6)=8
x(7)=9
x7=6
x6=4
x5=19
x4=8
x3=3
x2 =9
x1=5
sample ranks
x(1)=3
x(2)=4
x(3)=5
x(4)=6
x(5)=8
x(6)=9
x(7)=19
n = 7 odd: n = 8 even:
![Page 32: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/32.jpg)
Descriptive statistics• Mean (R function: mean() )
• Sample: • Sample size: n
• Mean
![Page 33: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/33.jpg)
Descriptive statistics• Comparison of median and
mean:• Both samples have median 2500• and are the mean values• Mean can strongly be influenced by
a single value
Þ Median is more robust against extreme values („outliers“)
Nevertheless, the mean is more often used in practice since it has other desirable properties (see later).
i ordered and 1 2000 2000 1500
2 5000 15000 2000
3 4000 4000 2500
4 1500 1500 4000
5 2500 2500 5000 / 15000
![Page 34: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/34.jpg)
Descriptive statistics
xx outlier
How to treat outliers?
Yes!2) Check value and correct
No!1) Discard
![Page 35: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/35.jpg)
Descriptive statistics
• Measure the amount of variation of the data!
x
sample A
An
21
nx
xx
A
Ax
Bx
sample B
Bn
21
nx
xx
B
x
The mean (or median) is not sufficent to describe a sample
![Page 36: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/36.jpg)
Descriptive statistics• Measures of dispersion and spread:
• Numbers to characterize the amount variation around the center (= mean)
• Most important:• Minimum, maximum, range (dispersion)• Empirical variance (spread)• Empirical standard deviation (spread)
![Page 37: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/37.jpg)
Descriptive statistics
ranks
n=7n=7
x7=6
x6=4
x5=19
x4=8
x3=3
x2 =9
x1=5
samplex(1)=3
x(2)=4
x(3)=5
x(4)=6
x(5)=8
x(6)=9
x(7)=19
minimum: min = x(1)
maximum: max = x(n)
range: R = x(n) – x(1)
• range:
![Page 38: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/38.jpg)
Descriptive statistics• Variance (R function: var() ):
• A measure to express the spread around the center (mean) by a single value
• The squared deviation of each attribute value from the mean is considered.
• Formula for the empirical variance from a sample of elements:
• The empircal standard deviation is just the square root of the variance, . (R function: sd() ).
![Page 39: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/39.jpg)
Why devide by instead of ?
=
=n
=x4
=x3
=x2
=x1
xxnx
xx
n
2
1
Example:
100
4
270
2
75
532702751004
x4 =53 is not free,but given by other values when the mean is known.
s2 has (n-1) degrees of freedom (f)
n
1i
2i
2 xxf1s
![Page 40: R Installation](https://reader035.fdocuments.in/reader035/viewer/2022062815/568168f5550346895de00165/html5/thumbnails/40.jpg)
Data • If you have data you want to analyse, please bring it
along!