R Installation

R INSTALLATIONR is an open source software package for statistical data analysis

R Installation • R download: http://www.r-project.org/

• Germany, Stefan Drees Bonn: http://cran.r-mirror.de/• This is the main program!

• RStudio (Auxiliary program for editing R files): http://www.rstudio.com/ide/download/

• Install the programs (R should be there already).

Working with the R command prompt (without RStudio)

• Practical session

Working with RStudio • Start RStudio

• „File -> New -> R Script“ Lets you edit a R command or R script (= small programme = several consecutive commands)

Working with RStudio

New R files

The command prompt

Select• Files• Plots• Packages (for advanced analyses)• Help

Working with RStudio• Commands and programs can be stored in R files

• Execute one command line:• Ctrl+Enter or• Button „Run“

• Execute several lines:• Mark lines and use „Ctrl+Enter“ or „Run“ button

WHAT IS STATISTICS?

What is statistics?• Statistics is a means to connect empirical knowledge and

theory and is constituted as follows:

• Data representation (Empirics)

• Methods for description, analysis, and interpretation of data, in order to allow predictions, conclusions and decisions (Statistical Theory)

What is statistics?

I. Descriptive Statistics

II. Probability theory

III. Test theory

DESCRIPTIVE STATISTICS

Descriptive Statistics• Basic concepts:

• Population: Collection of objects for which a conclusion shall be made (can be human beings but also a collection of atoms when applied in physics)

• Sample: a representative part/sub-set of the population

• Random sample: elements of the population drawn randomly and independently of each other

• Example: „Mietspiegel“ (= statistics of rents) for the city of Bonn• Population: all rooms, flats etc. for rent in Bonn ( too many to investigate all)• Sample: selected part; all flats from Poppelsdorf• Random sample: Investigation of n = 100, 200,… random objects from Bonn

Descriptive statistics

quantitative

Observational objects

discretecontinuous

Attributes / traits

Values

qualitative

patients, blood samples, DNA samples, houses, atoms

blood pressure, weight, age, blood group, number of siblings,

marital status, rent

Blood group, marital status

Number of siblings

blood pressure, weight, age, rent

Descriptive statistics• Scaling:

• Nominal scale: attribute values that are not directly comparable (sex, subject of studies, country of origin) (qualitative)

• Ordinal scale: attribute values that have a „natural“ order (grades, font sizes: tiny-small-medium-large-huge)

• Interval scale: difference between attribute values is interpretable (temperature in °C) (quantitative)

• To be distinguished:• Discrete attributes: Attribute values can be counted• Continuous attributes: All real numbers, or at least all numbers from an interval, are

possible

Descriptive statistics• Frequencies:

• Absolute frequency ni:• Number of obersvations with attribute value i (counts)

• Relative frequency hi:• Portion of elements with attribute value i • To be computed as absolute frequency devided by total number of

objects N: ni / N• Relative frequencies lie between 0 and 1• Relative frequencies have to add up to 1 (<- can be used to check

computation)

AB0 blood group

1.00N = 50

absolute frequency ni

KölnBonn

relative frequency hi

tally sheet

Descriptive statistics• Frequencies:

• Cumulative frequency:

• Sum of all frequencies up to a given value i. • Denoted as for absolute frequencies and denoted asi for relative frequencies• Often used when values are subdivided into classes

• Classification:

• Arrangement of attribute values into disjoint groups, so called „classes“

• Classes are disjoint, i.e. non-overlapping, and neighbouring intervals of attribute values, which are defined by a lower and an upper bound. Neighbouring values implies that each value belongs to a class and does not lie outiside (completeness of the classification).

Descriptive statisticsheight

classification: • complete

• disjoint (each value belongs to only one class)

class limits:

• (160; 170] contains all values, that are > 160 but 170.

150 160 170 180 190 200

150 200height [cm]

height [cm]

( ]( ] ( ]( ]( ] (

Descriptive statisticsheight [cm]

relative hi

frequency

absolute ni

Cumulative frequency

relative Hi

absolute Ni

Class number i

(190; 200]

(180; 190]

(170; 180]

(160; 170]

(150; 160]

Class limits (ai-1; ai] Tally sheet

1kki nN

DESCRIPTIVE STATISTICSGraphical representation

Descriptive statistics• Pie chart (R function: pie() )

• Shows absolute frequencies• Example: blood groups

Descriptive statistics• Bar chart (R function: barplot() )

• Shows relative frequencies• Example: blood groups

Descriptive statistics• Representation of cumulative frequencies with empirical

distribution function F

• Discrete trait: Number of Children

Number of children

N = 50

relative hi

absoluteni

Frequencies

Cumulative frequencies

absoluteNi

relativeHi

Tally sheet

0 1 2 3 4 >4

Number of children

0 1 2 3 4 >4

Bar chart

F: Empirical distribution function

Since the attribute is quantitative discrete, we obtain a step

function

Descriptive statistics• Histogramms (R function: hist() )

• Construction:• Data is subdevided into classes

• Surface area of columns is proportional to the respective frequencies

• Columns are neighbouring since classes are neighbouring

Descriptive statisticsExample: Height [cm]

150 160 170 180 190 200height [cm]

hiHistogram

••

150 160 170 180 190 200height [cm]

empirical density function f

empirical distribution function F

(for continuous trait)

150 160 170 180 190 200height [cm]

empirical density function f

••

150 160 170 180 190 200height [cm]

empirical distribution function F

Descriptive Statistics• Note: Slides 23 and 26 both show empirical distribution

functions. In the first case, we obtain a step function since the trait under investigation is discrete.

DESCRIPTIVE STATISTICSMeasures of central tendency, dispersion and spread

Descriptive statistics• Measures of central tendency:

• A number to characterize the „center“ of the data

• Most important:• Mean• Median

Descriptive statistics• Median (R function: median() )

• Sample: Order according to: Ordered sample: • Median

sample ranks

x(8)=19

x(1)=3

x(2)=4

x(3)=5

x(4)=6

x(5)=7

x(6)=8

x(7)=9

sample ranks

x(1)=3

x(2)=4

x(3)=5

x(4)=6

x(5)=8

x(6)=9

x(7)=19

n = 7 odd: n = 8 even:

Descriptive statistics• Mean (R function: mean() )

• Sample: • Sample size: n

• Mean

Descriptive statistics• Comparison of median and

mean:• Both samples have median 2500• and are the mean values• Mean can strongly be influenced by

a single value

Þ Median is more robust against extreme values („outliers“)

Nevertheless, the mean is more often used in practice since it has other desirable properties (see later).

i ordered and 1 2000 2000 1500

2 5000 15000 2000

3 4000 4000 2500

4 1500 1500 4000

5 2500 2500 5000 / 15000

xx outlier

How to treat outliers?

Yes!2) Check value and correct

No!1) Discard

• Measure the amount of variation of the data!

sample A

sample B

The mean (or median) is not sufficent to describe a sample

Descriptive statistics• Measures of dispersion and spread:

• Numbers to characterize the amount variation around the center (= mean)

• Most important:• Minimum, maximum, range (dispersion)• Empirical variance (spread)• Empirical standard deviation (spread)

n=7n=7

samplex(1)=3

x(2)=4

x(3)=5

x(4)=6

x(5)=8

x(6)=9

x(7)=19

minimum: min = x(1)

maximum: max = x(n)

range: R = x(n) – x(1)

• range:

Descriptive statistics• Variance (R function: var() ):

• A measure to express the spread around the center (mean) by a single value

• The squared deviation of each attribute value from the mean is considered.

• Formula for the empirical variance from a sample of elements:

• The empircal standard deviation is just the square root of the variance, . (R function: sd() ).

Why devide by instead of ?

Example:

532702751004

x4 =53 is not free,but given by other values when the mean is known.

s2 has (n-1) degrees of freedom (f)

2 xxf1s

Data • If you have data you want to analyse, please bring it

along!

R Installation

Documents

Transcript of R Installation

TELESCOPIC LIFTERS Installation & User Manual - … · TELESCOPIC LIFTERS Installation & User Manual LW 330. INSTALLATION & USER MANUAL LIFTERS LW 330 LW 135 R LW 142 R LW 150 R ...

R INSTALLATION MANUAL VHF RADIOTELEPHONE FM-8500

ADM110+ +SAP+R 3+Enterprise+Installation

INSTALLATION, OPERATION & MAINTENANCE MANUAL KZN (R ... · INSTALLATION, OPERATION & MAINTENANCE MANUAL KZN (R) SERIES . HEAVY DUTY AGITATOR . TOP DISCHARGE . Electric Submersible

IntelliFlow Retrofit Kit KA2-R Installation Instructions

Installation Instructions - R&G Adventure Bars Triumph ...files.twistedthrottle.com.s3.amazonaws.com/Installation... · R&G Racing Unit 1, Shelley’s Lane, East Worldham, Alton,

Operating and installation instructions R-IMC-BUS ...

Installation Oracle r Hel 5

IBM SPSS Statistics - Essentials for R: Installation ...gruener.userpage.fu-berlin.de/Essentials for R Installation... · Contents IBM SPSS Statistics - Essentials for R: Installation

INSTALLATION INSTRUCTIONS R-410A Single Package Rooftop ...

Kurzweil K2000 Compact Flash R/W Installation

R&Mfreenet GOF installation cable / overview

Warmboard-R installation guide

Installation g Windows 2008 r 2

PIPE-R SYSTEM Installation MANUALpipe-r.com/PIPE-R Installation Guide.pdf · PIPE-R™ reservoir system installation guide is only a tool to show the contractor the ... subsurface

Apexi Installation Manual: AVC-R Actuator Valve Control Type-R

INSTALLATION INSTRUCTIONS R 410A Single Package …

INSTALLATION AND MAINTENANCE INSTRUCTIONS r q …

INSTALLATION, OPERATION & MAINTENANCE MANUAL KZE (R ...

Installation instructions Roxtec R EMC Products …...Installation instructions Roxtec R EMC Installation 1 Clean the sleeve to ensure good electric conductivity. 2 Push the front