Referenți științifci:
Prof.univ.dr. Elena Druică
Universitatea din București
Prof.univ.dr. Tudorel Andrei
Academia de Studii Economice
Prof.univ.dr. Călin Vâlsan
Bishop’s University, Canada
William’s School of Business
© Editura Universității din București
Șos. Panduri nr. 90-92, 050663 București – ROMÂNIA
Tel./Fax: +40 214102384
E-mail: [email protected]
Internet: htp://editura-unibuc.ro
Centru de vânzare:
Bd. Regina Elisabeta nr. 4-12,
030018 București – ROMÂNIA
Tel. +40 213053703
Tehnoredactare: ADRIAN DUȘA
Copertă: MARIUS JULA
Descrierea CIP a Bibliotecii Naționale a României
R cu aplicații în statistică / Adrian Dușa, Bogdan Oancea,
Nicoleta Caragea, … - București : Editura Universității din București,
2015
Conține bibliografe
Index
ISBN 978-606-16-0643-6
I. Dușa, Adrian
II. Oancea, Bogdan
III. Caragea, Nicoleta
004:311
!y
x
\
↑ ↓
↑
±∞
→→→→→→→→→→→→
|||
> 1
x
y
OxOy
Oy
!
F, M, M, F, M, F, F, M, F, M
∥∥∥∥∥∥∥∥∥∥∥
x =
n
∑i=1
xi
n
nn+1
2 nn
8 − 1 = 7 7 − 2 = 55 − 4 = 1
xi − x i = 1, 2, . . . , n
s2 =
n
∑i=1
(xi − x)2
n − 1
ns2
s
s =√
s2
s1
Q1 Q2 Q3 Q2
Q1 Q2Q3
Q1 Q3
AIQ = Q3 − Q1
AIQ
Q1 Q3
Q2
Q1Q3
1488 − 1225 = 263Q3 + 1.5 · AIQ = 1488 + 1.5 · 263 = 1882.5
Y = f (X, e)
M(Y/X) = a0 + a1X
a1
a0
Y = M(Y/X) + e
Y = a0 + a1X + e
Y Xa0 a1 a0 a1
Y = a0 + a1X XtYt (Yt)
YtYt
ut = Yt − Yt, t = 1, 2, ..., n
F(a0, a1)
Y F(a0, a1)
F(a0, a1)) =n
∑t=1
u2t =
n
∑t=1
(Yt − Yt)2 =
n
∑t=1
(Yt − a0 − a1Yt)2
a0 a1 F(a0, a1)
∑n
t=1 Yt = na0 + a1 ∑nt=1 Xt
∑nt=1 XtYt = a0 ∑n
t=1 Xt + a1 ∑nt=1 X2
t
X1 X2 X3 X4 Y
X1 Y
Yt = a0 + a1X1t + et
X1 YYt = −0.669 + 1.1245 ∗ X1t
R2 = 0.675210%
100 − 8.74% 99.99%X1
Yt = a0 + a1X1t + a2X2t + et
Y
Yt = a0 + a1X1t + a2X2t
ut
Yt = Y + ut
F(a0, a1, a2) = u′u =n
∑t=1
u2t =
n
∑t=1
(Yt − a0 − a1X1t − a2X2t)2
a0, a1, a2
X1 X2Y
X1 X2 Y
Yt = 1.9836 + 0.4405 ∗ X1t − 0.6387 ∗ X2t
R2
Xit
Yt
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
Y1 = a0 + a1X11 + a2X21 + · · ·+ akXk1 + e1
Y2 = a0 + a1X12 + a2X22 + · · ·+ akXk2 + e2
Y3 = a0 + a1X13 + a2X23 + · · ·+ akXk3 + e3
Yt = a0 + a1X1t + a2X2t + · · ·+ akXkt + et
Yn = a0 + a1X1n + a2X2n + · · ·+ akXkn + en
Y =
⎛
⎜⎜⎜⎜⎜⎝
Y1Y2Y3
Yn
⎞
⎟⎟⎟⎟⎟⎠
X =
⎛
⎜⎜⎜⎜⎜⎝
1 X11 X21 · · · Xk11 X12 X22 · · · Xk21 X13 X23 · · · Xk3
1 X1n X2n · · · Xkn
⎞
⎟⎟⎟⎟⎟⎠
A =
⎛
⎜⎜⎜⎜⎜⎝
a0a1a2
ak
⎞
⎟⎟⎟⎟⎟⎠
e =
⎛
⎜⎜⎜⎜⎜⎝
e1e2e3
en
⎞
⎟⎟⎟⎟⎟⎠
Y = XA + e
A = (X′X)−1X′Y
Yt = a0 + a1X1t + a2X2t + a3X3t + et
Yt = a0 + a1X1t + a2X2t + a3X3t + a4X4t + et
regresie4
regresie4 !
Yt = 2.1128 − 3.8260 ∗ X1t − 2.5528 ∗ X2t + 3.7555 ∗ X3t + 2.9481 ∗ X4t
X Ya1 a1 a1
P(a1 = 0)X Y
X Y
a0 a0 = 0
H0H1
H0
tai =aisai
H0 ai = 0 Xi Y
a0
a131.79%
a2
a3
a4
R2
Y X
R2 = 1 − ∑nt=1 u2
t
∑nt=1(Yt − Y)2
R2
(R2) ∗ 100%
R2 = 1 − n − 1n − k − 1
(1 − R2)
X2 X3X4
Y X1 X2
µr =∑n
t=1 urt
n
JB = n
[16∗ µ2
3µ3
2+
124
∗(
µ4
µ22− 3
)2]+ n
(32∗ µ2
1µ2
− µ3 ∗ µ1
µ22
)
H0H1
χ2
χ22(α)
α
χ2
χ2
R2
dw =∑n
t=2(ut − ut−1)2
∑nt=1 u2
t
dL dU k n
H0 dU ≤dw ≤ 4 − dU
H0 dw ≤ dL dw ≤ 4dU
dL ≤ dw ≤ dU 4 − dU ≤ dw ≤ 4 − dL
±
(X′X)R2
(λmax) (λmin)
CIi =
√λmax
λi
λi
λi
X1
X2
(y)
(xi)
(y)(x)
(x)
p
y
x
n = 35
(y = 1)(x)
xy = 1
yx
x
y x
y
Ω
Ω =p
1 − p
p y
x(1 − p) y
x
p = 0, 6 1 − p = 0, 4 Ω = 0,60,4 = 1, 5
p = 0, 99 1 − p = 0, 01 Ω = 0,990,01 = 99
Ω > 1
Ω < 1
Ω = 1
Ω(0,+)
logit(−,+)p (−,+)
p +logit + p
−Ω = 1 logit
ln(
p1 − p
)= β0 + β1x
β0 β1logit
p
logit(p) = β0 + β1x
p1 − p
= eβ0+β1x
eln(A) = A
Ω = eβ0+β1x
p =eβ0+β1x
1 + eβ0+β1x
(x, y) p > 0, 5y = 1
Ω > 1 logit > 0
p(x) S
S
likelihood
y
logit
β0 β1logit(p) = β0 + β1x
β0 β1
n = 35
y = 1 x
β0 β1
β0 β1
β0 β1β0 β1
β0 β1
ln(
p1 − p
)= 5.495958 − 0.004889 × Venit
β1 logitx x
xeβ1 β1
x x x + 1
Ω(x) = eβ0+β1x = eβ0 × eβ1x
x x + 1
Ω(x) = eβ0+β1(x+1) = eβ0 × eβ1x × eβ1
eβ1
x = x + 1 x
OR =Ω(x+1)
Ω(x)=
eβ0+β1(x+1)
eβ0+β1x =eβ0 × eβ1x × eβ1
eβ0 × eβ1x = eβ1
eβ1
x
eβ1 > 1 xeβ1 = 1, 5
x
eβ1 < 1 xeβ1 = 0, 5
eβ1 = 1 xeβ1 = 1 β1 = 0
x
y
x
p(x) =e5.495958−0.004889×x
1 + e5.495958−0.004889×x
β0 = 5.495958 β1 = 0.004889
eβ1 = 0.995123 < 1
x
n = 30
x
eβ0+β1x
1+eβ0+β1x
x = 1 y = 1
eβ0
1+eβ0
x = 0 y = 1
11+eβ0+β1x x = 1
y = 0
11+eβ0
x = 0y = 0
ln(
p1 − p
)= −19.57 + 21.60 × statut ocupational
β1x
eβ1 = e21.6 = 2.4 × 109 > 1x
2.4 × 109
y = 1 x
p(x) =e5.495958−0.004889×100
1 + e5.495958−0.004889×100 = 0.993354
y = 1x = 100
p(x) =e−19,57+21,60
1 + e−19,57+21,60
y = 1x = 1
x
xy
likelihood y
x
[0, 1]
(−∞, 0]
β0
χ2
−2LL
χ2 χ2
χ2 = −2LL0 − (−2LLM) = −2ln(
LL0
LLM
)
−2LL0
−2LLM
−2LL
χ2
LL0 = LLM
(SST = SSR + SSE)
−2LL0
−2LLM
ln(
p1 − p
)= 5.495958 − 0.004889 × venitul mediu lunar
SEE = ∑i(yi − yi)
2
yi yi
SST = ∑i(yi − y)2
yi ySSR =
∑i(yi − y)2 yi
y
β1 = −0.004889
0.00370 < 0.01
eβ1 = 0.995123
β1[−0.009456652,−0.0238618]
χ2
Pr(> Chi) = 6.453e − 08 < 0.05 χ2
χ2
−2LL
29.22
d f
χ2
k xkd f = n − k − 1 n
k + 1χ2 n − 2
χ2
R2
R2 &R2
R2 = 1 −[−2LL0
−2LLM
]2/n
LL0LLM
n
R2 &R2
R2
R2 R2
χ2 χ2
R2 =1 −
[−2LL0−2LLM
]2/n
1 − (2LL0)2/n
AIC = −2LLk + 2k
BIC = −2LLk + 2 × log(n)
xiy
ln(
p1 − p
)= β0 + β1x1 + ... + βkxk
logit(p) = β0 + β1x1 + ... + βkxk
p1 − p
= eβ0+β1x1+...+βkxk
Ω =p
1 − p
Ω = eβ0+∑
kβkxk
p y1 − p y
β0 β1 βk k
p =eβ0+β1x1+...+βkxk
1 + eβ0+β1x1+...+βkxk
xjxi i = j
OR =Ω(xj+1)
Ω(xj)=
eβ0+β j(xj+1)
eβ0+β jxj=
eβ0 × eβ jxj × eβ j
eβ0 × eβ jxj= eβ j
eβ j
xj
eβ j > 1 xj
xj + 1 eβ j = 1, 5xj
eβ j < 1 xj
xj + 1 eβ j = 0, 5xj
eβ j = 1 xj
eβ j = 1 β j = 0
x1 x2
y
x1 x1 = 1x1 = 2
x2x2 = 1
x2 = 2
ln(
p1 − p
)= β0 + β1x1 + β2x2
p
β0 = −6.202β1 = −2.449 β2 = 6.297
eβ1 = e−2.449 = 0.08636 < 1x1 x1 = 1
x1 = 2
β1p =
0.02916
eβ2 = e6.297 = 542.9406 > 1x2 x2 = 1
x2 = 2
β2
x1 x2
eβ J = 1 β j = 0
Y J
Yi =
⎧⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎩
categoria − 1categoria − 2categoria − 3
...categoria − J
i
pi =
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎩
pi1pi2pi3...pij...piJ
j = 1, 2, ..., JJ > 2 J − 1 J
ln(
pi11 − piJ
)= β10 + β11 × xi1 + β12 × xi2 + ... + β1k × xik = β′
1 × xi
ln(
pi21 − piJ
)= β20 + β21 × xi1 + β22 × xi2 + ... + β2k × xik = β′
2 × xi
ln( pij
1 − piJ
)= β j0 + β j1 × xi1 + β j2 × xi2 + ... + β jk × xik = β′
j × xi
ln(
piJ
1 − piJ
)= β J0 + β J1 × xi1 + β J2 × xi2 + ... + β Jk × xik = β′
J × xi
β ji
j
ln( pij
1 − piJ
)= β j0 + β j1 × xi1 + β j2 × xi2 + ... + β jk × xik = β′
j × xi
Ω =pij
1 − pij= eβ j0+β j1×xi1+β j2×xi2+...+β jk×xik
J = 2(j = 1, 2)
ln(
pi11 − pi1
)= β10 + β11 × xi1 + β12 × xi2 + ... + β1k × xik = β′
1 × xi
i
pi1 =eβ10+β11×xi1+β12×xi2+...+β1k×xik
1 + eβ10+β11×xi1+β12×xi2+...+β1k×xik=
eβ′1×xi
1 + eβ′1×xi
i
pi2 = 1 − pi1 = 1 − eβ10+β11×xi1+β12×xi2+...+β1k×xik
1 + eβ10+β11×xi1+β12×xi2+...+β1k×xik=
11 + eβ′
1×xi
Ω =pi1
1 − pi1= eβ10+β11×xi1+β12×xi2+...+β1k×xik = eβ′
1×xi
Ω =p
1 − p= eβ0+β1×xi
J > 2
ij < J
pij =eβ′
J×xi
1 +J−1∑
j=1eβ′
j×xi
j < J
ij = J
piJ =1
1 +J−1∑
j=1eβ′
j×xi
j = J
jJ
xi1, ..., xik
xik xik + 1
OR =Ω(xik+1)
Ω(xik)=
eβ j0+β j1×xi1+β j2×xi2+...+β jk×(xik+1)
eβ j0+β j1×xi1+β j2×xi2+...+β jk×xik= eβ jk
β jkj J
xik
yk
x1k<
>
x2k
x3k
x4k
−LL−2LL
−2LL−2LL
−2LL
j
ln(
pcasnica1 − psalariat
)= βcasnica0 + βcasnicaNATMaghiar × xNATMaghiar+
+βcasnicaNATRoman × xNATRoman + βcasnicaNATRom × xNATRom+
+βcasnicaNIVEscazut × xNIVEscazut + βcasnicaNIVEsuperior × xNIVEsuperior
βcasnicaNIVEsuperior
j
ln(
pelev1 − psalariat
)= βelev0 + βelevNATMaghiar × xNATMaghiar+
+βelevNATRoman × xNATRoman + βelevNATRom × xNATTRom+
+βelevNIVEscazut × xNIVEscazut + βelevNIVEsuperior × xNIVEsuperior
ln( ppensionar
1 − psalariat
)= βpensionar0 + βpensionarNATMaghiar × xNATMaghiar+
+βpensionarNATRoman × xNATRoman + βpensionarNATRom × xNATRom+
+βpensionarNIVEscazut × xNIVEscazut + βpensionarNIVEsuperior × xNIVEsuperior
ln(
pstudent1 − psalariat
)= βstudent0 + βstudentNATMaghiar × xNATMaghiar+
+βstudentNATRoman × xNATRoman + βstudentNATRom × xNATRom+
+βstudentNIVEscazut × xNIVEscazut + βstudentNIVEsuperior × xNIVEsuperior
βstudentNIVEsuperior
eβstudentNIVEsuperior = 0.2540029
1 10
≤<
≥>
Volumei = a · Girthi + b · Heighti + ϵi
Permi = a · Areai + b · Perii + c cot Shapei + ϵi
Yi = a · X1i + b · X2i + ϵi, i = 1...100000000
X
acest: 12si: 35de: 45
acest: 10si: 40de: 15
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
acest: 12si: 78de: 12
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
acest: 34si: 153de: 72
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Calculator 1 Calculator 2 Calculator n Hardware-ul
Sistemul de fisiere distribuit - HDFS
MapReduce
Sistemul Hadoop
Software de analiza statistica
Pig Hive Rhipe RHadoop Streaming
R
R R
Nivelul Middleware
Interfata
yi = β1 × xi1 + · · ·+ βp × xip + ϵi = xTi × β + ϵi, i = 1, . . . , n
yi xi p
i 1 n
n
y =
⎛
⎜⎜⎜⎝
y1y2
yn
⎞
⎟⎟⎟⎠, X =
⎛
⎜⎜⎜⎝
xT1
xT2
xTn
⎞
⎟⎟⎟⎠=
⎛
⎜⎜⎜⎝
x1,1 · · · x1,px2,1 · · · x2,p
xn,1 · · · xn,p
⎞
⎟⎟⎟⎠, β =
⎛
⎜⎜⎜⎝
βT1
βT2
βTp
⎞
⎟⎟⎟⎠, ϵ =
⎛
⎜⎜⎜⎝
ϵT1
ϵT2
ϵTn
⎞
⎟⎟⎟⎠
y = X × β + ϵ
β
β
β = (XTX)−1XTy
XTX(XTX)−1 XTy
(XTX)−1
XTXβ = XTy
XTX XTy β
β = (XTX)−1XTy
solve(XTX, XTy)
XX y
n = 20000 xiX (20000, 15) y
20000 A(20000, 15)
solve(XTX, XTy)
XTX
(15, 20000)× (20000, 15) = (15, 15)(15, 15)
XTy (15, 20000)× (20000, 1) = (15, 1)
solve(XTX, XTy)
X20000 × 15 = 300000
X1, 2, 3 . . . 20000 X
y
XTX
Xr
XXr
Xr
X
XXTX
mm < n n
XTX
XTy
yXr
y
Top Related