Quantile Regression - University of Washington
Transcript of Quantile Regression - University of Washington
Quantile Regression
Caren Marzban
Applied Physics Lab., Department of Statistics
Univ. of Washington, Seattle, WA, USA 98195
CAPS, University of Oklahoma, Norman, OK
Abstract
The prediction from most regression models - be it multiple regression, neural networks, trees, etc. -is a point estimate of the conditional mean of a response (i.e., quantity being predicted), given a set ofpredictors. However, the conditional mean measures only the βcenterβ of the conditional distribution ofthe response. A more complete summary of the conditional distribution is provided by its quantiles. The0.5 quantile (i.e., the median) can serve as a measure of the center, and the 0.9 quantile marks the valueof the response below which resides 90quantile and the 0.05 quantile serves as the 90thereby conveyinguncertainty. Quantiles arise naturally in environmental sciences. For example, one may desire to knowthe lowest level (e.g., 0.1 quantile) of a river, given the amount of snowpack; or the highest tempera-ture (e.g., the 0.9 quantile), given cloud cover. Recent advances in computing allow the developmentof regression models for predicting a given quantile of the conditional distribution, both parametricallyand nonparametrically. The general approach is called Quantile Regression, but the methodology (ofconditional quantile estimation) applies to any statistical model, be it multiple regression, support vectormachines, or random forests. In this talk, the principles of quantile regression are reviewed and themethodology is illustrated through several examples. The technique and the examples display many ofthe features common in both machine learning and statistics.
Motivation
Q: Given data yi, i = 1, 2, ..., n, what prediction Ξ± minimizes MSE
E =1
n
nβi
Ξ΅2i =
1
n
nβi(yi β Ξ±)2 ?
1
A: The sample mean of y. Proof:
dE
dΞ±βΌ 1
n
nβi(yi β Ξ±) βΌ 1
n
nβi
yi β Ξ± ββ Ξ± =1
n
nβi
yi .
Q: Given data yi, i = 1, 2, ..., n, what prediction Ξ± minimizes the MAE
E =1
n
nβi|Ξ΅i| =
1
n
nβi|yi β Ξ±| ?
A: The sample median of y. Proof: Similar.
Q: Given data (xi, yi), what prediction y(x) = Ξ±0 + Ξ±1x minimizes MSE
E =1
n
nβi
Ξ΅i =1
n
nβi[yi β (Ξ±0 + Ξ±1 xi)]
2?
A: The conditional sample mean of y. I.e.,
y(x) = Ξ±0 + Ξ±1x = mean of y, given x
2
Moving on ...
E[y|x] = conditional mean = location of conditional distribution.
Width of conditional distribution βΌ uncertainty.
Why conditional mean? Why not conditional median?
And if median (i.e., 0.5 quantile), why not other quantiles?
Why not output = 0.1, ..., 0.5, ..., 0.9 conditional quantile (given x)?
Hence Quantile Regression (QR).
Effectively, QR produces the whole conditional distribution of y.
With it one can estimate and conduct inference on conditional quantile func-
tions. (βΌ Extreme Value Theory).
90% Prediction Interval (not CI) = 0.95 quantile - 0.05 quantile.
3
History
Google counts:
meteorology regression 655,000
meteorology βartificial intelligenceβ 495,000
meteorology βneural networksβ 153,000
meteorology βquantile regressionβ 296
QR: Roger Koenker and Gilbert Bassett (1978), Econometrica.
QR Neural Nets: James W. Taylor (2000), Journal of Forecasting.
QR Forests: Nicolai Meinshausen (2006), Jour. of Machine Learning Research.
Nonparametric QR: Takeuchi, Le, Sears, Smola, (2006), Jour. of Machine
Learning Research.
In meteorology:
QR: John Bjrnar Bremnes (2004): Probabilistic forecasts of precipitation in
terms of quantiles using NWP model output. Mon. Wea. Rev.
Friedrichs and Hense (2007): Statistical Downscaling of Extreme Precipitation
Events Using Censored Quantile Regression. Mon. Wea. Rev.
Jeremie Juban , Lionel Fugon, George Kariniotakis (2007): Probabilistic short-
term wind power forecasting based on kernel density estimators. European
Wind Energy Conference. (QR Forests)
4
Quantile Regression (Forests)
Instead of minimizingβn
i [yi β (Ξ±0 + Ξ±1 xi)]2, minimize
nβi
f (yi β (Ξ±0 + Ξ±1 xi))
where
f (y β q) =
Ξ² (y β q) y β₯ q
(1β Ξ²) (q β y) y < q
to obtain the Ξ²th quantile.
Instead of
E[y|x] = Ξ±0 + Ξ±1x
QR gives, for each quantile
Q[y|x] = Ξ±0 + Ξ±1x
Nonparametric QR gives
Q[y|x] = splines
QR Forestsβ give
Q[y|x] = piece-wise constant
β Breiman (2001): Take a bootstrap sample, develop a regression tree, repeat,
average the outputs over the samples.
5
Examples
1) No. tornadoes/month, 1950-1998 (technically, not regression).
12-month period filtered out with running median filter.
Courtesy of Joseph Schaefer, Storm Prediction Center.
2) This monthβs vs. last monthβs number of tornadoes.
3) Global temperature, 1851-1996 (Ibid).
http://www.image.ucar.edu/GSP/Data/Climate/observed.dat
4) Max water-level (above NAVD88) vs. Max 1-hr change, 23 sites in south-
western Louisiana and southeastern Texas following Hurricane Rita, Sep 2005.
http://pubs.usgs.gov/ds/2006/220/ (Table 3).
5) Six-minute obs. water-level vs. water temperature, San Fransisco, CA,
12/29/07-12/31/07
http://tidesonline.nos.noaa.gov/data read.shtml?station info=9414290+San+Francisco,+CA
6) Annual stream flow vs. prcp, 1948-1997, Big Sioux, Dell Rapids, SD.
Courtesy of Dennis Lettenmaier, Univ. of Washington.
7) Avg. (over 1931-1960) Jan. daily-min. temp. for 56 US cities, vs. lat. lon.
Code: R (packages: splines, quantreg, quantregForest).
Plot 0.05, 0.25, 0.5, 0.75, 0.95 quantiles, and the least-square fit (dashed).
Disclaimer: All substantive conclusions are cursory.
6
Tornado Time Series
0.2 0.4 0.6 0.8
15
25
35
(Intercept)
oo
o
o o
oo
o
o
0.2 0.4 0.6 0.8
0.0
50
.08
x
o
o oo
o oo
o o
No. of tornadoes is increasing with time.
The quantiles fan out slightly.
7
ββββ
β
βββ
ββββββββββ
ββ
β
βββββ
βββββββββββββββ
βββ
ββββββ
ββββββ
βββββ
β
β
ββββ
βββββββ
βββββββ
ββββ
β
βββ
βββββββββββ
β
β
ββ
βββββ
ββββ
βββββ
ββββ
ββββββ
β
ββββ
βββ
βββββββββββ
ββ
βββ
β
β
βββ
βββββββ
βββββββ
ββββ
β
βββββββββββββ
ββββ
β
β
βββββ
β
ββββ
ββββ
ββββ
βββ
βββ
β
ββ
βββ
βββββ
βββ
β
ββ
βββββββ
ββββ
βββββββ
ββ
β
ββββ
βββββββββ
ββ
βββββ
βββ
ββββββββ
βββ
ββ
βββ
β
βββββ
βββββ
β
βββββ
ββββ
βββ
ββββββ
ββββββββββββ
βββββ
β
β
ββββββββββββ
βββββ
β
β
βββββ
βββββ
β
ββββββ
ββββ
ββββ
ββ
β
β
βββββββββ
ββ
ββ
βββ
β
ββββ
ββββββββ
ββ
ββββ
β
ββββ
β
βββββββ
βββββ
β
βββββ
ββ
ββββββ
βββββ
β
ββ
β
β
β
β
β
ββ
β
ββββββββββββ
βββββ
ββ
βββ
ββ
ββββββ
ββ
βββββ
β
ββββ
β
ββ
βββ
β
βββββ
β
β
ββββββββββ
β
β
ββββ
β
ββββ
β
ββ
βββββ
ββββ
βββββββββββ
β
ββ
β
βββββ
β
β
β
0 100 200 300 400 500 600
2040
6080
100
x
y
The max (i.e., 0.95 quantile) number of tornadoes grows faster than the min
(i.e., 0.05 quantile) number of tornadoes.
More realistically, min is constant in time but max gows?
8
This Monthβs vs. Last Monthβs No. of Tornadoes
0.2 0.4 0.6 0.8
15
25
35
(Intercept)
oo
o
o o
oo
o
o
0.2 0.4 0.6 0.8
0.0
50
.08
x
o
o oo
o oo
o o
More/less tornadoes yesterday means more/less tornadoes today.
Center of distribution shifts up, but it spreads out too.
Median below mean, etc., i.e., distribution has a long tail on the high-y side.
9
β ββ
β
ββ
β
βββββ
ββ
β
β
β
ββ
ββ
β ββ
β
ββββ
β
β
βββ β
β β
ββ
β
β
β
β
βββ
β
β
β
β
β
ββ
ββ
ββ
β
β
ββ
β
β
ββ
β
β
β
β β
ββ
β
β
ββ
β
β
β
β
β
ββ
β
β
β
β
β
β
β
βββ
β
β
β
ββ
β
β
ββ
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
ββ
β β
β
β
β
ββ
βββ
ββ
β
β
β
β
β
β
β
β
β
β
ββ
β
ββ
β
β
β
β
β
βββ
β
β
β
β
β
β
β
ββ
ββ
β
β
β
β
β
ββ
β
β
βββββ
β β
β
β
β
β
ββ
β
β
ββ
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
βββ
β
β
β
β
β
ββ
β
β
ββ
β
β
β
β
β
βββ
ββ
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
ββ
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
βββ
β
β
β
β
β
β
β
β
ββββ
β
β
β
β
ββ
β
β
ββ
β
β
β
β
β
β
β
ββ
β
β
β
βββ
β
β
β
β
β
β
ββ
βββ
β
β
β
β
β
β
β
ββ
ββ
β
β
β
β
β
β
β
ββ
β
β
β
β
β
ββ
β
β
β
β
ββ
β
β
β
β
β
ββ
β
β
β
β
β
β
βββ
β
β
β
β
β
β
β
ββ
ββ
β
β
β
β
β
β
ββ
β
β
ββ
β
β
β
ββ
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
βββ
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
βββ
β
β
β
β β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β β
β
β
β
β
β
β
β
β
β
0 100 200 300 400
010
020
030
040
0
x
y
Shape of conditional distribution of y changes with x.
10
Global Temperature
0.2 0.4 0.6 0.8
β0
.5β
0.2
0.1
(Intercept)
o
oo o
oo
oo
o
0.2 0.4 0.6 0.8
0.0
02
50
.00
45
x
o o o
oo
o o o o
Global temperature is increasing with time.
No fanning, i.e., whole conditional distribution shifts up.
See next.
11
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
βββ
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
βββ
β
β
ββββ
β
β
β
β
ββ
β
β
β
β
β
ββ
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
ββ
β
β
β
β
β
β
β
β
0 50 100 150
β0.2
0.00.2
0.40.6
x
y
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
βββ
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
βββ
β
β
ββββ
β
β
β
β
ββ
β
β
β
β
β
ββ
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
ββ
β
β
β
β
β
β
β
β
0 50 100 150
β0.2
0.0
0.2
0.4
0.6
x
y
Global temperature is increasing/undulating.
The central spread (i.e., 0.75 - 0.25 quantile) is constant.
The full spread (i.e., 0.95 - 0.05 quantile) shrinks with time.
12
Hurricane Rita
0.2 0.4 0.6 0.8
02
04
0(Intercept)
o o o o o o o o
o
0.2 0.4 0.6 0.8
β0
.20
.41
.0
x
oo o o o o o
o
o
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
4 6 8 10 12 14
01
23
45
6
x
y
There is a positive relationship between max water-level and max change in
water-level.
Insufficient data for anything else!
13
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
4 6 8 10 12 14
01
23
45
6
x
y
Crossing Quantiles?
Probably overfitting!
14
Water Level vs. Temperature at San Fransisco
0.2 0.4 0.6 0.8
10
02
00
30
0(Intercept)
o
o
o o oo
o
o
o
0.2 0.4 0.6 0.8
β6
β4
β2
x
o
o
o o oo
o
o
o
βββ
ββ
βββββ
β
βββββ
βββ
βββ
ββββ
βββββββ
ββββββ
βββββββββββββββββββββββββ
ββββ
ββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββ
ββββββββββββββββββββββββββββ
ββ
βββ
β
β
β
β
β
βββ
ββ
β
ββ
β
β
βββ
β
β
ββ
ββ
β
β
ββ
βββββ
βββ
βββββββ β ββββ ββ
ββ ββ β βββ
ββββββ
ββββ
ββ
ββ
ββ
ββ
ββββββ
βββ
ββ
ββββββββ
βββββ
βββ
βββββββββ
ββββββββββββββ
βββββββ βββββ
βββββββ
βββββββββββββ
βββ
ββββββββββ β β
βββ
ββββ
βββ
ββ
ββββ
βββ
βββ
βββββ
ββ
ββ
βββ
ββ
ββββ
βββ
β ββββββ
βββββ
βββββββββ
βββββ
ββ
ββββ
β
β
β
β
β
β
ββ
ββ
ββ
β
β
β
β
β
β
β
ββ
βββ
βββ
βββ β
β
β
βββ
β βββββββββ
50.5 51.0 51.5 52.0
02
46
x
y
Surge inversely related to water temperature.
Reverse-fanning, i.e., less uncertainty in water-level at high temperatures.
15
βββ
ββ
βββββ
β
βββββ
βββ
βββ
ββββ
βββββββ
ββββββ
βββββββββββββββββββββββββ
ββββ
ββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββ
ββββββββββββββββββββββββββββ
ββ
βββ
β
β
β
β
β
βββ
ββ
β
ββ
β
β
βββ
β
β
ββ
ββ
β
β
ββ
βββββ
βββ
βββββββ β ββββ ββ
ββ ββ β βββ
ββββββ
ββββ
ββ
ββ
ββ
ββ
ββββββ
βββ
ββ
ββββββββ
βββββ
βββ
βββββββββ
ββββββββββββββ
βββββββ βββββ
βββββββ
βββββββββββββ
βββ
ββββββββββ β β
βββ
ββββ
βββ
ββ
ββββ
βββ
βββ
βββββ
ββ
ββ
βββ
ββ
ββββ
βββ
β ββββββ
βββββ
βββββββββ
βββββ
ββ
ββββ
β
β
β
β
β
β
ββ
ββ
ββ
β
β
β
β
β
β
β
ββ
βββ
βββ
βββ β
β
β
βββ
β βββββββββ
50.5 51.0 51.5 52.0
02
46
x
y
Nonlinear, but why?
Shape of distribution of y changes with x.
16
Stream Flow vs. Precipitation
x
Fre
quen
cy
4 6 8 10
02
46
810
y
Fre
quen
cy
0 500 1000 1500
02
46
810
log(y)
Fre
quen
cy
3 4 5 6 7
01
23
45
6
Transform stream flow to logarithm of stream flow.
17
0.2 0.4 0.6 0.8
02
46
(Intercept)
oo o o o o o o
o
0.2 0.4 0.6 0.8
β0
.20
.41
.0
x
o o oo o o o o
o
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
4 5 6 7 8 9 10
34
56
7
x
log(
y)
Increasing prcp leads to increasing flow.
But not for the lowest quantile (0.05)! The upper quartile (0.95) doesnβt
increase as fast as the middle portion of the distribution.
18
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
4 5 6 7 8 9 10
34
56
7
x
log(
y)
The middle portion of the distribution of flow shifts up with prcp.
Beware of extrapolation (0.05 quantile)!
19
Flow vs. Prcp with Quantile Random Forrest
β
ββ
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
4 5 6 7 8 9 10
34
56
7
x
log(
y)
Abrupt shifts in flow when prcp is about 5.5 and 7.5.
Again, lower quantile (0.05) is mostly insensitive to prcp.
20
Jan. Min. Temperatures vs. Lat. and Lon.
60 70 80 90 100 110
2530
3540
45
180βlon
lat
60 70 80 90 100 110
2530
3540
45
180βlon
lat
60 70 80 90 100 110
2530
3540
45
180βlon
lat
Quantiles: 0.99 (top), 0.5 (middle), 0.01 (bottom).
Top vs. median: SE and W mostly same; SW and NE warmer in top.
Bottom vs. median: SW and East of Mississippi cooler in bottom; No gradient
across Mexican border. (Acknowledment: Scott Sandgathe)
21
Conclusions I
βOrdinaryβ regression, linear or not, provides only a single summary measure
for the conditional distribution of response (predictand) - the mean - given the
predictor.
Quantile regression provides a more complete picture of the conditional distri-
bution.
The idea naturally lends itself to all the generalizations that have followed
ordinary regression.
Many problems would actually benefit more from the prediction of extreme
values (rather than the mean), anyway.
Even when the expected value is of interest, QR assays uncertainy via predic-
tion intervals.
22
Juvenile Conclusions
The number of tornadoes is on the rise.
Max grows faster than min, the latter being mostly constant.
The whole shape of the conditional distribution of the number of tornadoes
today varies as a function of the number of tornadoes yesterday.
Global temperature is on the rise.
The variation (between years) is dropping.
Max water level is positively associated with max change in water level.
Beware of overfitting.
Water level is negatively associated with water temperature.
The shape of the conditional distribution of the former varies with the latter.
Increased prcp leads to increased stream flow (for one station).
But not for the lowest quantiles of stream flow.
The strong temperature gradient across the Mexican border exists only in the
median, and not in the tails of the distribution.
23