STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized...

17
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction The cluster analysis The generalized additive model Discussion . . STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE SIZE DATA AND ATMOSPHERIC DATA Vagheggini, A. *1 , Scotto, F. 2 , Bacco, D. 3 , Ricciardelli, I. 2 , Trentini, A. 2 , Cocchi, D. 1 and Poluzzi, V. 2 * [email protected] 1 Department of Statistical Sciences, University of Bologna 2 Regional Agency for Prevention and Environment of Emilia-Romagna region (ARPA ER) 3 Department of Chemical and Pharmaceutical Sciences, University of Ferrara DUST 2014, International Conference on Athmospheric Dust Castellaneta Marina, June 6th, 2014 Alessandro Vagheggini Urban particle size and atmospheric data

Transcript of STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized...

Page 1: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

.

......

STATISTICAL ANALYSIS OF THE RELATIONSHIPBETWEEN URBAN PARTICLE SIZE DATA AND

ATMOSPHERIC DATA

Vagheggini, A.∗1, Scotto, F.2, Bacco, D.3, Ricciardelli, I.2,Trentini, A.2, Cocchi, D.1 and Poluzzi, V.2

[email protected]

1Department of Statistical Sciences, University of Bologna2Regional Agency for Prevention and Environment of Emilia-Romagna region (ARPA ER)

3Department of Chemical and Pharmaceutical Sciences, University of Ferrara

DUST 2014, International Conference on Athmospheric DustCastellaneta Marina, June 6th, 2014

Alessandro Vagheggini Urban particle size and atmospheric data

Page 2: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

.. Outline

...1 IntroductionThe SUPERSITO projectThe dataThe statistical analyses

...2 The cluster analysisk-means clusteringResults interpretation

...3 The generalized additive modelExplorative analysesThe model

...4 Discussion

Alessandro Vagheggini Urban particle size and atmospheric data

Page 3: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

The SUPERSITO projectThe dataThe statistical analyses

.. The SUPERSITO project

The SUPERSITO project focuses on the detailed study of some chemicals,physical and toxicological parameters and on health, epidemiological andenvironmental assessment by interpretative models, in the atmosphere ofEmilia-Romagna (Italy).

The project rises from the necessity to improve knowledge aboutenvironmental and health aspects of fine and ultrafine particulate, inprimary and secondary components, in atmosphere. Thus, the project isstructured in seven work-packages each complementary with the other.Among these, the WP7 (work-package 7 ) deals with the analysis ofenvironmental data.

The sampling sites are five: three in urban background sites (Bologna,Parma and Rimini), one in rural background site (San Pietro Capofiume)and one in remote site (Monte Cimone).

Alessandro Vagheggini Urban particle size and atmospheric data

Page 4: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

The SUPERSITO projectThe dataThe statistical analyses

.. The data

The dataset consists of the hourly mean of the five-minute measurements ofthe number concentration and size distribution of atmospheric particlesmeasured by a scanning mobility particle sizer (SMPS model 3936, TSI)situated in the urban background site of Bologna and covering a six monthperiod from January 1st to June 30th, 2013. The measured particle sizesrange from 15 to 600 nm (1× 10−9 m).

Hourly detections of wind speed and direction, average temperature,average relative humidity and carbon monoxide, gathered at a station closeto the main site during the same period from January 1st to June 30th,compose the atmospheric dataset.

Two main question arose:

(1) whether it is possible to identify similar hourly distributions andinvestigate development, growth and transportation particles;

(2) whether the hourly particle number concentration (PNC) is affected bythe atmospheric variables.

Alessandro Vagheggini Urban particle size and atmospheric data

Page 5: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

The SUPERSITO projectThe dataThe statistical analyses

.. The statistical analyses

In order to answer the previous questions we carried out the followingstatistical analyses:

(1) firstly, a cluster analysis was used to investigate the similarities in thehourly measurements of the dimensional distribution. Then, agraphical analysis of the average measurement of the numberconcentration for each channel was conducted to identify thedimensional distribution of each cluster. The plot of the ratio of eachcluster hours counts over the dataset hours counts allowed to clarify theinterpretation of each cluster;

(2) the non-linear relationship between atmospheric variables and PNCdata led to the choice of the generalized additive models as theframework to investigate connections between atmospheric variablesand ultrafine particulate matter.

Alessandro Vagheggini Urban particle size and atmospheric data

Page 6: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

k-means clusteringResults interpretation

.. Transforming the data

Following Beddows et al. (2009), a k-means cluster analysis has beencarried out on the hourly measurement of each channel standardized withsquare root of the sum of the squared hourly spectra as follows:

smps∗h,m =smpsh,m√√√√ M∑

m=1

smps2h,m

,

where smpsh,m is the hth hourly dN/d log10(Dp) observation for the mthchannel.

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Avergage silhouette width & Dunn index vs cluster number

cluster number

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Av. silhouette widthDunn index

The average silhouette width and theDunn index were used to determinethe clusters number. We decided tochoose 10 clusters since a trade-off be-tween the two index was needed. Theirvalues, 0.192 and 0.045, respectively,differs little from those obtained foreight or 12 groups.

Alessandro Vagheggini Urban particle size and atmospheric data

Page 7: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

k-means clusteringResults interpretation

.. Weekdays and weekends

20 50 100 200 500

05

1015

20

cluster 2

Dp [µm]

dN

/ d

Log(

Dp)

x 1

03 [cm

−3]

0 5 10 15 200.

00.

20.

40.

60.

8

cluster 2

Hour of the day

Hou

rs c

ount

in %

wholeweekdaysweekends

20 50 100 200 500

05

1015

20

cluster 3

Dp [µm]

dN

/ d

Log(

Dp)

x 1

03 [cm

−3]

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

cluster 3

Hour of the day

Hou

rs c

ount

in %

wholeweekdaysweekends

20 50 100 200 500

05

1015

20

cluster 8

Dp [µm]

dN

/ d

Log(

Dp)

x 1

03 [cm

−3]

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

cluster 8

Hour of the day

Hou

rs c

ount

in %

wholeweekdaysweekends

20 50 100 200 500

05

1015

20

cluster 1

Dp [µm]

dN

/ d

Log(

Dp)

x 1

03 [cm

−3]

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

cluster 1

Hour of the day

Hou

rs c

ount

in %

wholeweekdaysweekends

20 50 100 200 500

05

1015

20

cluster 6

Dp [µm]

dN

/ d

Log(

Dp)

x 1

03 [cm

−3]

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

cluster 6

Hour of the day

Hou

rs c

ount

in %

wholeweekdaysweekends

Clusters 2, 3 and 8 can be seen as representa-tive of the weekdays since their distribution’smodes range from 20 to 25 nm and may belinked to anthropic activities, such as roadtraffic. The plots of the hours counts for theseclusters, showing very different distributionsfor weekdays and weekends, support this im-pression.

Clusters 1 and 6 may well represent night-time hours and weekends when anthropic ac-tivities reduce; their distributions are charac-terized by high mode values perhaps relatedto regional transported particles. The dis-tributions of the hours counts highlight highpercentage of nighttime hours and there is noappreciable difference between weekdays andweekends.

Alessandro Vagheggini Urban particle size and atmospheric data

Page 8: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

k-means clusteringResults interpretation

.. Seasons

20 50 100 200 500

05

1015

20

cluster 5

Dp [µm]

dN

/ d

Log(

Dp)

x 1

03 [cm

−3]

0.00

0.10

0.20

0.30

cluster 5

Month

Mon

ths

coun

t in

%

Jan Feb Mar Apr May Jun

20 50 100 200 500

05

1015

20

cluster 10

Dp [µm]

dN

/ d

Log(

Dp)

x 1

03 [cm

−3]

0.00

0.10

0.20

0.30

cluster 10

Month

Mon

ths

coun

t in

%

Jan Feb Mar Apr May Jun

20 50 100 200 500

05

1015

20

cluster 4

Dp [µm]

dN

/ d

Log(

Dp)

x 1

03 [cm

−3]

0.00

0.10

0.20

0.30

cluster 4

Month

Mon

ths

coun

t in

%

Jan Feb Mar Apr May Jun

20 50 100 200 500

05

1015

20

cluster 7

Dp [µm]

dN

/ d

Log(

Dp)

x 1

03 [cm

−3]

0.00

0.10

0.20

0.30

cluster 7

Month

Mon

ths

coun

t in

%

Jan Feb Mar Apr May Jun

20 50 100 200 500

05

1015

20

cluster 9

Dp [µm]

dN

/ d

Log(

Dp)

x 1

03 [cm

−3]

0.00

0.10

0.20

0.30

cluster 9

Month

Mon

ths

coun

t in

%

Jan Feb Mar Apr May Jun

Clusters 5 and 10 are associated with win-ter months since their months distributionspresents peak in correspondence of Januaryand February and then decrease. Furtheranalyses on the atmospheric variables havepointed out that these clusters group observa-tions with higher humidity perhaps suggestingcondensation and curling of particles.

Clusters 4, 7 and 9 can be considered as repre-sentative of the spring months as their monthsdistributions show a trend which increases af-ter April-May.

Alessandro Vagheggini Urban particle size and atmospheric data

Page 9: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

k-means clusteringResults interpretation

.. Spring

The hours counts plots for the ”spring” show a peculiar pattern: cluster 9looks as it is mainly composed by morning hours (8–12 am), cluster 4 bythe afternoon hours (12am–6pm) and cluster 7 presents the evening andnight hours (6pm–3am).

0 5 10 15 20

0.0

0.1

0.2

0.3

0.4

Cluster 4, 7 and 9 (spring)

Hour of the day

Hou

rs c

ount

in %

cluster 4cluster 7cluster 9

This peculiarity is strictly connected to the events of formation of newparticles (cluster 9), their growth (cluster 4) and ageing (cluster 7).

Alessandro Vagheggini Urban particle size and atmospheric data

Page 10: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

Explorative analysesThe model

.. Explorative analyses

0 2 4 6 8

1112

1314

15

wind speed

log(

pnc)

0 50 150 250 350

1112

1314

15

wind.direction

log(

pnc)

0 5 10 15 20 25 30 35

1112

1314

15

ave temp

log(

pnc)

20 40 60 80 100

1112

1314

15

ave RH

log(

pnc)

0.5 1.0 1.5 2.0 2.5

1112

1314

15

CO

log(

pnc)

11 12 13 14 15

1112

1314

15

lag −1h

log(

pnc)

11 12 13 14 15

1112

1314

15

lag −2h

log(

pnc)

11 12 13 14 15

1112

1314

15

lag −3h

log(

pnc)

11 12 13 14 15

1112

1314

15

lag −24h

log(

pnc)

Wind speed seems not linearly correlated to thelog(PNC); however, the log(PNC) seems to de-crease as the wind increases.

The log(PNC) seems quite uniformly distributedamong the wind direction, a part for winds blow-ing form south which are associated with a slightdecrease of the particle concentration.

The log(PNC) is not affected by the averagetemperature; in correspondence of temperaturesranging from 10◦C to 25◦C lower values of thelog(PNC) have been measured.

Humidity seems not to affect the log(PNC).

Increasing values of the CO are not related tolow values of log(PNC).

Starting from three hour lag, the correlationwith the original series seems to loosen.

Alessandro Vagheggini Urban particle size and atmospheric data

Page 11: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

Explorative analysesThe model

.. Bases for the covariates

variable covariate base

wd wind direction cpws wind speed tpwd & ws wind direction and speed ccrh relative humidity tpco CO (carbon monoxide) tptemp hourly average temperature tphrs hour of the day cpday day of the week factorlag1 log(PNC) lag 1 hour tplag2 log(PNC) lag 2 hours tplag3 log(PNC) lag 3 hours tplag24 log(PNC) lag 1 day tp

cp: cyclic version of a P-spline; tp: thin plate regression spline; cc: cycliccubic regression spline tensor product; factor: categorical variable.

Alessandro Vagheggini Urban particle size and atmospheric data

Page 12: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

Explorative analysesThe model

.. Model selection

removed var wd, ws tensor

– 712.283 711.485lag24 724.043 711.485lag3 729.196 728.604lag2 775.589 776.315lag1 2735.322 2729.217wd 714.163 –ws 715.327 –wd & ws 717.959 717.959rh 746.739 749.268co 737.836 737.017temp 752.598 754.320hrs 837.298 835.961day 711.874 711.121day & wd 713.788 –day & ws 715.331 –day, wd & ws 718.264 718.264

Different models were comparedthrough the use of the Akaike informa-tion criterion (AIC).

The results highlighted that the mod-els incorporating the wind direction andspeed as a cyclic cubic regression splinetensor product produce slightly betterresults in terms of AIC.

Categorical variable day seems not to af-fect the model, in fact the AIC valueslightly decrease once it is removed.

Removing the one hour lag sensibly af-fects the models AIC (over three timeshigher)

Alessandro Vagheggini Urban particle size and atmospheric data

Page 13: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

Explorative analysesThe model

.. Model addends

20 40 60 80 100

−3

−2

−1

01

2

rh

s(rh

,1.2

6)

0.5 1.0 1.5 2.0 2.5

−3

−2

−1

01

2

co

s(co

,2.5

3)

0 5 10 15 20 25 30 35

−3

−2

−1

01

2

temp

s(te

mp,

4.14

)

0 5 10 15 20

−3

−2

−1

01

2

hrs

s(hr

s,7.

66)

11 12 13 14 15

−3

−2

−1

01

2

lag1

s(la

g1,5

.1)

11 12 13 14 15

−3

−2

−1

01

2

lag2

s(la

g2,1

.13)

11 12 13 14 15

−3

−2

−1

01

2

lag3

s(la

g3,7

.12)

11 12 13 14 15

−3

−2

−1

01

2

lag24

s(la

g24,

1)

The relative humidity, carbon monoxide andtemperature show an almost linear effect. Hu-midity seems not to affect the model, while COand temperature show slightly increasing and de-creasing effects, respectively.

The cyclic P-spline of the hour of the day showsa fluctuating effect: it shows two local maximumat 7am and 6pm.

The one hour lag influences in a considerablepositive way the log(PNC), while, surprisingly,the effect of the two hour lag is negative. Thethree hour lag fluctuates, although presenting aslight positive effect. The one day lag seems notto influence the log(PNC).

Alessandro Vagheggini Urban particle size and atmospheric data

Page 14: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

Explorative analysesThe model

.. Wind direction and speed

0 1 2 3 4 5 6

-3-2

-10

12

wd

s(wd,1.35)

0 2 4 6 8-3

-2-1

01

2

ws

s(ws,4.25)

ws

wd

te(ws,wd,3.99)

In the cyclic P-spline of the wind direction (left-hand panel) and in the thinplate regression spline of the wind speed (central panel) it is notappreciable the non-linear effect of the two variables which is clear whenthy are considered together.

In the perspective plot of the cyclic cubic regression spline tensor product(right-hand panel) it can be seen the combined effects of the wind directionand speed.

Alessandro Vagheggini Urban particle size and atmospheric data

Page 15: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

.. Conclusions and future developments

Conclusions:

the cluster analysis provided a useful tool to investigate similarities inthe hourly dimensional distribution allowing to identify peculiarity inthe data which are supported by the chemical and physical theory;

the generalized additive model theory allowed to model the particlenumber concentration as a non-linear function of atmospheric variablesand gases; its employment in such a context represents quite a novelty.

Future developments:

we would like to perform a generalized additive model of the particlenumber concentration for each cluster;

the dataset is just a small part of the conclusive one which will spanfor a five year period;

more gases data are being collected such as NO, NO2 and O3 whichmay help to better define the model and explain more relationships.

Alessandro Vagheggini Urban particle size and atmospheric data

Page 16: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

.. Acknowledgements

This research was conducted as part of the Supersito Project which wassupported and financed by Emilia-Romagna Region and Regional Agencyfor Prevention and Environment under Deliberation Regional Governmentn. 1971/13.

Alessandro Vagheggini Urban particle size and atmospheric data

Page 17: STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN … · The cluster analysis The generalized additive model Discussion.. STATISTICAL ANALYSIS OF THE RELATIONSHIP BETWEEN URBAN PARTICLE

..........

..........................................................................

.....

......

..........

.

IntroductionThe cluster analysis

The generalized additive modelDiscussion

.. Essential bibliography

Beddows, D.C.S., Dall’Osto, M. and Harrison, R.M. (2009) Clusteranalysis of rural, urban and curbside atmospheric particle size data,Environmental Sience & Technology 43, 4694–4700.

Clifford, S., Low Choy, S., Hussein, T., Mengersen, K. and Morawska,L. (2011) Using generalised additive model to model the particlenumber count of ultrafine particles, Atmospheric Environment 45,5934–5945.

Hastie, T.J. and Tibshirani, R.J. (1990) Generalized Additive Models,Chapman and Hall/CRC: New York.

Wood, S.N. (2006) Generalized Additive Models: An Introduction withR, Chapman and Hall/CRC: New York.

Alessandro Vagheggini Urban particle size and atmospheric data