BFI & Addiction Risk

38
1 The Five Factor Model of personality and evaluation of drug consumption risk E. Fehrman 1 , A. K. Muhammad 2 , E. M. Mirkes 2 , V. Egan 3 , A. N. Gorban 2 1 Men's Personality Disorder & National Women's Directorate, Rampton Hospital, Retford, Nottinghamshire, DN22 0PD, UK 2 Department of Mathematics, University of Leicester, Leicester, LE1 7RH, UK 3 Department of Psychiatry and Applied Psychology, University of Nottingham, Nottingham, NG8 1BB, UK Abstract The problem of evaluating an individual’s risk of drug consumption and misuse is highly important for health planning. An online survey methodology was employed to collect data including personality traits (NEO-FFI-R), impulsivity (BIS-11), sensation-seeking (ImpSS), and demographic information which influence drug use. The data set contained information on the consumption of 18 central nervous system psychoactive drugs. Correlation analysis using a relative information gain model demonstrates the existence of a group of drugs (amphetamines, cannabis, cocaine, ecstasy, legal highs, LSD, and magic mushrooms) with strongly correlated consumption patterns. An exhaustive search was performed to select the most effective subset of input features and data mining methods to classify users and non-users for each drug type. A number of classification methods were employed (decision tree, random forest, k-nearest neighbors, linear discriminant analysis, Gaussian mixture, probability density function estimation, logistic regression and naïve Bayes) and the most effective method selected for each drug. The quality of classification was surprisingly high. The best results with sensitivity and specificity (evaluated by leave-one-out cross-validation) being greater than 75% were obtained for VSA (volatile substance abuse) and methadone use. Good results with sensitivity and specificity (greater than 70%) were achieved for amphetamines, cannabis, cocaine, crack, ecstasy, heroin, ketamine, legal highs, and nicotine. The poorest result was obtained for prediction of alcohol consumption. Key Words: Drug consumption, data mining, personality factors, NEO-FFI-R, risk evaluation, correlation analysis. 1 Introduction Drug use is a risk behaviour that does not happen in isolation; it constitutes an important factor for increased risk of poor health, along with earlier mortality and morbidity, and has significant consequences for society (McGinnis & Foege, 1993; Sutina, Evans, & Zonderman, 2013). Drug consumption and addiction constitutes a serious problem globally. This includes numerous risk factors, which are defined as any attribute, characteristic, or event in the life of an individual that increases the probability of drug consumption. A number of factors are correlated with initial drug use including psychological, social, individual, environmental, and economic factors (Cleveland, Feinberg, Bontempo, & Greenberg, 2008; Ventura, de Souza, Hayashida, & Ferreira, 2014; World Health Organization, 2004). These factors are likewise associated with a number of personality traits (Dubey, Arora, Gupta, & Kumar, 2010; DordiNejad & Shiran, 2011; Bogg & Roberts, 2004). While legal drugs such as sugar, alcohol and tobacco are probably responsible for far more premature death than illegal recreational drugs (Beaglehole, et al., 2011), the social and personal consequences of recreational drug use can be highly problematic (Bickel, Johnson, Koffarnus, MacKillop, & Murphy, 2014).

Transcript of BFI & Addiction Risk

Page 1: BFI & Addiction Risk

1

The Five Factor Model of personality and evaluation of drug consumption risk

E. Fehrman1, A. K. Muhammad2, E. M. Mirkes2, V. Egan3, A. N. Gorban2

1Men's Personality Disorder & National Women's Directorate, Rampton Hospital, Retford, Nottinghamshire, DN22 0PD, UK

2Department of Mathematics, University of Leicester, Leicester, LE1 7RH, UK 3Department of Psychiatry and Applied Psychology, University of Nottingham, Nottingham, NG8 1BB, UK

Abstract The problem of evaluating an individual’s risk of drug consumption and misuse is highly important for health planning. An online survey methodology was employed to collect data including personality traits (NEO-FFI-R), impulsivity (BIS-11), sensation-seeking (ImpSS), and demographic information which influence drug use. The data set contained information on the consumption of 18 central nervous system psychoactive drugs. Correlation analysis using a relative information gain model demonstrates the existence of a group of drugs (amphetamines, cannabis, cocaine, ecstasy, legal highs, LSD, and magic mushrooms) with strongly correlated consumption patterns. An exhaustive search was performed to select the most effective subset of input features and data mining methods to classify users and non-users for each drug type. A number of classification methods were employed (decision tree, random forest, k-nearest neighbors, linear discriminant analysis, Gaussian mixture, probability density function estimation, logistic regression and naïve Bayes) and the most effective method selected for each drug. The quality of classification was surprisingly high. The best results with sensitivity and specificity (evaluated by leave-one-out cross-validation) being greater than 75% were obtained for VSA (volatile substance abuse) and methadone use. Good results with sensitivity and specificity (greater than 70%) were achieved for amphetamines, cannabis, cocaine, crack, ecstasy, heroin, ketamine, legal highs, and nicotine. The poorest result was obtained for prediction of alcohol consumption.

Key Words: Drug consumption, data mining, personality factors, NEO-FFI-R, risk evaluation, correlation analysis.

1 Introduction

Drug use is a risk behaviour that does not happen in isolation; it constitutes an important factor for increased risk of poor health, along with earlier mortality and morbidity, and has significant consequences for society (McGinnis & Foege, 1993; Sutina, Evans, & Zonderman, 2013). Drug consumption and addiction constitutes a serious problem globally. This includes numerous risk factors, which are defined as any attribute, characteristic, or event in the life of an individual that increases the probability of drug consumption. A number of factors are correlated with initial drug use including psychological, social, individual, environmental, and economic factors (Cleveland, Feinberg, Bontempo, & Greenberg, 2008; Ventura, de Souza, Hayashida, & Ferreira, 2014; World Health Organization, 2004). These factors are likewise associated with a number of personality traits (Dubey, Arora, Gupta, & Kumar, 2010; DordiNejad & Shiran, 2011; Bogg & Roberts, 2004). While legal drugs such as sugar, alcohol and tobacco are probably responsible for far more premature death than illegal recreational drugs (Beaglehole, et al., 2011), the social and personal consequences of recreational drug use can be highly problematic (Bickel, Johnson, Koffarnus, MacKillop, & Murphy, 2014).

Page 2: BFI & Addiction Risk

2

Psychologists have largely agreed that the personality traits of the Five Factor Model (FFM) are the most comprehensive and adaptable system for understanding human individual differences (Costa & MacCrae, 1992). The FFM comprises Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness.

Previous studies demonstrate that high Neuroticism and Openness, and low Agreeableness and Conscientiousness are associated with higher risk of drug use (including cocaine, cannabis, alcohol and marijuana) (Sutina, Evans, & Zonderman, 2013; Flory, Lynam, Milich, Leukefeld, & Clayton, 2002). Roncero et al. (2014) highlight the importance of the relationship between high Neuroticism scores and the presence of psychotic symptoms following cocaine-induced drug consumption. Vollrath & Torgersen (2002) observe that the personality traits of Neuroticism, Extraversion, and Conscientiousness are highly correlated with hazardous health behaviours. A low score of Conscientiousness, and high score of Extraversion or high score of Neuroticism correlate strongly with multiple risky health behaviours.

The statistical characteristics of groups of drug users and non-users have been studied by many authors (see, for example, Terracciano et al. (2008). They found that the personality profile for the users and non-users of tobacco, marijuana, cocaine, and heroin are associated with a higher score on Neuroticism and a very low score for Conscientiousness. The problem of risk evaluation for individuals is much more complex. This was explored very recently by Yasnitskiy et al. (2015), Valeroa et al. (2014), and Bulut & Bucak (2014).

Valeroa et al. (2014) evaluated the individual risk of drug consumption for alcohol, cocaine, opiates, cannabis, ecstasy and amphetamines. Input data were collected using the Spanish version of the Zuckerman–Kuhlman Personality Questionnaire (ZKPQ). Two samples were used in this study. The first one consisted of 336 drug dependent psychiatric patients of one hospital. The second sample included 486 control individuals. The authors used a decision tree as a tool to identify the most informative attributes. The sensitivity of 40% and the specificity of 94% were achieved for the training set. The main purpose of this research was to test if predicting drug consumption was possible and to identify the most informative attributes using data mining methods. The authors applied a decision tree to explore the differential role of personality profiles in drug consumer and control individuals. They found that the two personality factors of Neuroticism and anxiety and the ZKPQ’s Impulsivity were most relevant for drug consumption prediction. Low sensitivity does not provide application of this decision tree for real life problems. The current study seeks to test these associations with personality for different types of drugs separately using the Revised NEO Five-Factor Inventory (NEO-FFI-R) (McCrae & Costa, 2004), the Barratt Impulsiveness Scale Version 11 (BIS-11) (Stanford, Mathias, Dougherty, Lake, Anderson, & Patton, 2009), and the Impulsivity Sensation-Seeking scale (ImpSS) (Zuckerman, 1994) to assess impulsivity and sensation-seeking respectively.

Bulut & Bucak (2014) detected a risk rate for teenagers in terms of percentage who are at high risk without focusing on specific addictions. The attributes were collected by an original questionnaire, which included 25 questions. The form was filled in by 671 students. The first 20 questions asked about the teenagers’ financial situation, temperament type, family and social relations, and cultural preferences. The last five questions were completed by their teachers and concerned the grade point average of the student for the previous semester according to a 5-point grading system, whether the student had been given any disciplinary punishment so far, if the student had alcohol

Page 3: BFI & Addiction Risk

3

problems, if the student smoked cigarettes or used tobacco products, and whether the student misused substances.”

In Bulut et al’s study there are five risk classes as outputs. The authors diagnosed teenagers risk to be a drug abuser using seven types of classification algorithms: k-nearest neighbor, ID3 and C4.5 decision tree based algorithms, naïve Bayes classifier, naïve Bayes/decision trees hybrid approach, one-attribute-rule (OneR), and projective adaptive resonance theory (PART). The classification accuracy of the best classifier was reported as 98%.

Yasnitskiy et al. (2015), attempted to evaluate the individual’s risk of drug consumption and to recommend the most efficient changes in the individual’s social environment to reduce this risk. The input and output features were collected by an original questionnaire. The attributes consisted of: level of education, having friends who use drugs, temperament type, number of children in the family, financial situation, alcohol drinking and smoking, family relations (cases of physical, emotional and psychological abuse, level of trust and happiness in the family). There were 72 participants. A neural network model was used to evaluate the importance of attributes for diagnosis of the tendency to drug addiction. For several test patients (drug users) a series of virtual experiments was performed to evaluate how it is possible to control the propensity for drug addiction. For each patient the most effective change of social environment features was predicted. The recommended changes depended on the personal profile, and significantly varied for different patients. This approach produced individual bespoke advice for decreasing drug dependence.

In the current study we evaluated the individual drug consumption risk separately, for each drug. We also analysed interrelations between the individual drug consumption risks for different drugs. We applied several data mining approaches: decision tree, random forest, k-nearest neighbors, linear discriminant analysis, Gaussian mixture, probability density function estimation, logistic regression and naïve Bayes.

An online survey methodology was employed by Elaine Fehrman from March 2011 to March 2012. The data collected from the 2051 respondents included personality traits (NEO-FFI-R), impulsivity (BIS-11), sensation-seeking (ImpSS), and demographic information including country of location, ethnicity, level of education, gender, and age ranges. The data set contained information on the consumption of 18 central nervous system psychoactive drugs including alcohol, amphetamines, amyl nitrite, benzodiazepines, cannabis, chocolate, cocaine, caffeine, crack, ecstasy, heroin, ketamine, legal highs, LSD, methadone, magic mushrooms, nicotine, and Volatile Substance Abuse (VSA).

Participants were asked about substances, which were classified as central nervous system (CNS) depressants, stimulants, or hallucinogens. The depressant drugs comprised alcohol, amyl nitrite, benzodiazepines and tranquillizers, GHB, solvents and inhalants, and opiates such as heroin and methadone/prescribed opiates. The stimulants consisted of amphetamines, nicotine, cocaine powder, crack cocaine, caffeine, and chocolate. Although chocolate contains caffeine, data for chocolate was measured discretely, given that it may induce parallel psychopharmacological and behavioural effects in individuals congruent to other addictive substances (Bruinsma & Taren, 1999). The hallucinogens included 2CB, cannabis, ecstasy, ketamine, LSD, and magic mushrooms. Legal highs such as methedrone, salvia, and various legal smoking mixtures were also measured.

Page 4: BFI & Addiction Risk

The objecsensation-firstly, to secondly, profiles (contribute

The studybiased whpublishedcohorts (G2008).

Correlatiocaffeine) amphetamcorrelatedsymmetricexample, consumptconsumpt

In order approacheanalysis, GBayes). Fhighest leand Tableconsumpthypothese

The databfrom Surparticularparticipanbeing give

The studydid not reto being iendorsed respondenuseable sa

ctive of the-seeking, anidentify theto predict t

(i.e. NEO-Fe to an impr

y sample when compared by Egan, eGurrera, Ne

on analysis is not corre

mines, cannad (when thc a priori). knowledge

tion is usefution is signif

to predict es were appGaussian m

For each druevel of accue 12). The tion in relates for furthe

base was corvey Gizmoly relevant

nts were reqen.

y recruited 2espond correinattentive tusing a fi

nts who oveample of 18

e study wasnd demograe associationthe risk of dFFI-R, BISrovement of

as created bed with the et al. (2000)stor, & O'D

demonstrateelated with abis, cocain

he correlatioThere are ae of amphful for the eficantly less

the risk oplied (decis

mixture, probug, the mosuracy. Unex

creation oftion to indi

er study (see

ollected by o was empto canvassin

quired to dec

2051 particiectly to a reto the questictitious recer-claim, as 85 participa

s to assess aphic data n of persona

drug consumS-11, ImpSSf knowledge

by an anonygeneral pop) and Costa

Donnel, 2000

es that the cthat of oth

ne, ecstasy, ons are mea number ofetamines, c

evaluation os useful for t

f drug consion tree, rabability denst effective xpectedly gof classifiersividuals. The Section 3.6

2 Mate

Elaine Fehployed to ng respondeclare themse

ipants over esponse-chetions being creational dhave other

ants (male/f

4

the potention drug coality profile

mption for eaS, educatio

e concerning

ymous onlipulation, wha Jr & McC0; Terraccia

consumptionher drugs. T

legal higheasured byf strongly ascocaine, ecof ketaminethe evaluati

nsumption fandom forensity functio

subset of iood classifis provided the risk map6 and Figure

erials and

Data2.1

hrman betwegather datents’ viewselves at leas

an 18-montck built intoasked. Nin

drug, and wstudies of t

female = 943

ial effect oonsumptiones (i.e. NEOach individuon level, gg the pathwa

ne survey. hich is indicarae (2004).

ano, Löcken

n of legal dThe consums, LSD, an

y relative insymmetric c

cstasy, legae consumptiion of usage

for individust, k-neareson estimatioinput attribuiers were fothe capabilips in provie 8 there).

Methods

abase

een 2011 ana with ma, given the st 18 years

th recruitmeo the middlee of these pwhich wasthis kind (H3/942).

f personalit. The study

O-FFI-R) witual accordinender, and ays leading

It was founated from coSuch a bia

nhoff, Crum

drugs (i.e. almption of se

d mushroomnformation correlations

al highs, Lion, but knoe of the drug

uals, a numst neighborson, logistic rutes was seound for allity to evaluide a tool f

nd 2012. Aaximum ansensitive naof age prior

ent period. Oe of the scapersons wer

included pHoare & Mo

ty traits, imy had two th drug conng to their p

age). Theto drug con

nd to be sigomparison t

as is usual fom, Bienvenu,

lcohol, choceven illicit dms) is sym

gain, whics (see FigurLSD, and mowledge of gs listed abo

mber of dats, linear disregression, lected to pr drugs (seeuate the riskfor the gen

An online sunonymity, tature of drur to informe

Of these perale, so were re found to precisely to

oon, 2010).

mpulsivity, purposes:

nsumption; personality e findings nsumption.

gnificantly to the data

for clinical , & Costa,

colate and drugs (i.e. metrically ch is not e 7b). For mushroom f ketamine ove.

ta mining scriminant and naïve rovide the

e Table 11 k of drug eration of

urvey tool this being ug use. All ed consent

rsons, 166 presumed also have

o identify This led a

Page 5: BFI & Addiction Risk

5

The snowball sampling methodology recruited a primarily (93.5%) native English-speaking sample, with participants from the UK (1044; 55.4%), the USA (557; 29.5%), Canada (87; 4.6%), Australia (54; 2.9%), New Zealand (5; 0.3%) and Ireland (n = 20; 1.1%). A total of 118 (6.3%) came from a diversity of other countries, none of whom individually met 1% of the sample or did not declare the country of location. Further optimizing anonymity, persons reported their age band, rather than their exact age; 18-24 years (643; 34.1%), 25-34 years (481; 25.5%), 35-44 years (356; 18.9%), 45-54 years (294; 15.6%), 55-64 (93; 4.9%), and over 65 (18; 1%). This indicates that although the largest age cohort band were 18 to 24, some 40% of the cohort was 35 or above, which are a sample often missed in studies of this kind.

The sample recruited was highly educated, with just under two thirds (59.5%) educated to, at a minimum, degree or professional certificate level: 14.4% (271) reported holding a professional certificate or diploma, 25.5% (n = 481) an undergraduate degree, 15% (n = 284) a master’s degree, and 4.7% (n = 89) a doctorate. Approximately 26.8% (n = 506) of the sample had received some college or university tuition although they did not hold any certificates; lastly, 257 (13.6%) had left school at the age of 18 or younger.

Participants were asked to indicate which racial category was broadly representative of their cultural background. An overwhelming majority (91.2%; 1720) reported being White, 1.8% (33) stated they were Black, and 1.4% (26) Asian. The remainder of the sample (5.6%; 106) described themselves as ‘Other’ or ‘Mixed’ categories. This small number of persons belonging to specific non-white ethnicities precludes any analyses involving racial categories.

2.1.1 Personality measurements

In order to assess personality traits of the sample, the NEO-FFI-R) questionnaire was employed (Costa & MacCrae, 1992). The NEO-FFI-R is a highly reliable measure basic personality domains; internal consistencies are 0.84 (N); 0.78 (E); 0.78 (O); 0.77 (A), and 0.75 (C) (Egan, 2011). The scale is a 60-item inventory comprised of five personality domains or factors. The NEO-FFI-R is a shortened version of the Revised NEO-Personality Inventory (NEO-PI-R) (Costa & MacCrae, 1992). The five factors are: N (Neuroticism), E (Extraversion), O (Openness), A (Agreeableness), and C (Conscientiousness) with 12 items per domain. These traits can be summarized as:

1. Neuroticism – a long-term tendency to experience negative emotions such as nervousness, tension, anxiety and depression;

2. Extraversion – manifested in outgoing, warm, active, assertive, talkative, cheerful, and in search of stimulation characteristics;

3. Openness – a general appreciation for art, unusual ideas, and imaginative, creative, unconventional, and wide interests,

4. Agreeableness – a dimension of interpersonal relations, characterized by altruism, trust, modesty, kindness, compassion and cooperativeness;

5. Conscientiousness – a tendency to be organized and dependable, strong-willed, persistent, reliable, and efficient.

All of these domains are hierarchically defined by specific facets (McCrae & Costa, 1991). Egan et al. (2000) observe that the score Openness and Extraversion domains of the NEO-FFI instrument are less reliable than Neuroticism, Agreeableness, and Conscientiousness.

Page 6: BFI & Addiction Risk

6

Participants in our study were asked to read the 60 NEO-FFI-R statements and indicate on a five-point Likert scale how much a given item applied to them (i.e. 0 = ‘Strongly Disagree’, 1 = ‘Disagree’, 2 = ‘Neutral’, 3 = ‘Agree’, to 4 = ‘Strongly Agree’).

We expected that drug usage is associated with high N, and low A and C. The darker dimension of personality can be described in terms of low A, whereas much of the anti-social behaviour in non-clinical persons appears underpinned by high N and low C (Jakobwitz & Egan, 2006). The so-called ‘negative urgency’ is the tendency to act rashly when distressed, and characterized by high N, low C, and low A (Settles, Fischer, Cyders, Combs, Gunn, & Smith, 2012). The negative urgency is partially proved below for users of most of the illegal drugs. In addition, our findings suggest that O is higher for drug users.

The second measure used was the Barratt Impulsiveness Scale (BIS-11) (Stanford, Mathias, Dougherty, Lake, Anderson, & Patton, 2009). The BIS-11 is a 30-item self-report questionnaire, which measures the behavioural construct of impulsiveness, and comprises three subscales: motor impulsiveness, attentional impulsiveness, and non-planning. The ‘motor’ aspect reflects acting without thinking, the ‘attentional’ component poor concentration and thought intrusions, and the ‘non-planning’ a lack of consideration for consequences (Snowden & Gray, 2011). The scale’s items are scored on a four-point Likert scale. This study modified the response range to make it compatible with previous related studies (García-Montes, Zaldívar-Basurto, López-Ríos, & Molina-Moreno, 2009). A score of 5 usually connotes the most impulsive response although some items are reverse-scored to prevent response bias. Items are aggregated, and the higher BIS-11 scores, the higher the impulsivity level (Fossati, Ergis, & Allilaire, 2001). The BIS-11 is regarded a reliable psychometric instrument with good test-retest reliability (Spearman’s rho is equal to 0.83) and internal consistency (Cronbach’s alpha is equal to 0.83; (Stanford, Mathias, Dougherty, Lake, Anderson, & Patton, 2009; Snowden & Gray, 2011)).

The third measurement tool employed was the Impulsiveness Sensation-Seeking (ImpSS). Although the ImpSS combines the traits of impulsivity and sensation-seeking, it is regarded as a measure of a general sensation-seeking trait (Zuckerman, 1994). The scale consists of 19 statements in true-false format, comprising eight items measuring impulsivity (Imp), and 11 items gauging sensation-seeking (SS). The ImpSS is considered a valid and reliable measure of high risk behavioural correlates such as, substance misuse (McDaniel & Mahan, 2008).

2.1.2 Drug use

Participants were questioned concerning their use of 18 legal and illegal drugs (alcohol, amphetamines, amyl nitrite, benzodiazepine, cannabis, chocolate, cocaine, caffeine, crack, ecstasy, heroin, ketamine, legal highs, LSD, methadone, mushrooms, nicotine and volatile substance abuse (VSA)) and one fictitious drug (Semeron) which was introduced to identify over-claimers.

It was recognised at the outset of this study that drug use research regularly (and spuriously) dichotomises individuals as users or non-users, without due regard to their frequency or duration/desistance of drug use (Ragan & Beaver, 2010). In this study, finer distinctions concerning the measurement of drug use have been deployed, due to the potential for the existence of qualitative differences amongst individuals with varying usage levels. In relation to each drug, respondents were asked to indicate on if they never used the drug, used it over a decade ago, or in

Page 7: BFI & Addiction Risk

the last deand the sp

It can be sin last dayand ‘Usedover a decategoriesstudy we

The propoindividualrelatively Consumptamphetamamyl nitri30%. Finanumbers biased to drugs is ex

The raw sdata, by u

ecade, year, pecific recen

seen that pay’ and also td in last deccade ago’. s into the canalysed thi

ortions of drls without common (o

tion of benmines, mushite is approxally, crack, characterisethe higher

xpected to b

score for eacusage of the

T-s

Never U

Used ovdecade

month, weency of use. T

articipants wto the categ

cade’. ThereThese two

class ‘User’is binary cla

rug users diany missin

over 96%). nzodiazepinhrooms and ximately 19heroin, and

e the groupproportion

be significan

ch factor of equation be

core 10

Used

ver a ago

ek, or day. TThe seven c

Figure 1. C

who had usegories ‘Usede are two spcategories w, as the simassification.

iffered for dng data. CConsumptio

nes, ecstasy cocaine wa

%. Consumd VSA usep of respond

of drug usntly lower (

2.

f the NEO-Felow (McCr

7

This format categories of

Categories of

ed a drug thed in last weeecial categowere placedmplest versi.

different druonsumptionon of cannaand legal

as approximmption of me

is approximdents. It is sers and for(Home Offic

Data a2

FFI-R was crae & Costa,

Used in l

Used in

Used in l

Used in

Used in

captured thf drug users

drug users

e previous dek’, ‘Used inories (see Fid into the clion of binar

ugs. The datn of alcohoabis and nic

highs wasmately 36%ethadone is mately 10%,

worth to mr the populace, UK, 201

analysis

onverted in, 2004).

ast decade

last year

last month

last week

n last day

he breadth os are depicte

day belong tn last monthigure 1): ‘Nlass of ‘Nonry classifica

tabase sampol, caffeinecotine was a

less, at 41%. Consumpt

above 22%, 11%, 12%mention heration consu4).

nto a T-Scor

of a drug-usied in Figure

to the categh’, ‘Used in

Never used’ an-user’, andation. Furth

ple comprisee and chocalso high (o1%. Consumtion of keta and LSD is

%, respectivere that the

umption of t

re based on n

50.

ing career, 1.

gory ‘Used n last year’ and ‘Used d all other her in this

ed of 1885 olate was

over 67%). mption of amine and s less than ely. These sample is the illegal

normative

Page 8: BFI & Addiction Risk

Table 1 pof the sigdifference

F

NeuroticismExtraversionOpenness AgreeablenConscientio

The meanand Figurusual for Fridberg, Bienvenubelow thausers (seethe mean users from

T-scoresam

formula b

The T-scoconcerninlow scoreThe intervgroups (Umeans scothe averag49-51 ind

resents T-scgnificance oes between p

Table 1.

Factors

m n

ess ousness

ns of the NEre 2 illustratclinical samVollmer, O, & Costa,

at the group e Table 5 bevalues of th

m the sample

mple is introdbelow. The r

T-s

ore is categng each factes. The interval 65-80 in

Users and Nores belong ge is introdu

dicates neutr

core statisticof the diffepopulation a

Descriptive s

Samp

5947544743

EO-FFI-R Tte that the samples. For O'Donnell, &2008; Floryin this stud

elow). Howhis deviatione mean. For

Figu

duced for imresulting T-s

score

gorized intoor. The interval 45-55 indicates ver

Non-users) into the averauced as follral (0), and t

cs, sample merence betwand sample

statistics (Me

ple mean s

9.64 57.35 44.04 57.15 43.93 4

T-scores basample is biaexample, fo& Skosnik, y, Lynam, Mdy deviates fever, in thisn. Thereforer this purpos

ure 2. Mean T

mproved visscoressample

10raws

o five categerval 20-35 indicates avry high scornstead of eaage score inlows: the inthe interval

8

means, samween the sa

means are s

ean, SD, 95% 95% CI for

sample mean59.08, 60.20 46.88, 47.83 53.55, 54.52 46.62, 47.69 43.25, 44.61

sed on normased with reor schizoph2011) and

Milich, Leufrom the pos sample, gre, it is convse we introd

T-score NEO

sibility and contains a m

wscore-sasamplestan

gories to suindicates ve

verage scoreres. This stuach individunterval. A sunterval 44-451-56 indic

mple standardample meanstatistically

CI). P-value

SD

12.4110.4810.7511.8811.06

mative data aespect to thehrenia (Gurr

drug use (Tukefeld, & Copulation noroups of dru

venient to studuce the T-s

-FFI-R (red l

simplicity omean of 50 a

amplemeanndarddevia

ummarise anery low scoes. The interudy considerual’s score. ubdivision o9 indicates

cates modera

d deviation n and the psignificant

is calculated Popul

me55555

are depictede populationrera, NestorTerraccianoClayton, 20

orm in the saug users andudy deviatioscore in the

line)

of comparisand a standa

nscoreiation

n individuaores. The intrval 55-65 irs the meanAll values r

of the T-scomoderatelyately high (+

(SD), and epopulation m(p<0.001).

by t-test lation ean

p

0 0 0 0 0

d in Figure 2n. This typer, & O'Donno, Löckenho002). It is hame directiod non-usersons of userssample.

son, calculatard deviatio

50.

al’s personaterval 35-45indicates hi

n T-scoresamp

related to thresample near

y low (−), th+).

evaluation mean. All

p-value

<0.001 <0.001 <0.001 <0.001 <0.001

2. Table 1 e of bias is nel, 2000; off, Crum, ighlighted on as drug s differ for s and non-

ted by the on of 10.

ality score 5 indicates gh scores. ple for two he groups’ r to that of he interval

Page 9: BFI & Addiction Risk

The meanusers) forusually usand groupt-test is egroups ofprobabilitSAS 9.4.

There arecategorica

One of thcorrelationcoefficiencorrelationcontinuoufollows tdrawbackthresholds

Let us hacategoryThe samp

The simp‘average’ the value average p

The polycpolychoriapproach calculatio

n of the T-scr each drugsed as a meps of Users mployed tof users and ty value gre

many data al features to

he widely un (Lee, Poo

nts further in is based o

us random vthe normal

ks: it definess defined ar

ave the ordi. The emp

ple estimatio

plest methovalue in eawith averagrobability is

choric coefc coefficienis the usag

on the catego

coressample fo. Any diffe

easure of disand Non-us

o estimate thnon-users

eater, than 0

mining meo use these

2.3.1

used techniqon, & Bentlis used to con suggestio

values with fdistributio

s the threshe different f

inal feature pirical estimon of thresho

d of ordinaach intervalge probabilis

fficients, cants calculatege of the saories’ value

for all factorrences betwssimilarity isers for eachhe significafor each N

0.1, is consi

Inpu2.3

ethods to womethods.

1 Ordina

ques to analer, 1995; Mcalculate pron that valufixed thresh

on. Unfortuolds of discfor different

with catmation of proolds are eva

al feature q. There sevity: if thresh

Φ

lculated oned by usingame threshos.

9

rs enables coween the mein scores. Th drug is exance of the

NEO-FFI-R idered as no

ut feature t

ork with con

al features

alyse categoMartinson &rincipal comues of ordinholds. Furtheunately, polcretization bt pairs of att

tegories ,obability ofaluating as:

,

quantificatioveral variantholds an

2

n base of qug the maximolds for all

omparisonsean T-score

The relationsamined by cdifferencesscores. The

on-significan

transform

ntinuous da

quantifica

orical data & Hamdan, mponents, etnal feature aermore, thislychoric cobut not the vtributes.

, … , , af category

on is to usts of ‘averagnd t define

.

uantificationmum likeliho

pairs of att

of groups (esample for usship betweecomparing m

s between me differencent. The anal

ation

ata. It is nec

ation

is the calcu1971). The tc. The techare the resus latent contorrelation tevalues for e

and with nu is

e thresholdge’ value. Fe the interva

n (2), haveood approatributes and

(both Users sers and noen NEO-FFImean T-sco

mean T-scores with an alysis is perf

cessary to qu

ulation of pmatrix of p

hnique of plt of discrettinuous randechniques each categor

umber of ca/ , where

ds (1) and For this studal of categor

less likelihch. The me

d explicit fo

and Non-on-users is I-R scores

oresample. A resample for associated formed by

uantify all

polychoric polychoric polychoric tization of dom value have two ry and the

ases of ∑ .

(1)

select the dy we use ry , then

(2)

hood than erit of this ormula for

Page 10: BFI & Addiction Risk

We cannolocation afeatures wThis proce

1.

2. 3. 4.

The proceFigure 3 spoints.

As an alteof nomina0 (if ‘falseEthnicity Mixed-W

Fig

ot use techniand ethnicitwe implemeedure includ

Exclude nprincipal 2010) in Kaiser’s rCalculateCalculateThe numcomponen

ess of nomishows that p

ernative varal variables:e’): UK, Cais transform

White/Black,

gure 3. CatPC

2.3.2

iques descrity because ented the tedes four step

nominal feacomponentspace of r

rule (Guttme the centroie the first pri

merical valunt.

inal featurepoints corre

riant of nom: ‘country’ ianada, USA,med into sevAsian, Blac

CA quantifica

2 Nomina

ibed above categories

echnique ofps:

atures from ts (Pearson, retained inp

man, 1954; Kid of each caincipal com

ue for each

e quantificatesponding t

minal featureis transform, Other (couven binary fck and Mixe

ation of ‘Coun

10

al feature

to quantify of these fea

f nonlinear

the set of i1901; Gorb

put feature. Kaiser, 1960ategory in p

mponent of ccomponen

tion for theo the UK c

e quantificatmed into seveuntry), Austrfeatures: Med-Black/As

ntry’ on the p

quantifica

nominal feaatures are uCatPCA (L

input featurban & ZinovTo select i).

projection oncentroids. nt is the pro

e feature ‘Category are

tion we useen binary feralia, Repubixed-White/sian.

plane of the fir

ation

atures such unordered.

Linting & v

res and calcvyev, 2008; informative

n selected p

ojection of

Country’ is d located ver

dummy coeatures with blic of Irelan/Asian, Wh

rst two princ

as gender, cTo quantify

van der Koo

culate the inGorban & Zcomponen

rincipal com

f its centroi

depicted in ry far from

oding (Gujarvalues 1 (ifnd and New

hite, Other (

ipal compone

country of y nominal oij, 2012).

nformative Zinovyev,

nts we use

mponents.

id on this

Figure 3. any other

rati, 2003) f ‘true’) or

w Zealand; ethnicity),

ents

Page 11: BFI & Addiction Risk

In this stuprincipal feature whtogether wso on.

The seconinformativof an attrimportantdefined thimportancthe attribuminimal vprocedurethe best on

The third the simpleseveral ste

1. De

2. Seco

3. If go

4. Se5. If 6. Se

sea7. Su8. Se

tri1.

For this st

We used quantificaquantificaNeuroticisSensation

udy, we usevariables (

hich explainwith the pre

nd approachve principalibute is deft principal he thresholdce is greaterute is triviavalue of ime stops if thne.

approach uest thresholdeps:

efine the nu/ and

earch usualomponent. T

. the

o to step 8. earch the att

thet the value arch the prin

ubtract projeearch the attvial. If ther

tudy, we app

two generation descriation’. We sm, Extrav

n-Seeking. T

ed three diff(McCabe, 1ns the maximeviously sele

h was doubl componentfined as macomponents

d of importar than the thal. If there mportance. Were are no t

used was spd for sparse

umber of fead the Kaiserl principal The iterative

en all the inf

tribute with hen there is nof the foundncipal compection on foutributes wit

re are trivial

plied severa

ral sets of bed in subcalled this

version, OThe second

2.3.3 Inp

ferent techn984). The

mal fractionected featur

ble Kaiser’sts by Kaiser

aximum of as representeance as1/√hreshold of iare trivial aWe removetrivial attrib

parse PCA (e PCA. The

atures , var threshold f

componente algorithm

formative c

non-zero cono trivial atd coefficienponent undeund componth zero coefl attributes t

Ri2.4

al classificat

f input featsections ‘O

s set of fepenness, Aset was the

11

put feature

niques of inpmain idea o

n of the datares explains

s selection. r’s rule (Guabsolute valed by norm√ , where importance attributes thed the worsbutes. This a

(Naikal, Yasearching o

ariance of dafor coefficiet and calcgives the p

omponents

oefficients wttributes. Gont to zero. Bler this condinent from dfficients in then remove

isk evaluat

tion method

tures. The Ordinal featueature ‘origAgreeablenee set of proj

e ranking

put feature rof this appr

a variance, ths the maxim

Calculate puttman, 1954lue of the c

malized vec is the numthen this at

hen the worst attribute algorithm ra

ang, & Sastrof each spar

ata , the Kents 1/√culate the vrincipal com

are found.

with least abo to step 7. lock changeition. Go to

data and go teach founde it from th

tion metho

ds which pro

first wasures quanti

ginal’. It iness, Conscjections of i

ranking. Throach is to hen select th

mal fraction

principal co4; Kaiser, 19correspondinctors. For amber of attrittribute is inrst attributeand repeat

anks attribu

ry, 2011). Inrse principal

Kaiser thres√ . variancemponents in

Remove the

bsolute valu

es of this attstep 4.

to step 2. d componenhe set of attr

ods

ovide risk ev

the set of ification’, ancludes ageientiousnesinput featur

he first technselect first

he next featof data vari

omponents a960). The imng coordina

attribute seleibutes. If thenformative. O is the attri the proced

utes from the

n this studyl componen

shold for co

explainedn descendin

e last comp

e .

tribute coeff

nt. This attrributes and

valuation as

input featuand ‘Nomine, gender, es, Impulsivres onto the

nique was the input

ture which iance, and

and select mportance ates in the ection we e attribute Otherwise ibute with dure. This e worst to

y, we used nt contains

omponents

d by this g order of

ponent and

ficient and

ributes are go to step

s well.

ures after nal feature education, vity, and

e first four

Page 12: BFI & Addiction Risk

12

principal components. This set of input features we called ‘projected’. Also, we used the subsets of original and projected sets.

2.4.1 K nearest neighbors (KNN)

The basic concept of KNN is the class of object is the class of the majority of its k nearest neighbors (Clarkson, 2005). This algorithm is very sensitive to distance definition. There are several commonly used variants of distance for KNN: Euclidean distance; Minkovsky distance; and distances calculated after some transformation of the input space.

In this study, we used three distances: the Euclidean distance, the Fisher’s transformed distance (Fisher, 1936) and the adaptive distance (Hastie & Tibshirani, 1996). Moreover, we used a weighted voting procedure with weighting of neighbors by one of the standard kernel functions (Li & Racine, 2007).

The KNN algorithm is well known (Clarkson, 2005). The adaptive distance transformations algorithm is described in (Hastie & Tibshirani, 1996). KNN with Fisher’s transformed distance is less known. For them, the following options are defined: k is the number of nearest neighbors, is the kernel function, and kf is the number of neighbors which are used for the distance transformation. To define the risk of drug consumption we have to do the following steps:

1. Find the kf nearest neighbors of the test point. 2. Calculate the covariance matrix of kf neighbors and Fisher’s discriminant direction. 3. Find the k nearest neighbors of the test point using the distance along Fisher’s discriminant

direction among earlier found kf neighbors. 4. Define the maximal distance from the test point to k neighbors. 5. For each class calculate the membership of this class as a sum of the points’ weights. The

weight of a point is the ratio: the value of the kernel function of distance from this point to the test point divided by the maximal distance defined at step 4.

6. Drug consumption risk is defined as the ratio of the positive class membership to the sum of memberships of all classes.

The adaptive distance version implements the same algorithm but uses the other transformation on step 2 and the other distance on step 3. The Euclidean distance version simple defines and omits 2 and 3 steps of algorithm.

We test the KNN versions, which differ by:

The number of nearest neighbors, which is varied between 1 and 20; The set of input features; One of the three distances: Euclidean distance, adaptive distance and Fisher’s distance; The kernel function for adaptive distance transformation; The kernel functions for voting.

2.4.2 Decision tree

The decision tree approach is a method that constructs a tree like structure, which can be used to choose between several courses of action. The binary decision trees are used in this study. The decision tree is comprised of nodes and leaves. Every node can have a child node. If a node has no

Page 13: BFI & Addiction Risk

13

child node, it is called a leaf or a terminal node. Any decision tree contains one root node, which has no parent node. Each non terminal node calculates its own Boolean expression (with the value ‘true’ or ‘false’). According to the result of this calculation, the decision for a given sample would be delegated to the left child node (‘true’) or to the right child node (‘false’). Each leaf (terminal node) has a label which shows how many samples of the training set belong to each class. The probability of each class is estimated as a ratio of the number of samples in this class to the total number of samples in the leaf.

There are many methods for developing a decision tree (Rokach & Maimon, 2010; Quinlan, 1987; Breiman, Friedman, Stone, & Olshen, 1984; Gelfand, Ravishankar, & Delp, 1991; Dietterich, Kearns, & Mansour, 1996; Kearns & Mansour, 1999; Sofeikov, Tyukin, Gorban, Mirkes, Prokhorov, & Romanenko, 2014). We use the methods based on information gain, Gini gain, and DKM gain. Let us consider one node and one binary input attribute which can take values 0 or 1. Let us use notation: is the number of cases in the node, is the number of categories of the target feature, is the number of th category cases with input attribute value in the node, the number of th category cases in the node is . , the number of cases with the input attribute value in the node is . ∑ , , … , is the vector of frequencies with input attribute value , and ., … , . is the vector of frequencies with any input attribute value. To form a tree we select the base function for information criterion among

, ,

, 1 ,

and

, 2 , where is the vector of frequencies and is the sum of the elements of vector . The DKM can be applied to a binary target feature only. The value of the criterion is the gain of base function:

, . , . , .

There are several approaches to use real valued inputs in decision trees. The commonly used approach suggests the binning of real valued attribute before forming the tree. In this study we implemented ‘on the fly’ binning: the best threshold is searched in each node for each real valued attribute and then this threshold is used to bin these feature in this node. The best threshold depends from the split criteria used (information gain, Gini gain, or DKM gain).

Another possibility we employ is usage of Fisher’s discriminant to define the best linear combinations of the real valued features (Fisher, 1936) in each node. The pruning techniques are applied to improve the tree.

The specified minimal number of instances in the tree’s leaf is used as a criterion to stop node splitting. Each leaf of the tree cannot contain fewer instances than a specified number.

For the case study we tested the decision trees, which differ by:

Page 14: BFI & Addiction Risk

14

The three split criterion (information gain, Gini gain or DKM gain); The use of the real-valued features in the splitting criteria separately or in linear combination

by Fisher’s discriminant; The set of the input features; The minimal number of instances in the leaf, which varied between 3 and 30.

2.4.3 Linear discriminant analysis

We used Fisher's linear discriminant for binary version of problem (Fisher, 1936). We calculate the mean of points of th class and covariance matrix of th class for both classes. Then we calculate the discriminating direction as

.

Each point is projected onto discriminating direction by calculation of dot product , . The threshold to separate two classes is calculated by finding the maximum of relative information gain, Gini gain, or DKM gain. This method cannot be used for problems of risk evaluation.

For the case study we tested the LDA which differ by one of the three criteria (information gain, Gini gain or DKM gain) which were used to define the threshold and the set of input feature.

2.4.4 Gaussian mixture

Gaussian mixture is the method to estimate the probability under assumption that each category of target feature has the multivariate normal distribution (Dinov, 2008). For each category we should estimate the covariance matrix and inverse it. The primary probability of belonging to the th category is:

2 | | ,

where is a prior probability of ith category, is dimension of input space, is the mean of point of th category, is tested point, is covariance matrix of ith category and | | is determinant of it. The final probability of belonging to category is calculated as

/ .

The prior probabilities are estimated as fractions of the th category cases among all cases. For the binary problem, we also used the varied multiplier to correct priors.

For the case study, we tested the Gaussian mixtures, which differ by the set of input features and corrections of prior probabilities.

2.4.5 Probability density function estimation

We implemented the radial-basis functions method (Buhmann, 2003) for probability density function estimation (Scott, 1992).

Page 15: BFI & Addiction Risk

15

The number of probability densities to estimate is equal to the number of categories of the target feature. Each probability density function is estimated separately by using nonparametric techniques. The prior probabilities are estimated by database: / where is the number of cases with thcategory of the target feature and is the total number of cases in the database.

For each point, k nearest neighbors from the database is defined. These k points are used to estimate the radius of neighborhood as a maximum of distances from data point to each of k neighbors. The centre of one of the kernel functions is placed in the data point (Li & Racine, 2007). The integral of any kernel function over the whole space is equal to 1. The total probability of th category is the integral of sum of kernel functions and is equal to but the total probability of each category have to be equal to the prior probability . It means that sum of kernel functions has to be divided by . The probability of each category in arbitrary point is estimated as result of division of sum of values of kernel function which are placed in data points that correspond to records of this category by .

The following steps are to be used to evaluate the risk of each category: (i) the probability functions for all categories are estimated and (ii) the risk of each category is defined as a ratio of the probability of this category to the sum of all probabilities.

We tested the PDFE versions which differ by:

The number of the nearest neighbors (it is varied between 5 and 30); The set of the input features; The kernel function which was placed in each data points.

2.4.6 Logistic regression

We implemented the weighted version of logistic regression (Hosmer & Lemeshow, 2004). This method can be used for binary problem only. The log likelihood estimation of the regression coefficients is used. The weights of categories are defined as the fractions of th category cases among all cases. Logistic regression gives only one result because there is no option to customize the method except the set of input features.

2.4.7 Naïve Bayes

We implemented the standard version of naïve Bayes (Russell & Norvig, 1995). All attributes which containd less than or equal to 20 different values were interpreted as categorical and the standard contingency tables were calculated for such attributes. Calculated contingency tables are used to estimate the conditional probability. Attributes which contain more than 20 different values were interpreted as continuous. For continuous attributes the mean and the variance were calculated instead of the contingency tables. For each value of the output attribute we calculated the isolated mean and variance. The conditional probability of a specified outcome and a specified value of the attribute was calculated as the value of probability density function for normal distribution at point with mean and variance, which were calculated for outcome . This method has no customized options and is to be tested on different sets of input features.

Page 16: BFI & Addiction Risk

Random fdecision tclassifier

are most popu

In random(Hastie, Tamong alpredictors

Random fexpectatio

The foresany two trthe strengIncreasingalgorithm2011).

A numbeclassificatthe sensitfrom the 50% were

There are validationused for atechniqueet al. (200

The descrnormative(2004), 95distributioasymmetr

forests are ptrees that groconsisting independenular class at

m forest, eacTibshirani, &ll variables.s randomly c

forests try ton (Hastie, T

st error rate rees in the f

gth of each ig the streng

m builds hun

r of differetion, the senivity and sp‘completely

e not consid

several appn and Leaveall tests in te like decisio09).

ription statie data (mea5% confideon shape cory of the dis

proposed byow in randoof a collect

nt identicallyt input ” (B

ch tree is co& Friedman. In a randochosen at th

to improve Tibshirani, &

depends onforest. Increindividual tgth of the ndreds of d

2.4.9 C

ent criteria nsitivity andpecificity asy random gudered.

proaches to e-One-Out Cthis study. Ton tree and

istics for fians and staence intervaompared to tribution) fo

2.4.8

y Breiman (omly selectetion of tree y distributedBreiman, 20

onstructed un, 2009). Inom forest,

hat node (Li

on bagging& Friedman

n two thingseasing the ctree in the foindividual

decision tree

Criterion o

exist for sed specificitys the integrauess’ classif

test the quaCross ValidThere are sorandom for

3

D3.1

ive factorsandard devials, kurtoses

normal distor NEO-FFI

16

Random

(2000) for bed subspacestructured

d random ve01).

sing a differn standard treach node aw & Wien

g by ‘de-corn, 2009).

s (Breiman,correlation iforest. A tree

trees decrees and com

of the best

election of ty were emplal criterion. fier. Classif

ality of classdation (LOOome problemrest. These p

Results

Descriptiv

is presenteiations in ths (kurtosis itribution) anI-R for the f

forest

building a ps of data (Bclassifiers ectors and e

rent bootstrrees, each nis split usi

ner, 2002)

rrelating’ th

, 2001). Thencreases thee with a low

eases the fombines them

method se

the best claloyed as the

This criterfiers with se

sifier: usageOCV) (Arloms with claproblems ar

ve statistics

ed in Table he populatis a measurend ‘skewnefull sample.

predictor enBiau, 2012).

, ,each tree ca

rap sample fnode is spliing the bes

he trees. Ea

e first is thee forest errow error rate orest error

into a sing

election

assifier. In e primary crion selects ensitivity or

e of isolatedot & Celissassifier qualre considere

s

2: means, ion followine of flatness

esses’ (skew

nsemble wit“Random f

1, . . . sts a unit vo

from the oriit using the t among a

ach tree has

e correlationor rate. Theis a strong rate. Randogle model (

this study friteria, and tthe classifier specificity

d test set, n-e, 2010). Lity estimati

ed in details

standard dng McCraes/‘peakedne

wness is a m

th a set of forests is a where the ote for the

iginal data best split subset of

s the same

n between second is classifier.

om Forest (Williams,

for binary the sum of er furthest y less than

-fold cross LOOCV is on for the by Hastie

deviations, e & Costa ess’ of the measure of

Page 17: BFI & Addiction Risk

Tabl

Fa

NExOpAgCo

No

Pearson's associatiofactors haAgreeablesignifican

Table 4 ddrug. Signrelationshas followswhereas arisk of useon Agreeascore of E

le 2. Descripti

Factors

NeuroticismExtraversionOpenness AgreeableneConscientiou

actors

euroticism xtraversion penness greeableness onscientiousneote: p-value is

correlationon between tave no signeness and Ontly correlate

Com3.2

demonstratesnificant diffhip between s: an increaan increase e. Thus for ableness an

Extraversion

ive statistics (Normatives,

s Mean

m 23.92n 27.58

33.76ess 30.87usness 29.44

Ta

Neurotic

-0.4320.017

-0.215ess -0.398s the probabil

if

n coefficienttwo factors.nificant cor

Openness (r=ed in the sam

mparison o

s the mean ferences of p

personalityase in score

in the scoreach drug,

nd Conscienn is drug spe

(Means, sampKurtoses, Sk

n SD N

Me2 9.14 16.8 6.77 29.6 6.58 31.7 6.44 32.4 6.97 33.

able 3. PCC f

cism Extrav-0.43

2** 7 0.235** 0.158** 0.31lity to observdata uncorre

t (PCC) r i. PCC for alrrelation: (1=0.033 p=0.mple.

f mean pe

T-scoresamp

personality y profile ands of Neurotres of Agredrug users s

ntiousness wecific (non-u

17

ple Standard kewnesses) for

ormative 9

ean SD .83 7.36 .29 6.46 .29 6.12 .41 5.42 .26 6.3

for NEO-FFI-PCC

version Open32** 0.0

0.236** 59** 0.018** -0.0e by chance telated: * p<0.

is employedll pairs of fa1) Neurotici155). Howe

rsonality t

ple NEO-FFIfactor scored risk of drticism and Oeableness ascored high

when compauniversal).

Deviations (Sr NEO-FFI-R

95% Cl for Me

23.51, 24.3327.27, 27.8833.47, 34.0530.58, 31.1629.12, 29.75

-R for raw daC for scales nness Agree017 -0.236** 0. 0.

033 060* 0.the same or gr01; ** p<0.00

d as a meaactors are prism and Opever, all othe

traits for d

I-R factors fes exist betwrug consumpOpenness eand Conscieer on Neuro

ared to drug

SD), ConfidenR for raw data

ean Kurtosis

3 -0.55 8 0.06 5 -0.27 6 0.13 5 -0.17

ata

eableness Co.215** .159** .033

.249** reater correla01.

asure of theresented in Tpenness (r=er pairs of p

drug users

for users anween these gption can gentails an in

entiousness oticism and g non-users.

nce Intervals a

s Skewness

0.11 -0.27 -0.30 -0.26 -0.38

onscientiousne-0.398** 0.318** -0.060* 0.249**

ation coefficie

e strength oTable 3. Tw=0.017 p=0personality f

and non-u

nd non-usergroups. Theenerally be

ncrease in rientails a deOpenness, The influen

(CI),

ess

ent

of a linear wo pairs of 0.471); (2) factors are

users

rs for each e universal

described isk of use, ecrease in and lower nce of the

Page 18: BFI & Addiction Risk

18

Table 4. Mean T-scoresample and 95% CI for T-scoresample mean for groups of Users and Non-users. P-value is the probability to observe by chance the same or greater differences between means if the means are the same. The

green background corresponds to p-value less than 0.001, the yellow background corresponds to p-value between 0.001 and 0.01, the blue background corresponds to p-value between 0.01 and 0.05, the red

background corresponds to p-value between 0.05 and 0.1 and the white background corresponds to p-value greater than 0.1 which is non-significant.

Factors User Non-user

p-value Mean T-score 95% CI for mean Mean T-score 95% CI for mean

Alcohol # 1817 68 N 50.13 49.67,50.59 48.19 45.77, 50.61 0.116 E 50.06 49.60, 50.52 50.04 47.61, 52.42 0.988 O 50.04 49.58, 50.51 48.81 46.45, 51.17 0.318 A 49.93 49.47, 50.39 52.51 50.26, 54.77 0.036 C 49.94 49.48, 50.40 53.31 51.05, 55.56 0.006

Amphetamines # 679 1206 N 51.71 50.95, 52.46 49.14 48.58, 49.69 <0.001 E 49.71 48.89,50.53 50.26 49.72, 50.80 0.251 O 53.05 52.34, 53.77 48.28 47.72, 48.84 <0.001 A 48.39 47.60, 49.18 50.94 50.39, 51.48 <0.001 C 47.04 46.29, 47.80 51.76 51.22, 52.30 <0.001

Amyl nitrite # 370 1515 N 50.78 49.78, 51.79 49.89 49.37, 50.39 0.122 E 50.97 49.95, 51.99 49.84 49.33, 50.35 0.052 O 51.45 50.47, 52.43 49.65 49.14, 50.15 0.002 A 48.69 47.65, 49.72 50.35 49.84, 50.85 0.004 C 48.08 46.10, 49.07 50.54 50.04, 51.05 <0.001

Benzodiazepines # 679 1206 N 52.83 52.12, 53.54 48.15 47.59, 48.72 <0.001 E 49.07 48.31, 49.83 50.74 50.19, 51.30 <0.001 O 52.66 51.97, 53.34 48.17 47.59, 48.75 <0.001 A 48.28 47.53, 49.02 51.22 50.67, 51.78 <0.001 C 47.70 46.98, 48.41 51.69 51.12, 52.25 <0.001

Cannabis # 1265 620 N 51.08 50.52, 51.65 47.98 47.25, 48.71 <0.001 E 49.75 49.17, 50.33 50.70 49.98, 51.41 0.053 O 52.48 51.96, 52.99 44.95 44.20,45.70 <0.001 A 48.84 48.28,49.40 52.42 51.69,53.15 <0.001 C 48.15 47.60,48.70 53.92 53.27,54.65 <0.001

Page 19: BFI & Addiction Risk

19

Factors User Non-user

p-value Mean T-score 95% CI for mean Mean T-score 95% CI for mean

Chocolate # 1850 35 N 50.06 49.60, 50.51 50.29 46.36, 54.21 0.894 E 50.05 49.59, 50.51 50.80 47.71, 53.89 0.660 O 50.05 49.59, 50.51 47.37 44.42, 50.32 0.117 A 50.05 49.59, 50.50 48.66 44.61, 52.70 0.416 C 50.03 49.58, 50.49 51.57 47.87, 55.27 0.366

Cocaine # 687 1198 N 51.85 51.09, 52.60 49.04 48.48, 49.59 <0.001 E 50.33 49.55, 51.12 49.91 49.35, 50.46 0.374 O 52.60 51.89, 53.30 48.51 47.94, 49.08 <0.001 A 47.71 46.93, 48.50 51.34 50.81,51.88 <0.001 C 47.49 46.76, 48.22 51.53 50.98, 52.09 <0.001

Caffeine# 1848 37 N 50.06 49.61, 50.52 50.08 46.59,53.57 0.991 E 50.13 49.67, 50.59 46.76 43.82, 49.69 0.0429 O 50.11 49.65, 50.57 44.59 41.67,47.52 <0.001 A 49.99 49.54, 50.45 51.11 47.71, 54.51 0.504 C 49.99 49.53, 50.44 53.59 50.79, 56.40 0.029

Crack# 191 1694 N 53.08 51.64, 54.52 49.72 49.25, 50.20 <0.001 E 48.80 47.33, 50.27 50.20 49.73, 50.68 0.066 O 52.91 51.60, 54.21 49.67 49.19, 50.15 <0.001 A 47.02 45.46, 48.58 50.36 49.89, 50.83 <0.001 C 46.19 44.70, 47.68 50.50 50.03, 50.96 <0.001

Ecstasy # 751 1134 N 51.30 50.58, 52.02 49.24 48.67, 49.82 <0.001 E 50.56 49.80, 51.31 49.73 49.17, 50.30 0.081 O 53.62 52.96,54.28 47.60 47.03, 48.17 <0.001 A 48.49 47.75, 49.23 51.03 50.47, 51.60 <0.001 C 47.30 46.59, 48.00 51.89 51.33, 52.45 <0.001

Heroin # 212 1673 N 54.60 53.29, 55.92 49.49 49.01, 49.96 <0.001 E 48.42 46.94, 49.90 50.27 49.80, 50.74 0.011 O 54.25 53.04 , 55.47 49.46 48.98, 49.94 <0.001 A 45.53 44.00, 47.06 50.59 50.12, 51.05 <0.001 C 45.91 44.55, 47.26 50.59 50.11, 51.06 <0.001

Ketamine # 350 1535 N 51.42 50.40, 52.43 49.75 49.25, 50.26 0.005 E 50.36 49.23, 51.48 49.99 49.50, 50.49 0.542 O 53.87 52.90, 54.84 49.12 48.62, 49.62 <0.001 A 47.80 46.67, 48.94 50.53 50.04, 51.01 <0.001 C 46.86 45.81, 47.92 50.79 50.30, 51.28 <0.001

Page 20: BFI & Addiction Risk

20

Factors User Non-user

p-value Mean T-score 95% CI for mean Mean T-score 95% CI for mean

Legal highs # 762 1123 N 51.49 50.77,52.22 49.09 48.52, 49.66 <0.001 E 49.71 48.94, 50.47 50.30 49.75, 50.86 0.206 O 54.30 53.67, 54.92 47.08 46.51, 47.65 <0.001 A 48.61 47.86, 49.36 50.98 50.42, 51.53 <0.001 C 46.98 46.27, 47.72 52.14 51.60, 52.68 <0.001

LSD # 557 1328 N 50.87 50.05, 51.69 49.72 49.18, 50.26 0.023 E 50.07 49.16, 50.98 50.06 49.54, 50.58 0.986 O 55.25 54.54, 55.96 47.80 47.27, 48.33 <0.001 A 48.44 47.57, 49.31 50.68 50.16, 51.21 <0.001 C 47.54 46.71, 48.36 51.12 50.59, 51.65 <0.001

Methadone # 417 1468 N 53.41 52.47, 54.35 49.11 48.61, 49.62 <0.001 E 47.97 46.88, 49.05 50.66 50.17, 51.15 <0.001 O 53.77 52.86, 54.69 48.93 48.42, 49.44 <0.001 A 47.07 46.03, 48.10 50.86 50.37, 51.35 <0.001 C 46.21 45.22, 47.20 51.15 50.66, 51.64 <0.001

Magic Mushrooms # 694 1191 N 50.73 49.98,51.47 49.67 49.10, 50.24 0.027 E 50.15 49.35,50.96 50.01 49.47, 50.55 0.765 O 54.36 53.71,55.01 47.46 46.90, 48.02 <0.001 A 48.55 47.78,49.32 50.88 50.33, 51.43 <0.001 C 47.54 46.81,48.27 51.53 50.97, 52.08 <0.001

Nicotine # 1264 621 N 50.97 50.41, 51.52 48.22 47.45, 48.99 <0.001 E 49.98 49.41, 50.54 50.24 49.48, 50.99 0.599 O 51.47 50.92,52.01 47.01 46.26, 47.77 <0.001 A 49.20 48.65, 49.75 51.69 50.92, 52.46 <0.001 C 48.64 48.08, 49.20 52.95 52.24,53.67 <0.001

VSA # 230 1655 N 52.88 51.57, 54.20 49.67 49.19,50.15 <0.001 E 48.96 47.45, 50.47 50.22 49.74,50.69 0.075 O 54.20 53.00, 55.41 49.42 48.93,49.90 <0.001 A 47.30 45.92, 48.68 50.40 49.92,50.87 <0.001 C 45.22 4 3.88, 46.56 50.73 50.26,51.20 <0.001

Page 21: BFI & Addiction Risk

21

The introduction of moderate subcategories of T-scoresample enables the separation of drugs into five groups, as presented in Table 5. The description of each drug can be named by five moderate subcategories with the moderate profile (N, E, O, A, C). Firstly, the group with the profile (0, 0, 0, 0, 0) includes the users of the licit drugs alcohol, chocolate and caffeine. Thus, T-scoresample for all factors for licit drug consumers does not significantly differ from the sample mean. Secondly, the group of drugs with the profile (0, 0, +, −, −) includes the users of amyl nitrite, LSD, and magic mushrooms. Thirdly, nicotine users form their own group with the profile (0, 0, +, 0, −). Fourthly, the largest group of drugs users with the profile (+, 0, +, −, −) includes the users of amphetamines, benzodiazepines, cannabis, cocaine, ecstasy, ketamine and legal highs. Finally, the group with the profile (+, −, +, −, −) includes the users of crack, heroin, VSA and methadone.

Table 5. Moderate subcategories of T-scoresample with respect to the sample mean for groups of users. The white background corresponds to a neutral score (0), the green background corresponds to

moderately high score (+) and a pink background corresponds to moderately low score (−). Neuroticism Extraversion Openness Agreeableness Conscientiousness

Alcohol, Chocolate, and Caffeine 0 0 0 0 0

Amyl nitrite, LSD, and Magic Mushrooms 0 0 + − −

Nicotine 0 0 + 0 −

Amphetamines, Benzodiazepines, Cannabis, Cocaine, Ecstasy, Ketamine, and Legal highs + 0 + − −

Crack, Heroin, VSA, and Methadone + − + − −

Table 5 demonstrates that for all drugs the mean values of Neuroticism and Openness for groups of Users is neutral (0) or moderately high (+). Groups of User of illicit drugs can be seen as moderately low (−) on Agreeableness and Conscientiousness. Groups of licit drug users (alcohol, chocolate, caffeine, and nicotine) have neutral (0) score of Agreeableness and Conscientiousness, apart from nicotine users, who have moderately low (−) scores of Conscientiousness. For the groups of users of crack, heroin, VSA and methadone, the score of Extraversion is moderately low (−). However, for groups of Users of other drugs, the score of Extraversion is neutral (0).

Table 5 is based on the size of differences between the mean T-scoressample for users from the sample mean. Table 6 represents the groups of drugs with the same set of T-scoressample which significantly differ from the sample mean. The difference is considered as significant if the p-value is less than 0.01. Three out of six groups correspond to licit drugs. Chocolate has no significant differences for all factors. Alcohol has a significant difference only for Conscientiousness, and caffeine has a significant difference only for Openness. Amyl nitrite, LSD, and magic mushrooms form a group containing significant differences in relation to three factors: Openness, Agreeableness, and Conscientiousness. The next group contains amphetamines, cannabis, cocaine, crack, ecstasy, heroin, ketamine, legal highs, nicotine, and VSA, and have significant differences in relation to the following four factors: Neuroticism, Openness, Agreeableness, and Conscientiousness. The last groups are formed by two users of two drugs: benzodiazepines and methadone. The groups of benzodiazepines and methadone users form a significantly different sample mean in all factors.

Page 22: BFI & Addiction Risk

a

Fn

GreeNeur

Amph

Graphs ofFigure 5. all drugs group ‘Nimushroomusers withsample m

a)

Figure 4. Avenorm mean ((continued on

Table 6. Pen backgrounroticism

hetamines, Ca

X

X

f mean valuA single drin the sameicotine Userms. Figure 5h respect to

mean (i.e. the

erage personaleft column) a the next page

Profile of signnd and symbo

Extraversi

A

annabis, Coca

X

ues of factorug was choe group are r’ are simila5 represents o the popule right colum

ality profiles fand T-scoress

e)

nificantly diffol ‘X’ correspion

Amyl nitrite, L

aine, Crack, E

Benzodiaz

ors scores foosen to be pvery similaar to the seT-score gra

lation normmn) for alco

for groups ofsample with res

22

ferent of meaponds to signi

Openness Chocolate

Alcohol

Caffeine

X LSD and mag

X Ecstasy, Hero

X zepines and M

X

or groups olotted for ea

ar. Nicotine cond groupaphs of pers

m mean (theohol, LSD, c

b)

f users and nospect to the sa

an for groups ficant deviati

Agree

gic Mushroom

oin, Ketamine

Methadone

f drugs defiach group, dis not plotte

p consisting sonality facte left columcannabis, an

on-users in T-ample means

of User and Nions with p-vaableness

ms

X e, Legal highsX

X

fined in Tabdue to the fed, since thof amyl nit

tors for groumn) and thend heroin.

-scores with r(right column

Non-user. alue less than

Conscientio

X

X s, Nicotine an

X

X

ble 5 are prefact that the he scale scortrite, LSD aups of userse with resp

respect to then) for: a) & b

n 0.01 ousness

d VSA

esented in shapes of

res for the and magic s and non-ect to the

population b) Alcohol.

Page 23: BFI & Addiction Risk

c

e

g

c)

e)

g)Figure 5 (cto the pop

continuation)pulation norm

. Average perm mean (left c

for: c) &

rsonality profcolumn) and T& d) LSD, e) &

23

d)

f)

h)files for groupT-scoressample

& f) Cannabis

ps of users anwith respect

s, and g) & h)

nd non-users to the sample

) Heroin.

in T-scores we means (righ

with respect ht column)

Page 24: BFI & Addiction Risk

The usage

Table 13 correlationfrom a tochance themploy a significantechniqueused to ccorrelationBH step-ucoefficien

However,example, p-value isconsidere| | 0.4.can be covery stron

Figure 6 dhighs, LScorrelationketamine

Crack, bethree otheor weakly

Relative Iattributes RIG is zeri.e. the va(entropy) possible tRIGs is s0.15. FiguRIG Y|Xseen that LSD and Figure 6 from Figu

e (use or non

shows PCCns are signi

otal of 153 he same or g

multi-testinnce of the ce – Bonferrocontrol Falns. 115 corup procedu

nts.

, a significanthe correlat

s equal to 0d as an im. Figure 6 s

onsidered weng if| | 0

demonstrateSD, and magns betweenusage (

enzodiazepiner drugs usay correlated

Information(Mitchell, ro for indepalue of RIGin drug 1

to estimate significant, bure 7a dem|/min RIGin Figure 7amagic musexcept for

ure 6.

C3.3

n-use) of ea

Cs, which aificant, due pairs have pgreater corrng approach

correlation (oni correctise Discove

rrelation coeure with thr

nt correlatiotion coeffici.0005), but

mportant asssets out all seak if0.40.5.

es that the ugic mushroo

n cannabis 0.393).

nes, heroin,ages. Alcohowith all oth

n Gain (RIG1997). The

pendent attriG for drug 1

usage, whicthe significbut contain

monstrates ‘G X|Y ,RIGa a group coshrooms usa

ketamine u

Correlation

ach drug is c

are computeto the fact p-values le

relation coefh when test(Benjamini ion – and Bery Rate (Fefficients arreshold of

on does not ient for alcothe value oocitaion. Wsignificant i| |; mediu

usage of ampoms correlaand ketami

, methadoneol, amyl nitr

her drug use

G) is widelygreater the

ibutes. RIG 1 usage fromch can be rance of RIG

ns small valusymmetric’

G Y|X 0onsisting ofages are corusage. Asym

24

n between

considered a

ed between that the sam

ess than 0.0fficient for ting 153 pa& Hochber

BH step-up FDR), to esre significanFDR equal

necessarilyohol usage aof this coeff

We consideridentified coum if0.45

phetaminesates with aline usage (

e, and nicotrite, chocola.

y used in davalue of RIis not symm

m usage of removed if G, but it isues. Figure RIGs (we

0.2.) Figure f amphetamirrelated eachmmetric RI

n different

as a single B

the drug umple size is1 (p-valueuncorrelate

airs of drug rg, 1995). Wprocedure (stimate thent with Bonls 0.01 def

y imply a strand amyl nificient is eqr correlationorrelations g| | 0.4;

, cannabis, l other drug( 0.302)

tine usagesate, caffeine

ata mining tIG, the strometric. It is drug 2 is ethe value olikely that (7 presents call RIG(X7b demonstines, cannabh with eachIGs illustrat

drug usag

Boolean feat

usage pairs.s 1885. 121is the prob

ed variablesusages in

We apply th(Benjamini

e genuine snferroni corrfines 127 si

rong associatrate usage

qual to 0.074ns with absgreater than; strong if 0

cocaine, ecsgs in the sa) and betw

are correlate and VSA

to measure nger is the a measure o

equal to a frof drug 2 u(as for the Pall pairs w

X|Y) symmtrates asymmbis, cocaineh. This groute pattern s

ge

ture.

The major pairs of drability to o). In realityorder to esthe most con& Hochbe

significancerected p-vaignificant c

ation or cauis significan4, and thus olute value

n 0.4. The c0.5 | |

stasy, ketamame group,

ween legal h

ted with onusage is un

dependenceindicated co

of mutual inraction of uusage is knPCC) the m

with RIG grmetric if |RI

metric RIG, ecstasy, le

up is the samsignificantly

rity of the rug usages observe by y, it has to timate the nservative rg, 1995),

e of these lue 0.001.

correlation

sality. For nt (i.e. the cannot be

es of PCC correlation 0.45; and

mine, legal excluding highs and

ne, two, or ncorrelated

e between orrelation.

nformation uncertainty own. It is

majority of reater than IG X|Y. It can be

egal highs, me that in y different

Page 25: BFI & Addiction Risk

Figure 7

significan

The resuquantificacontain thby the prin

Results ofeature seeffect regUSA. Furuse. To consumptWe calcusubsample

7. Pairs of druntly asymmet

k

lts of the ation, and inhe list of attrncipal varia

f implemenelection we garding counrthermore, inunderstand

tion we comulated the e. For both

F

ug usages wittric RIG. In f

knowledge of

principal vn Table 8 foributes in orables approa

ntation sparcan exclud

ntry of locatncluding co

the reasonmpare the st

p-value ofh divisions i

Figure 6. Stro

th high relativfigure B) arroLSD usage ca

I3.4

variables cor dummy qrder from beach are show

se PCA arede ethnicitytion. Only t

ountry in perns of thesetatistics for f coincidinginto subsam

25

ng drug usag

ve informatioow from LSDan decrease u

Input featu

calculation quantificatioest to worst.wn in the sam

e representey from furthtwo countrirsonality mee two counthe subsam

g distributimples we ob

ge correlation

on gain: A) mD usage to heruncertainty in

ure rankin

are represeon of nomin. For compame table.

ed in Table her consideres are inforeasures addntries’ impo

mples: UK –on of persbtained the

ns.

more or less syroin usage, forn heroin usage

ng

ented in Tnal features.arison the lis

9 and Tabration. Therrmative (in ds not much ortance for

– non-UK ansonality me

same resul

ymmetric RIGr example, me.

Table 7 for Table 7 anst of attribut

ble 10. As are is more our sample)to predictio

r predictionnd USA – neasurementslts: all inpu

G and B) eans that

r CatPCA nd Table 8 tes ranked

a result of intriguing ): UK and on of drug n of drug non-USA. s in each ut features

Page 26: BFI & Addiction Risk

26

have the significantly different distributions with a 99.9% confidence level for UK and non-UK subsamples and likewise for USA – non-USA subsamples. It means that sample UK and non-UK are biased. The same situation is with the USA and non-USA samples. We excluded the country variable from further study.

Table 7. Results of feature ranking. Data table includes Country of location and Ethnicity quantified by CatPCA. FVE is the fraction of explained variance. CFVE is the cumulative FVE. The least informative

features are lower located. Principal variable ranking

Double Kaiser’s ranking Attribute FVE CFVE

Sensation-seeking 0.192 0.192 Extraversion Neuroticism 0.153 0.345 Conscientiousness Agreeableness 0.106 0.451 Sensation-seeking Education 0.104 0.555 Neuroticism Openness 0.092 0.647 Impulsivity Conscientiousness 0.088 0.735 Openness Extraversion 0.076 0.811 Agreeableness Age 0.073 0.884 Age Impulsivity 0.055 0.939 Education Country 0.037 0.976 Country Gender 0.021 0.997 Gender Ethnicity 0.003 1.000 Ethnicity

Table 8. Results of feature ranking. Data table includes dummy coded Country of location and Ethnicity. FVE is the fraction of explained variance. CFVE is the cumulative FVE. The least informative features are lower

located. Principal variable ranking

Double Kaiser’s ranking Attribute FVE CFVE

Sensation-seeking 0.186 0.186 Extraversion Neuroticism 0.149 0.335 Conscientiousness Agreeableness 0.103 0.438 Sensation-seeking Education 0.101 0.539 NeuroticismOpenness 0.089 0.627 Impulsivity Conscientiousness 0.086 0.714 Openness Extraversion 0.074 0.787 Agreeableness Age 0.071 0.858 Age Impulsivity 0.053 0.911 Education UK 0.027 0.938 UK Gender 0.020 0.959 USA USA 0.013 0.972 Gender White 0.010 0.982 Other (country) Other (country) 0.005 0.988 White Canada 0.004 0.991 Other (ethnicity) Other (ethnicity) 0.003 0.994 Canada Black 0.002 0.995 AsianAustralia 0.002 0.997 Mixed-White/Black Asian 0.001 0.998 Australia Mixed-White/Black 0.001 0.999 Black Republic of Ireland 0.000 1.000 Mixed-White/Asian Mixed-White/Asian 0.000 1.000 Republic of Ireland New Zealand 0.000 1.000 New Zealand Mixed-Black/Asian 0.000 1.000 Mixed-Black/Asian

Page 27: BFI & Addiction Risk

Table 9

Step

1

2

Table 10. T

Step

1

2

3

The first described selection first four p

Table 11 and specifour princbenzodiaz

There is nused featuand Consc

If the featfeatures ‘results pre

Table 9, bgender, wby other m

9. The result o

p # ocompo

5

4

The result of s

p # ocompo

8

5

4

step for thein section

are presenteprincipal co

shows that ficity are grcipal compzepines, coc

no single moures is 6 outcientiousnes

ture is consiby fact’. Aesented in T

but is used which is conmethods (se

of sparse PCA

of onents 5 Gen

4

No AgeAgrsee

sparse PCA f

of onents

8 CanZeaWh

5 Gen

4

No AgeAgrsee

Th3.5

e risk evalu‘Risk evalu

ed in Table omponents.

for all drureater than 7onents accu

caine, ketam

ost effectivet of 10 and tss only and

idered to be Age proves Table 7 and

in the bestsidered as ne Table 7).

A feature ranquan

nder and Ethremoved at

e, Educareeablenessking, Count

feature rankin

nada, Otheraland, Mixehite/Black, Ander, UK anremoved at

e, Educareeablenessking

he solution

uation is couation meth11 for the o

ugs except a70%. It is auracy is gr

mine, methad

e classifier the least numprovides se

informativenot to be t

t classifiersnon-informa

27

king. Data tantified by Cat

Re

hnicity ttributes. Thation, Ne, Conscientry

ng. Data table

Re

r (country)ed-White/AAsian, Blacknd USA ttributes. Thation, Ne, Conscien

n to the pr

onstruction oods’. and se

original inpu

alcohol, cocan unexpectreater than done, nicotin

employing mber is 2. Tensitivity of

e when usedthe most in

for 14 druative by Spa

able includes CPCA. emoved attr

he retained seuroticism, ntiousness,

e includes du

emoved attr

), AustraliaAsian, Whitk and Mixed

he retained seuroticism, ntiousness,

roblem of c

of classifierelected the ut space and

caine and medly high ain the origne, and VSA

all input feaThe decision

80.63% and

d for classifnformative m

ugs. The secarse PCA an

Country of lo

ributes

set of attribuExtravers

Impulsivity

mmy coded C

ributes

a, Republic e, Other (ed-Black/Asi

set of attribuExtravers

Impulsivity

classificati

rs. We testebest one. R

d in Table 1

magic mushrccuracy. In

ginal input A.

atures. The n tree for crad specificity

fier formingmeasure in

cond most nd as one of

ocation and E

utes: sion, Opy and Sen

Country and E

of Irelandethnicity), ian

utes: sion, Opy and Sen

ion

ed the eightResults of th

2 for the sp

rooms, the sthe space ospace for s

maximum nack uses Exy of 78.57%

, it is possibaccordance

used input f the least in

Ethnicity

penness, nsation-

Ethnicity.

d, New Mixed-

penness, nsation-

t methods he method pace of the

sensitivity of the first six drugs:

number of xtraversion

%.

ble to rank e with the

feature is nformative

Page 28: BFI & Addiction Risk

28

Table 11. The best results of the drug users classifiers in the original input space. Symbol ‘X’ means used input feature. Results are calculated by LOOCV.

Target feature Method Age

Edu

catio

n

Neu

rotic

ism

Ext

rave

rsio

n

Ope

nnes

s

Agr

eeab

lene

ss

Con

scie

ntio

usne

ss

Impu

lsiv

ity

Sen

satio

n-se

ekin

g

Gen

der

Sen

sitiv

ity

Spe

cifi

city

Sum

Alcohol LDA X X X X X 75.34% 63.24% 138.58% Amphetamines DT X X X X X X 81.30% 71.48% 152.77% Amyl nitrite DT X X X X 73.51% 87.86% 161.37% Benzodiazepines DT X X X X X X 70.87% 71.51% 142.38% Cannabis DT X X X X X X 79.29% 80.00% 159.29% Chocolate KNN X X X X 72.43% 71.43% 143.86% Cocaine DT X X X X X 68.27% 83.06% 151.32% Caffeine KNN X X X X X 70.51% 72.97% 143.48% Crack DT X X 80.63% 78.57% 159.20% Ecstasy DT X X X 76.17% 77.16% 153.33% Heroin DT X X X 82.55% 72.98% 155.53% Ketamine DT X X X X X 72.29% 80.98% 153.26% Legal highs DT X X X X X X 79.53% 82.37% 161.90% LSD DT X X X X X X 85.46% 77.56% 163.02% Methadone DT X X X X X 79.14% 72.48% 151.62% Magic Mushrooms DT X X 65.56% 94.79% 160.36% Nicotine DT X X X X 71.28% 79.07% 150.35% VSA DT X X X X X X 83.48% 77.64% 161.12%

Table 12. The best results of the drug users classifiers in the space of the first four principal components. Symbol ‘X’ means used input feature. Results are calculated by LOOCV.

Target feature Method PC 1 PC 2 PC 3 PC 4 Sensitivity Specificity Sum Alcohol GM X X 54.71% 70.59% 125.29% Amphetamines DT X 74.37% 77.78% 152.15% Amyl nitrite DT X X 61.35% 79.47% 140.82% Benzodiazepines DT X 64.63% 92.20% 156.83% Cannabis DT X X 78.02% 77.26% 155.28% Chocolate LDA X X X 57.35% 62.86% 120.21% Cocaine DT X 72.20% 85.23% 157.42% Caffeine LDA X X X 62.55% 78.38% 140.93% Crack DT X X 78.01% 77.10% 155.11% Ecstasy DT X X X 73.10% 73.46% 146.56% Heroin DT X 76.89% 74.72% 151.60% Ketamine DT X 73.43% 93.55% 166.98% Legal highs PDFE X X X X 75.98% 76.14% 152.12% LSD DT X 99.46% 61.30% 160.76% Methadone DT X X 79.38% 84.47% 163.85% Magic Mushrooms LDA X X X X 76.37% 69.10% 145.47% Nicotine DT X X X 79.67% 74.56% 154.23% VSA DT X X 83.48% 84.23% 167.71%

Results presented in Table 11 and Table 12, were calculated by LOOCV. It should be stressed that different methods of testing give different sensitivity and specificity. The widely used methods

Page 29: BFI & Addiction Risk

include caentire samother.

For examspecificityuse the de71.16%, cpresented

Figure 8.nodes

procedurenon-user c

alculation omple (if it is

mple, a decy different fecision tree calculated uin the Tabl

Decision trees are depicteds described in

class it is 1.66.

f a test set e sufficiently

cision tree from LOOCfor ecstasy,

using the whle 11, show

e for ecstasy. d with dashedn section “Inp. Columns ‘W

errors (the hy large, so-c

formed foCV (Hastie, , depicted inhole samplesensitivity 7

Input featured border. Valuput feature tr

Weighted’ presu

29

holdout metcalled ‘naïve

or entire saTibshirani,

n the Figuree. Results o76.17% and

es are: age, Sues of age, SSransformationesent normalizum of weight

thod), k-folde’ method),

ample can & Friedma

e 8. It has sef LOOCV f

d specificity

ensation-seekS and gender n”. Weight ofzed weights: ts.

d cross-vali random sub

have accuran, 2009). Fensitivity 78for a tree w77.16%.

king (SS) andare calculatedf each case of weight of eac

idation, testibsampling,

racy, sensitor illustratio8.56% and s

with the sam

d gender. Nond by quantific

f user class is ch class is divi

ing on the and many

tivity and on we can specificity

me options,

n-terminal cation 1.91 and of ided by the

Page 30: BFI & Addiction Risk

It is interbasis of aapparent p

Successfudrug consAlexandra2014b). TSensationand Figuraged betwsensation-9D) illusthypothese

Figure 9.

esting that age, genderpsychopatho

ul constructisumption foakis, Slater,The risk mn-seeking anre 9B) it canween 25-34-seeking havtrates qualites for furthe

Risk map of

the risk of r and Sensaology assoc

ion of a clasor each indi, Tuli, & G

map of ecstand gender) in be observe4 years, is sve significatatively the er study.

ecstasy consuand C) and

ecstasy conation-seekingiated with it

3.6

ssifier proviividual, alon

Gorban, 2014asy consumis depicted ed that a cosignificantly

antly less rissame shap

umption for: D) decision t

30

nsumption cg (see Tablts use..

Risk eva6

des us by anng with the4a; Mirkes

mption on thin Figure 9

onsiderable ay less for fsk. Decisione. The risk

A) & C) fematree based ma

can be evalule 11, Figur

aluation

n instrumene creation oE. , Alexanhe basis of. At the PDarea of highfemales, butn tree based

maps prov

ale and B) & ap; E) Legend

uated with hre 8, and F

nt for the evaof a map ofndrakis, Slaf three inpu

DFE based rh risk (indict young marisk maps (

vide a tool

D) male; A) &d of colours.

high accuraFigure 9), an

aluation of tf risk (Mirkater, Tuli, &ut features risk maps (Fcated in blueales with thFigure 9C afor the gen

& B) PDFE b

acy on the nd has no

the risk of kes E. M., & Gorban,

(i.e. age, Figure 9A e) for men he highest and Figure neration of

based map

Page 31: BFI & Addiction Risk

31

4 Discussion

This study, along with other studies, demonstrates a strong correlation between personality profiles and drug use (Sutina, Evans, & Zonderman, 2013; Haider et al., 2002; Vollrath & Torgersen, 2002; Terracciano, Löckenhoff, Crum, Bienvenu, & Costa, 2008; Roncero et al., 2014). The studies indicate that individuals involved in drug usage are more likely to have higher scores for Neuroticism, and low scores for Agreeableness and Conscientiousness.

Sutina et al. (2013) demonstrated that the relationship between low Conscientiousness and drug consumption is moderated by poverty; low Conscientiousness is a stronger risk factor for illicit drug usage among those with relatively higher socio-economic status. Roncero et al. (2014) showed that personality factors (Neuroticism-Anxiety and Aggression-Hostility) have an important impact on the risk of experiencing psychotic symptoms. Vollrath & Torgersen (2002) observe that low Conscientiousness and high Neuroticism (or high Extraversion) are particularly associated with engagement in multiple risky health behaviours.

An individual’s personality profile plays a role in becoming a drug user. Terracciano et al. (2008) demonstrate that the personality profile for the users and non-users of nicotine, cannabis, cocaine, and heroin are associated with a Five-Factor Model of personality samples from different communities. They also highlight the links between the consumption of these drugs and low Conscientiousness. Turiano et al. (2012) found a positive correlation between Neuroticism and Openness, and drug use. On the other hand, increasing scores for Conscientiousness and Agreeableness decreases risk of drug use. Previous studies demonstrate that participants who use drugs including alcohol and nicotine have a strong positive correlation between Agreeableness and Conscientiousness and a strong negative correlation for each of these factors with Neuroticism (Haider, et al., 2002; Stewart & Devine, 2000) (This is common, as is the N/A/C complex for externalising.)

Our study reveals that all five personality factors of Neuroticism, Openness, Agreeableness, Extraversion, and Conscientiousness are relevant traits to be taken into account when assessing risk of drug consumption. This study has established that drug users for all 18 drugs score moderately high (+) or neutral (0) on the personality profile concerning Neuroticism and Openness, and moderately low (−) on both Agreeableness and Conscientiousness. However, when it comes to groups of legal drugs (i.e. alcohol, chocolate, caffeine, and nicotine) users score (0) on Agreeableness and Conscientiousness, apart from nicotine users, whose score of Conscien-tiousness is moderately low (−).

The impact of the Extraversion score is drug specific for two groups: (1) groups of problematic drug users of consuming crack, heroin, VSA, and methadone, who score (−) on Extraversion to (0) their Introversion; (2) for groups of other drugs (alcohol, amphetamines, amyl nitrite, benzodiazepines, cannabis, chocolate, cocaine, caffeine, ecstasy, ketamine, legal highs, LSD, magic mushrooms, nicotine, and VSA), users score (0) on Extraversion (see Table 5).

In addition, higher scores for Neuroticism and Openness lead to increased drug use, whereas lower scores for Conscientiousness and Agreeableness cause an increase of drug consumption. O is marked by curiosity and open-mindedness (and correlated with intelligence), and it is therefore understandable why higher O may be sometimes associated with drug use (Wilmoth, 2012). As a result, it can be seen how personality factors affect drug use. These findings have been confirmed

Page 32: BFI & Addiction Risk

32

by our study. Our results improve the knowledge concerning the pathways leading to drug consumption.

There are, however, limitations to this study. This sample is biased with respect to the general population, but it can still be used for risk evaluation. A further limitation concerns the fact that a number of the findings may be culturally specific.

Unexpectedly successful classifiers have been created for all drugs, thus providing the possibility of evaluating individuals for the risk of drug consumption. The creation of risk maps forms a tool for the generation of hypotheses for further study.

Page 33: BFI & Addiction Risk

33

Table 13. PCCs between drug consumptions

Drug use

Alc

ohol

Am

phet

amin

es

Am

yl n

itrit

e

Ben

zodi

azep

ine

Can

nabi

s

Cho

cola

te

Coc

aine

Caf

fein

e

Cra

ck

Ecs

tasy

Her

oin

Ket

amin

e

Leg

al h

ighs

LS

D

Met

hado

ne

Mus

hroo

ms

Nic

otin

e

VS

A

Alco. 1.000 0.0741 0.0741 0.0512 0.1191 0.0991 0.1111 0.1571 0.027 0.1051 0.033 0.0782 0.0612 0.0692 -0.007 0.0711 0.1131 0.0463 Amp. 0.0741 1.000 0.3721 0.4631 0.4691 0.013 0.5801 0.1061 0.3231 0.5971 0.3591 0.4121 0.4811 0.4901 0.4151 0.4811 0.3431 0.3041 Amy. 0.0741 0.3721 1.000 0.2261 0.2921 0.028 0.3811 0.0602 0.1441 0.3921 0.1371 0.3451 0.2681 0.2131 0.0841 0.2711 0.1961 0.1301 Ben. 0.0512 0.4631 0.2261 1.000 0.3541 0.006 0.4281 0.0552 0.3261 0.3831 0.3951 0.3031 0.3481 0.3521 0.4681 0.3661 0.2601 0.2941 Can. 0.1191 0.4691 0.2921 0.3541 1.000 0.0463 0.4531 0.1131 0.2161 0.5211 0.2171 0.3021 0.5261 0.4211 0.2991 0.4971 0.5331 0.2371 Cho. 0.0991 0.013 0.028 0.006 0.0463 1.000 0.006 0.1221 0.032 0.040 -0.026 0.035 0.017 0.029 0.007 0.024 0.037 -0.021 Coc. 0.1111 0.5801 0.3811 0.4281 0.4531 0.006 1.000 0.0991 0.3961 0.6331 0.4141 0.4541 0.4451 0.4421 0.3541 0.4801 0.3621 0.2771 Cof. 0.1571 0.1061 0.0602 0.0552 0.1131 0.1221 0.0991 1.000 0.035 0.1071 0.026 0.0583 0.0851 0.0751 0.039 0.1001 0.1453 0.0533 Cra. 0.027 0.3231 0.1441 0.3261 0.2161 0.032 0.3961 0.035 1.000 0.2801 0.5091 0.2551 0.2031 0.2681 0.3671 0.2761 0.1911 0.2781 Ecs. 0.1051 0.5971 0.3921 0.3831 0.5211 0.040 0.6331 0.1071 0.2801 1.000 0.3011 0.5111 0.5861 0.5991 0.3151 0.5991 0.3701 0.2891 Her. 0.033 0.3591 0.1371 0.3951 0.2171 -0.026 0.4141 0.026 0.5091 0.3011 1.000 0.2741 0.2371 0.3471 0.4941 0.3061 0.1851 0.2931 Ket. 0.0782 0.4121 0.3451 0.3031 0.3021 0.035 0.4541 0.0583 0.2551 0.5111 0.2741 1.000 0.3931 0.4621 0.2461 0.4361 0.2431 0.1921 Leg. 0.0612 0.4811 0.2681 0.3481 0.5261 0.017 0.4451 0.0851 0.2031 0.5861 0.2371 0.3931 1.000 0.5191 0.3341 0.5751 0.3641 0.3141 LSD 0.0692 0.4901 0.2131 0.3521 0.4211 0.029 0.4421 0.0751 0.2681 0.5991 0.3471 0.4621 0.5191 1.000 0.3431 0.6801 0.2891 0.2991 Met. -0.007 0.4151 0.0841 0.4681 0.2991 0.007 0.3541 0.039 0.3671 0.3151 0.4941 0.2461 0.3341 0.3431 1.000 0.3431 0.2341 0.2771 Mus. 0.0711 0.4811 0.2711 0.3661 0.4971 0.024 0.4801 0.1001 0.2761 0.5991 0.3061 0.4361 0.5751 0.6801 0.3431 1.000 0.3241 0.2531 Nic. 0.1131 0.3431 0.1961 0.2601 0.5331 0.037 0.3621 0.1453 0.1911 0.3701 0.1851 0.2431 0.3641 0.2891 0.2341 0.3241 1.000 0.2211 VSA 0.0463 0.3041 0.1301 0.2941 0.2371 -0.021 0.2771 0.0533 0.2781 0.2891 0.2931 0.1921 0.3141 0.2991 0.2771 0.2531 0.2211 1.000

1 p-value <0.001, 2 p-value <0.01, 3 p-value <0.05

Page 34: BFI & Addiction Risk

34

5 Bibliography

Arlot, S., & Celisse, A. (2010). A survey of cross‐validation procedures for model selection. Statistics 

surveys, 4, 40‐79. 

Beaglehole, R., Bonita, R., Horton, R., Adams, C., Alleyne, G., Asaria, P., et al. (2011). Priority actions for 

the non‐communicable disease crisis. The Lancet, 377 (9775), 1438‐1447. 

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful 

approach to multiple testing. Journal of the Royal Statistical Society, 57(1), 289‐300. 

Biau, G. (2012). Analysis of a random forests model. The Journal of Machine Learning Research, 13(1), 

1063‐1095. 

Bickel, W. K., Johnson, M. W., Koffarnus, M. N., MacKillop, J., & Murphy, J. G.‐6. (2014). The behavioral 

economics of substance use disorders: reinforcement pathologies and their repair. Annual 

review of clinical psychology, 10, 641‐677. 

Bogg, T., & Roberts, B. (2004). Conscientiousness and health‐ related behaviors: A meta‐analysis of the 

leading behavioral contributores to mortality. Psychological Bulletin, 130(6), 887‐919. 

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5‐32. 

Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and Regression Trees. 

Belmont, Calif: Wadsworth International Group. 

Buhmann, M. D. (2003). Radial Basis Functions: Theory and Implementations. Cambridge: University 

Press, Vol. 12. 

Bulut, F., & Bukak, I. O. (2014). An urgent precaution system to detect students at risk of substance 

abuse through classification algorithms. Turkish Journal of Electrical Engineering & Computer 

Sciences, 22(3), 690‐ 707. 

Clarkson, K. L. (2005). Nearest‐neighbor searching and metric space dimensions. Nearest‐neighbor 

methods for learning and vision: theory and practice, 15–59. 

Cleveland, M. J., Feinberg, M. E., Bontempo, D. E., & Greenberg, M. T. (2008). The role of risk and 

protective factors in substance use across adolescence. Journal of Adolescent Health, 43(2), 

157–164. 

Costa, P. T., & MacCrae, R. R. (1992). Revised NEO‐Personality Inventory (NEO‐PI‐R) and the NEO‐Five 

Factor Inventory (NEO‐FFI): Personality manual. Odessa,FL: Psychological assessment Resources. 

Dietterich, T., Kearns, M., & Mansour, Y. (1996). Applying the weak learning framework to understand 

and improve C4.5 In ICML. (pp. 96–104). Proc. of the 13th Int. Conf. on Machine Learning, San 

Francisco: Morgan Kaufmann. 

Dinov, I. D. (2008). Expectation maximization and mixture modeling tutorial. ExStatistics Online 

Computational Resource. 

Page 35: BFI & Addiction Risk

35

DordiNejad, F. G., & Shiran, M. A. (2011). Personality Traits and Drug Usage among Addicts. Literacy 

Information and Computer Education Journal (LICEJ), 2(2), 402‐405. 

Dubey, C., Arora, M., Gupta, S., & Kumar, B. (2010). Five Factor Correlates: A Comparison of Substance 

Abusers and Non‐Substance Abusers. Journal of the Indian Academy of Applied Psychology, 

36(1), 107‐114. 

Egan, V. (2011). Individual differences and antisocial behaviour. In Handbook of Individual Differences 

(ed.) A. Furnham, S. Stumm, and K. Petredies(Oxford: Blackwell‐Wiley), 512‐537. 

Egan, V., Deary, I., & Austin, E. (2000). The NEO‐FFI: Emerging British norms and an item‐level analysis 

suggest N, A and C are more reliable than O and E. Personality and Individual Differences, 29(5), 

907 ‐ 920. 

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2), 

179–188. 

Flory, K., Lynam, D., Milich, R., Leukefeld, C., & Clayton, R. (2002). The relations among personality, 

symptoms of alcohol and marijuana abuse, and symptoms of comorbid psychopathology: 

Results from a community sample. Experimental and Clinical Psychopharmacology, 10(4), 425–

434. 

Fossati, P., Ergis, A. M., & Allilaire, J. F. (2001). Problem‐solving abilities in unipolar depressed patients: 

comparison of performance on the modified version of the Wisconsin and the California sorting 

tests. Psychiatry Research, 104(2), 145–156. 

Fridberg, D. J., Vollmer, J. M., O'Donnell, B. F., & Skosnik, P. D. (2011). Cannabis users differ from non‐

users on measures of personality and schizotypy. Psychiatry research, 186(1), 46–52. 

García‐Montes, J. M., Zaldívar‐Basurto, F., López‐Ríos, F., & Molina‐Moreno, A. (2009). The role of 

personality variables in drug abuse in a Spanish university population. International journal of 

mental health and addiction, 7(3), 475–487. 

Gelfand, S. B., Ravishankar, C. S., & Delp, E. J. (1991). An iterative growing and pruning algorithm for 

classification tree design. IEEE Transaction on Pattern Analysis and Machine Intelligence , 13(2), 

163–174. 

Gorban, A. N., & Zinovyev, A. (2010). Principal manifolds and graphs in practice: from molecular biology 

to dynamical systems. International journal of neural systems, 20(3), 219–232. 

Gorban, A. N., & Zinovyev, A. Y. (2008). Principal graphs and manifolds. In A. Gorban, B. .Kégl, D. 

Wunsch, & A. Zinovyev (Eds.), Principal Manifolds for Data Visualisation and Dimension 

Reduction (pp. 28‐59). Berlin‐Heidelberg‐NewYork: Springer. 

Gujarati, D. N. (2003). Basic econometrics (4 ed.). McGraw‐Hill: Inc.,US. 

Gurrera, R. J., Nestor, P. G., & O'Donnel, B. F. (2000). Personality Traits in Schizophrenia: Comparison 

with a Community Sample. The Journal of nervous and mental disease, 188(1), 31‐35. 

Guttman, L. (1954). Some necessary conditions for common‐factor analysis. Psychometrika, 19(2), 149‐

161. 

Page 36: BFI & Addiction Risk

36

Haider, A. H., Edwin, D. H., MacKenzie, E. J., Bosse, M. J., Castillo, R. C., Travison, T. G., et al. (2002). The 

use of the NEO‐five factor inventory to assess personality in trauma patients: a two‐year 

prospective study. Journal of Orthopaedic trauma, 16(9), 660‐667. 

Hastie, T., & Tibshirani, R. (1996). Discriminant adaptive nearest neighbor classification. Pattern Analysis 

and Machine Intelligence, IEEE Transactions on, 18(6), 607–616. 

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2 ed.). New York: 

Springer. 

Hoare, J., & Moon, D. (2010). Drug misuse declared: Findings from the 2009/10 British Crime Survey. 

London: Home Office. Home Office Statistical Bulletin 13/10. 

Home Office, UK. (2014). Drug misuse: findings from the 2013 to 2014 CSEW second edition. 

https://www.gov.uk/government/statistics/drug‐misuse‐findings‐from‐the‐2013‐to‐2014‐csew. 

Hosmer, D. W., & Lemeshow, S. (2004). Applied Logistic Regression (2 ed.). John: Wiley & Sons. 

Jakobwitz, S., & Egan, V. (2006). The dark triad and normal personality traits. Personality and Individual 

Differences, 40 (2), 331‐339. 

Kaiser, H. (1960). The application of electronic computers to factor analysis. Educational and 

Psychological Measurement, 20, 141‐151. 

Kearns, M., & Mansour, Y. (1999). On the boosting ability of top‐down decision tree learning algorithms. 

Journal of Computer and System Sciences, 58(1), 109–128. 

Lee, S. Y., Poon, W. Y., & Bentler, P. M. (1995). A two‐stage estimation of structural equation models 

with continuous and polytomous variables. British Journal of Mathematical and Statistical 

Psychology, 48(2), 339–358. 

Li, Q., & Racine, J. S. (2007). Nonparametric Econometrics: theory and practice. Princeton : University 

Press. 

Liaw, A., & Wiener, M. R. (2002). Classification and Regression by Random Forest. R news, 2(3), 18‐22. 

Linting, M., & van der Kooij, A. (2012). Nonlinear Principal Components Analysis With CATPCA: A 

Tutorial. Journal of Personality Assessment, 94(1), 12‐25. 

Martinson, E. O., & Hamdan, M. A. (1971). Maximum likelihood and some other asymptotically efficient 

estimators of correlation in two way contingency tables. Journal of Statistical Computation and 

Simulation, 1(1), 45‐54. 

McCabe, G. p. (1984). Principal Variables. Technometrics, 26(2), 137–144. 

McCrae, R. R., & Costa, P. T. (1991). The NEO Personality Inventory: Using the Five‐Factor ModeI in 

Counseling. Journal of Counseling & Development, 69(4), 367‐372. 

McCrae, R. R., & Costa, P. T. (2004). A contemplated revision of the NEO Five‐Factor Inventory. 

Personality and Individual Differences, 36(3), 587‐596. 

Page 37: BFI & Addiction Risk

37

McDaniel, S., & Mahan, J. (2008). An examination of the ImpSS scale as a valid and reliable alternative to 

the SSS‐V in optimum stimulation level research. Personality and Individual Differences, 44(7), 

1528–1538. 

McGinnis, J. M., & Foege, W. H. (1993). Actual causes of death in the United States. Journal of the 

American Medical Association, 270(18), 2207‐2212. 

Mirkes, E. M., Alexandrakis, I., Slater, K., Tuli, R., & Gorban, A. N. (2014a). Computational diagnosis and 

risk evaluation for canine lymphoma. Computers in biology and medicine, 53, 279‐290. 

Mirkes, E., Alexandrakis, I., Slater, K., Tuli, R., & Gorban, A. (2014b). Computational diagnosis of canine 

lymphoma. J.Phys.:Conf.Ser., 490, 012135. 

Mitchell, T. M. (1997). Machine learning.1997. New York: Burr Ridge, IL: McGraw Hill,45. 

Naikal, N., Yang, A., & Sastry, S. (2011). Informative feature selection for object recognition via sparse 

PCA. In Proceedings of the 13th International Conference on Computer Vision (ICCV), 818‐825. 

Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. philosophical 

magazine, 2(6), 559‐572. 

Quinlan, J. R. (1987). Simplifying decision trees. International journal of man‐machine studies, 27(3), 

221–234. 

Ragan, D. T., & Beaver, K. M. (2010). Chronic offenders:A life‐course analysis of marijuana users. Youth 

and Society, 42, 174‐198. 

Rokach, L., & Maimon, O. (2010). Decision trees. In O. Maimon, & L. Rokach (Eds.), Data Mining and 

Knowledge Discovery Handbook (pp. 165–192). Berlin: Springer. 

Roncero, C., Daigre, C., Barral, C., Ros‐Cucurull, E., Grau‐López, L., Rodríguez‐Cintas, L., et al. (2014). 

Neuroticism Associated with Cocaine‐Induced Psychosis in Cocaine‐Dependent Patients: A 

Cross‐Sectional Observational Study. PloS one, 9(9), e106111. 

Russell, S., & Norvig, P. (1995). Artificial Intelligence A Modern Approach. Egnlewood Cliffs: Prentice‐Hall, 

25. 

Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization (1 ed.). New 

York: Wiley. 

Settles, R. E., Fischer, S., Cyders, M. A., Combs, J. L., Gunn, R. L., & Smith, G. T. (2012). Negative urgency: 

A personality predictor of externalizing behavior characterized by neuroticism, low 

conscientiousness, and disagreeableness. Journal of Abnormal Psychology 121 (1), 160‐172. 

Snowden, R., & Gray, N. S. (2011). Impulsivity and psychopathy: Associations between the Barrett 

Impulsivity Scale and the Psychopathy Checklist revised. Psychiatry Research, 187(3), 414‐417. 

Sofeikov, K. I., Tyukin, I. Y., Gorban, A. N., Mirkes, E. M., Prokhorov, D. V., & Romanenko, I. V. (2014). 

Learning optimization for decision tree classification of non‐categorical data with information 

gain impurity criterion. In Neural Networks (IJCNN), 2014 International Joint Conference on, 

3548‐3555. 

Page 38: BFI & Addiction Risk

38

Stanford, M. S., Mathias, C. W., Dougherty, D. M., Lake, S. L., Anderson, N. E., & Patton, J. H. (2009). Fifty 

years of the Barratt Impulsiveness Scale: An update and review. Personality and Individual 

Differences, 47(5), 385‐395. 

Stewart, S. H., & Devine, H. (2000). Relations between personality and drinking motives in young adults. 

Personality and Individual Differences, 29(3), 495‐511. 

Sutina, A. R., Evans, M. K., & Zonderman, A. B. (2013). Personality traits and illicint substances:the 

moderation role of poverty. Drug and Alcohol Dependence, 131, 247‐251. 

Terracciano, A., Löckenhoff, C. E., Crum, R. M., Bienvenu, O. J., & Costa, P. T. (2008). Five Factor Model 

personality profiles of druge users. Bmc Psychiatry, 8(1), 22. 

Turiano, N. A., Whiteman, S. D., Hampson, S. E., Roberts, B. W., & Mroczek, D. K. (2012). Personality and 

substance use in midlife:Conscientiousness as a moderator and the effect of trait change. 

Journal of Research in Personality, 46(3), 295‐305. 

Valeroa, S., Daigre, C., Rodrígu, L., Gomà‐i‐Freixanet, M., Ferrer, M., Casasa, M., et al. (2014). 

Neuroticism and impulsivity: Their hierarchical organization in the personality characterization 

of drug‐dependent patients from a decision tree learning perspective. Comprehensive 

Psychiatry, 55(5), 1227–1233. 

Ventura, C. A., de Souza, J., Hayashida, M., & Ferreira, P. (2014). Risk factors for involvement with illegal 

drugs: opinion of family members or significant others. Journal of Substance Use(0), 1–7. 

Vollrath, M., & Torgersen, S. (2002). Who takes health risks? A probe into eight personality types. 

Personality and Individual Differences, 32(7), 1185‐1197. 

Williams, G. (2011). Data mining with Rattle and R: the art of excavating data for knowledge discovery. 

Springer New York Dordrecht Heidelberg:London: Springer Science & Business Media. 

Wilmoth, D. R. (2012). Intelligence and past use of recreational drugs. Intelligence, 40 (1), 15‐22. 

World Health Organization. (2004). Prevention of mental disorders: Effective interventions and policy 

options. Geneva Universities of Nijmegen and Maastricht: Summary report. 

Yasnitskiy, L., Gratsilev, V., Kulyashova, J., & Cherepanov, F. (2015). Possibilities of artificial intellect in 

detection of predisposition to drug addiction. Perm University Bulletin. Series «Philosophy. 

Psychology. Sociology», 1 (21), 61‐73. 

Zuckerman, M. (1994). Behavioral expressions and biosocial bases of sensation seeking. New York: 

Cambridge University Press.