Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

50
06/14/22 H.S. 1 Stata 3, Regression Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/

Transcript of Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

Page 1: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

04/19/23 H.S. 1

Stata 3, Regression

Hein Stigum

Presentation, data and programs at:

http://folk.uio.no/heins/

Page 2: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 2

Agenda

• Linear regression

• GLM

• Logistic regression

• Binary regression

• (Conditional logistic)

Page 3: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 3

Linear regression

Birth weight

by

gestational age

Page 4: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 4

Regression idea

residual error,e

xofeffect ,tcoefficienb

covariate =x

outcome=y

:model

1

10

exbby

2500

3000

3500

4000

4500

5000

birt

h w

eigh

t (g

ram

)

240 260 280 300 320gestational age (days)

covariate = x,x

:cofactorsmany with model

21

22110 exbxbby

Page 5: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 5

Model and assumptions

• Model

• Assumptions– Independent errors

– Linear effects

– Constant error variance

),0(, 222110 Nxxy

Page 6: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 6

Association measure: RD

11

1

2210

2210

121

22110

1

211

βRD

β

xβββ

xβββ

yyRD

xβxββy

xx

Model:

Start with:

Hence:

Page 7: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 7

Purpose of regression

• Estimation– Estimate association between outcome and

exposure adjusted for other covariates

• Prediction– Use an estimated model to predict the

outcome given covariates in a new dataset

Page 8: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 8

Adjusting for confounders

• Not adjust– Cofactor is a collider– Cofactor is in causal path

• May or may not adjust– Cofactor has missing– Cofactor has error

True value

True value

Unadjusted estimate

Adjusted estimate

Page 9: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 9

Workflow

• Scatterplots

• Bivariate analysis

• Regression– Model fitting

• Cofactors in/out• Interactions

– Test of assumptions• Independent errors• Linear effects• Constant error variance

– Influence (robustness)

Page 10: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 10

Scatterplot20

0030

0040

0050

0060

00w

eigh

t

200 300 400 500 600 700gest

Page 11: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 11

Syntax

• Estimation– regress y x1 x2 linear regression

– xi: regress y x1 i.c1 categorical c1

• Post estimation– predict yf, xb predict

• Manage models– estimates store m1 save model

Page 12: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 12

Model 1: outcome+exposure

Page 13: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 13

Model 2: Add counfounders

Estimate association:m1=m2

Prediction: m2 is best

Page 14: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

”Dummies”

April 19, 2023 H.S. 14

Assume educ is coded 1, 2, 3 for low, medium and high education

Choose low educ as reference

Make dummies for the two other categories:generate medium=(educ==2) if educ<.generate high =(educ==3) if educ<.

Page 15: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 15

Interaction

sexββRD

sexββ

sexβxβββ

sexβxβββ

yyRD

sexxβxβxββy

xx

311

31

32210

32210

121

1322110

11

2211

Model:

Start with:

Hence:

Page 16: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 16

Model 3: with interaction

Page 17: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 17

Test of assumptions

• Predict y and residuals– predict y, xb

– predict res, resid

• Plot resid vs y– independent?

– linear?

– const. var?

-100

00

1000

2000

3200 3400 3600 3800Linear prediction

twoway (scatter res y )(qfitci res y)

Page 18: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 18

Violations of assumptions

• Dependent residualsMixed models: xtmixed

• Non linear effectsgen gest2=gest^2

regress weigth gest gest2 sex

• Non-constant varianceregress weigth gest sex, robust

-1-.

50

.51

200 220 240 260 280 300gest

-2-1

01

2re

s3400 3500 3600 3700 3800

p

Page 19: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 19

Measures of influence

• Measure change in:– Outcome (y)

– Deviance

– Coefficients (beta)• Delta beta, Cook’s distance

Remove obs 1, see changeremove obs 2, see change

-.6

-.4

-.2

0.2

Influ

ence

1 2 10Id

Page 20: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 20

Points with high influence

lvr2plot, mlabel(id)

1 3456791011121314 161819 20 21232425 2627 282930313233 35 3637383940 414243 444548 49 50515456575859 6061 6264656668697072 74 7576 7778 79 8081 82838485 8687 8889909193 9495969899 100101102103104105 106107108109 110111112 113115116118 119120121122124125126 127128129130 131132133 134 135136137138139141142143144 145146148149150151152153154 155156157 158160161 163164165167168 169170171172 173174175 176177178 179180181182183184185186187 188189190191 192193 194195196197198199 200 201202203205206 208209210 211212 213214215216217218219220 221222223 224225226227228229230231 232233234235 236237 238239240241242244 245246247 248249250251252253254 255256 258259261 262263 264266267269270 271273274275276277278 279280281 283284 285286 287 288289290291

292 293294295296297 298299 300301303304305306307310311312 313314315316318319 320321322323324 325326327328 329 330331332334335 336337338339 340341343344345346347348349 350351 352353354 355357 358359360361363364365 366367368369 370371372374375 376 378379380381383384 385386 387388389390 391 392393394395396397399400401402403404 405406407408409410 412413414416418419420 421422423424 425426427 428 429430431432434 435436437 438439440441443444 445446 447448 449451452453454 455456457460461462463464465466 467468 469470471 472474475476477478479480481482483 484485486487 488489490491492493 495496497498 499500501502503 504505

506507508 509510511512513514515 516517 519520521 522523524 525526527 528 529530 532533 535536 537538

539

540541542543545546 547548551552553554555556557 559560562563 564565566 567568 569570571 572573574575576 5775795805815825830.2

.4.6

.8Le

vera

ge

0 .01 .02 .03Normalized residual squared

Page 21: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

Added variable plot: gestational age

April 19, 2023 H.S. 21

505

291

134564

270

79500

206

180

110

36635

152

211

543

105

313

96

161181

136

565

36

176

125

744925219

7089

45564

191444

340

485

47695

456492

33

355

5406

368

58

359

493

345

8145

221

220

75

480

232

200

289

346

551

111

38

238

209

478

135

466

496297

103

474

416

258

271

580

218

201

353344

315

292497

521301

224

273

403

316

5

174

171

145

48338

370

46465267

121

354205486

305

523

229

132

482

53084

360

522

93210

158

249

146

347

324502

54

477

358

285

425

160

216

538

499

66

331

329

339

208

51

80283

116

427

385

461

259

296

43

107

318

40

11351

379

149335542

277303

168

248

194

535

227511

179

198407115

419

235560

447

14

18376

529

463

163

118

29

555

54832423

321

245

100

127

90278

470

31

275

577

9573

490

290

37108

109

520

536375

432

452426386

559

369

435

579

240566104

57568

460196

412

199

142

552

422

507

310

280

337

567

332

62

469

51499

32612

509

387

261

395

583

454

428

408

553

288

228

506

4

394

388170

312

7

479

120

177

122

189

352

242462223

3468

443

489

393

557

281

420

222

263

188

392

197

233

361

508

169

294

16155

72

94

49157219

13

304

449

371

133

287

430402

516

541

504

380

330

424

348562

528

307

441

483

178284

148

244

40983

20

429

364

503357546

574

414124

26

501

421

391

319

445

510

234

374

328

343

572

264

237

519

568

215300

195

193306

547

487

144

570182

401

410

157

323413349

367365363

262

69513102

431

255

517

438

41

24

241

27

10

143

18739

325

128

183

28

569

322

250

446

293

350

153

515

484

151

384

87

2344

13823198

434

246

42

383

129

77

101

230

52521

126

471

4531

5956

164

397

253

239

475

236

563

203

192

472

390

213

378

156

76202

30

184

247

175

295

495

334

172

451

299

406

276

131

448

320

399576

581

214

141

225

130

533

314

405

25

186465

396

527

60

88

457

571269

91389

173

298

481

436

150

341311

498

113

439

254154554190

86

372

212327

50119

279

167266

545

381437

537

488

440

185

165532

85

524

400

112

137

526

336

274

467

217

251

286

61

40478

418582

556256139

226

106

82

512

539

-10

000

1000

2000

e( w

eigh

t | X

)

-100 0 100 200 300 400e( gest | X )

coef = 5.6762662, se = 1.1303444, t = 5.02

avplot gest, mlabel(id)

Page 22: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 22

Removing outlier

Page 23: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 23

Influence

Outlier

Regression with outlier

Regressionwithout outlier

2000

3000

4000

5000

6000

Birt

h w

eigt

h

200 300 400 500 600 700Gestational age

Page 24: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 24

Final model

),0(, 222110 Nxxy

sum gest /* find smallest value */generate gest2=gest-204 /* smallest gest=204 */generate sex2=sex-1 /* boys=0, girls=1 */regress weight gest2 sex2 /* final model */estimates store m4

Give meaning to constant term:

Page 25: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 25

Logistic regression

Being bullied

Page 26: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 26

Model and assumptions

• Model

• Assumptions– Independent residuals– Linear effects

BinomialxyxyExx ~|),|(,)1

ln( 22110

Page 27: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 27

Association measure, Odds ratio

)(βOR

β

xβββ

xβββ

)(Odds)(Odds

Odds

Odds)(OR

xβxββ -

xx

x

x

11

1

2210

2210

12

1

21

22110

exp

1

2

lnln

lnln

1ln

11

1

1

Model:

Start with:

Hence:

Page 28: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 28

Syntax

• Estimation– logistic y x1 x2 logistic regression

– xi: logistic y x1 i.c1 categorical c1

• Post estimation– predict yf, pr predict probability

• Manage models– estimates store m1 save model

– est table m1, eform show OR

Page 29: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 29

Workflow

• Bivariate analysis

• Regression– Model fitting

• Cofactors in/out• Interactions

– Test of assumptions• Independent errors• Linear effects

– Influence (robustness)

Page 30: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 30

Bivariate

Generate dummiesgen Island= (country==2) if country<.gen Norway= (country==3)gen Finland= (country==4)gen Denmark= (country==5)

Page 31: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 31

Model 1: outcome and exposure

xi:logistic bullied i.country use xi: i.var for categorical variablesxi:logistic bullied i.country , coef coefs instead of OR'sxi:logistic bullied i.country if sex!=. & age!=. do if sex and age not missing

Alternative commands:

Page 32: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 32

Model 2: Add confounders

Estimate associations: m1=m2Predict: m2 best

Page 33: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 33

Interaction

sex)β(βOR

sexββ

sexβsexβββ

sexβsexβββ

)(Odds)(Odds

Odds

Odds)(OR

sexxβsexβxββ -

xx

x

x

311

31

3210

3210

12

1

21

132110

exp

11

22

lnln

lnln

1ln

11

1

1

Model:

Start with:

Hence:

Page 34: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 34

Model 3: interaction

Page 35: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 35

Test of assumptions

• Linear effects (of age)– findit lincheck search and

install

– lincheck xi:logistic bullied age I.country sex

0.2

.4.6

.8co

effic

ient

5 10 15age

Page 36: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 36

Points with high influence

0.2

.4.6

.8P

regi

bon'

s db

eta

.05 .1 .15 .2 .25 .3Pr(bullied)

estimates restore m2 restore best modelpredict p, p probability (mu in our notation)predict db, db delta-beta (one value, not one per estimate)scatter db p delta-beta plot

Page 37: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 37

Removing 2 observations

Conclusion:Robust results

Page 38: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 38

Generalized Linear Models

Being bullied

Page 39: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

Designs and measures

April 19, 2023 H.S. 39

Design Frequency measuresNatural association measures

Cross-sectionalPrevalence Risk Ratio, Risk Difference

CohortIncidence risk Risk Ratio, Risk DifferenceIncidence rate Rate Ratio

Case ControlTraditional CC Odds RatioCase Cohort Rate RatioNested Case Control Rate Ratio

Models   MeasuresGLM RR, RD, ORSurvival   Rate Ratio

Page 40: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

04/19/23 H.S. 40

Generalized Linear Models, GLM25

0030

0035

0040

0045

0050

00bi

rth

wei

ght

(gr

am)

250 270 290 310gestational age (days)

0.2

.4.6

.81

risk

0 20 40 60 80age

Linear regression

Logistic regression

Poisson regression

05

1015

0 20 40 60 80

Page 41: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

04/19/23 H.S. 41

GLM: Distribution and link

• Distribution family– Given by data

– Influence p-value, CI

• Link function– May chose

– Shape (=link-1)

– Scale

– Association measure

Normal Binomial Poisson

Identity Logit Log

Additive Multi. Multi.

RD OR RR

Page 42: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

04/19/23 H.S. 42

Data Distribution Link Measure NameStandard models

Continuous Normal Identity - LinearCount Poisson Log RR Poisson0/1 Binomial Logit OR Logistic

Other models0/1 Binomial Log RR Log binomial0/1 Binomial Identity RD Linear binomial

Distribution and link examples

Link: Identity linear model additive scale

OBS: not for traditional case control data

Page 43: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

04/19/23 H.S. 43

Being bullied, 3 modelsglm bullied Island Norway Finland Denmark sex age, family(binomial) link(logit)

glm bullied Island Norway Finland Denmark sex age, family(binomial) link(log)

glm bullied Island Norway Finland Denmark sex age, family(binomial) link(identity)

Page 44: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 44

Convergence problems

• If glm does not converge, use:– poisson y x1 x2, irr robust RR

– regress y x1 x2, robust RD

Stop

Page 45: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 45

Association measure, RR

)(βRR

β

...xβββ

...xβββ

)()(

)(RR

...xβxββ

xx

x

x

11

1

2210

2210

12

1

21

22110

exp

1

2

lnln

lnln

ln

11

1

1

Model:

Start with:

Hence:

Page 46: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 46

Association measure: RD

11

1

2210

2210

121

22110

1

211

βRD

β

...xβββ

...xβββ

RD

...xβxββ

xx

Model:

Start with:

Hence:

Page 47: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 47

The importance of scale

Males

Females

010

2030

T1 T2

Additive scaleAbsolute increase

Females: 30-20=10Males: 20-10=10

Conclusion:Same increase for males and females

RD

Multiplicative scaleRelative increase

Females: 30/20=1.5Males: 20/10=2.0

Conclusion:More increase for

males

RR

Page 48: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 55

Stata regression commands

Page 49: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 56

• Regression with simple error structure– regress linear regression (also heteroschedastic

errors)– nl non linear least squares

• GLM– logistic logistic regression– poisson Poisson regression– binregbinary outcome, OR, RR, or RD effect measures

• Conditional logistc– clogit for matched case-control data

• Multiple outcome– mlogitmultinomial logit (not ordered)– ologit ordered logit

• Regression with complex error structure– xtmixed linear mixed models– xtlogitrandom effect logistic

Page 50: Jul-15H.S.1 Stata 3, Regression Hein Stigum Presentation, data and programs at:

April 19, 2023 H.S. 57

• Estimation– regress y x1 x2 linear regression– logistic y x1 x2 logistic regression– xi:regress y x1 i.x2 categorical x2

• Manage results– estimates store m1 store results– estimates table m1 m2 table of results– estimates stats m1 m2 statistics of results

• Post estimation– predict y, xb linear prediction– predict res, resid residuals– lincom b0+2*b3 linear combination

• Help– help logistic postestimation

Syntax