Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns...

18
Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human Nutrition Research, Cambridge EUCCONET International Workshop, Bristol October 2011

Transcript of Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns...

Page 1: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Reduced Rank Regression –a powerful statistical method for identifying empirical dietary patterns

Gina Ambrosini PhD Senior Research ScientistMRC Human Nutrition Research, Cambridge

EUCCONET International Workshop, Bristol October 2011

Page 2: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Why dietary patterns ?

The human diet is complex – we do not eat nutrients or foods in isolation

• Single food/nutrient studies are frequently null e.g. fat intake and obesity;

these do not consider total dietary intake

• Strong co-linearity between dietary variables;

; difficult to separate effects, may be too small to detect

• Numerous dietary variables (foods & nutrients) lead to too many statistical tests

Studies of dietary patterns i.e. combinations of total food intake

can overcome many of these problems

Page 3: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

What nutrition epidemiologists want to know …

Disease or

Health Outcome

Dietary Indices

Eg. Healthy Eating Index

Reduced Rank

Regression

PCA or Factor

Analysis

DietaryPattern

Cluster Analysis

??

Page 4: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Empirical Dietary Patterns

E.g. Principal Components Analysis (PCA), Factor Analysis and Cluster Analysis

• Data reduction techniques; identify latent constructs in data = patterns

• Take advantage of co-linearity

• Consider total diet; ‘real-life’ consumption and synergism

• Produce uncorrelated dietary patterns (or clusters) suitable for multivariate models

• Exploratory, data-driven, study specific: reproducibility unknown in different populations

• Explain variation in food intakes but not necessarily nutrients – the end product of diet

• Not disease-specific or hypothesis-based

Food Intakes

Dietary Patterns

Page 5: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Reduced Rank Regression – a novel empirical approach

Page 6: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Reduced Rank Regression (RRR)

• A hypothesis-based empirical method for identifying dietary patterns

• Similar to PCA and factor analysis but requires a 2nd set of data = response variables

• Response variables should be on the pathway between food intake and outcome of

interest

RRR dietary patterns are linear combinations of food intake

that explain the maximum variation in a set of response variables

Food Intake

Nutrients Or

Biomarkers

Disease or

Outcome of Interest

Dietary Pattern

Predictors Responses

Page 7: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Example - ALSPAC

• Measured dietary intake using a 3d food diary at

7, 10 and 13 years of age

• We hypothesised that:

a dietary pattern that could explain the variation in

dietary energy density, % energy from fat, and fibre at 7, 10 and 13 y

would be prospectively assoc with body fatness

measured at 9, 11, 13, 15 y

Page 8: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Example RRR - ALSPAC

Each dietary pattern is a linear combination of weighted food intakesthat explains the max variation in ALL response variables -1st pattern often explains the most

Such that for each dietary pattern a z-score is calculated as = W1(Food1 Intake) + W2(Food2 Intake) + W3(Food3 Intake) + …

3-day

food diary

Fibre

Fat

EnergyDensity

PredictorsFood Group

Intakes

ResponsesNutrient Intakes

F4

F6F5

F7 F8…

Fruit

F3

Veg

OBESITY(fat mass)

DietaryPattern 2

DietaryPattern 1

DietaryPattern 3

1st Dietary Pattern: Energy-dense,

high in fat, low in fibre

Page 9: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

ALSPAC energy-dense, high fat, low fibre dietary pattern

Page 10: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

ALSPAC – change in Fat Mass Index (z-score) with a SD increase in energy-dense, high fat, low fibre dietary patternz-score

Girls  Age FMI @ 9 y FMI @ 11 y FMI @ 13 y FMI @ 15 y   Dietary Pattern n=2868 n=2274 n=2007 n=1556

7 y 0.08 0.08 0.07 0.07(95% CI) (0.05 - 0.12) (0.03 - 0.12) (0.03 - 0.11) (0.02 - 0.12)p-value <.0001 <0.001 <0.001 <0.001

10 y 0.05 0.04 0.05(0.01 - 0.08) (0.01 - 0.08) (0.01 - 0.10)0.01 0.04 0.02

13Y -0.01(-0.04 - 0.03)

0.68

Boys  Age FMI @ 9 y FMI @ 11 y FMI @ 13 y FMI @ 15 y   Dietary Pattern n=2854 n=2118 n=1863 n=1345

  7 y 0.09 0.09 0.06 0.07(95% CI) (0.05 - 0.12) (0.05 - 0.13) (0.01 - 0.10) (0.02 - 0.12)p-value <.0001 <.0001 0.012 0.006

10 y 0.01 0.04 0.01(-0.03 - 0.04) (0.01 - 0.08) (-0.03 - 0.06)

0.65 0.04 0.64

13Y -0.01(-0.05 - 0.02)

          0.45

Adjusted for age at fat mass assessment, dietary misreporting, physical activity (cpm)

Page 11: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Cross-cohort comparisons: ALSPAC v Raine Study

PhD project – Geeta Appannah

University of Cambridge and MRC Human Nutrition Research:

• An almost identical energy-dense, high fat, low fibre dietary pattern seen at 14

and 17 y in The Western Australian Pregnancy Cohort (Raine) Study, a

contemporaneous birth cohort.

• Similar factor loadings for an energy-dense, high fat, low fibre dietary pattern

in a FFQ and a food diary at 14 y of age in the Raine Study

Geeta Appannah, MRC Human Nutrition Research

Page 12: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Comparisons of RRR and PCA patterns

• Although the PCA and RRR patterns in these studies had similar nutrient profiles; these studies reported stronger associations between RRR-based dietary patterns and outcomes

• RRR patterns explain more variation in the response variables

Study RRR response variables Outcome

Multi-Ethnic Study of Atherosclerosis (US) CRP, IL-6, Fibrinogen, Homocysteine Sub-clinical atherosclerosis

EPIC Potsdam (Germany) Fibre, Magnesium, alcohol Type 2 Diabetes

EPIC Potsdam (Germany) % Energy from saturated fat, PUFA, MUFA, protein and carbohydrate

All cause mortality

EPIC Potsdam (Germany) SFA, MUFA, n-3 PUFA, n-6 PUFA Breast cancer incidence

Tehran Lipids and Glucose Study Total fat, PUFA/sat fat, cholesterol, fibre, calcium

Obesity

Gina Ambrosini

Page 13: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Caution - using biomarkers as response variables

Biomarkers as response variables should be chosen carefully:

• So they are true intermediates and not a proxy for the outcome of interest

• Should be on pathway;• Therefore must be susceptible to dietary intake – relevant to more novel biomarkers

Food Intake

Blood Glucose

Insulin Resist.Diabetes

Dietary Pattern

Predictors Responses

Gina Ambrosini

Page 14: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Generalisability of RRR patterns

• Imamura et al (2010) applied RRR dietary patterns that were associated with type 2 diabetes in three different cohorts to the Framingham Offspring Study

• All patterns were characterised by high intakes of meat products, refined grains and soft drinks

EPIC Potsdam (Germany) Fibre, Magnesium, alcohol 1.14 (0.99 – 1.32)

Nurses Health Study (US) Inflammatory markers 1.44 (1.25 – 1.66)

Whitehall II (UK) Insulin resistance * 1.16 (1.00 – 1.35)

Dietary Pattern RRR response variables Risk of T2D in Framingham Offspring Study

Imamura F et al. Generalizability of dietary patterns associated with type 2 diabetes mellitus.

AJCN 2010; 90(4):1075-83Gina Ambrosini

Page 15: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Limitations

RRR appears to be a robust and powerful method, however:

• Reproducibility, generalisability of patterns – only 1 published study

• RRR depends on existing knowledge in order to choose response variables

• Response variables must be chosen very carefully to avoid circular analysis

• Biomarkers as response variables: must be an intermediate and not a proxy for the outcome/disease

Gina Ambrosini

Page 16: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Acknowledgements

Dr Pauline Emmett, Dr Kate Northstone, & the ALSPAC Study Team

Ms Geeta Appannah, PhD scholar, MRC Human Nutrition Research

Mr David Johns, PhD scholar, MRC Human Nutrition Research

Dr Anna Karin Lindroos, Swedish Food Authority, Uppsala (prev. HNR)

Funding from:

Page 17: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

[email protected]

MRC Human Nutrition ResearchCambridge, UK

Page 18: Reduced Rank Regression – a powerful statistical method for identifying empirical dietary patterns Gina Ambrosini PhD Senior Research Scientist MRC Human.

Reported Associations with Other RRR Dietary Patterns

Study RRR response variables Outcome

Multi-Ethnic Study of Atherosclerosis (US) CRP, IL-6, Fibrinogen, Homocysteine Sub-clinical atherosclerosis

Insulin Resistance Atherosclerosis Study (US multi-ethnic cohort)

Plasminogen activator inhibitor 1, Fibrinogen

Carotid artery atherosclerosis (IMT, CAC)

Coronary Risk Factors for Atherosclerosis in Women (CORA) Germany

LDL and HDL cholesterol lipoprotein (a) CRP, C-peptide (insulin resist)

Coronary artery disease

Nurses Health Study (US) Inflammatory markers Type 2 Diabetes

Framingham Offspring Study (US) BMI, fasting HDL-C, TG, glucose, hypertension (BP residuals)

Type 2 Diabetes

EPIC Potsdam (Germany) Fibre, Magnesium, alcohol Type 2 Diabetes

EPIC Potsdam (Germany) % Energy from saturated fat, PUFA, MUFA, protein and carbohydrate

All cause mortality

EPIC Potsdam (Germany) SFA, MUFA, n-3 PUFA, n-6 PUFA Breast cancer incidence

Tehran Lipids and Glucose Study Total fat, PUFA/sat fat, cholesterol, fibre, calcium

Obesity

ALSPAC Energy density% energy from fatFibre density

Child obesity at 7, 9, 11, 13, 15y

Gina Ambrosini