BIOMETRIC CONFERENCE 7TH...

53
7TH NORDIC-BALTIC BIOMETRIC CONFERENCE Final Programme and Abstract Book 3-5 June, 2019 Vilnius, Lithuania

Transcript of BIOMETRIC CONFERENCE 7TH...

Page 1: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

7TH NORDIC-BALTICBIOMETRIC CONFERENCE

Final Programme and Abstract Book

3-5 June, 2019Vi ln ius, L i thuania

Page 2: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

TABLE OF CONTENTS

CONFERENCE ORGANIZERS ........................................................................................................................... 2

GENERAL INFORMATION ................................................................................................................................... 3

SOCIAL PROGRAMME ........................................................................................................................................... 4

SCIENTIFIC PROGRAMME .................................................................................................................................. 5

ABSTRACTS ................................................................................................................................................................ 10

Page 3: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

2

CONFERENCE ORGANIZERS

Local Organizing Committee

Audronė Jakaitienė (Vilnius University)- Chair Kęstutis Dučinskas (Vilnius University/Klaipėda University)

Rimantas Eidukevičius (Vilnius University) Roma Puronaitė (Vilnius University)

Daiva Petkevičiūtė (Kaunas University of Technology) Arvydas Martinkėnas (Klaipėda University)

Scientific Programme Committee

Theis Lange (University of Copenhagen) – Chair Juha Heikkinen (Natural Resources Institute Finland (Luke))

Mette Langas (Norwegian University of Science and Technology) Eva Šauriņa (Rīga Stradiņš University) Aila Särkkä (University of Gothenburg)

Krista Fisher (University of Tartu) Audronė Jakaitienė (Vilnius University)

Conference Secretariat

Conference & Event Manegament UAB Kalanis

Rūdninkų str. 18, LT-01135, Vilnius, Lithuania Tel: +370 630 31131

[email protected]

Page 4: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

3

GENERAL INFORMATION Dates | 3-5 June, 2019 Venue | Vilnius University Address | Universiteto str. 3, Vilnius, Lithuania On-site fees: IBS members – 390 Eur Non-members – 475 Eur Student IBS members – 225 Eur Student non-members – 275 Eur Pre–conference course – 100 Eur Gala Dinner – 80 Eur Payment: Payments on-site shall be made either in cash (Euro) or by credit card. Only registered and paid participants can attend the Conference. Conference registration fee includes:

Admission to all scientific sessions from 3 June to 5 June, 2019 Conference materials Lunch and coffee/ refreshments from 3 June to 5 June, 2019 Guided tour and Welcome Reception on 3 June, 2019

Registration and hospitality desk opening hours: 2 June, Sunday, 2019 09:00-17:30 3 June, Monday, 2019 08:00-17:30 4 June, Tuesday, 2019 08:30-16:30 5 June, Wednesday, 2019 08:30-12:30 Name tags: Name tags will be provided during registration. All participants are expected to wear the name tags during all the Conference related events. Official language: The official Conference language is English. No simultaneous translation will be provided. WIFI: Name: Konferencija Password: Renginys6 Contacts: E – mail: [email protected] Tel: +370 63 031 131

Page 5: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

4

SOCIAL PROGRAMME

GUIDED TOUR: VILNIUS UNIVERSITY Date | 3 June, 2019 Excursion starts | 17:30 Meeting point | entrance of Vilnius University main building Duration: 1,5 hours Fee: free of charge to all registered participants. The Conference participants are welcome to join the guided tour around Vilnius University Ensemble and the Library, which will take place just before the Welcome Reception. One of the oldest universities in Central Europe, Vilnius University was founded in the 16th century. Afterwards, the campuses of VU were built and as a result, all feature Gothic, Baroque and Classical styles of architecture, and the main building’s medieval exterior is a stark contrast to its lively student atmosphere. WELCOME RECEPTION Date | 3 June, 2019 Time | 19:30 – 21:00 Venue | Konstantinas Sirvydas Courtyard, Vilnius University (Universiteto str. 3, Vilnius) The reception will be organized on the first day of the Conference and will take place in a beautiful courtyard of Vilnius University. The evening is free of charge for all participants, partners and guests. Join the evening and use this opportunity to meet your colleagues and friends from Nordic as well as Baltic States! CONFERENCE DINNER Date | 4 June, 2019 Time | 19:30 – 23:00 Venue | Trinity Restaurant (Vilniaus str. 30, Vilnius) All participants are welcome to join the Conference dinner, which will take place at the Trinity Restaurant, just 10 min walk from Conference venue. Admission fee is 80 Euro per person. The restaurant is located in the 18th-century building that was used to be a nunnery.

Page 6: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

5

SCIENTIFIC PROGRAMME

Sunday, 2 June, 2019

Room 239

09:00-17:30 REGISTRATION

Mediation analysis using R Pre-conference course

10:00-10:45 Module 1

10:45-11:00 COFFEE BREAK

11:00-12:00 Module 2

12:00-13:00 LUNCH

13:00-14:15 Module 2 - lab

14:15-14:30 COFFEE BREAK

14:30-15:15 Module 3

15:15-16:30 Module 3 - lab

Monday, 3 June, 2019 Aula Parva Room 239 08:00 - 17:30 REGISTRATION

9:00 - 9:30 WELCOME

9:30 - 10:30 Keynote Speaker: Geert Molenberghs, Hasselt University, Belgium The Applied Statistical (Data) Scientist in a High-Profile and Societal Environment

10:30 - 11:00 COFFEE BREAK

IS5: Applied Spatiotemporal Modelling Chair: Juha Heikkinen, Finland

CS7: Robust Methods and Big Data Chair: Andreas Kryger Jensen, Denmark

11:00 - 11:30

David Bolin, Chalmers University of Technology and University of Gothenburg A Bayesian General Linear Modeling Approach to Cortical Surface fMRI Data

11:00 - 11:20 Tsung-Shan Tsou, National Central University, Taiwan A Reproducible Robust Likelihood Approach to Inference about Marginal Characteristics of Binary Data in Paired Settings

Page 7: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

6

11:20 - 11:40

Jelena Liutvinavičienė, Vilnius University, Lithuania Multi-Level Methodology for Massive Data Visualization

11:30 - 12:00 Samuel Soubeyrand, French National Institute for Agricultural Research, France The Mechanistic-Statistical Approach Applied to Spatio-Temporal Populations Dynamics at Different Scales

11:40 - 12:00 Janis Valeinis, University of Latvia, Latvia Inference for Two-Sample Quantile Difference Using Empirical Likelihood

12:00 - 12:30

Timo Adam, Bielefeld University, Germany Statistical Modeling of Animal Telemetry Data at Multiple Temporal Resolutions: Hidden Markov Models and Extensions

12:00 - 12:20 Tadas Danielius, Vilnius University, Lithuania Functional Data Analysis of Neurophysiological Data: Case Study

12:30 - 13:30 LUNCH

IS2: Causal Inference from a Stochastic Process Point of View Chair: Theis Lange, Denmark

CS6: Statistical Genetics Chair: Krista Fischer, Estonia

13:30 - 14:00 Daniel Commenges, Bordeaux University, France Effects of Simple and Adaptive Interventions in the Stochastic System Approach to Causality

13:30 - 13:50

Francesca Azzolini, University of Bergen, Norway Heritability Curves: a Local Measure of Heritability

13:50 - 14:10 Marijus Radavičius, Vilnius University, Lithuania Properties of Noninformative Genetic Sequences

14:00 - 14:30 Niklas Pfister, Swiss Federal Institute of Technology, Switzerland Causal KinetiX: Learning Stable Structures in Kinetic Systems

14:10 - 14:30 Ilva Trapina, University of Latvia, Latvia SNPs of Proteasomal Genes as Possible Biomarkers for Multiple Sclerosis in the Latvian Population

14:30 - 14:50 Märt Möls, University of Tartu, Estonia Identification of Mixtures of Bacterial Strains Using DNA Sequencing Reads with Possibly High Error Rates

15:00 - 15:30 COFFEE BREAK

IS3: Novel Designs in Randomized Clinical Trials: Platform, Umbrella and Basket Trial Chair: Ziad Taib, Sweden

15:30 - 16:00 Lindsay Renfro, University of Southern California, USA Basket and Umbrella Trials for Precision Medicine: Overview, Statistical Considerations, and Examples

16:00 - 16:30 Steve Fox, AstraZeneca R&D, United Kingdom Platform, Umbrella and Basket Designs in Oncology Trials

16:30 - 17:00 Sofia Tapani, AstraZeneca R&D, Sweden Platform Designs Beyond Oncology: Biomarker Finding Design in Patients With Heart Failure With Preserved Ejection Fraction

17:30 - 19:00 EXCURSION AT VILNIUS UNIVERSITY

19:30 - 21:00 WELCOME RECEPTION

Page 8: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

7

Tuesday, 4 June, 2019 Aula Parva Room 239

08:30 - 16:30 REGISTRATION

IS1: Recent Development in Analysis of Omics-Data Chair: Krista Fischer, Estonia

CS2: Equivalence Testing and Survival Analysis Chair: Rimas Eidukevičius, Lithuania

09:00-09:30 Inke König, Universität zu Lübeck, Germany Genome-Wide Association Studies – Looking Beyond Associations

09:00-09:20

Martin Otava, Janssen Pharmaceutical Companies of Johnson & Johnson, Czech Republic Equivalence, Similarity and Comparability: How Equal Do We Want the Outcomes To Be?

09:20 - 09:40

Christian Bressen Pipper, LEO Pharma A/S, Denmark Testing Equivalence of Survival Before but Not After End of Follow-Up

09:30 - 10:00 Marika Kaakinen, Imperial College, United Kingdom Machine Learning In Multi-Omics Data To Assess Longitudinal Predictors Of Glycaemic Health

09:40 - 10:00 Alexandra Jauhiainen, Astrazeneca R&D, Sweden A Novel Joint Modelling Approach to Estimating Treatment Effects on Copd Exacerbations in the Presence of Differential Discontinuations

10:00-10:30 Tanel Kaart, Estonian University of Life Sciences, Estonia Troubles, Challenges And Delights In Area Of Omics-Data Analyses From Statistician Viewpoint

10:00- 10:20 Discussion

10:30- 11:00 COFFEE BREAK

11:00 - 12:00 Keynote Speaker: Richard Cook, University of Waterloo, Canada – SJS lecture Defining and Addressing Dependent Oservation Schemes in Life History Studies

12:00 - 13:00 LUNCH

CS3: Getting to the Causes Chair: Daniel Commenges, France

CS4: Temporal Analysis Chair: Eva Šaurina, Latvia

13:00 - 13:20 Benjamin Mayer, Ulm University, Germany A Two-Level Matching Algorithm for a Multi-Center Case-Control Study Using Registry Data

13:00 - 13:20 Jaakko Reinikainen, National Institute for Health and Welfare, Finland Prevalence Forecasting Based on Multiple Imputation

13:20 - 13:40 Inge Christoffer Olsen, Oslo University Hospital, Norway Exploring the Individually-Randomised Stepped Wedge Design for Trials Where all Patients Eventually Receive the Intervention

13:20 - 13:40 Jukka Kontto, National Institute for Health and Welfare, Finland Scenario-Based Projections Using Multiple Imputation: Accounting for Both Risk Factor and Health Outcome Changes in Repeated Measures Data

Page 9: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

8

13:40 - 14:00 Roma Puronaitė, Vilnius University, Lithuania Identifying Patterns of Multimorbidity in Lithuanian National Health Insurance Fund Data: A Comparison of Cross-Sectional and Temporal Phenotyping Approaches

14:00 - 15:00 Keynote Speaker: Stian Lydersen, Norwegian University of Science and Technology, Norway Contingency Tables: How to Choose Appropriate Methods for Analysis

15:00 - 15:30 COFFEE BREAK

CS1: Survival Analysis Chair: Marijus Radavičius, Lithuania

CS8: Spatial and Spatiotemporal Analysis Chair: David Bolin, Sweden

15:30 - 15:50 Tommi Härkänen, National Institute for Health and Welfare, Finland Intensity Model Based on Multidimensional Smoothing of Hazard Functions: Projecting Healthy Life Years

15:30 - 15:50 Hans J. Skaug, University of Bergen, Norway Modeling of Caesarean Section Rates Using Spatio-Temporal Gaussian Random Fields

15:50 - 16:10 Krista Fischer, University of Tartu, Estonia Challenges of Survival Analysis in Population-Based Biobank Cohorts

15:50 - 16:10 Adil Yazigi, University of Eastern Finland, Finland Sequential Models for Forest Inventory: A Self-Interactive Spatial Point Process

16:10 - 16:30 Merli Mändul, University of Tartu, Estonia Combining Parental and Offpring Data for Survival Analysis in Population-Based Biobank Cohorts

19:30 - 22:00 CONFERENCE DINNER

Page 10: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

9

Wednesday, 5 June, 2019

Aula Parva Room 239

08:30 - 12:30 REGISTRATION

IS4: Statistical Methods and Models in Neuroscience Chair: Tadas Danielius, Lithuania

CS5: Trends and Trajectories Chair: Theis Lange, Denmark

09:00-09:30 Brice Ozenne, University of Copenhagen, Denmark Region-Based and Voxel-Wise Analysis of Medical Images Using Latent Variables

09:00-09:20

Viktor Skorniakov, Vilnius University, Lithuania On the P-Wave Segment Model of a Single Electrocardiogram Wave

09:20 - 09:40 Andreas Kryger Jensen, University of Copenhagen, Denmark The Trendiness of Trends 09:30 - 10:00 Karsten Tabelow, Weierstrass

Institute for Applied Analysis and Stochastics, Germany Adaptive Smoothing Data from Multi-Parameter Mapping

09:40 - 10:00 Ziad Taib, Astrazeneca R&D, Sweden Modelling Disease Progression Based on Biomarker Observations: A Bayesian Hidden Markov Autoregressive Model Approach

10:00 - 10:30 Julius Kernbach, RWTH Aachen University, Germany Changing Data-Analysis Regimes in Big Biomedical Data

10:00-10:20

Discussion

10:30- 11:00 COFFEE BREAK

11:00 - 12:00 Keynote Speaker: Kęstutis Dučinskas, Klaipėda University, Lithuania Statistical Classification of Spatial Data Based on Discriminant Functions

12:00 - 12:30 Closing remarks and farewell

12:30 - 13:30 LUNCH

Page 11: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

10

ABSTRACTS

Abstracts are displayed in the same order as they are in the Conference programme

Page 12: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

11

DEFINING AND ADDRESSING DEPENDENT OBSERVATION SCHEMES IN LIFE HISTORY STUDIES Presenting author: Richard Cook University of Waterloo, Canada Multistate models provide a powerful framework for the analysis of life history processes when the goal is to characterize transition intensities, transition probabilities, state occupancy probabilities, and covariate effects thereon. Data on such processes are typically only available at random visit times occurring over a finite period of time. We formulate a joint multistate model for the life history process, the recurrent visit process, and a random loss to follow up time at which the visit process terminates. This joint model is helpful when discussing the independence conditions necessary to justify the use of standard partial likelihoods involving the life history model alone, and provides a basis for analyses that accommodate dependence. We consider settings with disease-driven visits and routinely scheduled visits and develop likelihoods that accommodate partial information on the types of visits. Simulation studies suggest that suitably constructed joint models can yield consistent estimates of parameters of interest even under dependent visit processes, providing the models are correctly specified; identifiability and estimability issues are also discussed. An application is given to a cohort of individuals attending a rheumatology clinic where interest lies in progression of joint damage. This is joint work with Jerry Lawless.

Page 13: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

12

THE APPLIED STATISTICAL (DATA) SCIENTIST IN A HIGH-PROFILE AND SOCIETAL ENVIRONMENT

Presenting author: Geert Molenberghs I-BioStat, Hasselt University, Hasselt, Belgium; I-BioStat, KU Leuven, Belgium A perspective will be offered on the profession of the biometrician, the biostatistician, and more generally the applied statistical scientist, in an ever changing environment. The specifics of working in a multi-disciplinary environment will be discussed, referring to collaboration with agronomists, biologists, epidemiologists, medical professionals, etc. At the same time, interactions with other semi- or fully quantitative fields will be touched upon, such as computational biologists, computer scientists, engineers, etc. The current-day (r)evolution towards data science will be placed against a historical timeline of our field, which saw, over a relatively brief period of just one century, the coming of epidemiology and observational studies, (statistical) genetics, bioinformatics, the omics, big data, data science, data analytics, etc. Historical notes related to the International Biometric Society will be interwoven. References: Molenberghs, G. (2005). Presidential Address: XXII International Biometric Conference, Cairns, Australia, July 2004: Biometry, biometrics, biostatistics, bioinformatics,… Bio-X. Biometrics, 61, 1-9.

Page 14: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

13

CONTINGENCY TABLES: HOW TO CHOOSE APPROPRIATE METHODS FOR ANALYSIS Presenting author: Stian Lydersen Regional Centre for Child and Youth Mental Health and Child Welfare, Department of Mental Health, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology (NTNU), Trondheim, Norway Literally hundreds of methods for hypothesis tests and confidence intervals for contingency tables are described in the literature. This is the case even for the seemingly simple 2 × 2 table. Wald intervals, chi squared tests, and the Fisher exact test are examples of widely used methods. Unfortunately, these methods are also commonly used in situations when they perform poorly, and better alternatives exist. A short description will be given of Wald inference, likelihood ratio inference, and score inference, as well as asymptotic, exact conditional, exact mid-P, and exact unconditional methods. I will describe the actual significance level and power for a test, and coverage probability, expected interval width, and symmetry for a confidence interval, which will be used as evaluation criteria. The talk will focus on tests and confidence intervals for the binomial probability, the 2x2 table for independent counts, and the paired 2x2 table. Commonly used methods, and other methods which perform well, will be described and evaluated, and illustrated using data from studies in medicine, health and the social sciences. References: Fagerland MW, Lydersen S, Laake P (2017). Statistical Analysis of Contingency Tables. Chapman and Hall/CRC.

Page 15: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

14

STATISTICAL CLASSIFICATION OF SPATIAL DATA BASED ON DISCRIMINANT FUNCTIONS Presenting author: Kęstutis Dučinskas Klaipėda University, Lithuania Classification and discriminant analysis of spatial data has been regularly mentioned in the biological and ecological literature, but lacks full mathematical treatment and easily available algorithms and software. This study gap by defining the method of statistical classification based on Bayes discriminant function (BDF), by providing novel formulas and algorithms, which allows to evaluate the influence of spatial information to the performance of proposed classifier. Assuming that initial or transformed spatial data follow Gaussian random field (GRF) model, the problem of classifying observation into one of two or more populations (categories, groups, classes) is considered. Given training sample, the classifier obtained by substituting model parameters with their ML or REML estimators into BDF is of great interest. Closed-form expressions of the actual error rate and expected error rate associated with aforementioned plug-in BDF are derived both for Geostatistical and Markov Gaussian models of spatial data. These are used for the evaluation of classifiers performance. Numerical analysis of proposed classifier performances is done with simulated and real data. GeoR package of statistical software R is used in simulation of the GRF realizations. Stationary geometrically anisotropic Gaussian random field with exponential covariance function sampled on regular 2-dimensional lattice is used for illustrative examples. Different spatial sampling designs are compared. In the case study,various types of spatial data models for invasive species (zebra mussels) distributed in the Curonian Lagoon (Lithuania) are considered and compared by the performance measures of aforementioned classifiers. Advanced models are proposed to the mapping of presence and absence of zebra mussels in the Curonian Lagoon. This is joint work with Lina Dreižienė.

Page 16: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

15

GENOME-WIDE ASSOCIATION STUDIES – LOOKING BEYOND ASSOCIATIONS

Presenting author: Inke R. König Institut für Medizinische Biometrie and Statistik Universität zu Lübeck, Germany

In the past decade, numerous genome-wide association studies (GWAS) were performed to elucidate the genetic background of common complex diseases. Results from these helped to identify genetic candidates that are involved in disease development or progression or that help to predict treatment response. In addition to that, results from GWAS can also be used for broader purposes within the context of precision medicine. For example, it is assumed that polygenic risk scores building on genome-wide association results can help to predict disease risk. However, how many genetic variants should be used for scoring, and how the information across variants should be aggregated, is still under debate. Another example is the use of genetic association information in Mendelian randomization studies that might help to clarify the causality of assumed epidemiological risk factors. In this presentation, current developments, discussion points and examples for the use of GWAS results in precision medicine will be given.

Keywords: genome-wide association, precision medicine, polygenic risk score, Mendelian randomization.

Page 17: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

16

MACHINE LEARNING IN MULTI-OMICS DATA TO ASSESS LONGITUDINAL PREDICTORS OF GLYCAEMIC HEALTH Presenting author: Marika Kaakinen School of Biosciences and Medicine, Department of Clinical and Experimental Medicine, University of Surrey, Guildford, United Kingdom Co-authors: Laurie Prélot1, Harmen Draisma2, Mila Anasanti1, Zhanna Balkhiyarova2, Matthias Wielscher1, Loic Yengo3, Beverley Balkau4, Ronan Roussel5, Sylvain Sebert6, Mika Ala-Korpela7, Philippe Froguel1, Marjo-Riitta Jarvelin1, Inga Prokopenko2

1Imperial College London, London, United Kingdom; 2University of Surrey, Guildford, United Kingdom; 3The University of Queensland, Brisbane, Australia; 4Inserm, Villejuif, France; 5Inserm U1138, Paris, France; 6University of Oulu, Oulu, Finland; 7Baker Heart and Diabetes Institute, Melbourne, Australia Multi-omics data hold enormous potential for personalised medicine; however, analysis of high-dimensional data poses challenges. Type 2 diabetes (T2D) is a global health burden that will benefit from personalised risk prediction and tracking of disease progression. We aimed to identify longitudinal predictors of glycaemic traits relevant for T2D by applying machine learning (ML) approaches to multi-omics, including epigenetic and metabolomic data, from the Northern Finland Birth Cohort 1966 (NFBC1966) at 31 (T1) and 46 (T2) years. We predicted fasting glucose/insulin (FG/FI), glycated haemoglobin (HbA1c) and 2-hour glucose/insulin (2hGlu/2hIns) at T2 in 513 individuals using 1,001 anthropometric, metabolic, metabolomic and epigenetic variables at T1 and T2. We used six ML approaches trained on 80% and tested on 20% of the data: Boosted trees (BT), Random forest (RF) and support vector regression (SVR) with Linear Kernel with L2 regularization and with L1 and L1/L2 loss functions (SVR-L2Linear-L1, SVR-L2Linear-L1L2, respectively), with Polynomial Kernel (SVR-Polynomial) and with Radial Basis function Kernel (SVR-RBF). We further validated our models trained in NFBC1966 in an independent French study with 48 matching predictors (DESIR, N=769, age range 30-65 years at recruitment, interval between data collections: 9 years). RF and BT showed consistent performance while SVMs struggled with higher-dimensional data. The predictions worked best for FG and FI. T2 branched-chain amino acids, HDL-cholesterol and body measurements already at T1 were amongst the most important predictors. Addition of methylation data did not improve the predictions; however, BMI-associated methylation probes T1_cg26361535 at ZC3H3/T1_cg00634542 at SLC11A1 were within the top 5% of predictors of FI/FG variability. With ML we could narrow down hundreds of variables into clinically relevant sets of predictors for each glycaemic trait and demonstrate the importance of longitudinal traits in prediction. Keywords: multi-omics, machine learning, prediction, type 2 diabetes.

Page 18: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

17

TROUBLES, CHALLENGES AND DELIGHTS IN AREA OF OMICS-DATA ANALYSES FROM STATISTICIAN VIEWPOINT Presenting author: Tanel Kaart Chair of Animal Breeding and Biotechnology, Institute of Veterinary Medicine and Animal Sciences, Estonian University of Life Sciences, Tartu, Estonia The omics-data or omics-like data are nowadays everywhere. At least in statistician/data scientist perspective. The main characteristics of omics-data are the huge amount of differently structured data and the complex unknown relationships in data. But such data are collected also in finance, linguistics, ecology, food science, neuroscience etc. Of-course, in biomedical sciences there are lot of less or more specific problems, which in present abstract are considered just technical (bioinformatics) problems. So, let’s assume, that the game starts when the statistician gets the database and is asked to give some results. The first challenge can be the ability to handle large databases. But if we omit this, then the next question will be the choice of analysis method. You can apply something traditional – principal component analysis, cluster analysis, random forest etc. And you can be successful. Successful, because the database structure and the research question were simple enough and the results are clear and logical enough. However, statistician should know, how easy you can overestimate the effects, relationships and patterns, if the dataset is not balanced, if there exist strongly related variables and/or dependent observations, if there are lot of zeros, if the number of variables is much higher than the number of observations, if the accuracy is estimated on the training dataset etc. And statistician is usually the only person in workgroup who can imagine that something in results can be not real. So, before analysis statistician should understand the background of the data and research question, should have knowledge about the nature of variables (for example, are they expressing absolute or relative values?) and study objects (are all variables in different tables measured on the same study objects?), and must have an overview about the distribution of variables (are some variables skewed or censored, what is the frequency of predicted event?). And then statistician can start to play. Because at least with omics-like data, there is not only one correct solution. It is good to have result of different methods and algorithms, as consistent results can indicate something real (something, what is robust enough to appear despite the analysis method). And there are lot of other things to consider: what is the research question – to find patterns or to find patterns distinguishing groups, is the accuracy the whole truth, p-value – why not, but why, how to visualize the results, etc.? Finally is the statistician only, who is able to make conservative but still meaningful conclusions and is able to argue the applied methodology :) Keywords: omics-data, data structure, machine learning, reliability.

Page 19: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

18

EFFECTS OF SIMPLE AND ADAPTIVE INTERVENTIONS IN THE STOCHASTIC SYSTEM APPROACH TO CAUSALITY Presenting author: Daniel Commenges INSERM 1219 and INRIA / Bordeaux, France Co-authors: Mélanie Prague INSERM 1219 and INRIA / Bordeaux, France We consider the problem of defining the effect of an intervention on a time-varying risk factor or treatment for a disease or a physiological marker; we develop here the latter case. So, the system considered is (Y, A, C), where Y={Y(t)}, is the marker process of interest, A={A(t)} the treatment (assumed to take values 0 or 1) and C a potential confounding factor. The marker process Y has a Doob-Meyer decomposition which specifies the ``physical law'' and cannot be changed (Commenges and Gégout-Petit, 2015). Y lives in continuous time but can be observed only at discrete times with a measurement error. A realistic case is that the treatment can be changed only at discrete times, according to a probability law given the past observations. In an observational study the treatment attribution law is unknown; however, the physical law can be estimated without knowing the treatment attribution law, provided a well specified model is available. An intervention is specified by the treatment attribution law, which is thus known. Simple interventions will simply randomize the attribution of the treatment; interventions that take into account the past history will be called ``strategies''. The effects of interventions can be defined by risk functions and contrasts between risk functions for different strategies can be formed. Once we can compute effects for any strategy, we can search for optimal or sub-optimal strategies; in particular we can find optimal parametric strategies. We present several ways for designing strategies. As an illustration, we consider the choice of a strategy for containing the HIV load below a certain level while limiting the treatment burden. A simulation study demonstrates the possibility of finding optimal parametric strategies. Keywords: causality, interventions, stochastic processes, strategies. References: Commenges D. and Gégout-Petit A. (2015). The stochastic system approach for estimating dynamic treatment effects. Lifetime Data Analysis 21, 561-578.

Page 20: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

19

CAUSAL KINETIX: LEARNING STABLE STRUCTURES IN KINETIC SYSTEMS Presenting author: Niklas Pfister Swiss Federal Institute of Technology, Switzerland Learning kinetic systems from data is one of the core challenges in many fields. Efficient computational methods to identify a robust underlying model from data are essential for the extrapolation and generalization capabilities of data driven modeling approaches. We introduce Causal KinetiX, a novel framework for causal kinetic models used to identify structure in complex (heterogeneous, noisy and time-based) data. The algorithm only assumes the existence of invariant properties of the kinetic model and is based on a combination of smoothing techniques and model based structure search. The results on both simulated and real-world examples suggests that learning the structure of kinetic systems indeed benefits from a causal perspective. The talk is based on joint work with Stefan Bauer and Jonas Peters. It does not require prior knowledge on causality or kinetic systems.

Page 21: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

20

BASKET AND UMBRELLA TRIALS FOR PRECISION MEDICINE: OVERVIEW, STATISTICAL CONSIDERATIONS, AND EXAMPLES

Presenting author: Lindsay A. Renfro, PhD Division of Biostatistics, University of Southern California, USA Within the field of clinical cancer research, discovery of biomarkers and genetic mutations that are potentially predictive of treatment benefit are motivating a paradigm shift in how cancer clinical trials are conducted. In this talk, I will provide an overview of basket and umbrella trials, which are increasingly popular clinical trial design solutions for the study of novel targeted agents. For each, I will describe standardized terminology and definitions, discuss advantages and limitations from the statistical and practical viewpoints, and provide a detailed real-world example with statistical details. Keywords: clinical trial, basket trial, umbrella trial, master protocol, biomarker trial. References:

1. Renfro, L.A., Mandrekar, S.J. (2018). Definitions and statistical properties of master protocols for personalized medicine in oncology. Journal of Biopharmaceutical Statistics, 28(2), 217–228.

2. Renfro, L.A., Sargent, D.J. (2017). Statistical controversies in clinical research: basket trials, umbrella trials, and other master protocols: a review and examples. Annals of Oncology, 28(1), 34–43.

3. Renfro, L.A., An, M., Mandrekar, S.J. (2017). Precision oncology: a new era of cancer clinical trials. Cancer Letters, 387, 121–126.

4. Renfro, L.A., Mallick, H., An, M., Sargent, D.J., Mandrekar, S.J. (2016). Clinical trial designs incorporating predictive biomarkers. Cancer Treatment Reviews, 43, 74–82.

Page 22: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

21

HUDSON ”THE PLATFORM STUDY” Presenting author: Steven Fox AstraZeneca R&D, United Kingdom The talk will provide the opportunity to present AstraZeneca’s high-profile HUDSON platform. It is an open-label, multi-drug, biomarker-directed, multi-centre phase II platform study in patients with non-small cell lung cancer, who progressed on a platinum doublet chemotherapy and Immuno-Oncology therapy. It will provide an overall of the internal decision-making framework and a comprehensive discussion of the key challenges experienced, with a focus on those encountered during pre-study set-up, during the study, with statistical reporting and the interpretation of the results. It will close, with a summary of final thoughts regarding HUDSON – the platform study.

Page 23: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

22

PLATFORM DESIGNS BEYOND ONCOLOGY: BIOMARKER FINDING DESIGN IN PATIENTS WITH HEART FAILURE WITH PRESERVED EJECTION FRACTION

Presenting author: Sofia Tapani Early Clinical Biometrics, IMED, AstraZeneca R&D, Gothenburg, Sweden Adapting a portfolio approach to the implementation of clinical trials at the early stage has been evaluated within the oncology therapy area. This feature of clinical trial design can also add value to other therapy areas due to its potential exploratory nature. The platform design allows for multi-arm clinical trials to evaluate several experimental treatments perhaps not all available at the same point in time. At the early clinical development stage, new drugs are rarely at the same stage of development. The alternative, several separate two-arm studies is time consuming and can be a bottle neck in development due to budget limitations in comparison to the more efficient platform study where arms are added at several different time points after start of enrollment. Platform designs within the heart failure therapy area in early clinical development are exploratory of nature. Clear prognostic and predictive biomarker profiles for disease are not available and need to be explored to be identified for each patient population. As an example, we’ll have a look at the HIDMASTER trial design for biomarker identification and compound graduation throughout the platform. All platform trials need to be thoroughly simulated, and simulations should be used as a tool to decide among design options. Simulations of platform trials gives the opportunity to investigate many scenarios including null scenario to establish overall type I error. We can evaluate bias estimation and sensitivity to patient withdrawals, missing data, enrolment rates/patterns, interim analysis timings, data access delays, data cleanliness, analysis delays, etc. Simulations should also comprise decision operating characteristics to be able to make decisions on the design based on the objective of the trial: early stops of underperforming arms, early go for active arms, prioritise arms on emerging data or drawing insights from whole study data analysis. Over time the trial learns about the disease, new endpoints, stratification biomarkers and prognostic vs predictive effects. Keywords: platform trials, basket trial, innovative trial design, clinical trial simulation, biomarker-finding.

Page 24: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

23

REGION-BASED AND VOXEL-WISE ANALYSIS OF MEDICAL IMAGES USING LATENT VARIABLES Presenting author: Brice Ozenne Section of Biostatistics, University of Copenhagen, Denmark; Neurobiology Research Unit, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark. Co- authors: Esben Budtz-Jørgensen1, Martin Nørgaard2

1Section of Biostatistics, University of Copenhagen, Copenhagen, Denmark; 2Neurobiology Research Unit, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark. Investigating the relationship between a 3-dimensional signal to one or several exposures is a common problem in neuroscience. For instance, studies on seasonal depression would like to assess how the environment affects the serotonin brain system using Positron Emission Tomography (PET) measurements of the density of serotonin receptors over the whole brain. This type of studies is typically characterized by a small sample (n<50), a very large number of measurements per subject, and a complex correlation structure between these measurements. In this talk, we present two strategies that are often used in our research unit (NRU - https://nru.dk/) to perform such study. Both strategies introduce latent variables as a way to perform dimension reduction, account for the spatial correlation, and possibly identify underlying biological mechanisms. The first strategy is performed at the regional level, i.e. expert-knowledge is used to identify a set of relevant regions and the average signal of each region is estimated. Then a Latent Variable Model (LVM) is used to relate the regional signal to the clinical variables. The second strategy performs a voxel-wise analysis using Partial Least Squares (PLS). After briefly describing LVMs and PLS, we will discuss techniques that we use to specify these models and to perform statistical inference. Keywords: latent variables, multivariate models, dimension reduction, neuroimaging.

Page 25: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

24

ADAPTIVE SMOOTHING DATA FROM MULTI-PARAMETER MAPPING Presenting author: Karsten Tabelow Weierstrass Institute for Applied Analysis and Stochastics, Berlin, Germany Most MRI neuroimaging studies in the last 25 years utilized classical weighted imaging sequences with enhanced contrast between tissue with different, T1 or T2 relaxation times. However, the acquired signals in such weighted images are given in arbitrary rather than physical units and strongly depend on the acquisition details. This renders comparison across time points in longitudinal studies or between subjects in multi-site experiments difficult. Thus, in recent years the interest in quantitative MRI which infers on quantities with physical interpretation has increased. One example is diffusion MRI focusing on the directionally dependent water diffusion constant. Another example is the Multi-Parameter Mapping (MPM) sequence, which is able to produce maps of the R1 or R2* relaxation rates, the proton density, and the magnetization transfer. As all imaging modalities MPM suffers from noise. In this talk I will introduce the analysis pipeline of MPM data (Tabelow et al., 2019), and propose a new structural adaptive noise reduction method (Mohammadi et al., 2017) that is able to reduce the noise in the maps without blurring important details in the images. I will also discuss a bias correction method that is necessary due to the very low signal-to-noise ratio in the data (Tabelow et al., 2017). Keywords: quantitative MRI, Multi-Parameter Mapping, structural adaptive smoothing, bias correction. References:

1. Tabelow et al. (2019) hMRI – A toolbox for quantitative MRI in neuroscience and clinical research, NeuroImage, 194, 191-201, doi: 10.1016/j.neuroimage.2019.01.029

2. Mohammadi et al. (2017) Simultaneous adaptive smoothing of relaxometry and quantitative magnetization transfer mapping, Preprint no. 2432, WIAS, Berlin, doi: 10.20347/WIAS.PREPRINT.2432

3. Tabelow et al. (2017) Removing the estimation bias due to the noise floor in multi-parameter maps, The International Society for Magnetic Resonance in Medicine (ISMRM) 25th Annual Meeting & Exhibition, Honolulu, USA, April 22 - 27, 2017.

Page 26: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

25

CHANGING DATA-ANALYSIS REGIMES IN BIG BIOMEDICAL DATA Presenting author: Julius Kernbach RWTH Aachen University, Germany Biomedical datasets are rapidly growing in information granularity, sample size, and the complexity of meta-information. These emerging opportunities open always more biological and medical fields to data-led research programs that leverage advanced machine-learning techniques. In several examples in the area of population brain-imaging, I will explore and discuss how long-standing neuroscience questions can be revisited, reformulated as a pattern-learning problem, and new insights be translated back into the application domain. Special attention will be devoted to i) how discovery of salient structure in high-dimensional data may be directly integrated with achieving accurate predictions at the single-subject level, ii) how such individualized predictions may be complementary to statistically significant group difference foundational for evidence-based medicine, and iii) how recent extensions of classical analysis methods can jointly appreciate data from different levels of observation. Machine learning is a core technology that offers new strategies for generating knowledge from current and future large-scale datasets to extent our understanding of and enable rigorous data-guided decisions about human biology and disease trajectories.

Page 27: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

26

A BAYESIAN GENERAL LINEAR MODELING APPROACH TO CORTICAL SURFACE FMRI DATA ANALYSIS

Presenting author: David Bolin Department of Mathematical Sciences, University of Gothenburg, Gothenburg, Sweden Co-authors: Amanda F. Mejia1, Yu Ryan Yue2, Finn Lindgren3, Martin A. Lindquist4

1Indiana University, Bloomington, IN 47405; 2Baruch College, The City University of New York, New York, NY 10010; 3The University of Edinburgh, Edinburgh, UK; 4Johns Hopkins University, Baltimore, MD 21205. Cortical surface fMRI (cs-fMRI) has recently grown in popularity versus traditional volumetric fMRI, as it allows for more meaningful spatial smoothing and is more compatible with the common assumptions of isotropy and stationarity in Bayesian spatial models. However, as no Bayesian spatial model has been proposed for cs-fMRI data, most analyses continue to employ the classical, voxel-wise general linear model (GLM). Here, we propose a Bayesian GLM for cs-fMRI, which employs a class of spatial processes based on stochastic partial differential equations to model latent activation fields. Bayesian inference is performed using integrated nested Laplacian approximations (INLA), which is a computationally efficient alternative to Markov Chain Monte Carlo. To identify regions of activation, we propose an excursions set method based on the joint posterior distribution of the latent fields, which eliminates the need for multiple comparisons correction. Finally, we address a gap in the existing literature by proposing a Bayesian approach for multi-subject analysis. The methods are validated and compared to the classical GLM through simulation studies and a motor task fMRI study from the Human Connectome Project. The proposed Bayesian approach results in smoother activation estimates, more accurate false positive control, and increased power to detect truly active regions.

Keywords: spatial statistics; smoothing; integrated nested Laplace approximation; stochastic partial differential equation; brain imaging.

Page 28: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

27

THE MECHANISTIC-STATISTICAL APPROACH APPLIED TO SPATIO-TEMPORAL POPULATIONS DYNAMICS AT DIFFERENT SCALES Presenting author: Samuel Soubeyrand French National Institute for Agricultural Research, Avignon, France Population dynamics can be described as the study of the structure, the pattern and the drivers of populations. They are carried out from the microscopic to the global scales and are of particular interest in ecology and epidemiology. Numerous approaches have been proposed to mathematically model and statistically infer spatio-temporal population dynamics. In this talk, I will present the mechanistic-statistical approach, which is a framework embedded in state-space modeling and which combines a mechanistic vision of the dynamics, a probabilistic vision of the observation processes and a statistical method for the estimation of parameters and latent processes. I will illustrate the application of this framework with different formalisms (e.g. point processes, partial differential equations…), in diverse settings and at diverse scales and resolutions. One of the applications will deal with the dynamics of Xylella fastidiosa in Corsica, France, a quarantine plant pathogen recently discovered in Europe. Keywords: parameter estimation, partial differential equations, population dynamics models, spatio-temporal point processes, Xylella fastidiosa. References:

1. Abboud C., Bonnefon O., Parent E., Soubeyrand S. (2018). Dating and localizing an invasion from post-introduction data and a coupled reaction-diffusion-absorption model. arXiv preprint arXiv:1808.00868.

2. Abboud C., Senoussi R. and Soubeyrand S. (2018). Piecewise-deterministic Markov processes for spatio-temporal population dynamics. In Azaïs R., Bouguet R. (Eds). Statistical Inference for Piecewise-deterministic Markov Processes. John Wiley & Sons.

3. Soubeyrand S., de Jerphanion P., Martin O., Saussac M., Manceau C., Hendrikx P., Lannou C. (2018). Inferring pathogen dynamics from temporal count data: the emergence of Xylella fastidiosa in France is probably not recent. New Phytologist 219: 824-836.

4. Soubeyrand S., Roques L. (2014). Parameter estimation for reaction-diffusion models of biological invasions. Population Ecology 56: 427-434.

Page 29: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

28

STATISTICAL MODELING OF ANIMAL TELEMETRY DATA AT MULTIPLE TEMPORAL RESOLUTIONS: HIDDEN MARKOV MODELS AND EXTENSIONS

Presenting author: Timo Adam Bielefeld University, Bielefeld, Germany Hidden Markov models are prevalent in the field of animal movement modeling, where they are widely used to infer behavioral modes from various types of telemetry data. In its basic form, a hidden Markov model comprises a single observed movement process that is driven by a single hidden state process, the latter of which is typically linked to behavioral modes such as resting, foraging, or traveling. In ecological applications, it is of particular interest to model the effect of individual and environmental variables on state occupancy. To allow for meaningful inference, the observations to be modeled need to be equally spaced in time, implying that all variables need to be collected at the same temporal resolution (e.g. hourly, dive-by-dive, or every second). However, recent advances in bio-logging technology have led to a variety of novel telemetry sensors which often collect data from the same individual at different time scales. Typical examples are step lengths obtained from GPS tags every hour, dive depths obtained from time-depth recorders once per dive, or overall dynamic body accelerations obtained from accelerometers several times per second. This offers the opportunity to jointly model observations from different data sources at multiple time scales, in particular to minimize the effect of the often arbitrarily chosen time intervals between consecutive observations, and ultimately to draw a much more comprehensive picture of an animal's behavior. To account for the differing temporal resolutions across variables, hierarchical hidden Markov models are presented, where the observations are regarded as stemming from several, connected hidden state processes, each of which operates at the time scale at which the corresponding variables were observed. The suggested approach is illustrated by jointly modeling daily horizontal and 10-minute vertical movements of an Atlantic cod (Gadus morhua) throughout the English Channel, where it is demonstrated how the proposed framework can be used to infer different kinds of behaviors, some of which operate at relatively crude and others operate at relatively finer time scales, respectively. Keywords: animal movement modeling, statistical ecology, temporal resolution, time series modeling. References:

1. Adam, T., Griffiths, C.A., Leos-Barajas, V., Meese, E.N., Lowe, C.G., Blackwell, P.G., Righton, D., and Langrock, R. (2019): Joint Modelling of Multi-Scale Animal Movement Data Using Hierarchical Hidden Markov Models. Available on request.

2. Leos-Barajas, V., Gangloff, E.J., Adam, T., Langrock, R., van Beest, F.M., Nabe-Nielsen, F., and Morales, J.M. (2017): Multi-scale Modeling of Animal Movement and General Behavior Data Using Hidden Markov Models with Hierarchical Structures. Journal of Agricultural, Biological, and Environmental Statistics 22 (3), 232-248.

Page 30: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

29

INTENSITY MODEL BASED ON MULTIDIMENSIONAL SMOOTHING OF HAZARD FUNCTIONS: PROJECTING HEALTHY LIFE YEARS Presenting author: Tommi Härkänen Department of Public Health Solutions, National Institute for Health and Welfare, Helsinki, Finland

Co-authors: Laura Sares-Jäske, Paul Knekt, Markku Peltonen, Seppo Koskinen Department of Public Health Solutions, National Institute for Health and Welfare, Helsinki, Finland Projections of healthy and diseased life years are needed to illustrate the importance of risk factors, and to estimate the disease burden. Often the risk factors of morbidity are also risk factors of mortality. Multiple timescales induced by age and time since onset of a disease can have a joint effect on the risk of death, and possible interactions cannot be handled using the Cox and Poisson regression models assuming multiplicative hazards. We apply Bayesian intensity models based on one-dimensional piecewise constant hazard functions. They are parameterized by jump points and the corresponding hazard levels between them. Multidimensional smoothing is incorporated in the prior distributions of the hazard levels both within and between the hazard functions of neighboring strata (Härkänen et al. 2017). Posterior distributions are calculated numerically using the reversible jump Markov chain Monte Carlo methods to allow addition and deletion of jump points. Projected healthy and diseased life years are calculated using the predictive distribution of disease onset and death times. The Finnish Mobile Clinic, Mini-Finland and Health 2000 Surveys provide baseline risk factor data on smoking, alcohol use, body mass index, a diet quality index and physical exercise. These data have been individually linked with register-based follow-up data on selected chronic disease onset times and time of death. Multidimensional smoothing provided flexible modeling approach to account for possible interactions both between the risk factors and changes in the hazard functions during the follow-up. Keywords: time-to-event data, Cohort study, Bayesian inference, nonparametric methods. References: Härkänen, T., But, A., & Haukka, J. (2017). Non‐parametric Bayesian Intensity Model: Exploring Time‐to‐Event Data on Two Time Scales. Scandinavian Journal of Statistics, 44(3), 798-814.

Page 31: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

30

CHALLENGES OF SURVIVAL ANALYSIS IN POPULATION-BASED BIOBANK COHORTS Presenting author: Krista Fischer Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia; Estonian Genome Center, Institute of Genomics, University of Tartu, Estonia As not only the sample size, but also follow-up time in large-scale population-based biobank cohorts has increased considerably during past decade, survival analysis methodology has found increased usage for those cohorts. However, there are different challenges due to some specific features of such data as well as the research questions of interest. We discuss some of them and illustrate the issues using the data of the Estonian Biobank. Especially for the analysis of time to death, one should account for left-truncation in the data, as the biobank cohorts are usually not birth cohorts, but individuals at various ages have joined them. However, we will show that proper adjustment for left-truncation leads to unbiased estimates of hazard ratios (in Cox proportional hazards models), but reduces power, when compared to the approaches that ignore it. Thus we use simulations to understand, whether and when a biased estimate could still be recommended for such data. As the sizes of the largest biobank cohorts exceed 100000, the algorithm for fitting of the conventional Cox proportional hazards model is relatively slow. We have proposed a 2-step martingale residual-based approach to reduce the analysis time of genome-wide association studies (Joshi et al. 2016). We study the performance of this approach under different scenarios to understand its limitations. Finally, the challenges of absolute risk prediction using genomic data will be briefly discussed. Keywords: survival analysis, genome-wide association studies. References: Joshi, PK; Fischer, K; Schraut, KE; Campbell, H; Esko, T; Wilson, JF (2016). Variants near CHRNA3/5 and APOE have age- and sex-related effects on human lifespan. Nature communications, 7 (11174), 11174−11174.10.1038/ncomms11174.

Page 32: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

31

COMBINING PARENTAL AND OFFPRING DATA FOR SURVIVAL ANALYSIS IN POPULATION-BASED BIOBANK COHORTS Presenting author: Merli Mändul1, 2

Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia; Estonian Genome Center, Institute of Genomics, University of Tartu, Estonia Co-authors: Krista Fischer1,2, Märt Möls1

1Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia; 2Estonian Genome Center, Institute of Genomics, University of Tartu, Estonia Search for genomic predictors of lifespan has been the focus of many studies so far. In recent years, this question has been explored by genome-wide association studies of population-based biobank cohorts. Although the biobank cohorts are often very large, those studies may still lack power, due to a relatively short follow-up time. Therefore, it has been proposed to use parental survival times as outcomes in such analyses, whereas offspring genotypes are used as covariates. This is justified as individuals inherit their genotypes from their parents – if a risk-increasing genetic variant is present in the genotype of one individual, it should also be present in the genotype of one of the parents. Thus intuitively, if a considerable proportion of the parents of the biobank participants are dead (and their survival times are known), a test for association between parental survival and offspring genotype may have more power than a survival analysis in the offspring alone. We will show that in case the Cox proportional hazards model holds for the genotype-survival association in parents, it does, however, not hold when the offspring genotypes are used as covariates. As the approach is still valid for testing the null hypothesis of no association, we use simulations to explore the magnitude of bias in parameter estimates in more detail. We also compare power of the approaches that uses parental survival data with the analysis that only uses the data of biobank participants. The methodology will be illustrated using the Estonian Biobank data. Keywords: survival analysis, genome-wide association studies.

Page 33: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

32

EQUIVALENCE, SIMILARITY AND COMPARABILITY: HOW EQUAL DO WE WANT THE OUTCOMES TO BE? Presenting author: Martin Otava Manufacturing and Applied Statistics, Janssen Pharmaceutical Companies of Johnson & Johnson, Prague, Czech Republic Co-authors: Francisca Galindo Garre Manufacturing and Applied Statistics, Janssen Pharmaceutical Companies of Johnson & Johnson, Leiden, The Netherlands The task of demonstrating some form of similarity (called also comparability, equivalence, homogeneity, etc.) between two samples of certain properties (e.g. two different measurement methods) appears across wide range of fields, but the practical solutions are often subject to confusion and misinterpretation. This presentation will start with basic misconceptions, such as erroneous usage of difference testing for equivalence-based scientific questions and disregarding the role of paired observations. The main focus will be put on fundamental difference between population level statements (mean, SD or interval similarity) and individual level statements (single measurement similarity). With respect to population level approaches, the role of prediction and tolerance intervals will be explored with emphasis on benefits and pitfalls of Bayesian representation. In terms of individual level similarity, besides frequentist paired differences evaluation, the Bayesian modelling framework will be leveraged to construct relative measures of similarity under various scenarios via translating posterior individual error distribution into probabilistic statements. The case studies will focus on pharmaceutical manufacturing applications, where similarity-related questions appear in contexts such as demonstration of no impact of process change, lab or equipment comparability, method validation etc. Keywords: equivalence, comparability, similarity, Bayesian modelling, distribution comparison.

Page 34: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

33

TESTING EQUIVALENCE OF SURVIVAL BEFORE BUT NOT AFTER END OF FOLLOW-UP Presenting author: Christian Bressen Pipper Biostatistics 1, LEO Pharma A/S, Ballerup, Denmark Co-authors: Julie K. Furberg1 ,Thomas H. Scheike2 1Biostatistics semiglutide s.c., Novo Nordisk A/S, Copenhagen, Denmark; 2Section of Biostatistics, University of Copenhagen, Copenhagen, Denmark For equivalence trials with survival outcomes, a popular testing approach is the elegant test for equivalence of two survival functions suggested by Wellek (1993). This test evaluates whether or not the difference between the true survival curves is practically irrelevant by specifying an equivalence margin on the hazard ratio under the proportional hazards assumption. However, this approach is based on extrapolating the behavior of the survival curves to the whole time axis, whereas in practice survival times are only observed until the end of follow-up. We propose a modification of Welleks test that only addresses equivalence until end of follow-up and derive the large sample properties of this test. We compare our suggestion and highlight some of differences to the traditional Wellek equivalence test by simulation. Keywords: equivalence testing, survival data, cox model, proportional hazards assumption. References: Wellek, S. (1993). A log-rank test for equivalence of two survivor function. Biometrics 49, 877-881.

Page 35: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

34

A NOVEL JOINT MODELLING APPROACH TO ESTIMATING TREATMENT EFFECTS ON COPD EXACERBATIONS IN THE PRESENCE OF DIFFERENTIAL DISCONTINUATIONS

Presenting author: Alexandra Jauhiainen BioPharma Early Biometrics and Statistical Innovation, Data Science & AI, R&D BioPharmaceuticals, AstraZeneca, Gothenburg, Sweden

Co-authors: Agnieszka Król 1,2, Robert Palmér 2, Virginie Rondeau 3, Ulf Eriksson 2 1 BioPharma Early Biometrics and Statistical Innovation, Data Science & AI, R&D BioPharmaceuticals, AstraZeneca, Gothenburg, Sweden; 2 Quantitative Clinical Pharmacology, Clinical Pharmacology and Safety Sciences, R&D BioPharmaceuticals, AstraZeneca, Gothenburg, Sweden; 3 Biostatistics Team, INSERM CR1219, University of Bordeaux, Bordeaux, France. COPD clinical trials aimed at evaluating long-term treatment effects on exacerbations often suffer from a high rate of patient discontinuations. Discontinuations should be considered in the statistical evaluation of study results, especially when differing between treatment arms. We aimed to quantify the association between COPD exacerbation and discontinuation risks, and to evaluate the impact of this association on exacerbation treatment effect estimates, using a joint frailty model approach [1]. A model describing the hazards of recurrent episodes of exacerbations and early discontinuations was developed. The two risk processes were coupled using a gamma distributed shared random effect (frailty), where the effect of the frailty in the discontinuation hazard is scaled using an association parameter (α):

𝑟𝑒𝑥,𝑖𝑗(𝑡|𝑢𝑖) = 𝑌𝑖𝑟(𝑡) ∗ 𝑢𝑖 ∗ 𝑟0(𝑡) ∗ exp(𝒙𝑒𝑥,𝑖𝑗

𝑇 ∙ 𝜷𝑒𝑥) 𝜆𝑒𝑑,𝑖(𝑡|𝑢𝑖) = 𝑢𝑖

𝛼 ∗ 𝜆0(𝑡) ∗ exp(𝒙𝑒𝑑,𝑖𝑇 ∙ 𝜷𝑒𝑑)

Here, 𝑟𝑒𝑥,𝑖𝑗 and 𝜆𝑒𝑑,𝑖 are the patient-specific hazards for recurrent exacerbations (indexed by j) and early discontinuation, respectively. The variable 𝑢𝑖 denotes the frailty for patient i, and the 𝒙s and 𝜷s are the covariates and their related regression coefficients. 𝑟0 and 𝜆0 denote the population baseline hazards and 𝑌𝑖

𝑟 is the at-risk process for patient i. The joint frailty model was applied both to simulated data and to data from randomized controlled trials in patients with moderate to severe COPD [2-4] and compared to standard models. Early discontinuations were defined as any discontinuation of investigational product before the predefined end of study, irrespective of reason. For clinical trial data, significant (p<1-4) and similar associations between exacerbations and discontinuations were found in all trials. The differences in treatment effect estimates between the joint frailty model and simpler models ranged from 1-8 percentage points. The use of a joint frailty modelling approach can reduce bias and improve precision in treatment effect estimates in the presence of informative censoring. Keywords: COPD, early discontinuation, joint frailty model, recurrent events, survival analysis. References:

1. Król et al., J Stat Softw 2017;81(3):1-52, 2. Rennard et al., Drugs 2009;69(5):549-565, 3. Sharafkhaneh et al., Respir Med 2012;106(2):257-268, [ 4. Tashkin et al., Drugs 2008;68(14):1975-2000

Page 36: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

35

A TWO-LEVEL MATCHING ALGORITHM FOR A MULTI-CENTER CASE-CONTROL STUDY USING REGISTRY DATA

Presenting author: Benjamin Mayer Institute of Epidemiology and Medical Biometry, Ulm University, Ulm, Germany Background: Lacking structural equality is a major issue to be addressed in observational studies. On the contrary, their advantages against randomized controlled trials are often reduced efforts in data collection as well as more realistic effect estimates due to an increased external validity. Numerous approaches have been developed to account for covariates which may be unequally distributed in comparison groups, including multiple regression, subgroup analysis, and matched case-control designs. The latter has been often described as a useful tool if extensive control data sets are available, i.e. the pool of possible controls for each case is comparatively large. Methods: A two-level matching algorithm is presented which enables to conduct a multi-centric case-control study. In particular, the algorithm includes the possibility to define the matching strategy as a combination of an exact matching approach and a subsequent consideration of further matching variables to be controlled by means of any distance measure, e.g. Euclidean distance or propensity score. The presented algorithm is applied to a case-control-based study on the treatment effect of an anti-leukemic drug using different registries as source data. Furthermore, a concept is presented to evaluate the quality of the applied matching. Results: Applying the presented matching algorithm to the demographic and clinical data revealed well-balanced comparison groups in a 1:2 ratio (cases-controls), whereas most important covariates associated with an acute myeloid leukemia were considered as first level (exact) matching criteria, and the distribution of further covariates was controlled by means of a propensity score matching (second level matching). In general, the quality of the matching and thus the interval validity were found satisfactory regarding all matching variables used. No statistically significant treatment effect could be demonstrated. Discussion: This two-level matching algorithm is a very flexible and useful approach to deal with the aim of finding comparable cases and controls in observational data. It is able to increase structural equality by means of balancing the most important covariates which might be of different importance for the matching process. It has been implemented in an object-oriented manner in the statistical software SAS (version 9.4) which offers therefore high flexibility regarding its application to various data analysis projects. The application of the presented algorithm in the course of a clinical evaluation of the therapeutic effect of an anti-leukemic drug was approved by the European Medicines Agency. This demonstrates the acceptance of the use of observational data in clinical studies assuming that respective measures are taken towards a maximal structural equality of comparison groups. Keywords: case-control study, euclidean distance, matching, propensity socre.

Page 37: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

36

EXPLORING THE INDIVIDUALLY-RANDOMISED STEPPED WEDGE DESIGN FOR TRIALS WHERE ALL PATIENTS EVENTUALLY RECEIVE THE INTERVENTION Presenting author: Inge Christoffer Olsen Research Support Services CTU/OCBE, Oslo University Hospital, Oslo, Norway Co-authors: Morten Wang Fagerland, Marissa LeBlanc, Corina Rueegg, Morten Valberg Research Support Services CTU/OCBE, Oslo University Hospital, Oslo, Norway Context: When planning a randomised controlled trial (RCT) to assess the effect of losartan (the intervention) on patients with glioblastoma, we faced interesting study design challenges. Glioblastoma is a rare disease, resulting in a small target population and ethical reasons require that all patients eventually receive the intervention during the study. The stepped wedge cluster RCT design is an increasingly used method to evaluate e.g. hospital policy interventions. Each cluster (usually hospital or medical centre) is randomised to initiate the intervention at different timepoints. Over time, all clusters contribute to observations under both control and intervention. Objective: To investigate the efficiency of modifying the stepped wedge cluster RCT design to randomize individuals instead of clusters. Methods: In the case of the losartan-glioblastoma trial, only one treatment centre is available and cannot be cluster randomised. We therefore modified the usual stepped wedge design to randomize individual patients instead of clusters sequentially, a so called individually-randomised stepped-wedge design. Simulations were used to compare the power of the new design to that of the standard parallel group design, under different sample-size calculation assumptions and design options. Results: The required sample size for the new design was 50-60% lower than that of the standard parallel group design, confirming the analytical results by 2. The efficacy gain compared to the parallel design depended on the intra-subject correlation coefficient. Conclusions: The individually-randomised stepped wedge design can be an efficient alternative to other individually-randomised designs. It is particularly useful when there is an ethical requirement that all patients receive the intervention. Keywords: randomised clinical trials, stepped wedge design. References:

1. Hemming K, Haines TP, Chilton PJ, Girling AJ, Lilford RJ. The stepped wedge cluster randomised trial: rationale, design, analysis, and reporting. BMJ. 2015;350(feb06 1):–h391. doi:10.1136/bmj.h391.

2. Hooper R, Teerenstra S, de Hoop E, Eldridge S. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials. Statistics in medicine. 2016;35(26):4718-4728. doi:10.1002/sim.7028.

Page 38: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

37

PREVALENCE FORECASTING BASED ON MULTIPLE IMPUTATION Presenting author: Jaakko Reinikainen Department of Public Health Solutions, National Institute for Health and Welfare, Helsinki, Finland Co-authors: Tommi Härkänen, Hanna Tolonen Department of Public Health Solutions, National Institute for Health and Welfare, Helsinki, Finland Information on future development of prevalences of risk factors and health indicators is needed to prepare for the forthcoming burden of disease in the population and to allocate resources properly for prevention (Soyiri & Reidpath, 2013). We present a flexible yet relatively easy-to-use forecasting method based on simulation of future observations by multiple imputation. The proposed approach uses data on repeated cross-sectional surveys from different years. We create future samples with age and sex distributions corresponding to the official national population forecasts. Then, the risk factors are simulated using multiple imputation by chained equations, also known as fully conditional specification (Van Buuren, 2007). Finally, the imputations are pooled to obtain the prevalences of interest. Covariates, such as sociodemographic variables, as well as their possible interactions and non-linear terms can be included in the modeling. The future development of these covariates is also forecast simultaneously. We apply the procedure to data from five Finnish health examination surveys conducted between 1997 and 2017 and forecast the prevalences of obesity, smoking and hypertension to 2020 and 2025. We also assess the accuracy of the forecasts and discuss the strengths and limitations of the proposed approach. Keywords: prevalence, risk factors, forecasting, microsimulation, multiple imputation. References:

1. Soyiri, I. N., & Reidpath, D. D. (2013). An overview of health forecasting. Environmental Health and Preventive Medicine, 18(1), 1.

2. Van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16(3), 219-242.

Page 39: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

38

SCENARIO-BASED PROJECTIONS USING MULTIPLE IMPUTATION: ACCOUNTING FOR BOTH RISK FACTOR AND HEALTH OUTCOME CHANGES IN REPEATED MEASURES DATA Presenting author: Jukka Kontto Department of Public Health Solutions, National Institute for Health and Welfare, Helsinki, Finland

Co-authors: Seppo Koskinen1, Laura Paalanen1, Päivi Sainio2, Tommi Härkänen1 1Department of Public Health Solutions, National Institute for Health and Welfare, Helsinki, Finland; 2Department of Welfare, National Institute for Health and Welfare, Helsinki, Finland Information on future development of health and functioning is needed to tackle the emerging health problems at the population level. Projection methods taking into account the potential scenarios for changes in health determinants are needed to produce effective interventions to prevent or delay negative development in health and functioning. Our projection method is based on multiple imputation and exploits longitudinal data (Härkänen et al. 2019). We generated bootstrapped data sets to account for sampling uncertainty. Multiple imputation by chained equations, using classification and regression trees (CART), was used sequentially to generate projections for each bootstrapped data set. Finally, the results were combined over the data sets. We used ATHLOS data to project the levels of health metric, a score derived to evaluate the functioning of individuals (Caballero et al. 2017). We compared projections where the observed transition probabilities will be the same in the future (the null scenario) with scenarios with modifications for health determinants, e.g. smoking and obesity. Keywords: multiple imputation, bootstrap, forecasting, functioning, longitudinal studies. References:

1. Caballero F.F., Soulis G., Engchuan W., et al. (2017). Advanced analytical methodologies for measuring healthy ageing and its determinants, using factor analysis and machine learning techniques: the ATHLOS project. Scientific Reports, 7, 43955. doi: 10.1038/srep43955.

2. Härkänen T., Sainio P., Stenholm S., et al. (2019). Projecting long-term trends in mobility limitations: impact of excess weight, smoking and physical inactivity. J Epidemiol Community Health Published Online First: 18 February 2019. doi: 10.1136/jech-2017-210413

Page 40: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

39

IDENTIFYING PATTERNS OF MULTIMORBIDITY IN LITHUANIAN NATIONAL HEALTH INSURANCE FUND DATA: A COMPARISON OF CROSS-SECTIONAL AND TEMPORAL PHENOTYPING APPROACHES Presenting author: Roma Puronaitė Faculty of Mathematics and Informatics, Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania; Department of Information Systems, Centre of Informatics and Development, Vilnius University Hospital Santaros Klinikos, Vilnius, Lithuania; Faculty of Medicine, Vilnius, Lithuania; Co-authors: Audronė Jakaitienė1, 3, Elena Jurevičienė2, 3, Vytautas Kasiulevičius2, 3, Rokas Navickas2, 3, Marijus Radavičius1, Žydrūnė Visockienė2, 3 1 Faculty of Mathematics and Informatics, Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania; 2Department of Information Systems, Centre of Informatics and Development, Vilnius University Hospital Santaros Klinikos, Vilnius, Lithuania; 3Faculty of Medicine, Vilnius, Lithuania; Multimorbidity – defined as two or more chronic conditions coexisted in same individual have become a burden to world’s healthcare systems. One of main objectives for multimorbidity is multimorbidity patterns identification. Consequently, the trajectories of patient disease progression, may have impact in answering another question, why same combination of diseases leading to different health care utilization. We analyzed data from National Health Insurance Fund (NHIF) administrative database with information about healthcare services used to treat 428 252 subjects with multiple chronic conditions between 01/01/2012 and 30/06/2014. We used hierarchical clustering, exploratory factor analysis and multiple correspondence analysis for cross-sectional phenotype identification and parallel factor analysis-2 (PARAFAC2) for a temporal phenotype identification. We present a comparison between these approaches. Keywords: EFA, MCA, Multimorbidity, PARAFAC2, temporal Phenotyping. References:

1. Helwig, N. E. (2018). multiway: Component Models for Multi-Way Data. R package version 1.0-5. https://CRAN.R-project.org/package=multiway

2. Marengoni, A., Vetrano, D. L., & Onder, G. (2019). Target Population for Clinical Trials on Multimorbidity: Is Disease Count Enough?. Journal of the American Medical Directors Association, 20(2), 113-114.

3. Perros, I., Papalexakis, E. E., Vuduc, R., Searles, E., & Sun, J. (2019). Temporal Phenotyping of Medically Complex Children via PARAFAC2 Tensor Factorization. Journal of Biomedical Informatics, 103125.

ON THE P-WAVE SEGMENT MODEL OF A SINGLE ELECTROCARDIOGRAM WAVE

Page 41: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

40

Presenting author: Viktor Skorniakov Faculty of Mathematics and Informatics, Vilnius University, Vilnius, Lithuania Co-authors: Antanas Mainelis1, Petras Navickas2, Albinas Stankus3

1 Faculty of Mathematics and Informatics, Vilnius University, Vilnius, Lithuania; 2 Faculty of Medicine, Vilnius University, Vilnius, Lithuania; 3 State Research Institute Centre for Innovative Medicine, Vilnius, Lithuania

We describe parametric model for the P-wave segment of a single electrocardiogram wave (ECG) trajectory. The model was previously considered in the bioengineering literature [1], [2], however, it was not treated in a complete and rigorous parametric fashion. Our results fill the gap by making use of both frequentist and Bayesian techniques. In addition to the model specification, we provide supporting real data example illustrating fit, discuss possible extensions on the statistical grounds and point out some potential practical applications.

Keywords: electrocardiography, P-wave, statistical modeling. References:

1. Censi F, Calcagnini G, Ricci C, Ricci R P, Santini M, Grammatico A, etal. P-Wave Morphology Assessment by a Gaussian Functions-Based Model in Atrial Fibrillation Patients. IEEE Transactions on Biomedical Engineering, 2007 April; 54(4):663–672.

2. Suppappola S, Sun Y,ChiaramidaS A. Gaussian pulse decomposition: An intuitive model of electrocardiogram wave forms. Annals of Biomedical Engineering, 1997 March; 25(2):252–260.

Acknowledgement:

The authors would like to thank Lithuanian Hypertension League for supporting participation and presentation of the work in the 7th Nordic – Baltic Biometric Conference.

THE TRENDINESS OF TRENDS

Page 42: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

41

Presenting author: Andreas Kryger Jensen Biostatistics, Institute of Public Health, University of Copenhagen, Denmark A statement often seen in the news concerning some kind of public health outcome is that its trend has changed. Such statements are often based on longitudinal data obtained from e.g., national surveys, and the change in the trend is claimed to have occurred at the time of the latest data collection. Statistical assessments of changes in trends are very important as they may potentially influence public health decisions on a national level. But... What exactly is a trend? What constitutes a change in a trend? Can we quantify the trendiness of a trend? In this talk we propose two measures for quantifying the trendiness of a trend. Under the assumption that reality evolves in continuous time we define what constitutes a trend as well as a change in a trend, and we introduce a probabilistic Trend Direction Index (TDI). This index has the intuitive interpretation of the probability that a latent time-varying characteristic changes monotonicity at any given time conditional on observed data. We also define a global index of Expected Trend Instability (ETI) that quantifies the expected number of times that a trend has changed on an interval. We show how the Trend Direction Index and the Expected Trend Instability can be estimated from data under a Bayesian framework and give an application to development of the proportion of smokers in Denmark during the last 20 years. Keywords: functional data analysis, trends, public health, Bayesian data analysis.

Page 43: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

42

MODELLING DISEASE PROGRESSION BASED ON BIOMARKER OBSERVATIONS: A BAYESIAN HIDDEN MARKOV AUTOREGRESSIVE MODEL APPROACH Presenting author: Ziad Taib Early Clinical Biometrics, Astrazeneca RD, KC5 - Gothenburg, Sweden Co-authors: Hamid El Maroufy and El Houcine Hibbah1, Abdelmajid Zyad2

1Department of Applied Mathematics, Faculty of Science and Techniques, C.P:523, Sultan Moulay Slimane University Beni-Mellal, Morocco; 2Biological Engineering Laboratory, Team of Natural Substances, Cell and Molecular Immuno-Pharmacology, Sultan Moulay Slimane University, Beni-Mellal, Morocco Understanding disease progression of a chronic illness is of uttermost importance to conceive a convenient treatment. Unfortunately, disease progression can only be observed directly in rare cases. However, there are often several biomarkers that can be observed, that contain indirect information about disease progression. In this work, we model the stages of the disease as an unobserved Markov chain, X, while the Biomarker values follow a first-order autoregressive AR (1), Y. The resulting model generalises the standard Hidden Markov Model (HMM) where the values of Y are assumed to be independent conditionally on the corresponding values of X. We consider two possible applications: Chronic Obstructive Pulmonary Disease (COPD) and Breast Cancer. In the context of drug development, the model can be used to study the effect on progression that e.g. follows as a consequence of a treatment. The model supposes we have biomarker observations related to the hidden disease stages and the aim is to use this information to learn about the progression of the disease i.e. to evaluate the transition between its stages which we assume to be discrete. We adopt a Bayesian description with apriori/aposteriori distributions for the parameters and a Markov Chain Monte Carlo (MCMC) method for the actual parameter estimation, based on the joint estimation of the hidden states. A simulation experiment is provided to assess the accuracy of the estimates and the speed of the algorithm. Keywords: autoregressive hidden Markov model, breast cancer progression marker, Gibbs sampler, hidden states joint estimation, Markov Chain Monte Carlo. References: Bayesian Estimation of Multivariate Autoregressive Hidden Markov Model with Application to Breast Cancer Biomarker Modeling. Hamid El Maroufy, El Houcine Hibbah. Abdelmajid Zyad and Taib ZiadNovember 2017 DOI: 10.5772/intechopen.70053 - In book: Bayesian Inference.

Page 44: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

43

HERITABILITY CURVES: A LOCAL MEASURE OF HERITABILITY Presenting author: Francesca Azzolini Department of Mathematics, University of Bergen, Bergen, Norway Co-authors: Geir Drage Berentsen1, Håkon Gjessing2, Rolv Terje Lie3, Hans Julius Skaug1 1Department of Mathematics, University of Bergen, Bergen, Norway; 2Department of Genes and Environment, Norwegian Institute of Public Health, Oslo, Norway; 3Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway We introduce a new measure of heritability in quantitative genetics, which allows the degree of heritability to vary with the trait value. This measure can be used in scenarios where the trait dependence structure between family members is non-linear, in which case traditional mixed effect models and covariance (correlation) based methods are inadequate. The idea is to combine the notion of a correlation curve with traditional moment-based estimators of heritability. For estimation purposes we use a multivariate Gaussian mixture, which is able to capture non-linear dependence and possesses necessary symmetry properties (between family members). We derive an analytical expression for the correlation curve under Gaussian mixtures, and investigate its tail behavior (outside the data range). The result is a measure of heritability that varies with trait value, giving rise to statements such as "Low and high birth weight is less heritable than medium birth weight”. We then apply this approach to a dataset consisting of the birth weight of 81,144 mother-father-child trios, collected in the Medical Birth Registry of Norway. Keywords: birth weight, correlation curve, heritability. References:

1. Bjerve S, Doksum K. Correlation curves: measures of association as functions of covariate values. The Annals of Statistics. 1993 Jun 1:890-902.

2. Magnus P, Gjessing HK, Skrondal A, Skjaerven RJ. Paternal contribution to birth weight. Journal of Epidemiology & Community Health. 2001 Dec 1;55(12):873-7.

3. Lunde A, Melve KK, Gjessing HK, Skjærven R, Irgens LM. Genetic and environmental influences on birth weight, birth length, head circumference, and gestational age by use of population-based parent-offspring data. American journal of epidemiology. 2007 Feb 20;165(7):734-41.

Page 45: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

44

PROPERTIES OF NONINFORMATIVE GENETIC SEQUENCES

Presenting author: Marijus Radavičius Faculty of Mathematics and Informatics, Vilnius University, Vilnius, Lithuania Co-authors: Tomas Rekašius

Faculty of Fundamental Sciences, Vilnius Gediminas Technical University, Vilnius, Lithuania Genome regions whose evolution is not subjected to natural selection pressure and hence evolve with a neutral mutation rate can be viewed as noninformative genetic sequences (genetic noise). Those regions could be parts of non-coding sequences in genomes of primitive species. Thus the probabilistic model of genetic noise should be consistent with observations and empirical findings in analysis of non-coding DNA sequences (eg., Markov property, long-range dependence, strand symmetry, CpG content, etc.). Our aim is to make a suitable model of local dependences in genetic noise and to test its goodness-of-fit. In (Radavičius et al.), a model of local strand symmetry has been considered and tested for the longest non-coding sequences of bacterial genoms. For most of sequences, the hypothesis of the first order local strand symmetry has been rejected. We will discuss extentions of this result. Keywords: non-coding DNA sequence, probabilistic model, genetic noise, strand symmetry, bacterial genom. References: Radavičius, M., Rekašius, T., and Židanavičiūtė, J. (2019). Local symmetry of genetic non-coding sequences. Informatica (to apear).

Page 46: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

45

SNPs OF PROTEASOMAL GENES AS POSSIBLE BIOMARKERS FOR MULTIPLE SCLEROSIS IN THE LATVIAN POPULATION

Presenting author: Ilva Trapina Genomics and Bioinformatics, Institute of Biology of the University of Latvia, Miera 3, Salaspils, Latvia Co-authors Natalia Paramonova1, Kristine Osina1, Nikolajs Sjakste1,2

1Genomics and Bioinformatics, Institute of Biology of the University of Latvia, Salaspils, Latvia; 2Faculty of Medicine, University of Latvia, Riga, Latvia The proteasomes play critical role in degradation of proteins via ATP/ubiquitin-dependent process, which plays crucial role in immunity. Its disregulation or modulation influence development and progression of different diseases. The possible role of proteasomes in autoimmune diseases was hypothesized after discovery of involvement of LMP2 (PSMB9) and LMP7 (PSMB8) subunits of immunoproteasome in antigen processing. The proteolytic activities of proteasomes are reduced in brain tissue of multiple sclerosis patients. Multiple sclerosis (MS) is autoimmune inflammatory disease of central nervous system. The aim of the study was to find SNPs of proteasomal genes as possible biomarkers for MS in Latvian population. To evaluate potential biomarkers for MS, (1) in case/control study of 280 MS patients and 305 controls were genotyped single nucleotide polymorphisms (SNPs): rs2277460 and rs1048990 of PSMA6 and rs2295826 and rs2295827 of PSMC6; (2) for 174 MS patients and 17 controls were analysed PSMA6 and PSMC6 gene expression levels with qPCR. Data from both parts of research were analysed with appropriate statistical methods, considering, specification of MS (types: relapsing-remitting MS (RRMS) and secondary progressive MS (SPMS), or/and sex specifications, methods of treatment), type of data and normal distribution. There weren’t any significant association between rs1048990 of PSMA6 and MS in case/control study, but rear allele A and genotype CA of rs2277460 was found in significant association with MS in women group and with increased risk for MS and has clinical risk for SPMS. SNPs rs2295826/rs2295827 of PSMC6 are in complete linkage disequilibrium in both groups of Latvian population. Significant associations were found between rs2295826/rs2295827 and different MS groups, more prevalent for men patients was found, that rear alleles G/T has clinical risk with different levels for MS in general and for subtypes. There was found association between rs1048990 of PSMA6 and outcome of interferon (IFN) therapy in MS patient group to be on the boarder of statistical significance (p ~ 5.00 x10-2). Expression level of PSMA6 was statistical significance increased for patients group with IFN therapy, comparing with patients without treatment. For patients with genotypes of rare allele of rs1048990 of PSMA6 the gene expression level was significantly lower, compared to homozygote of common allele; for women samples we could see the same trend, but for men, the difference did not reach statistical significance. We provide evidence that variations of proteasomal genes: PSMA6 and PSMC6, may contribute to the risk of multiple sclerosis in Latvians. Results suggest that IFN therapy increases transcription of genes of 20S proteasome α type subunits. Our results prove that the investigated polymorphisms potentially may be usable biomarkers for MS risk in clinical practice. Keywords: proteasome, association, Multiple sclerosis, Latvian population. References: SAM No 1.1.1.1/16/A/016 project “Determination of proteasome-related genetic, epigenetic and clinical markers for multiple sclerosis”.

Page 47: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

46

IDENTIFICATION OF MIXTURES OF BACTERIAL STRAINS USING DNA SEQUENCING READS WITH POSSIBLY HIGH ERROR RATES Presenting author: Märt Möls Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia Co-authors: Mihkel Vaher Department of Bioinformatics, University of Tartu, Tartu, Estonia

Analysing the data from second generation sequencing experiments can produce heavy computational loads - as sequencing one environmental sample can produce DNA reads with billions of letters and one has to compare this dataset against tens of thousands known bacterial sequences. For a patient waiting for diagnosis or for a scientist wanting to process thousands of environmental samples the speed of the algorithm is important. Computer efficient methods often use k-mer (substrings of length k) counts to identify the bacteria present in the sample. However, due to the sequencing errors one might encounter k-mers uniquely attributable to a bacterial strain even if this particular bacterial strain is not present in the sample - especially if a closely related strain (with only a few differences in its genome) is highly abundant in the sample. To achieve high detection precision even for strains with sequencing coverage several magnitudes lower compared to the most abundant strains in the sample one has to model the effects of sequencing errors to the observed k-mer counts (estimate the number of excess counts due to the sequencing errors). Different models and estimating approaches like non-negative least squares and likelihood-based mixture models will be compared.

Keywords: genetics, NGS, non-negative least squares, mixture modelling.

Page 48: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

47

A REPRODUCIBLE ROBUST LIKELIHOOD APPROACH TO INFERENCE ABOUT MARGINAL CHARACTERISTICS OF BINARY DATA IN PAIRED SETTINGS Presenting author: Tsung-Shan Tsou Institute of Statistics, National Central University, Taiwan We introduce a robust likelihood approach to inference about marginal distributional characteristics for paired data without modeling correlation/joint probabilities. This method is reproducible in that it is applicable to paired settings with various sizes. The virtue of the new strategy is elucidated via testing marginal homogeneity in paired triplet scenario. We use simulations and real data analysis to demonstrate the merit of our robust likelihood methodology. Keywords: correlated triplet data, paired designs, reproducible, robust likelihood, robust score test. References:

1. Bennett BM. Note on tests for matched samples. Journal of the Royal Statistical Society Series B 1968; 30: 368–370.

2. Cochran WG. The Comparison of Percentages in Matched Samples. Biometrika 1950; 37: 256–266.

3. Tsou TS. A robust likelihood approach to inference about the difference between two multinomial distributions in paired designs. Statistical Methods in Medical Research 2018; 27: 3077-3091.

Page 49: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

48

MULTI-LEVEL METHODOLOGY FOR MASSIVE DATA VISUALIZATION Presenting author: Jelena Liutvinavičienė Vilnius University, Institute of Data Science and Digital Technologies, Vilnius, Lithuania

Co-authors: Olga Kurasova, Marius Liutvinavičius Vilnius University, Institute of Data Science and Digital Technologies, Vilnius, Lithuania

Data mining is a process which enables extracting the information and knowledge from analyzed data sources. Nowadays the main challenge in various areas is to handle big data that has such features as volume, speed, volatility, variety and complexity. This research focuses on massive data visualization that is based on dimensionality reduction methods. We propose a new methodology, which divides the whole data visualization process into separate interactive steps. In each step, some part of data can be selected for further analysis and visualization. The different dimensionality method can be chosen/changed in each step. The decision which methods to be chosen depends on desirable accuracy measures and visualization samples. In addition, there are provided statistical measures of the identified clusters. We have developed a special tool, which implements the proposed methodology. Here we present the possibilities to apply proposed methodology for omics data visualization.

Keywords: massive data, dimensionality reduction, data visualization, data mining. References:

1. Diamond, M., Mattia, A. (2017), Data Visualization: An Exploratory Study into the Software Tools Used by Businesses. Journal of Instructional Pedagogies, Vol. 18, 2017, available at: https://eric.ed.gov/?id=EJ1151731.

2. Hassan, A, Elragal, A. (2017), Big Data Visualization Tool: a Best-Practice Selection Model. Institute of Electrical and Electronics Engineers (IEEE), 2017. pp. 59-68, available at: http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1072292&dswid=6232.

3. Rosaria, R. S., Adae, I., Hart, A., Berthold, M. (2014), Seven Techniques for Dimensionality Reduction. Knime, available at: https://www.knime.com/blog/seven-techniques-for-data-dimensionality-reduction.

4. Santoyo, S. (2017), A Brief Overview of Outlier Detection Techniques, available at: https://towardsdatascience.com/a-brief-overview-of-outlier-detection-techniques-1e0b2c19e561.

5. Sorzano, C. O. S., Vargas, J., Montano, A. P. (2014), A survey of dimensionality reduction techniques, available at: https://arxiv.org/abs/1403.2877.

Page 50: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

49

Inference for Two-Sample Quantile Difference Using Empirical Likelihood Presenting author: Janis Valeinis Department of mathematics, Faculty of Physics, mathematics and optometry, University of Latvia, Riga, Latvia

Co-authors: Artis Luguzis Department of mathematics, Faculty of Physics, mathematics and optometry, University of Latvia, Riga, Latvia The inference for two central tendency measures such as the mean or the median difference is a classical topic in statistics. Testing the difference of two quantiles is of great importance especially when the underlying distributions are not normal. For such cases some classical non-parametric testing procedures are usually performed. More specifically, for the two-sample case the classical Wilcoxon-signed-rank test has been widely used. For more sample cases among others the popular Friedman and Kruskal-Wallis tests are commonly performed. In this work we make inference about the two-sample quantiles using the empirical likelihood method developed by Owen (1988, 1990). First Chen and Hall (1993) established the smoothed version of the empirical likelihood for the one-sample quantile inference. They showed some advantages of the smoothed empirical likelihood method. Empirical likelihood has been established for the two quantile differences in the one-sample setting in Zhou and Jing (2003) and discussed for the two-sample case in Valeinis (2007) and Molanes Lopez et al. (2009). The purpose of this work is to compare the smoothed two-sample empirical likelihood with its non-smoothed counterpart performing an extensive simulation study and applying both methods for some biostatistical data. Keywords: empirical likelihood, two-sample problem, quantile difference. References:

1. Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2), 237-249.

2. Owen, A. (1990). Empirical likelihood ratio confidence regions. The Annals of Statistics, 18(1), 90-120.

3. Zhou, W., & Jing, B. Y. (2003). Adjusted empirical likelihood method for quantiles. Annals of the Institute of Statistical Mathematics, 55(4), 689-703.

4. Valeinis, J. (2007). Confidence bands for structural relationship models (Doctoral dissertation). Goettingen. Germany

5. Molanes Lopez, E. M., KEILEGOM, I. V., & Veraverbeke, N. (2009). Empirical likelihood for non‐smooth criterion functions. Scandinavian Journal of Statistics, 36(3), 413-4

Page 51: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

50

FUNCTIONAL DATA ANALYSIS OF NEUROPHYSIOLOGICAL DATA: CASE STUDY Presenting author: Dokt. Tadas Danielius Department of Statistical Analysis, Institute of Applied Mathematics, Vilnius University, Vilnius, Lithuania Co-authors: Prof. Habil. Dr. Osvaldas Rukšėnas, Dr. Valenina Vengelienė, Dr. Rokas Buišas, Dokt. Redas Dulinskas Department of Neurobiology and Biophysics, Institute of Biosciences, Vilnius University, Vilnius, Lithuania Lamotrigine with anti-glutamatergic properties may have potential utility as novel treatment for alcoholism. Nevertheless, it is not entirely clear to which extent different neurotransmitter systems contribute to the development of behavioral inflexibility in alcoholism. In order to better understand it, experiment employing behavioral animal model of long-term voluntary alcohol intake was done. In this study functional data analysis of recorded neurophysiological data revealed that the third functional principal component (PC) had significant changes during drinking bout, while animal was injected with lamotrigine. Analysis of PCs showed that background brain process accounted for roughly 60% of variability. After applying VARIMAX rotation, PCs highlighted time segments, which may help to understand brain dynamics leading to decision making process in brain. Continuous wavelet transform analysis of the third PC display 4 times stronger power spectrum levels when animal was injected with lamotrigin. Keywords: functional data analysis, functional principal component analysis, VARIMAX, continuous wavelet transform, nucleus accumbens, rat brains, lamotrigine.

Page 52: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

51

MODELING OF CAESAREAN SECTION RATES USING SPATIO-TEMPORAL GAUSSIAN RANDOM FIELDS Presenting author: Hans J. Skaug Department of Mathematics, University of Bergen, Norway Co-authors: Janne Mannseth2, Geir Drage Berentsen1, Dag Moster2 1Department of Mathematics, University of Bergen, Norway; 2Department of Global Public Health and Primary Care, University of Bergen, Norway Caesarean section (CS) is a medical intervention that can be applied in certain cases with known high-risk pregnancies and in acute situations. We study geographical variation in CS rates in Norway. We use the R packages TMB and R-INLA in combination to set up and fit a latent spatial-temporal Gaussian Markov random field (GMRF). The SPDE approximation (Lindgren et al, 2011) to the GMRF precision matrix enable us to calculate the Laplace approximation of the marginal likelihood (Kristensen et al, 2016). The model is fit by maximum likelihood to data from 840627 births in Norway during the period 2001 to 2014. A number of covariates are accounted for, among those the so-called Robson Classification. An overall increasing trend in CS rates is found over the 14 year period. The estimated spatial variation within year is attributed differences in practice between hospitals. Keywords: spatial model, GMRF, SPDE approximation, Laplace approximation, maximum likelihood. References:

1. Kristensen, Kasper, et al. "TMB: Automatic differentiation and Laplace approximation." Journal of Statistical Software 70.5 (2016): 1-21.

2. Lindgren, Finn, Håvard Rue, and Johan Lindström. "An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73.4 (2011): 423-498.

Page 53: BIOMETRIC CONFERENCE 7TH NORDIC-BALTICfiles.razzby.com/95/General/1559646481_NBBC19_Abstract_book.pdf · Chair: Tadas Danielius, Lithuania CS5: Trends and Trajectories Chair: Theis

52

SEQUENTIAL MODELS FOR FOREST INVENTORY: A SELF-INTERACTIVE SPATIAL POINT PROCESS Presenting author: Adil Yazigi School of Computing, University of Eastern Finland / Joensuu, Finland We consider a special class of spatio-temporal point processes, the sequential spatial point process. The sequential spatial point processes differ from spatial point processes in the sense that the realizations are ordered sequences of spatial locations and the order of points allows us to describe the evolutionary dynamics of the process, spatially. This feature shall be useful to interpret the long term -dependence and the memory formed by the spatial history of the process. As an illustration, the sequence can be identified as tree locations ordered spatially w.r.t. to time, or some given mark or covariate. We derive a parametric sequential spatial point process model that is expressed in terms of self-interactions of spatial points, and where likelihood-based inference is tractable. As an application, we apply the model obtained to forest datasets collected from the Kiihtelysvaara site in Eastern Finland. Keywords: maximum likelihood, long term-dependence, recurrence, self-interaction, sequential spatial point processes. References:

1. van Lieshout, M. N. M. (2006). Markovianity in space and time. IMS Lecture Notes-Monograph Series, 48, 154-168.

2. Penttinen, A., Ylitalo, A.-K. (2016). Deducing self-interaction in eye movement data using sequential spatial point processes. Spatial Statistics, 17, 1-21.