April 05 06, 2019 - Lamar University€¦ · Mentor: Dr. Yulia R. Gel 5 Bayesian Longitudinal Study...

20
2019 Conference of Texas Statisticians April 05 – 06, 2019

Transcript of April 05 06, 2019 - Lamar University€¦ · Mentor: Dr. Yulia R. Gel 5 Bayesian Longitudinal Study...

2019 Conference of Texas Statisticians

April 05 – 06, 2019

The conference of Texas Statisticians 2019

On behalf of the Office of Undergraduate Research, I want to welcome all of you to Lamar University. We are proud to be hosting the 2019 Conference of Texas Statisticians, the 39th edition of the conference. It has been a long journey for many of you and I am very glad that you are here today. We have put together a wonderful program for you. Famous statisticians from the State of Texas will be presenting their research in this conference. We are also excited to see and hear our students’ poster presentations. We are certainly hoping to have a successful conference. And it is most likely to be successful if we all participate - not just in presenting papers - but by being active participants in the discussion and take advantage of the networking opportunities. I want to thank the San Antonio Chapter of the American Statistical Association for sponsoring the Don Own Award and want to congratulate this year’s winner Dr. Christine Anderson-Cook of the Los Alamos National Laboratory. I want to thank my colleagues at Lamar and around the state for supporting the conference. My special thanks go to President Kenneth Evans and Dean Lynn Maurer for their support. I also want to thank Mr. Nirmal Gope of the Office of Undergraduate Research for a job well done! Please make yourselves at home and enjoy the hospitality of Southeast Texas! Have a great conference everyone. Dr. Kumer P. Das, Professor and University Scholar, Lamar University President, The Conference of Texas Statisticians-2019

The conference of Texas Statisticians 2019

PROGRAM SUMMARY

Friday, April 5th, 2019

Time Activity

12:00-13:00 Registration

13:00-13:20 Opening Remarks

Lynn Maurer, Ph.D., Dean, College of Arts and Sciences, Lamar University

Kumer Das, Ph.D., Professor/University Scholar, Lamar University

13:20-14:10 The Randomized Probability Integral Transform with Applications

Dennis Cox, Ph.D., Rice University, Houston, TX

14:10-15:00 An adaptive model for genetic association tests with flexible pleiotropy structures

Han Hao, Ph.D. University of North Texas, Denton, TX

15:00-15:30 Coffee Break

15:30-16:20 An investigation of correlation structure misspecification for longitudinal gene

expression studies.

Jacob Turner, Ph.D., Stephen F. Austin State University, Nacogdoches, TX

16:20-17:10 Degree-correlation, robustness, and vulnerability in finite scale-free networks

Jeremy Alm, Ph.D., Lamar University, Beaumont, TX

17:25-18:00

17:20-17:40

COTS Business Meeting, 6th Floor, John Gray Library

Poster Presentations Set-up, 8th Floor, John Gray Library

17:45-19:00 Poster Session & Social Hour, 8th Floor, John Gray Library

19:00-21:00 Banquet Dinner; Welcome Remarks by President Kenneth Evans, Lamar University

Don Owen Award; Honoring Professors; poster Awards

8th Floor, John Gray Library

The conference of Texas Statisticians 2019

Saturday, April 6th, 2019

Chemistry 108

Time Activity

7:00-8:45 Breakfast, Chemistry 104

8:45-9:15 Quantile Regression for Functional Data ,

Meng Li, Ph.D., Rice University, Houston, TX

9:15-9:45 Statistical Inference for 3D Rock Images using Persistent Homology,

Chul Moon, Ph.D., Southern Methodist University, Dallas, TX

9:15-9:45 New Statistical Approaches in Clustering Financial Time Dependent Information

Doo Young Kim, Ph.D., Sam Houston State University, Huntsville, TX

9:45-10:15 Bayesian Function-on-Scalars Regression for High Dimensional Data

Daniel Kowal, Ph.D., Rice University, Houston, TX

10:15-11:00 Coffee Break and Faculty Poster Presentation

11:00-11:30 Matched case-control data with a misclassified exposure: What can be done with

instrumental variables?

Authors: Christopher Manuel, Ph.D., Samiran Sinha, Ph.D., and Suojin Wang, Ph.D.;

Presenting author: Samiran Sinha, Ph.D., Texas A & M University, College Station, TX

11:30-12:00 Bayesian Copula Density Deconvolution for Zero-Inflated Data with Applications in

Nutritional Epidemiology

Abhra Sarkar, Ph.D., University of Texas at Austin, Austin, TX

12:00-12:10 Closing Remarks

The conference of Texas Statisticians 2019

POSTER DIRECTORY

1 Importance of Reporting Practical Significance Measures with Large Data Samples

Wafa Salem Aljuhani, Department of Mathematics and Statistics Sam Houston State University

Mentor: Dr. Melinda Holt 2

Effect of the Unfolded Protein Response and Oxidative Stress on Mutagenesis in CSF3R: A Model for Evolution of Severe Congenital Neutropenia to Myelodysplastic Syndrome and Acute Myeloid Leukemia

Sara Biesiadny Department of Statistics, Rice University

Co-Authors: Adya Sapra, Roman Jaksik, Hrishikesh Mehta, and Seth J. Corey Mentor: Dr. Marek Kimmel

3 Intervening in Clostridium difficile Infections in California Hospitals

Erik Boonstra Department of Mathematics and Statistics

Stephen F. Austin State University Co-Authors: Isaac Slagel and Katherine Rodriguez;

Mentor: Dr. Daniel Sewell 4

Role of Local Geometry in Robustness of Power Grid Networks Asim Kumer Dey, Department of Mathematical Sciences

University of Texas at Dallas Co-authors: Umar Islambekov and Dr. H. Vincent Poor

Mentor: Dr. Yulia R. Gel 5

Bayesian Longitudinal Study of the Smoking Cessation Treatment in Cocaine/Meth Dependent Patients Moruf Olalekan Disu, Department of Mathematics and Statistics

Sam Houston State University Mentor: Dr. Ram C. Kafle

6 A Functional Regression Approach for Studying the Trend of Lung Cancer Incidence Rate

Among Females in the United States Richard Ekem, Department of Mathematics & Statistics

Sam Houston State University Mentor: Dr. Ram C Kafle

7 The Role of Statistics in the Fight against Alzheimer’s Disease

Grace Granger, Department of Mathematics Lamar University

Mentor: Dr. Jasdeep Pannu 8

Predicting Bitcoin Return Using Extreme Value Theory Mohammad Tariquel Islam, Department of Mathematics

Lamar University Mentor: Dr. Kumer Das

The conference of Texas Statisticians 2019

9 Using shannon's diversity index to discriminate fiber sources from crime scenes

Iromi Nimna Jayawardena, Department of mathematics and statistics Sam Houston State University

Mentor: Dr. Melinda Holt 10

A Bayesian Approach for Survival Analysis with the Inverse Gaussian Data Dr. Kalanka P Jayalath and Dr. Raj S Chhikara

University of Houston Clear-Lake 11

A Bayesian Zero-Inflated Negative Binomial Regression Model for the Integrative Analysis of Microbiome Data Shuang Jiang, Department of Statistical Science, Department of Population and Data Sciences

Southern Methodist University and UT Southwestern Medical Center Co-authors: Guanghua Xiao, and Andrew Y. Koh

Mentor: Dr. Xiaowei Zhan, and Dr. Qiwei Li 12

Predicting Home Electric Energy Consumption: Comparison between Generalized Regression Neural Network, ARMA and VAR Models

Duwani Wijesinghe Katumullage, Department of Mathematics and Statistics Sam Houston State University

Mentor: Dr. Ferry Butar 13

Predicting the Outcome of Twenty 20 Cricket while the Game is in Progress: A Statistical Learning Approach Upeksha Perera, Department of Mathematics and Statistics

Sam Houston State University Mentor: Dr. Ananda Manage

14 A Critical Look at the Automatic SAS® Forecasting System

Mohammad Afser Uddin, Department of Mathematics and Statistics Sam Houston State University Mentor: Dr. Stephen Scariano

15 Sparse function-on-scalar regression using a group bridge approach with application to ECoG/iEEG data

Zhengjia Wang, Department of Statistics, Rice University Co-authors: John Magnotti, and Michael Beauchamp

Mentor: Dr. Meng Li

16 Generalized Regression Neural Network Model: Application of prediction of temperature in Texas

Uthpala Wanigasekara, Department of Mathematics and Statistics Sam Houston State University

Mentor: Dr. Ferry Butar 17

Sequential Rerandomization Dr. Quan Zhou, Department of Statistics

Rice University Co-authors: Dr. Philip Ernst, Dr. Kari Morgan, Dr. Donald Rubin, and Dr. Anru Zhang

The conference of Texas Statisticians 2019

ABSTRACTS (Oral Presentations) All Talks are listed alphabetically by last name of the primary presenter

Degree-correlation, robustness, and vulnerability in finite scale-free networks

Jeremy Alm, Ph.D. and Keenan Mack, Ph.D. Department of Mathematics

Lamar University Many naturally occurring networks have a power-law degree distribution as well as a non-zero degree-correlation. Despite this, most studies analyzing the robustness to random node-deletion and vulnerability to targeted node-deletion have concentrated only on power-law degree distribution and ignored degree-correlation. We simulate targeted and random deletion in 700 random power-law networks on 1000 nodes built via preferential attachment, and confirm Newman's finding that positive degree-correlation increases robustness and decreases vulnerability. However, we found that networks with sufficiently high positive degree-correlation are more vulnerable to random node-deletion than to targeted deletion methods that utilize knowledge of initial node-degree only. Targeted deletion sufficiently alters the structure of the network to render this method less effective than uniform random methods unless changes in the network are accounted for. This result indicates the importance of degree-correlation in certain network applications. (Joint work with Keenan Mack, Illinois College)

The Randomized Probability Integral Transform with Applications Dennis Cox, Ph.D.

Department of Statistics Rice University

The randomized probability integral transform can be simply stated as follows: given a real valued random

variable X with distribution function F, and an independent random variable U uniformly distributed on [0,1],

then the random variable UF(X) + (1-U)F(X-0) has a uniform distribution on [0,1]. Here, F(x-0) is the limit of F(y)

as y tends to x from below. This motivates consideration of empirical measures that have continuous

components from discrete random variables and discrete components from continuous random variables. We

present some new types of empirical processes generated from independent random variables and theoretical

results about such processes. We also use these empirical measures to motivate graphical diagnostics and

goodness of fit tests applicable to a wide variety of situations not covered by classical methods.

An adaptive model for genetic association tests with flexible pleiotropy structures Han Hao, Ph.D.

Department of Mathematics University of North Texas

Increasing empirical evidence shows the existence of pleiotropy, where genetic variants influence multiple

phenotypes related to complex diseases such as glaucoma, hypertension, autism spectrum disorder, major

depressive disorder, and schizophrenia. There are two different types of pleiotropy: causal pleiotropy, where

genetic variants directly affect multiple phenotypes simultaneously; and mediated pleiotropy, where genetic

variants affect certain phenotypes through the mediation of other phenotypes and demographic covariates.

Although there are a number of existing multiple traits association tests, few tests can deal with both causal and

mediated pleiotropy. We propose a novel multiple-traits genetic association test framework which is flexible for

The conference of Texas Statisticians 2019

various pleiotropy structures by selecting mediators adaptively. This approach will not only increase the

statistical power by aggregating multiple weak effects, but also improve our understanding of the disease

etiology.

New Statistical Approaches in Clustering Financial Time Dependent Information Doo Young Kim, Ph.D.

Department of Mathematics and Statistics, Sam Houston State University

In the Age of the information revolution, statistics has taken an essential position in decision making procedures.

A number of statistical models have been developed during the pre-information age such as ARIMA, VAR, and

GARCH with several other child models such as FAR, CFAR, etc. for time dependent information, which mainly

focus on finding the best model that represents the given data with the smallest amount of errors while all

models have the same flaw which is the fanning out problem of prediction intervals over time. This incurable

problem in conjunction with emerging of big data requested us to consider new methods to analyze time

dependent information more efficiently. In the present study, we employ LTTC (Lag Target Time Series

Clustering) and MFTC (Multi-Factor Time Series Clustering) that Kim and Tsokos proposed in the previous study.

The cross-lag time dependencies with respect to volatility between two financial time series are investigated in

order to identify lagging securities to a set of leading securities. The arc length of log return series on the

weighted time line is used as another measure of risk in a stock market to cluster securities based on risk or

variability. The concept of the target lag allows a stock trader to have more flexible portfolio as a clustering

output based on one’s trading strategy. Finally, we construct a multivariate statistical forecasting model in each

cluster at the end.

Bayesian Function-on-Scalars Regression for High Dimensional Data Daniel R. Kowa, Ph.D. and Daniel C. Bourgeois, Ph.D.

Department of Statistics Rice University

We develop a fully Bayesian framework for function-on-scalars regression with many predictors. The functional

data response is modeled nonparametrically using unknown basis functions, which produces a flexible and data-

adaptive functional basis. We incorporate shrinkage priors that effectively remove unimportant scalar covariates

from the model and reduce sensitivity to the number of (unknown) basis functions. For variable selection in

functional regression, we propose a decision theoretic posterior summarization technique, which identifies a

subset of covariates that retains nearly the predictive accuracy of the full model. Our approach is broadly

applicable for Bayesian functional regression models, and unlike existing methods provides joint rather than

marginal selection of important predictor variables. Computationally scalable posterior inference is achieved

using a Gibbs sampler with linear time complexity in the number of predictors. The resulting algorithm is

empirically faster than existing frequentist and Bayesian techniques, and provides joint estimation of model

parameters, prediction and imputation of functional trajectories, and uncertainty quantification via the

posterior distribution. A simulation study demonstrates improvements in estimation accuracy, uncertainty

quantification, and variable selection relative to existing alternatives. The methodology is applied to actigraphy

data to investigate the association between intraday physical activity and responses to a sleep questionnaire.

The conference of Texas Statisticians 2019

Quantile Regression for Functional Data Meng Li, Ph.D.

Department of Statistics Rice University

The advance in computation and technology generated an explosion of data that have functional characteristics,

which has triggered a rapid growth of the functional data analysis (FDA) field with an overwhelming emphasis on

mean regression. Quantile regression introduced by Koenker and Bassett Jr (1978) has been widely used in many

areas to study the effect of predictor variables on a given quantile level of the response, and can reveal

important information about how the entire distribution of response varies with predictors in ways that might

not be captured by mean regression. In this talk, we propose a Bayesian framework to perform function-on-

scalar quantile regression, develop theory for the scalar-on-function case, and introduce a Bayesian Median

AutoRegressive model (BayesMAR) for time series forecasting. We demonstrate the excellent performances of

the proposed methods using simulations and real data application including an analysis of mass spectrometry

proteomics.

Statistical Inference for 3D Rock Images using Persistent Homology Chul Moon, Ph.D.

Southern Methodist University We propose a 3d rock image analysis pipeline using persistent homology. We first compute persistent

homology of binarized 3D images of sampled material subvolumes. For each image we compute sets of

homology intervals, which are represented as summary graphics called persistence diagrams. We convert

persistence diagrams into image vectors in order to analyze the similarity of the homology of the material

images using the mature tools for image analysis. Each image is treated as a vector and we compute its principal

components to extract features. We fit a statistical model using the loadings of principal components to

estimate material porosity, permeability, anisotropy, and tortuosity. We also propose an adaptive version of the

Structural SIMilarity index (SSIM), a similarity metric for images, as a measure to determine the Statistical

Representative Elementary Volumes (sREV) for persistence homology. Thus we provide a capability for making a

statistical inference of the fluid flow and transport properties of rocks based on their geometry and connectivity.

Bayesian Copula Density Deconvolution for Zero-Inflated Data with Applications in Nutritional Epidemiology Abhra Sarkar, Ph.D.

University of Texas Austin Estimating the marginal and joint densities of the long-term average intakes of different dietary components, X,

is an important problem in nutritional epidemiology. Since X cannot be directly measured, data are usually

collected in the form of 24-hour recalls of the intakes, Y, which show marked patterns of conditional

heteroscedasticity. Significantly compound- ing the challenges, the recalls for episodically consumed dietary

components also include exact zeros. The problem of estimating the density of the latent X from their observed

measurement error contaminated proxies Y is then a problem of deconvolution of densities with zero-inflated

data. We propose a Bayesian semiparametric solution to the problem, building on a novel hierarchical latent

variable framework that translates the problem to one involving continuous surrogates only. Crucial to

accommodating important aspects of the problem, we then design a copula based approach to model the

involved joint distributions, adopting different modeling strategies for the marginals of the different dietary

components. We design efficient Markov chain Monte Carlo algorithms for posterior inference and illustrate the

efficacy of the proposed method through simulation experiments. Applied to our motivating nutritional

The conference of Texas Statisticians 2019

epidemiology problems, our method provides more realistic estimates of the consumption patterns of

episodically consumed dietary components.

`Matched case-control data with a misclassified exposure: What can be done with instrumental variables?' Christopher Manuel, Ph.D., Samiran Sinha, Ph.D., and Suojin Wang, Ph.D.

Texas A & M University

Matched case-control studies are used for finding the association between a disease and an exposure after

controlling the effect of important confounding variables. It is a known fact that the disease-exposure

association parameter estimators are biased when the exposure is misclassified, and a matched case-control

study is of no exception. Any bias correction method relies on validation data that contain the true exposure and

the misclassified exposure value, and in turn the validation data help to estimate the misclassification

probabilities. The question is what we can do when there are no validation data, and no prior knowledge on the

misclassification probabilities but some instrumental variables are observed. To answer this unexplored and

unanswered question, we propose two methods of reducing the exposure misclassification bias in the analysis of

a matched case-control data when instrumental variables are measured for each subject of the study. The

significance of these approaches is that the proposed methods are designed to work without any validation data

that often are not available when the true exposure is impossible or too costly to measure. A simulation study

explores different types of instrumental variable scenarios and investigates when the proposed methods work,

and how much bias can be reduced. For the purpose of illustration we apply the methods to a nested case-

control data sampled from the 1989 United States birth registry.

An investigation of correlation structure misspecification for longitudinal gene expression studies. Jacob Turner, Ph.D.

Mathematics and Statistics Department Stephen F. Austin State University

Gene expression studies such as microarray and RNA sequencing provide researchers with a wealth of biological

knowledge at the transcript level. With that wealth of data, comes interesting statistical challenges to discover

and understand. As the cost of the technology continues to decline, more complex study designs such as

longitudinal studies are arriving to help researchers understand biological systems in a more kinetic way.

Combining longitudinal analyses with gene expression studies has its challenges. One particular challenge is the

choice of correlation structures when conducting general linear models on each of the many thousands of

genes. It is well known that when the correlation structure is incorrectly specified for a single response variable,

standard error estimates can be biased, inflating type-I error. This talk will provide an assessment of how

correlation structure misspecification impacts the False Discovery Rate (FDR), introduce an objective diagnostic

to help in making an appropriate specification, and provide some real data examples involving immunology

related experiments.

The conference of Texas Statisticians 2019

ABSTRACTS (Poster Presentations) All Posters are listed alphabetically by last name of the primary presenter

Importance of Reporting Practical Significance Measures with Large Data Samples

Wafa Salem Aljuhani, Department of Mathematics and Statistics Sam Houston State University

Mentor: Dr. Melinda Holt As the ability to collect larger and larger data sets grows, so does the concern that the statistical significance can be essentially guaranteed without practical significance. Here we consider a recent study (Xie et al, 2019) in which researchers wished to study children reported as abused or neglected. The FY2013 National Child Abuse and Neglect Data System Child File (NCANDS) dataset was used to compare Hispanic children to Non-Hispanic children in terms of mental health access post-abuse, child living arrangements, and housing adequacy. Considering only children with at least one substantiated report of abuse, the sample size was n = 448,171. As expected, Chi-Square tests were all significant (p < 0.001). For practical significance, the study used Cramer’s V. Despite the statistically significant Chi-Square results, the Cramer’s V indicated weak association to no association among the data categories and variables. The result emphasizes the importance of evaluating the practical significance for large data sample. Effect of the Unfolded Protein Response and Oxidative Stress on Mutagenesis in CSF3R: A Model for Evolution

of Severe Congenital Neutropenia to Myelodysplastic Syndrome and Acute Myeloid Leukemia Sara Biesiadny

Department of Statistics, Rice University Co-Authors: Adya Sapra, Roman Jaksik, Hrishikesh Mehta, and Seth J. Corey

Mentor: Dr. Marek Kimmel Severe congenital neutropenia (SCN) is a rare blood disorder characterized by abnormally low levels of circulating neutrophils. Mutations in multiple genes, including neutrophil elastase gene (ELANE) and granulocyte colony-stimulating factor receptor (CSF3R), may cause SCN. The treatment of choice for SCN is administration of granulocyte-colony stimulating factor (G-CSF), which elevates the neutrophil count and hence improves the survival and quality of life. Long term survivorship on G-CSF is, however, linked to the development of MDS (myelodysplastic syndrome) and AML (acute myeloid leukemia). About 70% of MDS/AML patients acquire nonsense mutations affecting the cytoplasmic domain of CSF3R. In this project, we hypothesized that this coding region of CSF3R constitutes a hotspot, vulnerable to mutations resulting from excessive oxidative stress or endoplasmic reticulum (ER) stress. We used the murine Ba/F3 cell line to study the effect of induced oxidative or ER stress on the mutation rate in our hypothesized hotspot of the exogenous human CSF3R, the corresponding region in the endogenous Csf3r, and a leukemia-associated gene Runx1. Ba/F3 cells transduced with the cDNA for partial C-terminal of CSF3R fused in-frame with a Green Fluorescent Protein (GFP) tag was subjected to cellular stress inducing mutagen treatment for a prolonged period of time (30 days). The amplicon based targeted deep sequencing data for days 15 and 30 samples show that, although there was increased mutagenesis observed in all genes, there were more mutations in the GFP region as compared to the GC-rich partial CSF3R region. Our findings also indicate that there is no correlation between the stress-inducing chemical treatments and mutagenesis in Ba/F3 cells. Our data suggest that cellular oxidative or ER stress induction does not promote genomic instability affecting CSF3R or Csf3R in Ba/F3 cells that could account for it being a mutational hotspot. Thus, we conclude that there are other mechanisms to acquired mutations of CSF3R that help drive the evolution of SCN to MDS/AML.

The conference of Texas Statisticians 2019

Intervening in Clostridium difficile Infections in California Hospitals

Erik Boonstra Department of Mathematics and Statistics

Stephen F. Austin State University Co-Authors: Isaac Slagel and Katherine Rodriguez;

Mentor: Dr. Daniel Sewell Clostridium difficile infection (CDI) poses a serious health threat for hospitalized patients. For example, in 2015 there were approximately 500,000 infections and 29,000 deaths in the United States. Individual risk factors include advanced age, prolonged antibiotic use, and severe illness. Environmental risk factors of CDI include bathroom sharing with infected patients, patient transfers, and seasonal trends. Despite several clinical trials testing various strategies for reducing CDI rates, most have been unsuccessful or provided short-term success. So, what we aim to do is effectively choose hospitals within a region to include in a clinical study with the goal of maximizing the reduction of CDI cases for a fixed treatment efficacy rate. To achieve this we used a linear mixed effects model to predict CDI rates 24 months out. Using this prediction model, we test various selection strategies such as targeting hospitals with high CDI rates, larger proportions of patients over 65, and high levels of centrality within the hospital network. Then using the selection methods we simulated interventions to determine which selection strategy resulted in the maximum reduction of CDI cases over 24 months. This work was done in correlation with the University Of Iowa Department Of Biostatistics through a program called ISIB, which was funded by National Heart, Lung, and Blood Institute.

Role of Local Geometry in Robustness of Power Grid Networks Asim Kumer Dey, Department of Mathematical Sciences

University of Texas at Dallas Co-authors: Umar Islambekov and Dr. H. Vincent Poor

Mentor: Dr. Yulia R. Gel We introduce a novel approach to study robustness of a power grid network employing the tools of topological data analysis (TDA). This approach not only enables one to incorporate intrinsic network properties such as electrical conductance but more importantly also offers a systematic and comprehensive framework to study the role of topology in its functionality and robustness. This is achieved by viewing the network as a weighted graph, equipping it with a nested simplicial complex structure and extracting topological summaries in the form of the Betti numbers and persistent diagrams. These summaries are then used to characterize network vulnerability under critical conditions such as targeted attacks.

Bayesian Longitudinal Study of the Smoking Cessation Treatment in Cocaine/Meth Dependent Patients

Moruf Olalekan Disu, Department of Mathematics and Statistics Sam Houston State University

Mentor: Dr. Ram C. Kafle The high rate of abuse and addiction among smokers of cocaine/methamphetamine (meth) has short and long-term consequences. The consequence of smoking cocaine/meth range from depression, increase in blood pressure, ulcers, stroke, and many others, which might even lead to sudden death. The aim of this study is to evaluate the effect of smoking cessation treatment plus treatment as usual (SCT + TAU) compared to treatment as usual (TAU) from a longitudinal study in cocaine/meth dependent patients. The data is drawn from the Clinical Trials Network (CTN) of the National Drug Abuse treatment (NIDA) program. Average carbon monoxide (CO) level was measured over time from 533 cocaine/meth dependent patients and randomized to either SCT+TAU or TAU. We applied Bayesian longitudinal approach using PROC MCMC considering the CO levels as the response with other applicable covariates.

The conference of Texas Statisticians 2019

A Functional Regression Approach for Studying the Trend of Lung Cancer Incidence Rate

Among Females in the United States Richard Ekem, Department of Mathematics & Statistics

Sam Houston State University Dr. Ram C Kafle

Lung cancer is the leading cause of cancer death among men and women in the United States. The Incidence of lung cancer is greater in males than in females but studies have shown that the incidence in females is increasing. Investigating the trend line will give some insight into this growing trend of Lung cancer in females. The goal of this research is to study the trend line of the rate of incidence of lung cancer in females from 1973 to 2014. We applied a functional regression approach to fit the trend line. The B-spline basis function is used to build the functional data object in the model. Finally, we fit a log-linear model and compare to the functional regression model. It reveals that the proposed functional regression model outperforms the log-linear mode.

The Role of Statistics in the Fight against Alzheimer ’s disease Grace Granger, Department of Mathematics

Lamar University Dr. Jasdeep Pannu

At some point, on some level, we all have to ask ourselves, “What makes life meaningful?” Our motivations stem from the answer, which for most people usually boils down to this: the people we meet and the experiences we share, held in the library of our memories. If nothing else, we’ll always have that—at least we hope so. Alzheimer’s Disease is the leading cause of dementia, and is projected to become more common in the coming decades. Although researchers aren’t yet able to determine with certainty whether or not an individual will develop Alzheimer’s, we have access to statistical data that has helped determine some factors that affect one’s probability of doing so. Even though certain protection is not yet available, what can we do to reduce the risk in ourselves and loved ones? The answer to this problem is largely already available from sources like the Alzheimer’s Association’s Facts and Figures annual reports. Modifiable risk factors include a healthy diet, physical activity, managing cardiovascular health, and sustained cognitive activity. Other powerful tools include awareness for the disease, and increased research funding to uncover further answers. Alzheimer’s Disease is about more than memory loss; it slowly kills the brain and is ultimately fatal. Because Alzheimer’s is the sixth leading cause of death in the USA, it is important for individuals to be aware of their risk for developing it, something we are able to gauge using the statistical data available.

Predicting Bitcoin Return Using Extreme Value Theory Mohammad Tariquel Islam, Department of Mathematics

Lamar University Mentor: Dr. Kumer Das

Bitcoin is the largest among all cryptocurrencies. It was introduced in 2008, and in a short time, Bitcoin price reached USD 19,345.49 in Dec 2017. Investors are also considering Bitcoin as an alternative to their current investments. Rather than looking into the price of Bitcoin, we looked into the historical return of Bitcoin and decided to study the future pattern. We found that the historical Bitcoin daily return data are not normally distributed, and the presence of many extreme outliers. Therefore, we researched the historic Bitcoin daily returns using Extreme Value Theory to predict the future return level and the probability of exceeding certain return levels. We have found that it is very highly unlikely to have an extremely high return on Bitcoin.

The conference of Texas Statisticians 2019

Using shannon's diversity index to discriminate fiber sources from crime scenes Iromi Nimna Jayawardena, Department of mathematics and statistics

Sam Houston State University Mentor: Dr. Melinda Holt

Wigs are objects that people wear frequently to cover baldness. They can also be worn to cover one’s identity in a crime. Therefore wig fibers may considered physical evidence in a crime scene. Wigs can be distinguished by their cross sectional shape and their chemical class. Yogi, et al (2018) applied Simpson’s Index, a diversity measure, to show that a combination of these analyses had significantly higher discriminating power than either alone. Because the sampling distribution of Simpson’s Index is unknown, the authors employed bootstrap methods to determine significance. Shannon’s Index is another diversity measure that puts more weight on richness, the number of different subgroups in the sample, to determine diversity. This study applied Shannon’s index to the wig data of Yogi, et al. Laura Pla (2004) proposed a bootstrap interval estimate of Shannon’s Index, using a bias correction that incorporates richness. Both standard bootstrap interval of Simpson’s interval and standard bootstrap interval of Shannon’s Index indicate that the combination of cross sectional shape and chemical class analyses produce significantly better discrimination than either alone.

A Bayesian Approach for Survival Analysis with the Inverse Gaussian Data Dr. Kalanka P Jayalath and Dr. Raj S Chhikara

University of Houston Clear-Lake This work focuses on a comprehensive survival analysis for the inverse Gaussian (IG) distribution employing

Bayesian approach. It is common in the literature that the mean of the IG distribution assumed to be known. No

such assumption is made in this study to broaden the applications and further include a survival analysis of data

with random rightly censored observations. The Gibbs sampling is employed in estimation and hypothesis

testing of the parameters of interest. A simulation study is used to exhibit the accuracy of the suggested

procedure. A real data set is also used to further exhibit the applications.

A Bayesian Zero-Inflated Negative Binomial Regression Model for the Integrative Analysis of Microbiome Data Shuang Jiang, Department of Statistical Science, Department of Population and Data Sciences

Southern Methodist University and UT Southwestern Medical Center Co-authors: Guanghua Xiao, and Andrew Y. Koh

Mentor: Dr. Xiaowei Zhan, and Dr. Qiwei Li Microbiome ‘omics approaches can reveal intriguing relationships between the human microbiome and certain

disease states. Along with identification of specific bacteria taxa associated with diseases, recent scientific

advancements provide mounting evidence that metabolism, genetics and environmental factors can all

modulate these microbial effects. However, the current methods for integrating microbiome data and other

covariates are severely lacking. Hence, we present an integrative Bayesian zero-inflated negative binomial

regression model that can both distinguish differentially abundant taxa with distinct phenotypes and quantify

covariate-taxa effects. Our model has good performance using simulated data. Furthermore, we successfully

integrated microbiome taxonomies and metabolomics in two real microbiome datasets to provide biologically

interpretable findings. In all, we proposed a novel integrative Bayesian regression model that features bacterial

differential abundance analysis and microbiome-covariate effects quantifications which is suitable for general

microbiome studies.

The conference of Texas Statisticians 2019

Predicting Home Electric Energy Consumption: Comparison between Generalized Regression Neural Network, ARMA and VAR Models

Duwani Wijesinghe Katumullage, Department of Mathematics and Statistics Sam Houston State University

Mentor: Dr. Ferry Butar This project compares the applications of Generalized Regression Neural Network (GRNN) model and conventional time series models, Autoregressive Moving Average (ARMA) and Vector Autoregressive (VAR)/Vector Error Correction Model (VECM), to predict home electric power consumption. The weekly average electric power consumption in apartments in the New England region and corresponding temperature data from October 2014 to December 2016 were used for the analysis. Based on the seasonal patterns observed, the first ninety percent and the latter ten percent of the data set were used for training and testing purposes, respectively. For the comparison between the models, we employ the standard time series analysis assumption that time-sequenced data have serial correlations within the time series and cross correlations with the explanatory time series. In overall, forecasting with multivariate models, GRNN and VAR (or VECM), that incorporate corresponding temperature data as an explanatory time series show significant improvement over the univariate model, ARMA. Among the multivariate models, GRNN captures the trend and seasonal pattern in its forecasting, while the VAR (or VECM) does not. Accurate forecasting of home electric power consumption can facilitate decisionmaking tasks in improvement of energy conservation, and it can facilitate research in designing sustainable homes.

Predicting the Outcome of Twenty 20 Cricket while the Game is in Progress: A Statistical Learning Approach Upeksha Perera, Department of Mathematics and Statistics

Sam Houston State University Mentor: Dr. Ananda Manage

Predicting the outcome of a game during the second innings of a cricket match is a difficult task. There are

several studies in the literature in which the authors have attempted to predict the outcome of cricket matches.

In those studies, they have mainly used the factors such as winning the toss, home team advantage and rank

difference as the predictor variables. Furthermore, most of those studies were done for the One Day

International (ODI) format. In this study, we have used two modern statistical learning techniques to successfully

predict the outcome of T20 cricket matches, while the game is in progress during its second innings. As the first

approach, Support Vector Machine (SVM) procedure with both linear and nonlinear kernels have been applied.

Second is a procedure that utilizes a similarity score for the given match based on the characteristics of previous

matches. Moreover, amount of pressure exerted on batsmen at each over is also used as predictor variables, in

addition to the predictors used in the previous studies.

A Critical Look at the Automatic SAS® Forecasting System Mohammad Afser Uddin, Department of Mathematics and Statistics

Sam Houston State University Mentor: Dr. Stephen Scariano

The goal of this effort is to study the SAS® automatic model selection and forecasting procedure for studying

time series data. This procedure generates various time series models and suggests the best choice model based

on either minimum mean squared error (MSE), minimum root mean square error (RMSE), minimum mean

absolute error (MAE), and minimum mean absolute percent error (MAPE), along with several other accuracy

measures. This study focuses on using the classic time series data sets given in Appendix B of Introduction to

Time Series Analysis and Forecasting (2nd ed., p. 581-626) by Douglas C. Montgomery, Cheryl L. Jennings &

Murat Kulahci.

The conference of Texas Statisticians 2019

Sparse function-on-scalar regression using a group bridge approach with application to ECoG/iEEG data Zhengjia Wang, Department of Statistics, Rice University

Co-authors: John Magnotti, and Michael Beauchamp Mentor: Dr. Meng Li

There is a surge of interest in functional data analysis to incorporate shape constraints into the regression

functions tailored for specific applications with enhanced interpretability. One such example is sparse function

that arises frequently in neuroscience where interpretable signals often are zero in most regions and non-zero in

some local regions. In this paper, we consider the function-on-scalar setting and propose to model sparse

regression coefficient functions using a group bridge approach to capture both global and local sparsity. We use

B-splines to transform sparsity of coefficient functions to its sparse vector counterpart of increasing dimension.

We propose a non-convex optimization algorithm to solve the involved penalized least square error loss

function, with theoretically guaranteed numerical convergence and scalable implementation. Some asymptotic

properties are provided. We illustrate the proposed method through simulation and an application to

intracranial electroencephalography (iEEG) and electrocorticography (ECoG) dataset.

Generalized Regression Neural Network Model: Application of prediction of temperature in Texas Uthpala Wanigasekara, Department of Mathematics and Statistics

Sam Houston State University Mentor: Dr. Ferry Butar

The temperature has an effect on all living things on the earth. Therefore, predicting temperature plays very important role in research fields. Weather forecasting is the application of current technology and science to predict the state of the atmosphere for a future time and a given location. Weather forecasts are made by collecting as much data as possible about the current state of the atmosphere (temperature, humidity, wind and precipitation) and using the understanding of atmospheric processes (through meteorology) to determine how the atmosphere evolves in the future. However, the chaotic nature of the atmosphere and incomplete understanding of the processes mean that forecasts become less accurate as the range of the forecast increases. In this paper, temperature of selected cities in Texas is predicted using Generalized Regression Neural Network method (GRNN). Data is collected from 70 cities all over the USA. There are 9 input variables, which are latitude (Degrees), longitude (Degrees), precipitation (mm), surface pressure (kPa), wind speed at 50 meters (m/s), earth skin temperature (C), maximum temperature at 2 meters (C), minimum temperature at 2 meters (C), and wind speed at 10 meters (m/s). In this study, temperature at 2 meters (C) is selected as the output variable. The GRNN method is applied to the data set and Mean Square Error (MSE) is calculated. The accuracy of the GRNN method is 99.70%, 93.82% and 93.94% for training phase, testing phase and prediction phase.

Sequential Rerandomization Dr. Quan Zhou, Department of Statistics, Rice University

Co-authors: Dr. Philip Ernst, Dr. Kari Morgan, Dr. Donald Rubin, and Dr. Anru Zhang

The seminal work of Morgan and Rubin (2012) considers rerandomization for all the units at one time. In practice, however, experimenters may have to rerandomize units sequentially. For example, a clinician studying a rare disease may be unable to wait to perform an experiment until all the experimental units are recruited. Our work offers a mathematical framework for sequential rerandomization designs, where the experimental units are enrolled in groups. We formulate an adaptive rerandomization procedure for balancing treatment/control assignments over continuous or binary covariates, using Mahalanobis distance as the imbalance measure. We prove in our key result that given the same number of rerandomizations (in expected value), under certain mild assumptions, sequential rerandomization achieves better covariate balance than rerandomization at one time.

The conference of Texas Statisticians 2019

The conference of Texas Statisticians 2019

The conference of Texas Statisticians 2019

The conference of Texas Statisticians 2019

The Conference of Texas Statisticians (COTS) was established at the annual American Statistical Association

(ASA) meeting in Houston, Texas in 1980. Tom Bratcher, Bill Schucany, and Jim Davenport organized the first

COTS meeting which was held in Waco, Texas in February, 1981. The conference has been held annually since

then. After the successful 1985 meeting in Austin, the Texas Chapters of the ASA were invited to set up the

Council of Texas Statisticians with representatives from each Texas Chapter. The COTS meetings afford Texas

statisticians the opportunity for social and intellectual exchange on a yearly basis. Senior and junior statisticians

as well as students present their research talks or posters at the COTS. These meetings also stimulate increased

chapter membership as well as visibility of the statistical profession in the state of Texas. A highlight of each

meeting is presentation of the annual Don Owen Award for excellence in research, statistical consultation, and

service to the statistical community.

Lamar University, Beaumont, Texas