332 results match your criteria Annals of Applied Statistics[Journal]


BIDIMENSIONAL LINKED MATRIX FACTORIZATION FOR PAN-OMICS PAN-CANCER ANALYSIS.

Ann Appl Stat 2022 Mar 28;16(1):193-215. Epub 2022 Mar 28.

Department of Genetics, Computational Medicine Program, University of North Carolina.

Several modern applications require the integration of multiple large data matrices that have shared rows and/or columns. For example, cancer studies that integrate multiple omics platforms across multiple types of cancer, , have extended our knowledge of molecular heterogeneity beyond what was observed in single tumor and single platform studies. However, these studies have been limited by available statistical methodology. Read More

View Article and Full-Text PDF

JOINT AND INDIVIDUAL ANALYSIS OF BREAST CANCER HISTOLOGIC IMAGES AND GENOMIC COVARIATES.

Ann Appl Stat 2021 Dec 21;15(4):1697-1722. Epub 2021 Dec 21.

University of North Carolina at Chapel Hill.

The two main approaches in the study of breast cancer are histopathology (analyzing visual characteristics of tumors) and genomics. While both histopathology and genomics are fundamental to cancer research, the connections between these fields have been relatively superficial. We bridge this gap by investigating the Carolina Breast Cancer Study through the development of an integrative, exploratory analysis framework. Read More

View Article and Full-Text PDF
December 2021

BOUNDING THE LOCAL AVERAGE TREATMENT EFFECT IN AN INSTRUMENTAL VARIABLE ANALYSIS OF ENGAGEMENT WITH A MOBILE INTERVENTION.

Ann Appl Stat 2022 Mar 28;16(1):60-79. Epub 2022 Mar 28.

Department of Medicine, Vanderbilt University Medical Center.

Estimation of local average treatment effects in randomized trials typically relies upon the exclusion restriction assumption in cases where we are unwilling to rule out the possibility of unmeasured confounding. Under this assumption, treatment effects are mediated through the post-randomization variable being conditioned upon, and directly attributable to neither the randomization itself nor its latent descendants. Recently, there has been interest in mobile health interventions to provide healthcare support. Read More

View Article and Full-Text PDF

Length-biased semi-competing risks models for cross-sectional data: an application to current duration of pregnancy attempt data.

Ann Appl Stat 2021 Jun 12;15(2):1054-1067. Epub 2021 Jul 12.

Department of Family Health Services, University of Maryland.

Cross-sectional length-biased data arise from questions on the at-risk time for an event of interest from those who are at-risk but have yet to experience the event. For example, in the National Survey on Family Growth (NSFG), women who were currently attempting to become pregnant were asked how long they had been attempting pregnancy. Cross-sectional survival analysis methods use the observed at-risk times to make inference on the distribution of the unobserved time-to-failure. Read More

View Article and Full-Text PDF

PARTITIONING AROUND MEDOIDS CLUSTERING AND RANDOM FOREST CLASSIFICATION FOR GIS-INFORMED IMPUTATION OF FLUORIDE CONCENTRATION DATA.

Ann Appl Stat 2022 Mar 28;16(1):551-572. Epub 2022 Mar 28.

Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina at Chapel Hill.

Community water fluoridation is an important component of oral health promotion, as fluoride exposure is a well-documented dental caries-preventive agent. Direct measurements of domestic water fluoride content provide valuable information regarding individuals' fluoride exposure and thus caries risk; however, they are logistically challenging to carry out at a large scale in oral health research. This article describes the development and evaluation of a novel method for the imputation of missing domestic water fluoride concentration data informed by spatial autocorrelation. Read More

View Article and Full-Text PDF

INFORMATION CONTENT OF HIGH-ORDER ASSOCIATIONS OF THE HUMAN GUT MICROBIOTA NETWORK.

Ann Appl Stat 2021 Dec 21;15(4):1788-1807. Epub 2021 Dec 21.

Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth.

The human gastrointestinal tract is an environment that hosts an ecosystem of microorganisms essential to human health. Vital biological processes emerge from fundamental inter- and intra-species molecular interactions that influence the assembly and composition of the gut microbiota ecology. Here we quantify the complexity of the ecological relationships within the human infant gut microbiota ecosystem as a function of the information contained in the nonlinear associations of a sequence of increasingly-specified maximum entropy representations of the system. Read More

View Article and Full-Text PDF
December 2021

ASSESSING SELECTION BIAS IN REGRESSION COEFFICIENTS ESTIMATED FROM NONPROBABILITY SAMPLES WITH APPLICATIONS TO GENETICS AND DEMOGRAPHIC SURVEYS.

Ann Appl Stat 2021 Sep 23;15(3):1556-1581. Epub 2021 Sep 23.

Michigan Program in Survey and Data Science, Institute for Social Research, University of Michigan.

Selection bias is a serious potential problem for inference about relationships of scientific interest based on samples without well-defined probability sampling mechanisms. Motivated by the potential for selection bias in: (a) estimated relationships of polygenic scores (PGSs) with phenotypes in genetic studies of volunteers and (b) estimated differences in subgroup means in surveys of smartphone users, we derive novel measures of selection bias for estimates of the coefficients in linear and probit regression models fitted to nonprobability samples, when aggregate-level auxiliary data are available for the selected sample and the target population. The measures arise from normal pattern-mixture models that allow analysts to examine the sensitivity of their inferences to assumptions about nonignorable selection in these samples. Read More

View Article and Full-Text PDF
September 2021

VCSEL: PRIORITIZING SNP-SET BY PENALIZED VARIANCE COMPONENT SELECTION.

Ann Appl Stat 2021 Dec 21;15(4):1652-1672. Epub 2021 Dec 21.

Department of Biostatistics, University of California, Los Angeles.

Single nucleotide polymorphism (SNP) set analysis aggregates both common and rare variants and tests for association between phenotype(s) of interest and a set. However, multiple SNP-sets, such as genes, pathways, or sliding windows are usually investigated across the whole genome in which all groups are tested separately, followed by multiple testing adjustments. We propose a novel method to prioritize SNP-sets in a joint multivariate variance component model. Read More

View Article and Full-Text PDF
December 2021

ZERO-INFLATED QUANTILE RANK-SCORE BASED TEST (ZIQRANK) WITH APPLICATION TO SCRNA-SEQ DIFFERENTIAL GENE EXPRESSION ANALYSIS.

Ann Appl Stat 2021 Dec 21;15(4):1673-1696. Epub 2021 Dec 21.

Department of Biostatistics, Columbia University.

Differential gene expression analysis based on scRNA-seq data is challenging due to two unique characteristics of scRNA-seq data. First, multimodality and other heterogeneity of the gene expression among different cell conditions lead to divergences in the tail events or crossings of the expression distributions. Second, scRNA-seq data generally have a considerable fraction of dropout events, causing zero inflation in the expression. Read More

View Article and Full-Text PDF
December 2021

Scalable penalized spatiotemporal land-use regression for ground-level nitrogen dioxide.

Ann Appl Stat 2021 Jun 12;15(2):688-710. Epub 2021 Jul 12.

Department of Statistics, Texas A&M University.

Nitrogen dioxide (NO) is a primary constituent of traffic-related air pollution and has well established harmful environmental and human-health impacts. Knowledge of the spatiotemporal distribution of NO is critical for exposure and risk assessment. A common approach for assessing air pollution exposure is linear regression involving spatially referenced covariates, known as land-use regression (LUR). Read More

View Article and Full-Text PDF

ESTROGEN RECEPTOR EXPRESSION ON BREAST CANCER PATIENTS' SURVIVAL UNDER SHAPE RESTRICTED COX REGRESSION MODEL.

Ann Appl Stat 2021 Sep;15(3):1291-1307

Department of Biostatistics, University of Texas MD Anderson Cancer Center.

For certain subtypes of breast cancer, study findings show that their level of estrogen receptor expression is associated with their risk of cancer death, and also suggests a non-linear effect on the hazard of death. A flexible form of the proportional hazards model, (∣ ) = () exp( )(), is desirable to facilitate a rich class of covariate effect on a survival outcome to provide meaningful insight, where the functional form of () is not specified except for its shape. Prior biologic knowledge on the shape of the underlying distribution of the covariate effect in regression models can be used to enhance statistical inference. Read More

View Article and Full-Text PDF
September 2021

A MULTIVARIATE SPATIOTEMPORAL CHANGE-POINT MODEL OF OPIOID OVERDOSE DEATHS IN OHIO.

Ann Appl Stat 2021 Sep 23;15(3):1329-1342. Epub 2021 Sep 23.

Center for Biostatistics, Department of Biomedical Informatics, Ohio State University.

Ohio is one of the states most impacted by the opioid epidemic and experienced the second highest age-adjusted fatal drug overdose rate in 2017. Initially it was believed prescription opioids were driving the opioid crisis in Ohio. However, as the epidemic evolved, opioid overdose deaths due to fentanyl have drastically increased. Read More

View Article and Full-Text PDF
September 2021

IDENTIFYING MAIN EFFECTS AND INTERACTIONS AMONG EXPOSURES USING GAUSSIAN PROCESSES.

Ann Appl Stat 2020 Dec 19;14(4):1743-1758. Epub 2020 Dec 19.

Department of Statistical Science, Duke University.

This article is motivated by the problem of studying the joint effect of different chemical exposures on human health outcomes. This is essentially a nonparametric regression problem, with interest being focused not on a black box for prediction but instead on selection of main effects and interactions. For interpretability we decompose the expected health outcome into a linear main effect, pairwise interactions and a nonlinear deviation. Read More

View Article and Full-Text PDF
December 2020

Identifying the Recurrence of Sleep Apnea Using A Harmonic Hidden Markov Model.

Ann Appl Stat 2021 Sep;15(3):1171-1193

School of Life Sciences, University of Warwick.

We propose to model time-varying periodic and oscillatory processes by means of a hidden Markov model where the states are defined through the spectral properties of a periodic regime. The number of states is unknown along with the relevant periodicities, the role and number of which may vary across states. We address this inference problem by a Bayesian nonparametric hidden Markov model assuming a sticky hierarchical Dirichlet process for the switching dynamics between different states while the periodicities characterizing each state are explored by means of a trans-dimensional Markov chain Monte Carlo sampling step. Read More

View Article and Full-Text PDF
September 2021

TENSOR QUANTILE REGRESSION WITH APPLICATION TO ASSOCIATION BETWEEN NEUROIMAGES AND HUMAN INTELLIGENCE.

Ann Appl Stat 2021 Sep 23;15(3):1455-1477. Epub 2021 Sep 23.

Department of Biostatistics, Yale University.

Human intelligence is usually measured by well-established psychometric tests through a series of problem solving. The recorded cognitive scores are continuous but usually heavy-tailed with potential outliers and violating the normality assumption. Meanwhile, magnetic resonance imaging (MRI) provides an unparalleled opportunity to study brain structures and cognitive ability. Read More

View Article and Full-Text PDF
September 2021

MODEL-BASED FEATURE SELECTION AND CLUSTERING OF RNA-SEQ DATA FOR UNSUPERVISED SUBTYPE DISCOVERY.

Ann Appl Stat 2021 Mar 18;15(1):481-508. Epub 2021 Mar 18.

University of North Carolina at Chapel Hill, NC, USA.

Clustering is a form of unsupervised learning that aims to uncover latent groups within data based on similarity across a set of features. A common application of this in biomedical research is in delineating novel cancer subtypes from patient gene expression data, given a set of informative genes. However, it is typically unknown what genes may be informative in discriminating between clusters, and what the optimal number of clusters are. Read More

View Article and Full-Text PDF

A COVARIANCE-ENHANCED APPROACH TO MULTI-TISSUE JOINT EQTL MAPPING WITH APPLICATION TO TRANSCRIPTOME-WIDE ASSOCIATION STUDIES.

Ann Appl Stat 2021 Jun 12;15(2):998-1016. Epub 2021 Jul 12.

Transcriptome-wide association studies based on genetically predicted gene expression have the potential to identify novel regions associated with various complex traits. It has been shown that incorporating expression quantitative trait loci (eQTLs) corresponding to multiple tissue types can improve power for association studies involving complex etiology. In this article, we propose a new multivariate response linear regression model and method for predicting gene expression in multiple tissues simultaneously. Read More

View Article and Full-Text PDF

ANALYZING SECOND ORDER STOCHASTICITY OF NEURAL SPIKING UNDER STIMULI-BUNDLE EXPOSURE.

Ann Appl Stat 2021 Mar 18;15(1):41-63. Epub 2021 Mar 18.

Department of Psychology and Neuroscience, Duke University.

Conventional analysis of neuroscience data involves computing average neural activity over a group of trials and/or a period of time. This approach may be particularly problematic when assessing the response patterns of neurons to more than one simultaneously presented stimulus. in such cases the brain must represent each individual component of the stimuli bundle, but trial-and-time-pooled averaging methods are fundamentally unequipped to address the means by which multiitem representation occurs. Read More

View Article and Full-Text PDF

MODEL FREE ESTIMATION OF GRAPHICAL MODEL USING GENE EXPRESSION DATA.

Ann Appl Stat 2021 Mar 18;15(1):194-207. Epub 2021 Mar 18.

Fred Hutchinson Cancer Research Center.

Graphical model is a powerful and popular approach to study high-dimensional omic data, such as genome-wide gene expression data. Nonlinear relations between genes are widely documented. However, partly due to sparsity of data points in high dimensional space (i. Read More

View Article and Full-Text PDF

Inferring a consensus problem list using penalized multistage models for ordered data.

Ann Appl Stat 2020 Sep 18;14(3):1557-1580. Epub 2020 Sep 18.

Division of Hematology Oncology, University of Michigan, USA.

A patient's medical problem list describes his or her current health status and aids in the coordination and transfer of care between providers. Because a problem list is generated once and then subsequently modified or updated, what is not usually observable is the provider-effect. That is, to what extent does a patient's problem in the electronic medical record actually reflect a consensus communication of that patient's current health status? To that end, we report on and analyze a unique interview-based design in which multiple medical providers independently generate problem lists for each of three patient case abstracts of varying clinical difficulty. Read More

View Article and Full-Text PDF
September 2020

INTEGRATIVE NETWORK LEARNING FOR MULTI-MODALITY BIOMARKER DATA.

Ann Appl Stat 2021 Mar 18;15(1):64-87. Epub 2021 Mar 18.

Department of Biostatistics, Mailman School of Public Health, Columbia University.

The biomarker networks measured by different modalities of data (e.g., structural magnetic resonance imaging (sMRI), diffusion tensor imaging (DTI)) may share the same true underlying biological model. Read More

View Article and Full-Text PDF

REGION-REFERENCED SPECTRAL POWER DYNAMICS OF EEG SIGNALS: A HIERARCHICAL MODELING APPROACH.

Ann Appl Stat 2020 Dec 19;14(4):2053-2068. Epub 2020 Dec 19.

Department of Biostatistics, UCLA Fielding School of Public Health, University of California, Los Angeles.

Functional brain imaging through electroencephalography (EEG) relies upon the analysis and interpretation of high-dimensional, spatially organized time series. We propose to represent time-localized frequency domain characterizations of EEG data as region-referenced functional data. This representation is coupled with a hierarchical regression modeling approach to multivariate functional observations. Read More

View Article and Full-Text PDF
December 2020

A BAYESIAN NONPARAMETRIC MODEL FOR INFERRING SUBCLONAL POPULATIONS FROM STRUCTURED DNA SEQUENCING DATA.

Ann Appl Stat 2021 Jun 12;15(2):925-951. Epub 2021 Jul 12.

Department of Mathematics and Statistics, University of Massachusetts Amherst.

There are distinguishing features or "hallmarks" of cancer that are found across tumors, individuals, and types of cancer, and these hallmarks can be driven by specific genetic mutations. Yet, within a single tumor there is often extensive genetic heterogeneity as evidenced by single-cell and bulk DNA sequencing data. The goal of this work is to jointly infer the underlying genotypes of tumor subpopulations and the distribution of those subpopulations in individual tumors by integrating single-cell and bulk sequencing data. Read More

View Article and Full-Text PDF

ESTIMATION AND INFERENCE IN METABOLOMICS WITH NON-RANDOM MISSING DATA AND LATENT FACTORS.

Ann Appl Stat 2020 Jun 29;14(2):789-808. Epub 2020 Jun 29.

University of Chicago.

High throughput metabolomics data are fraught with both non-ignorable missing observations and unobserved factors that influence a metabolite's measured concentration, and it is well known that ignoring either of these complications can compromise estimators. However, current methods to analyze these data can only account for the missing data or unobserved factors, but not both. We therefore developed MetabMiss, a statistically rigorous method to account for both non-random missing data and latent factors in high throughput metabolomics data. Read More

View Article and Full-Text PDF

LOG-CONTRAST REGRESSION WITH FUNCTIONAL COMPOSITIONAL PREDICTORS: LINKING PRETERM INFANT'S GUT MICROBIOME TRAJECTORIES TO NEUROBEHAVIORAL OUTCOME.

Ann Appl Stat 2020 Sep 18;14(3):1535-1556. Epub 2020 Sep 18.

University of Connecticut.

The neonatal intensive care unit (NICU) experience is known to be one of the most crucial factors that drive preterm infant's neurodevelopmental and health outcome. It is hypothesized that stressful early life experience of very preterm neonate is imprinting gut microbiome by the regulation of the so-called brain-gut axis, and consequently, certain microbiome markers are predictive of later infant neurodevelopment. To investigate, a preterm infant study was conducted; infant fecal samples were collected during the infants' first month of postnatal age, resulting in functional compositional microbiome data, and neurobehavioral outcomes were measured when infants reached 36-38 weeks of post-menstrual age. Read More

View Article and Full-Text PDF
September 2020

SPATIAL DISTRIBUTED LAG DATA FUSION FOR ESTIMATING AMBIENT AIR POLLUTION.

Ann Appl Stat 2021 Mar 18;15(1):323-342. Epub 2021 Mar 18.

School of Forestry and Environmental Studies, Department of Environmental Health Sciences, Yale University.

We introduce spatial (DLfuse) and spatiotemporal (DLfuseST) distributed lag data fusion methods for predicting point-level ambient air pollution concentrations, using, as input, gridded average pollution estimates from a deterministic numerical air quality model. The methods incorporate predictive information from grid cells surrounding the prediction location of interest and are shown to collapse to existing downscaling approaches when this information adds no benefit. The spatial lagged parameters are allowed to vary spatially/spatiotemporally to accommodate the setting where surrounding geographic information is useful in one area/time but not in another. Read More

View Article and Full-Text PDF

GENERALIZED ACCELERATED RECURRENCE TIME MODEL IN THE PRESENCE OF A DEPENDENT TERMINAL EVENT.

Ann Appl Stat 2020 Jun 29;14(2):956-976. Epub 2020 Jun 29.

Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, U.S.A.

Recurrent events are commonly encountered in longitudinal studies. The observation of recurrent events is often stopped by a dependent terminal event in practice. For this data scenario, we propose two sensible adaptations of the generalized accelerated recurrence time (GART) model (Sun et al. Read More

View Article and Full-Text PDF

SEQUENTIAL IMPORTANCE SAMPLING FOR MULTIRESOLUTION KINGMAN-TAJIMA COALESCENT COUNTING.

Ann Appl Stat 2020 Jun;14(2):727-751

Stanford University.

Statistical inference of evolutionary parameters from molecular sequence data relies on coalescent models to account for the shared genealogical ancestry of the samples. However, inferential algorithms do not scale to available data sets. A strategy to improve computational efficiency is to rely on simpler coalescent and mutation models, resulting in smaller hidden state spaces. Read More

View Article and Full-Text PDF

ESTIMATING CAUSAL EFFECTS IN STUDIES OF HUMAN BRAIN FUNCTION: NEW MODELS, METHODS AND ESTIMANDS.

Ann Appl Stat 2020 Mar 16;14(1):452-472. Epub 2020 Apr 16.

Department of Biostatistics, Johns Hopkins University.

Neuroscientists often use functional magnetic resonance imaging (fMRI) to infer effects of treatments on neural activity in brain regions. In a typical fMRI experiment, each subject is observed at several hundred time points. At each point, the blood oxygenation level dependent (BOLD) response is measured at 100,000 or more locations (voxels). Read More

View Article and Full-Text PDF

Accounting for Smoking in Forecasting Mortality and Life Expectancy.

Ann Appl Stat 2021 Mar 18;15(1):437-459. Epub 2021 Mar 18.

University of Washington.

Smoking is one of the main risk factors that has affected human mortality and life expectancy over the past century. Smoking accounts for a large part of the nonlinearities in the growth of life expectancy and of the geographic and sex differences in mortality. As Bongaarts (2006) and Janssen (2018) suggested, accounting for smoking could improve the quality of mortality forecasts due to the predictable nature of the smoking epidemic. Read More

View Article and Full-Text PDF