Nonparametric regression is a fundamental problem in statistics but challenging when the independent variable is measured with error. Among the first approaches was an extension of deconvoluting kernel density estimators for homescedastic measurement error. The main contribution of this article is to propose a new simulation-based nonparametric regression estimator for the heteroscedastic measurement error case. Read More
Panel-count data arise when each study subject is observed only at discrete time points in a recurrent event study, and only the numbers of the event of interest between observation time points are recorded (Sun and Zhao, 2013). However, sometimes the exact number of events between some observation times is unknown and what we know is only whether the event of interest has occurred. In this article, we will refer this type of data to as mixed panel-count data and propose a likelihood-based semiparametric regression method for their analysis by using the nonhomogeneous Poisson process assumption. Read More
Department of Biostatistics, Harvard University, Boston, Massachusetts 02115, U.S.A.
In comparing two treatments with the event time observations, the hazard ratio (HR) estimate is routinely used to quantify the treatment difference. However, this model dependent estimate may be difficult to interpret clinically especially when the proportional hazards (PH) assumption is violated. An alternative estimation procedure for treatment efficacy based on the restricted means survival time or t-year mean survival time (t-MST) has been discussed extensively in the statistical and clinical literature. Read More
Division of Epidemiology and Biostatistics, Department of Psychiatry, University of Illinois at Chicago, Chicago, Illinois 60612, U.S.A.
A unified statistical methodology of sample size determination is developed for hierarchical designs that are frequently used in many areas, particularly in medical and health research studies. The solid foundation of the proposed methodology opens a new horizon for power analysis in presence of various conditions. Important features such as joint significance testing, unequal allocations of clusters across intervention groups, and differential attrition rates over follow up time points are integrated to address some useful questions that investigators often encounter while conducting such studies. Read More
Department of Mathematics and Statistics, University of Otago, New Zealand.
The standard approach to fitting capture-recapture data collected in continuous time involves arbitrarily forcing the data into a series of distinct discrete capture sessions. We show how continuous-time models can be fitted as easily as discrete-time alternatives. The likelihood is factored so that efficient Markov chain Monte Carlo algorithms can be implemented for Bayesian estimation, available online in the R package ctime. Read More
In clinical studies with time-to-event outcomes, the restricted mean survival time (RMST) has attracted substantial attention as a summary measurement for its straightforward clinical interpretation. When the data are subject to length-biased sampling, which is frequently encountered in observational cohort studies, existing methods to estimate the RMST are not applicable. In this article, we consider nonparametric and semiparametric regression methods to estimate the RMST under the setting of length-biased sampling. Read More
Doubly truncated data arise when event times are observed only if they fall within subject-specific, possibly random, intervals. While non-parametric methods for survivor function estimation using doubly truncated data have been intensively studied, only a few methods for fitting regression models have been suggested, and only for a limited number of covariates. In this article, we present a method to fit the Cox regression model to doubly truncated data with multiple discrete and continuous covariates, and describe how to implement it using existing software. Read More
We consider a functional linear Cox regression model for characterizing the association between time-to-event data and a set of functional and scalar predictors. The functional linear Cox regression model incorporates a functional principal component analysis for modeling the functional predictors and a high-dimensional Cox regression model to characterize the joint effects of both functional and scalar predictors on the time-to-event data. We develop an algorithm to calculate the maximum approximate partial likelihood estimates of unknown finite and infinite dimensional parameters. Read More
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.
Survival data collected from a prevalent cohort are subject to left truncation and the analysis is challenging. Conditional approaches for left-truncated data could be inefficient as they ignore the information in the marginal likelihood of the truncation times. Length-biased sampling methods may improve the estimation efficiency but only when the underlying truncation time is uniform; otherwise, they may generate biased estimates. Read More
Genotype eigenvectors are widely used as covariates for control of spurious stratification in genetic association. Significance testing for the accompanying eigenvalues has typically been based on a standard Tracy-Widom limiting distribution for the largest eigenvalue, derived under white-noise assumptions. It is known that even modest local correlation among markers inflates the largest eigenvalues, even in the absence of true stratification. Read More
Sightings of previously marked animals can extend a capture-recapture dataset without the added cost of capturing new animals for marking. Combined marking and resighting methods are therefore an attractive option in animal population studies, and there exist various likelihood-based non-spatial models, and some spatial versions fitted by Markov chain Monte Carlo sampling. As implemented to date, the focus has been on modeling sightings only, which requires that the spatial distribution of pre-marked animals is known. Read More
Random-effects meta-analyses are very commonly used in medical statistics. Recent methodological developments include multivariate (multiple outcomes) and network (multiple treatments) meta-analysis. Here, we provide a new model and corresponding estimation procedure for multivariate network meta-analysis, so that multiple outcomes and treatments can be included in a single analysis. Read More
Department of Biostatistics, University of Washington, Seattle, Washington, U.S.A.
Clinical practice may be enhanced by use of person-level information that could guide treatment choice and lead to better outcomes for both treated individuals and for the population. The scientific challenge is to identify and validate those factors that can reliably be used to target treatment, and to accurately quantify the expected treatment benefit as a function of candidate markers. Our proposal is to explicitly focus on smooth non-parametric evaluation of a canonical single index score that estimates the expected treatment benefit associated with patient characteristics. Read More
We propose a C-index (index of concordance) applicable to recurrent event data. The present work addresses the dearth of measures for quantifying a regression model's ability to discriminate with respect to recurrent event risk. The data which motivated the methods arise from the Dialysis Outcomes and Practice Patterns Study (DOPPS), a long-running prospective international study of end-stage renal disease patients on hemodialysis. Read More
Epidemiologic studies and disease prevention trials often seek to relate an exposure variable to a failure time that suffers from interval-censoring. When the failure rate is low and the time intervals are wide, a large cohort is often required so as to yield reliable precision on the exposure-failure-time relationship. However, large cohort studies with simple random sampling could be prohibitive for investigators with a limited budget, especially when the exposure variables are expensive to obtain. Read More
The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, U.S.A.
We consider a research scenario motivated by integrating multiple sources of information for better knowledge discovery in diverse dynamic biological processes. Given two longitudinal high-dimensional datasets for a group of subjects, we want to extract shared latent trends and identify relevant features. To solve this problem, we present a new statistical method named as joint principal trend analysis (JPTA). Read More
Department of Biostatistics, University of Florida, Gainesville, Florida, U.S.A.
Phosphorylated proteins provide insight into tumor etiology and are used as diagnostic, prognostic, and therapeutic markers of complex diseases. However, pre-analytic variations, such as freezing delay after biopsy acquisition, often occur in real hospital settings and potentially lead to inaccurate results. The objective of this work is to develop statistical methodology to assess the stability of phosphorylated proteins under short-time cold ischemia. Read More
Precision medicine seeks to provide treatment only if, when, to whom, and at the dose it is needed. Thus, precision medicine is a vehicle by which healthcare can be made both more effective and efficient. Individualized treatment rules operationalize precision medicine as a map from current patient information to a recommended treatment. Read More
Structural nested failure time models (SNFTMs) are models for the effect of a time-dependent exposure on a survival outcome. They have been introduced along with so-called G-estimation methods to provide valid adjustment for time-dependent confounding induced by time-varying variables. Adjustment for informative censoring in SNFTMs is possible via inverse probability of censoring weighting (IPCW). Read More
There is an increasing need to construct a risk-prediction scoring system for survival data and identify important risk factors (e.g., biomarkers) for patient screening and treatment recommendation. Read More
Assessing agreement is often of interest in biomedical and clinical research when measurements are obtained on the same subjects by different raters or methods. Most classical agreement methods have been focused on global summary statistics, which cannot be used to describe various local agreement patterns. The objective of this work is to study the local agreement pattern between two continuous measurements subject to censoring. Read More
Somatic mutations are the driving forces for tumor development, and recent advances in cancer genome sequencing have made it feasible to evaluate the association between somatic mutations and cancer-related traits in large sample sizes. However, despite increasingly large sample sizes, it remains challenging to conduct statistical analysis for somatic mutations, because the vast majority of somatic mutations occur at very low frequencies. Furthermore, cancer is a complex disease and it is often accompanied by multiple traits that reflect various aspects of cancer; how to combine the information of these traits to identify important somatic mutations poses additional challenges. Read More
Many survival studies have error-contaminated covariates due to the lack of a gold standard of measurement. Furthermore, the error distribution can depend on the true covariates but the structure may be difficult to characterize; heteroscedasticity is a common manifestation. We suggest a novel dependent measurement error model with minimal assumptions on the dependence structure, and propose a new functional modeling method for Cox regression when an instrumental variable is available. Read More
It is often of interest to compare centers or healthcare providers on quality of care delivered. We consider the setting where evaluation of center performance on multiple competing events is of interest. We propose estimating center effects through cause-specific proportional hazards frailty models that allow correlation among a center's cause-specific effects. Read More
We introduce a non-myopic, covariate-adjusted response adaptive (CARA) allocation design for multi-armed clinical trials. The allocation scheme is a computationally tractable procedure based on the Gittins index solution to the classic multi-armed bandit problem and extends the procedure recently proposed in Villar et al. (2015). Read More
N-mixture models describe count data replicated in time and across sites in terms of abundance N and detectability p. They are popular because they allow inference about N while controlling for factors that influence p without the need for marking animals. Using a capture-recapture perspective, we show that the loss of information that results from not marking animals is critical, making reliable statistical modeling of N and p problematic using just count data. Read More
This article focuses on the evaluation of vaccine-induced immune responses as principal surrogate markers for predicting a given vaccine's effect on the clinical endpoint of interest. To address the problem of missing potential outcomes under the principal surrogate framework, we can utilize baseline predictors of the immune biomarker(s) or vaccinate uninfected placebo recipients at the end of the trial and measure their immune biomarkers. Examples of good baseline predictors are baseline immune responses when subjects enrolled in the trial have been previously exposed to the same antigen, as in our motivating application of the Zostavax Efficacy and Safety Trial (ZEST). Read More
We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. Read More
The efficiency of doubly robust estimators of the average causal effect (ACE) of a treatment can be improved by including in the treatment and outcome models only those covariates which are related to both treatment and outcome (i.e., confounders) or related only to the outcome. Read More
Sparse capture-recapture data from open populations are difficult to analyze using currently available frequentist statistical methods. However, in closed capture-recapture experiments, the Chao sparse estimator (Chao, 1989, Biometrics 45, 427-438) may be used to estimate population sizes when there are few recaptures. Here, we extend the Chao (1989) closed population size estimator to the open population setting by using linear regression and extrapolation techniques. Read More
Glimm et al. (2010) and Tamhane et al. (2010) studied the problem of testing a primary and a secondary endpoint, subject to a gatekeeping constraint, using a group sequential design (GSD) with K=2 looks. Read More
When fitting regression models, measurement error in any of the predictors typically leads to biased coefficients and incorrect inferences. A plethora of methods have been proposed to correct for this. Obtaining standard errors and confidence intervals using the corrected estimators can be challenging and, in addition, there is concern about remaining bias in the corrected estimators. Read More
In practice, both testable and untestable assumptions are generally required to draw inference about the mean outcome measured at the final scheduled visit in a repeated measures study with drop-out. Scharfstein et al. (2014) proposed a sensitivity analysis methodology to determine the robustness of conclusions within a class of untestable assumptions. Read More
Length-biased survival data subject to right-censoring are often collected from a prevalent cohort. However, informative right censoring induced by the sampling design creates challenges in methodological development. While certain conditioning arguments could circumvent the problem of informative censoring, related rank estimation methods are typically inefficient because the marginal likelihood of the backward recurrence time is not ancillary. Read More
While data transformation is a common strategy to satisfy linear modeling assumptions, a theoretical result is used to show that transformation cannot reasonably be expected to stabilize variances for small counts. Under broad assumptions, as counts get smaller, it is shown that the variance becomes proportional to the mean under monotonic transformations g(·) that satisfy g(0)=0, excepting a few pathological cases. A suggested rule-of-thumb is that if many predicted counts are less than one then data transformation cannot reasonably be expected to stabilize variances, even for a well-chosen transformation. Read More
Now over 20 years old, functional MRI (fMRI) has a large and growing literature that is best synthesised with meta-analytic tools. As most authors do not share image data, only the peak activation coordinates (foci) reported in the article are available for Coordinate-Based Meta-Analysis (CBMA). Neuroimaging meta-analysis is used to (i) identify areas of consistent activation; and (ii) build a predictive model of task type or cognitive process for new studies (reverse inference). Read More
This article proposes an efficient approach to screening genes associated with a phenotypic variable of interest in genomic studies with subgroups. In order to capture and detect various association profiles across subgroups, we flexibly estimate the underlying effect size distribution across subgroups using a semi-parametric hierarchical mixture model for subgroup-specific summary statistics from independent subgroups. We then perform gene ranking and selection using an optimal discovery procedure based on the fitted model with control of false discovery rate. Read More
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. Read More
The use of instrumental variables for estimating the effect of an exposure on an outcome is popular in econometrics, and increasingly so in epidemiology. This increasing popularity may be attributed to the natural occurrence of instrumental variables in observational studies that incorporate elements of randomization, either by design or by nature (e.g. Read More
Next generation sequencing panels are being used increasingly in cancer research to study tumor evolution. A specific statistical challenge is to compare the mutational profiles in different tumors from a patient to determine the strength of evidence that the tumors are clonally related, that is, derived from a single, founder clonal cell. The presence of identical mutations in each tumor provides evidence of clonal relatedness, although the strength of evidence from a match is related to how commonly the mutation is seen in the tumor type under investigation. Read More
Batch marking provides an important and efficient way to estimate the survival probabilities and population sizes of wild animals. It is particularly useful when dealing with animals that are difficult to mark individually. For the first time, we provide the likelihood for extended batch-marking experiments. Read More
Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Read More
A critical component of longitudinal study design involves determining the sampling schedule. Criteria for optimal design often focus on accurate estimation of the mean profile, although capturing the between-subject variance of the longitudinal process is also important since variance patterns may be associated with covariates of interest or predict future outcomes. Existing design approaches have limited applicability when one wishes to optimize sampling schedules to capture between-individual variability. Read More
Advanced hepatocellular carcinoma (HCC) has limited treatment options and poor survival, therefore early detection is critical to improving the survival of patients with HCC. Current guidelines for high-risk patients include ultrasound screenings every six months, but ultrasounds are operator dependent and not sensitive for early HCC. Serum α-Fetoprotein (AFP) is a widely used diagnostic biomarker, but it has limited sensitivity and is not elevated in all HCC cases so, we incorporate a second blood-based biomarker, des'γ carboxy-prothrombin (DCP), that has shown potential as a screening marker for HCC. Read More
Many studies of biomedical time series signals aim to measure the association between frequency-domain properties of time series and clinical and behavioral covariates. However, the time-varying dynamics of these associations are largely ignored due to a lack of methods that can assess the changing nature of the relationship through time. This article introduces a method for the simultaneous and automatic analysis of the association between the time-varying power spectrum and covariates, which we refer to as conditional adaptive Bayesian spectrum analysis (CABS). Read More