584 results match your criteria Biometrika[Journal]


Model-free approach to quantifying the proportion of treatment effect explained by a surrogate marker.

Biometrika 2020 Mar 24;107(1):107-122. Epub 2019 Dec 24.

Department of Biostatistics, Harvard University, 655 Huntington Avenue, Boston, Massachusetts 02115, U.S.A.

In randomized clinical trials, the primary outcome, , often requires long-term follow-up and/or is costly to measure. For such settings, it is desirable to use a surrogate marker, , to infer the treatment effect on , Δ. Identifying such an and quantifying the proportion of treatment effect on explained by the effect on are thus of great importance. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz065DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7315285PMC

Consistency for the tree bootstrap in respondent-driven sampling.

Biometrika 2020 Jun 24;107(2):497-504. Epub 2020 Jan 24.

Department of Statistics, University of Washington, Seattle, Washington 98195-4322, USA.

Respondent-driven sampling is an approach for estimating features of populations that are difficult to access using standard survey tools, e.g., the fraction of injection drug users who are HIV positive. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz067DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7228542PMC

Ensemble estimation and variable selection with semiparametric regression models.

Biometrika 2020 Jun 15;107(2):433-448. Epub 2020 Apr 15.

Department of Biostatistics, CB# 7420, University of North Carolina, Chapel Hill, North Carolina 27599, U.S.A.

We consider scenarios in which the likelihood function for a semiparametric regression model factors into separate components, with an efficient estimator of the regression parameter available for each component. An optimal weighted combination of the component estimators, named an ensemble estimator, may be employed as an overall estimate of the regression parameter, and may be fully efficient under uncorrelatedness conditions. This approach is useful when the full likelihood function may be difficult to maximize, but the components are easy to maximize. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asaa012DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7228544PMC

Adaptive nonparametric regression with the -nearest neighbour fused lasso.

Biometrika 2020 Jun 29;107(2):293-310. Epub 2020 Jan 29.

Department of Statistics, University of Washington, Seattle, Washington, U.S.A.

The fused lasso, also known as total-variation denoising, is a locally adaptive function estimator over a regular grid of design points. In this article, we extend the fused lasso to settings in which the points do not occur on a regular grid, leading to a method for nonparametric regression. This approach, which we call the [Formula: see text]-nearest-neighbours fused lasso, involves computing the [Formula: see text]-nearest-neighbours graph of the design points and then performing the fused lasso over this graph. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz071DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7228543PMC

Bayesian constraint relaxation.

Biometrika 2020 Mar 24;107(1):191-204. Epub 2019 Dec 24.

Department of Statistics, University of California, Los Angeles, 8125 Math Sciences Building, Los Angeles, California 90095, U.S.A.

Prior information often takes the form of parameter constraints. Bayesian methods include such information through prior distributions having constrained support. By using posterior sampling algorithms, one can quantify uncertainty without relying on asymptotic approximations. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz069DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7017863PMC

Integrative linear discriminant analysis with guaranteed error rate improvement.

Authors:
Quefeng Li Lexin Li

Biometrika 2018 Dec 22;105(4):917-930. Epub 2018 Oct 22.

Division of Biostatistics, University of California at Berkeley, 50 University Hall 7360, Berkeley, California 94720, U.S.A.

Multiple types of data measured on a common set of subjects arise in many areas. Numerous empirical studies have found that integrative analysis of such data can result in better statistical performance in terms of prediction and feature selection. However, the advantages of integrative analysis have mostly been demonstrated empirically. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asy047DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6874859PMC
December 2018

On nonparametric maximum likelihood estimation with double truncation.

Authors:
J Xiao M G Hudgens

Biometrika 2019 Dec 23;106(4):989-996. Epub 2019 Jul 23.

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.

Doubly truncated survival data arise if failure times are observed only within certain time intervals. The nonparametric maximum likelihood estimator is widely used to estimate the underlying failure time distribution. Using a directed graph representation of the data suggested by Vardi (1985), a certain graphical condition holds if and only if the nonparametric maximum likelihood estimate exists and is unique. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz038DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6845852PMC
December 2019

Accounting for unobserved covariates with varying degrees of estimability in high-dimensional biological data.

Biometrika 2019 Dec 16;106(4):823-840. Epub 2019 Sep 16.

Department of Statistics, University of Chicago, 5747 S. Ellis Avenue, Chicago, Illinois, U.S.A.

An important phenomenon in high-throughput biological data is the presence of unobserved covariates that can have a significant impact on the measured response. When these covariates are also correlated with the covariate of interest, ignoring or improperly estimating them can lead to inaccurate estimates of and spurious inference on the corresponding coefficients of interest in a multivariate linear model. We first prove that existing methods to account for these unobserved covariates often inflate Type I error for the null hypothesis that a given coefficient of interest is zero. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz037DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6845853PMC
December 2019

Nonidentifiability in the presence of factorization for truncated data.

Biometrika 2019 Sep 13;106(3):724-731. Epub 2019 May 13.

Department of Biostatistics and Epidemiology, University of Massachusetts, 715 N. Pleasant Street, Amherst, Massachusetts 01003, USA.

A time to event, [Formula: see text], is left-truncated by [Formula: see text] if [Formula: see text] can be observed only if [Formula: see text]. This often results in oversampling of large values of [Formula: see text], and necessitates adjustment of estimation procedures to avoid bias. Simple risk-set adjustments can be made to standard risk-set-based estimators to accommodate left truncation when [Formula: see text] and [Formula: see text] are quasi-independent. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz023DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6690171PMC
September 2019

Optimal designs for frequentist model averaging.

Biometrika 2019 Sep 13;106(3):665-682. Epub 2019 Jul 13.

Fakultät für Mathematik, Ruhr-Universität Bochum, Bochum, Germany.

We consider the problem of designing experiments for estimating a target parameter in regression analysis when there is uncertainty about the parametric form of the regression function. A new optimality criterion is proposed that chooses the experimental design to minimize the asymptotic mean squared error of the frequentist model averaging estimate. Necessary conditions for the optimal solution of a locally and Bayesian optimal design problem are established. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz036DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6690170PMC
September 2019

Statistical inference of genetic pathway analysis in high dimensions.

Biometrika 2019 Sep 13;106(3):651. Epub 2019 Jul 13.

Public Health Sciences Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, Washington, U.S.A.

Genetic pathway analysis has become an important tool for investigating the association between a group of genetic variants and traits. With dense genotyping and extensive imputation, the number of genetic variants in biological pathways has increased considerably and sometimes exceeds the sample size [Formula: see text]. Conducting genetic pathway analysis and statistical inference in such settings is challenging. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz033DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6690174PMC
September 2019
1 Read

Homogeneity tests of covariance matrices with high-dimensional longitudinal data.

Biometrika 2019 Sep 24;106(3):619-634. Epub 2019 May 24.

Department of Statistics and Probability, Michigan State University, 619 Red Cedar Road, East Lansing, Michigan 48824, USA.

This paper deals with the detection and identification of changepoints among covariances of high-dimensional longitudinal data, where the number of features is greater than both the sample size and the number of repeated measurements. The proposed methods are applicable under general temporal-spatial dependence. A new test statistic is introduced for changepoint detection, and its asymptotic distribution is established. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz011DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6690172PMC
September 2019
1 Read

Generalized meta-analysis for multiple regression models across studies with disparate covariate information.

Biometrika 2019 Sep 13;106(3):567-585. Epub 2019 Jul 13.

Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 615 N. Wolfe Street, Baltimore, Maryland, U.S.A.

Meta-analysis is widely popular for synthesizing information on common parameters of interest across multiple studies because of its logistical convenience and statistical efficiency. We develop a generalized meta-analysis approach to combining information on multivariate regression parameters across multiple studies that have varying levels of covariate information. Using algebraic relationships among regression parameters in different dimensions, we specify a set of moment equations for estimating parameters of a maximal model through information available from sets of parameter estimates for a series of reduced models from the different studies. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz030DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6690173PMC
September 2019

Nonparametric regression with adaptive truncation via a convex hierarchical penalty.

Biometrika 2019 Mar 13;106(1):87-107. Epub 2018 Dec 13.

Department of Biostatistics, University of Washington, 1705 NE Pacific Street, Seattle, Washington, USA.

We consider the problem of nonparametric regression with a potentially large number of covariates. We propose a convex, penalized estimation framework that is particularly well suited to high-dimensional sparse additive models and combines the appealing features of finite basis representation and smoothing penalties. In the case of additive models, a finite basis representation provides a parsimonious representation for fitted functions but is not adaptive when component functions possess different levels of complexity. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asy056DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6691776PMC

Identifiability and estimation of structural vector autoregressive models for subsampled and mixed-frequency time series.

Biometrika 2019 Jun 8;106(2):433-452. Epub 2019 Apr 8.

Department of Biostatistics, University of Washington, Box 357232, Seattle, Washington 98195, USA.

Causal inference in multivariate time series is challenging because the sampling rate may not be as fast as the time scale of the causal interactions, so the observed series is a subsampled version of the desired series. Furthermore, series may be observed at different sampling rates, yielding mixed-frequency series. To determine instantaneous and lagged effects between series at the causal scale, we take a model-based approach that relies on structural vector autoregressive models. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz007DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6508036PMC
June 2019
1 Read

Sufficient direction factor model and its application to gene expression quantitative trait loci discovery.

Authors:
F Jiang Y Ma Y Wei

Biometrika 2019 Jun 22;106(2):417-432. Epub 2019 Apr 22.

Department of Biostatistics, Columbia University, 722 West 168th St, New York, New York 10032, USA.

Rapid improvement in technology has made it relatively cheap to collect genetic data, however statistical analysis of existing data is still much cheaper. Thus, secondary analysis of single-nucleotide polymorphism, SNP, data, i.e. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz010DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6508038PMC
June 2019
1 Read

Differential Markov random field analysis with an application to detecting differential microbial community networks.

Authors:
T T Cai H Li J Ma Y Xia

Biometrika 2019 Jun 22;106(2):401-416. Epub 2019 Apr 22.

Department of Statistics, School of Management, Fudan University, Shanghai 200433, China.

Micro-organisms such as bacteria form complex ecological community networks that can be greatly influenced by diet and other environmental factors. Differential analysis of microbial community structures aims to elucidate systematic changes during an adaptive response to changes in environment. In this paper, we propose a flexible Markov random field model for microbial network structure and introduce a hypothesis testing framework for detecting differences between networks, also known as differential network analysis. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz012DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6508037PMC
June 2019
1 Read

Pseudo-population bootstrap methods for imputed survey data.

Biometrika 2019 Jun 3;106(2):369-384. Epub 2019 Apr 3.

Department of Mathematics and Statistics, University of Winnipeg, 515 Portage Avenue, Winnipeg, Manitoba R3B 2E9, Canada.

The most common way to treat item nonresponse in surveys is to replace a missing value by a plausible value constructed on the basis of fully observed variables. Treating the imputed values as if they were observed may lead to invalid inferences. Bootstrap variance estimators for various finite population parameters are obtained using two pseudo-population bootstrap schemes. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz001DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6508281PMC
June 2019
2 Reads

Spectral density estimation for random fields via periodic embeddings.

Authors:
Joseph Guinness

Biometrika 2019 Jun 3;106(2):267-286. Epub 2019 Apr 3.

Department of Statistical Science, Cornell University, 1178 Comstock Hall, Ithaca, New York 14853, U.S.A.

We introduce methods for estimating the spectral density of a random field on a [Formula: see text]-dimensional lattice from incomplete gridded data. Data are iteratively imputed onto an expanded lattice according to a model with a periodic covariance function. The imputations are convenient computationally, in that circulant embedding and preconditioned conjugate gradient methods can produce imputations in [Formula: see text] time and [Formula: see text] memory. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asz004DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6508039PMC

Covariate association eliminating weights: a unified weighting framework for causal effect estimation.

Authors:
Sean Yiu Li Su

Biometrika 2018 Sep;105(3):709-722

Medical Research Council Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Robinson Way, Cambridge CB2 0SR, U.K.

Weighting methods offer an approach to estimating causal treatment effects in observational studies. However, if weights are estimated by maximum likelihood, misspecification of the treatment assignment model can lead to weighted estimators with substantial bias and variance. In this paper, we propose a unified framework for constructing weights such that a set of measured pretreatment covariates is unassociated with treatment assignment after weighting. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asy015DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6481550PMC
September 2018
2 Reads

Counting process-based dimension reduction methods for censored outcomes.

Biometrika 2019 Mar 7;106(1):181-196. Epub 2019 Jan 7.

Department of Biostatistics, University of North Carolina at Chapel Hill, 3101 McGavran-Greenberg Hall, Chapel Hill, North Carolina, USA.

We propose counting process-based dimension reduction methods for right-censored survival data. Semiparametric estimating equations are constructed to estimate the dimension reduction subspace for the failure time model. Our methods address two limitations of existing approaches. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asy064DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373420PMC
March 2019
4 Reads

Constrained likelihood for reconstructing a directed acyclic Gaussian graph.

Biometrika 2019 Mar 13;106(1):109-125. Epub 2018 Dec 13.

Department of Industrial and Systems Engineering, University of Minnesota, 111 Church St S.E., Minneapolis, Minnesota, U.S.A.

Directed acyclic graphs are widely used to describe directional pairwise relations. Such relations are estimated by reconstructing a directed acyclic graph's structure, which is challenging when the ordering of nodes of the graph is unknown. In such a situation, existing methods such as the neighbourhood and search-and-score methods have high estimation errors or computational complexities, especially when a local or sequential approach is used to enumerate edge directions by testing or optimizing a criterion locally, as a local method may break down even for moderately sized graphs. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asy057DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373419PMC
March 2019
1 Read

Discussion of 'Gene hunting with hidden Markov model knockoffs'.

Biometrika 2019 Mar 13;106(1):23-26. Epub 2019 Feb 13.

Departments of Statistics and Biostatistics, University of Washington, Seattle, Washington, U.S.A.

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asy061DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373413PMC
March 2019
1 Read

Gene hunting with hidden Markov model knockoffs.

Biometrika 2019 Mar 4;106(1):1-18. Epub 2018 Aug 4.

Department of Statistics, Stanford University, 390 Serra Mall, Stanford, California, USA.

Modern scientific studies often require the identification of a subset of explanatory variables. Several statistical methods have been developed to automate this task, and the framework of knockoffs has been proposed as a general solution for variable selection under rigorous Type I error control, without relying on strong modelling assumptions. In this paper, we extend the methodology of knockoffs to problems where the distribution of the covariates can be described by a hidden Markov model. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asy033DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373422PMC
March 2019
1 Read

Targeted learning ensembles for optimal individualized treatment rules with time-to-event outcomes.

Biometrika 2018 Sep 7;105(3):723-738. Epub 2018 May 7.

Division of Biostatistics, Weill Cornell Medicine, 402 East 67th Street, New York, New York, U.S.A.

We consider estimation of an optimal individualized treatment rule when a high-dimensional vector of baseline variables is available. Our optimality criterion is with respect to delaying the expected time to occurrence of an event of interest. We use semiparametric efficiency theory to construct estimators with properties such as double robustness. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asy017DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6374011PMC
September 2018
1 Read

Optimal pseudolikelihood estimation in the analysis of multivariate missing data with nonignorable nonresponse.

Biometrika 2018 Jun 28;105(2):479-486. Epub 2018 Feb 28.

Department of Statistics, Pennsylvania State University, University Park, Pennsylvania, U.S.A.

Tang et al. (2003) considered a regression model with missing response, where the missingness mechanism depends on the value of the response variable and hence is nonignorable. They proposed three pseudolikelihood estimators, based on different treatments of the probability distribution of the completely observed covariates. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asy007DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373018PMC
June 2018
2 Reads

Joint testing and false discovery rate control in high-dimensional multivariate regression.

Biometrika 2018 Jun 16;105(2):249-269. Epub 2018 Feb 16.

Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, U.S.A.

Multivariate regression with high-dimensional covariates has many applications in genomic and genetic research, in which some covariates are expected to be associated with multiple responses. This paper considers joint testing for regression coefficients over multiple responses and develops simultaneous testing methods with false discovery rate control. The test statistic is based on inverse regression and bias-corrected group lasso estimates of the regression coefficients and is shown to have an asymptotic chi-squared null distribution. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx085DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6374004PMC
June 2018
1 Read

Scalar-on-Image Regression via the Soft-Thresholded Gaussian Process.

Biometrika 2018 Mar 19;105(1):165-184. Epub 2018 Jan 19.

Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, U.S.A.

This work concerns spatial variable selection for scalar-on-image regression. We propose a new class of Bayesian nonparametric models and develop an efficient posterior computational aigorithm. The proposed soft-thresholded Gaussian process provides large prior support over the class of piecewise-smooth, sparse, and continuous spatially-varying regression coefficient functions. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx075DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6345249PMC
March 2018
3 Reads

The Change-Plane Cox Model.

Biometrika 2018 Dec 17;105(4):891-903. Epub 2018 Oct 17.

Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599, U.S.A.,

We propose a projection pursuit technique in survival analysis for finding lower-dimensional projections that exhibit differentiated survival outcome. This idea is formally introduced as the change-plane Cox model, a non-regular Cox model with a change-plane in the covariate space dividing the population into two subgroups whose hazards are proportional. The proposed technique offers a potential framework for principled subgroup discovery. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asy050DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6289527PMC
December 2018
2 Reads

Robust estimation of high-dimensional covariance and precision matrices.

Biometrika 2018 Jun 27;105(2):271-284. Epub 2018 Mar 27.

Department of Biostatistics, University of North Carolina at Chapel Hill, 3105D McGavran-Greenberg Hall, Chapel Hill, North Carolina 27599, U.S.A.

High-dimensional data are often most plausibly generated from distributions with complex structure and leptokurtosis in some or all components. Covariance and precision matrices provide a useful summary of such structure, yet the performance of popular matrix estimators typically hinges upon a sub-Gaussianity assumption. This paper presents robust matrix estimators whose performance is guaranteed for a much richer class of distributions. Read More

View Article

Download full-text PDF

Source
https://academic.oup.com/biomet/article/105/2/271/4955410
Publisher Site
http://dx.doi.org/10.1093/biomet/asy011DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6188670PMC
June 2018
5 Reads

Sequential rerandomization.

Biometrika 2018 Sep 24;105(3):745-752. Epub 2018 Jun 24.

Department of Statistics, University of Wisconsin-Madison, 1300 University Ave., Madison, Wisconsin 53706, U.S.A.

The seminal work of Morgan & Rubin (2012) considers rerandomization for all the units at one time.In practice, however, experimenters may have to rerandomize units sequentially. For example, a clinician studying a rare disease may be unable to wait to perform an experiment until all the experimental units are recruited. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asy031DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6109990PMC
September 2018
2 Reads

Theoretical limits of microclustering for record linkage.

Biometrika 2018 Jun 19;105(2):431-446. Epub 2018 Mar 19.

Department of Statistical Science, Duke University, Box 90251, Durham, North Carolina 27708, U.S.A.

There has been substantial recent interest in record linkage, where one attempts to group the records pertaining to the same entities from one or more large databases that lack unique identifiers. This can be viewed as a type of microclustering, with few observations per cluster and a very large number of clusters. We show that the problem is fundamentally hard from a theoretical perspective and, even in idealized cases, accurate entity resolution is effectively impossible unless the number of entities is small relative to the number of records and/or the separation between records from different entities is extremely large. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asy003DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5963577PMC
June 2018
1 Read

Kernel-based covariate functional balancing for observational studies.

Biometrika 2018 Mar 8;105(1):199-213. Epub 2017 Dec 8.

Department of Biostatistics, University of Washington, 1959 NE Pacific St., Seattle, Washington 98195, U.S.A.

Covariate balance is often advocated for objective causal inference since it mimics randomization in observational data. Unlike methods that balance specific moments of covariates, our proposal attains uniform approximate balance for covariate functions in a reproducing-kernel Hilbert space. The corresponding infinite-dimensional optimization problem is shown to have a finite-dimensional representation in terms of an eigenvalue optimization problem. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx069DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5976457PMC
March 2018
1 Read

Bayesian block-diagonal variable selection and model averaging.

Biometrika 2017 Jun 24;104(2):343-359. Epub 2017 Apr 24.

Department of Economics and Business, Universitat Pompeu Fabra, Ramon Trias Fargas 25-27, Barcelona 08005, Spain.

We propose a scalable algorithmic framework for exact Bayesian variable selection and model averaging in linear models under the assumption that the Gram matrix is block-diagonal, and as a heuristic for exploring the model space for general designs. In block-diagonal designs our approach returns the most probable model of any given size without resorting to numerical integration. The algorithm also provides a novel and efficient solution to the frequentist best subset selection problem for block-diagonal designs. Read More

View Article

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5975653PMC
June 2017
2 Reads

Partial likelihood estimation of isotonic proportional hazards models.

Biometrika 2018 Mar 5;105(1):133-148. Epub 2017 Dec 5.

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7420, U.S.A.

We consider the estimation of the semiparametric proportional hazards model with an unspecified baseline hazard function where the effect of a continuous covariate is assumed to be monotone. Previous work on nonparametric maximum likelihood estimation for isotonic proportional hazard regression with right-censored data is computationally intensive, lacks theoretical justification, and may be prohibitive in large samples. In this paper, partial likelihood estimation is studied. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx064DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5969539PMC
March 2018
4 Reads

Partition-based ultrahigh-dimensional variable screening.

Biometrika 2017 Nov 9;104(4):785-800. Epub 2017 Oct 9.

Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, Michigan 48109, U.S.A.

Traditional variable selection methods are compromised by overlooking useful information on covariates with similar functionality or spatial proximity, and by treating each covariate independently. Leveraging prior grouping information on covariates, we propose partition-based screening methods for ultrahigh-dimensional variables in the framework of generalized linear models. We show that partition-based screening exhibits the sure screening property with a vanishing false selection rate, and we propose a data-driven partition screening framework with unavailable or unreliable prior knowledge on covariate grouping and investigate its theoretical properties. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx052DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5890472PMC
November 2017
18 Reads

On falsification of the binary instrumental variable model.

Biometrika 2017 Mar 23;104(1):229-236. Epub 2017 Jan 23.

Department of Statistics, University of Washington, Box 354322, Washington 98195,

Instrumental variables are widely used for estimating causal effects in the presence of unmeasured confounding. The discrete instrumental variable model has testable implications for the law of the observed data. However, current assessments of instrumental validity are typically based solely on subject-matter arguments rather than these testable implications, partly due to a lack of formal statistical tests with known properties. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asw064DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5819759PMC
March 2017
2 Reads

Optimal designs for active controlled dose-finding trials with efficacy-toxicity outcomes.

Biometrika 2017 Dec 9;104(4):1003-1010. Epub 2017 Oct 9.

Statistical Methodology, Novartis Pharma AG, 4002 Basel,

We derive optimal designs to estimate efficacy and toxicity in active controlled dose-finding trials when the bivariate continuous outcomes are described using nonlinear regression models. We determine upper bounds on the required number of different doses and provide conditions under which the boundary points of the design space are included in the optimal design. We provide an analytical description of minimally supported optimal designs and show that they do not depend on the correlation between the bivariate outcomes. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx057DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793717PMC
December 2017
1 Read

On two-stage estimation of structural instrumental variable models.

Biometrika 2017 Dec 26;104(4):881-899. Epub 2017 Oct 26.

Department of Epidemiology, University of North Carolina, 2105F McGavran-Greenberg Hall, Chapel Hill, North Carolina 27599,

Two-stage least squares estimation is popular for structural equation models with unmeasured confounders. In such models, both the outcome and the exposure are assumed to follow linear models conditional on the measured confounders and instrumental variable, which is related to the outcome only via its relation with the exposure. We consider data where both the outcome and the exposure may be incompletely observed, with particular attention to the case where both are censored event times. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx056DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793491PMC
December 2017
1 Read

Doubly robust nonparametric inference on the average treatment effect.

Biometrika 2017 Dec 16;104(4):863-880. Epub 2017 Oct 16.

Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, PO Box 19024, Seattle, Washington 98109,

Doubly robust estimators are widely used to draw inference about the average effect of a treatment. Such estimators are consistent for the effect of interest if either one of two nuisance parameters is consistently estimated. However, if flexible, data-adaptive estimators of these nuisance parameters are used, double robustness does not readily extend to inference. Read More

View Article

Download full-text PDF

Source
http://fdslive.oup.com/www.oup.com/pdf/production_in_progres
Publisher Site
http://dx.doi.org/10.1093/biomet/asx053DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793673PMC
December 2017
1 Read

Projection correlation between two random vectors.

Biometrika 2017 Dec 4;104(4):829-843. Epub 2017 Sep 4.

Wang Yanan Institute for Studies in Economics, School of Economics, Xiamen University, Fujian 361005, China

We propose the use of projection correlation to characterize dependence between two random vectors. Projection correlation has several appealing properties. It equals zero if and only if the two random vectors are independent, it is not sensitive to the dimensions of the two random vectors, it is invariant with respect to the group of orthogonal transformations, and its estimation is free of tuning parameters and does not require moment conditions on the random vectors. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx043DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793497PMC
December 2017
36 Reads

Distribution-free tests of independence in high dimensions.

Biometrika 2017 Dec 3;104(4):813-828. Epub 2017 Oct 3.

Department of Operations Research and Financial Engineering, Princeton University, Sherrerd Hall, Charlton Street, Princeton, New Jersey 08544,

We consider the testing of mutual independence among all entries in a [Formula: see text]-dimensional random vector based on [Formula: see text] independent observations. We study two families of distribution-free test statistics, which include Kendall's tau and Spearman's rho as important examples. We show that under the null hypothesis the test statistics of these two families converge weakly to Gumbel distributions, and we propose tests that control the Type I error in the high-dimensional setting where [Formula: see text]. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx050DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793489PMC
December 2017
22 Reads

Semiparametric analysis of complex polygenic gene-environment interactions in case-control studies.

Biometrika 2017 Dec 15;104(4):801-812. Epub 2017 Sep 15.

Department of Biostatistics, Johns Hopkins University, 615 N. Wolfe Street, Baltimore, Maryland 21205,

Many methods have recently been proposed for efficient analysis of case-control studies of gene-environment interactions using a retrospective likelihood framework that exploits the natural assumption of gene-environment independence in the underlying population. However, for polygenic modelling of gene-environment interactions, which is a topic of increasing scientific interest, applications of retrospective methods have been limited due to a requirement in the literature for parametric modelling of the distribution of the genetic factors. We propose a general, computationally simple, semiparametric method for analysis of case-control studies that allows exploitation of the assumption of gene-environment independence without any further parametric modelling assumptions about the marginal distributions of any of the two sets of factors. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx045DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793684PMC
December 2017
8 Reads

Expandable factor analysis.

Biometrika 2017 Sep 16;104(3):649-663. Epub 2017 Jun 16.

Department of Statistical Science, Duke University, Box 90251, Durham, North Carolina 27708,

Bayesian sparse factor models have proven useful for characterizing dependence in multivariate data, but scaling computation to large numbers of samples and dimensions is problematic. We propose expandable factor analysis for scalable inference in factor models when the number of factors is unknown. The method relies on a continuous shrinkage prior for efficient maximum a posteriori estimation of a low-rank and sparse loadings matrix. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx030DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793687PMC
September 2017
5 Reads

Robust reduced-rank regression.

Authors:
Y She K Chen

Biometrika 2017 Sep 12;104(3):633-647. Epub 2017 Jul 12.

Department of Statistics, University of Connecticut, 215 Glenbrook Road U-4120, Storrs, Connecticut 06269,

In high-dimensional multivariate regression problems, enforcing low rank in the coefficient matrix offers effective dimension reduction, which greatly facilitates parameter estimation and model interpretation. However, commonly used reduced-rank methods are sensitive to data corruption, as the low-rank dependence structure between response variables and predictors is easily distorted by outliers. We propose a robust reduced-rank regression approach for joint modelling and outlier detection. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx032DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793675PMC
September 2017
3 Reads

Identification and estimation of causal effects with outcomes truncated by death.

Biometrika 2017 Sep 11;104(3):597-612. Epub 2017 Jul 11.

Department of Statistics, University of Washington, Seattle, Washington 98195,

It is common in medical studies that the outcome of interest is truncated by death, meaning that a subject has died before the outcome could be measured. In this case, restricted analysis among survivors may be subject to selection bias. Hence, it is of interest to estimate the survivor average causal effect, defined as the average causal effect among the subgroup consisting of subjects who would survive under either exposure. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx034DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793679PMC
September 2017
2 Reads

Joint sufficient dimension reduction and estimation of conditional and average treatment effects.

Biometrika 2017 Sep 19;104(3):583-596. Epub 2017 May 19.

Department of Biostatistics, University of Washington, Seattle, Washington 98105,

The estimation of treatment effects based on observational data usually involves multiple confounders, and dimension reduction is often desirable and sometimes inevitable. We first clarify the definition of a central subspace that is relevant for the efficient estimation of average treatment effects. A criterion is then proposed to simultaneously estimate the structural dimension, the basis matrix of the joint central subspace, and the optimal bandwidth for estimating the conditional treatment effects. Read More

View Article

Download full-text PDF

Source
https://academic.oup.com/biomet/article/104/3/583/3836906
Publisher Site
http://dx.doi.org/10.1093/biomet/asx028DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793490PMC
September 2017
2 Reads

Multiple robustness in factorized likelihood models.

Biometrika 2017 Sep 15;104(3):561-581. Epub 2017 Jun 15.

Department of Epidemiology, Harvard T. H. Chan School of Public Health, 655 Huntington Avenue, Boston, Massachusetts 02115,

We consider inference under a nonparametric or semiparametric model with likelihood that factorizes as the product of two or more variation-independent factors. We are interested in a finite-dimensional parameter that depends on only one of the likelihood factors and whose estimation requires the auxiliary estimation of one or several nuisance functions. We investigate general structures conducive to the construction of so-called multiply robust estimating functions, whose computation requires postulating several dimension-reducing models but which have mean zero at the true parameter value provided one of these models is correct. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx027DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793686PMC
September 2017
1 Read

Covariate-assisted spectral clustering.

Biometrika 2017 Jun 19;104(2):361-377. Epub 2017 Mar 19.

Department of Statistics, University of Wisconsin, 1300 University Avenue, Madison, Wisconsin 53706,

Biological and social systems consist of myriad interacting units. The interactions can be represented in the form of a graph or network. Measurements of these graphs can reveal the underlying structure of these interactions, which provides insight into the systems that generated the graphs. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx008DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5793492PMC
June 2017
1 Read

An improved and explicit surrogate variable analysis procedure by coefficient adjustment.

Biometrika 2017 Jun 21;104(2):303-316. Epub 2017 Apr 21.

Department of Biostatistics, University of Florida, 2004 Mowry Rd, Gainesville, Florida 32611,

Unobserved environmental, demographic and technical factors canadversely affect the estimation and testing of the effects ofprimary variables. Surrogate variable analysis, proposed to tacklethis problem, has been widely used in genomic studies. To estimatehidden factors that are correlated with the primary variables,surrogate variable analysis performs principal component analysiseither on a subset of features or on all features, but weightingeach differently. Read More

View Article

Download full-text PDF

Source
http://dx.doi.org/10.1093/biomet/asx018DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5627626PMC
June 2017
3 Reads