129 results match your criteria Annals Of Statistics[Journal]


ENTRYWISE EIGENVECTOR ANALYSIS OF RANDOM MATRICES WITH LOW EXPECTED RANK.

Ann Stat 2020 Jun 17;48(3):1452-1474. Epub 2020 Jul 17.

Department of ORFE, Princeton University, Princeton, NJ 08544, USA.

Recovering low-rank structures via eigenvector perturbation analysis is a common problem in statistical machine learning, such as in factor analysis, community detection, ranking, matrix completion, among others. While a large variety of bounds are available for average errors between empirical and population statistics of eigenvectors, few results are tight for entrywise analyses, which are critical for a number of problems such as community detection. This paper investigates entrywise behaviors of eigenvectors for a large class of random matrices whose expectations are low-rank, which helps settle the conjecture in Abbe et al. Read More

View Article and Full-Text PDF

CONSISTENT SELECTION OF THE NUMBER OF CHANGE-POINTS VIA SAMPLE-SPLITTING.

Ann Stat 2020 Feb 17;48(1):413-439. Epub 2020 Feb 17.

Department of Statistics, and The Methodology Center, The Pennsylvania State University, University Park, PA 16802-2111, USA

In multiple change-point analysis, one of the major challenges is to estimate the number of change-points. Most existing approaches attempt to minimize a Schwarz information criterion which balances a term quantifying model fit with a penalization term accounting for model complexity that increases with the number of change-points and limits overfitting. However, different penalization terms are required to adapt to different contexts of multiple change-point problems and the optimal penalization magnitude usually varies from the model and error distribution. Read More

View Article and Full-Text PDF
February 2020

A UNIFIED STUDY OF NONPARAMETRIC INFERENCE FOR MONOTONE FUNCTIONS.

Ann Stat 2020 Apr 26;48(2):1001-1024. Epub 2020 May 26.

Department of Biostatistics, University of Washington.

The problem of nonparametric inference on a monotone function has been extensively studied in many particular cases. Estimators considered have often been of so-called Grenander type, being representable as the left derivative of the greatest convex minorant or least concave majorant of an estimator of a primitive function. In this paper, we provide general conditions for consistency and pointwise convergence in distribution of a class of generalized Grenander-type estimators of a monotone function. Read More

View Article and Full-Text PDF

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX.

Ann Stat 2019 31;47(6):3300-3334. Epub 2019 Oct 31.

Department of Statistics, and The Methodology Center the Pennsylvania State University, University Park, PA 16802-2111, USA,

This paper is concerned with test of significance on high dimensional covariance structures, and aims to develop a unified framework for testing commonly-used linear covariance structures. We first construct a consistent estimator for parameters involved in the linear covariance structure, and then develop two tests for the linear covariance structures based on entropy loss and quadratic loss used for covariance matrix estimation. To study the asymptotic properties of the proposed tests, we study related high dimensional random matrix theory, and establish several highly useful asymptotic results. Read More

View Article and Full-Text PDF
October 2019

Distributed estimation of principal eigenspaces.

Ann Stat 2019 Dec 31;47(6):3009-3031. Epub 2019 Oct 31.

Department of Operations Research and Financial Engineering Princeton University.

Principal component analysis (PCA) is fundamental to statistical machine learning. It extracts latent principal factors that contribute to the most variation of the data. When data are stored across multiple machines, however, communication cost can prohibit the computation of PCA in a central location and distributed algorithms for PCA are thus needed. Read More

View Article and Full-Text PDF
December 2019

SPECTRAL METHOD AND REGULARIZED MLE ARE BOTH OPTIMAL FOR TOP- RANKING.

Ann Stat 2019 21;47(4):2204-2235. Epub 2019 May 21.

Department of Operations Research &, Financial Engineering, Princeton University, Princeton, New Jersey 08544,

This paper is concerned with the problem of top- ranking from pairwise comparisons. Given a collection of items and a few pairwise comparisons across them, one wishes to identify the set of items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model - the Bradley-Terry-Luce model, where each item is assigned a latent preference score, and where the outcome of each pairwise comparison depends solely on the relative scores of the two items involved. Read More

View Article and Full-Text PDF

LINEAR HYPOTHESIS TESTING FOR HIGH DIMENSIONAL GENERALIZED LINEAR MODELS.

Ann Stat 2019 Oct 3;47(5):2671-2703. Epub 2019 Aug 3.

Department of Statistics, and The Methodology Center, the Pennsylvania State University, University Park, PA 16802-2111, USA.

This paper is concerned with testing linear hypotheses in high-dimensional generalized linear models. To deal with linear hypotheses, we first propose constrained partial regularization method and study its statistical properties. We further introduce an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints. Read More

View Article and Full-Text PDF
October 2019

EIGENVALUE DISTRIBUTIONS OF VARIANCE COMPONENTS ESTIMATORS IN HIGH-DIMENSIONAL RANDOM EFFECTS MODELS.

Ann Stat 2019 Oct 3;47(5):2855-2886. Epub 2019 Aug 3.

Department of Statistics, Stanford University, 390 Serra Mall, Stanford, CA 94305,

We study the spectra of MANOVA estimators for variance component covariance matrices in multivariate random effects models. When the dimensionality of the observations is large and comparable to the number of realizations of each random effect, we show that the empirical spectra of such estimators are well-approximated by deterministic laws. The Stieltjes transforms of these laws are characterized by systems of fixed-point equations, which are numerically solvable by a simple iterative procedure. Read More

View Article and Full-Text PDF
October 2019

TEST FOR HIGH DIMENSIONAL CORRELATION MATRICES.

Ann Stat 2019 Oct 3;47(5):2887-2921. Epub 2019 Aug 3.

Department of Biostatistics, The University of North Carolina at Chapel Hill Chapel Hill, NC, USA

Testing correlation structures has attracted extensive attention in the literature due to both its importance in real applications and several major theoretical challenges. The aim of this paper is to develop a general framework of testing correlation structures for the one-, two-, and multiple sample testing problems under a high-dimensional setting when both the sample size and data dimension go to infinity. Our test statistics are designed to deal with both the dense and sparse alternatives. Read More

View Article and Full-Text PDF
October 2019

A ROBUST AND EFFICIENT APPROACH TO CAUSAL INFERENCE BASED ON SPARSE SUFFICIENT DIMENSION REDUCTION.

Ann Stat 2019 Jun 13;47(3):1505-1535. Epub 2019 Feb 13.

DEPARTMENT OF STATISTICS, TEXAS A&M UNIVERSITY, COLLEGE STATION, TEXAS 77843, USA.

A fundamental assumption used in causal inference with observational data is that treatment assignment is ignorable given measured confounding variables. This assumption of no missing confounders is plausible if a large number of baseline covariates are included in the analysis, as we often have no prior knowledge of which variables can be important confounders. Thus, estimation of treatment effects with a large number of covariates has received considerable attention in recent years. Read More

View Article and Full-Text PDF

NONPARAMETRIC TESTING FOR MULTIPLE SURVIVAL FUNCTIONS WITH NON-INFERIORITY MARGINS.

Ann Stat 2019 Feb 30;47(1):205-232. Epub 2018 Nov 30.

Department of Biostatistics, Columbia University, 722 West 168th Street, New York, NY 10032, U.S.A.

New nonparametric tests for the ordering of multiple survival functions are developed with the possibility of right censorship taken into account. The motivation comes from non-inferiority trials with multiple treatments. The proposed tests are based on nonparametric likelihood ratio statistics, which are known to provide more powerful tests than Wald-type procedures, but in this setting have only been studied for pairs of survival functions or in the absence of censoring. Read More

View Article and Full-Text PDF
February 2019

ON TESTING CONDITIONAL QUALITATIVE TREATMENT EFFECTS.

Ann Stat 2019 Aug 21;47(4):2348-2377. Epub 2019 May 21.

Department of Statistics, North Carolina State University, Raleigh, NC 27695.

Precision medicine is an emerging medical paradigm that focuses on finding the most effective treatment strategy tailored for individual patients. In the literature, most of the existing works focused on estimating the optimal treatment regime. However, there has been less attention devoted to hypothesis testing regarding the optimal treatment regime. Read More

View Article and Full-Text PDF

UNIFORMLY VALID POST-REGULARIZATION CONFIDENCE REGIONS FOR MANY FUNCTIONAL PARAMETERS IN Z-ESTIMATION FRAMEWORK.

Ann Stat 2018 Dec 11;46(6B):3643-3675. Epub 2018 Sep 11.

Department of Biostatistics, Columbia University, 722 West 168th St, Rm 633, New York, New York 10032, USA,

In this paper, we develop procedures to construct simultaneous confidence bands for potentially infinite-dimensional parameters after model selection for general moment condition models where is potentially much larger than the sample size of available data, . This allows us to cover settings with functional response data where each of the parameters is a function. The procedure is based on the construction of score functions that satisfy Neyman orthogonality condition approximately. Read More

View Article and Full-Text PDF
December 2018

FEATURE ELIMINATION IN KERNEL MACHINES IN MODERATELY HIGH DIMENSIONS.

Ann Stat 2019 Feb;47(1):497-526

The University of North Carolina at Chapel Hill.

We develop an approach for feature elimination in statistical learning with kernel machines, based on recursive elimination of features. We present theoretical properties of this method and show that it is uniformly consistent in finding the correct feature space under certain generalized assumptions. We present a few case studies to show that the assumptions are met in most practical situations and present simulation results to demonstrate performance of the proposed approach. Read More

View Article and Full-Text PDF
February 2019

ESTIMATION OF A MONOTONE DENSITY IN -SAMPLE BIASED SAMPLING MODELS.

Ann Stat 2018 17;46(5):2125-2152. Epub 2018 Aug 17.

Department of Statistics, Chinese University of Hong Kong, Shatin, NT, Hong Kong Sar.

We study the nonparametric estimation of a decreasing density function in a general -sample biased sampling model with weight (or bias) functions for = 1, …, . The determination of the monotone maximum likelihood estimator and its asymptotic distribution, except for the case when = 1, has been long missing in the literature due to certain non-standard structures of the likelihood function, such as non-separability and a lack of strictly positive second order derivatives of the negative of the log-likelihood function. The existence, uniqueness, self-characterization, consistency of and its asymptotic distribution at a fixed point are established in this article. Read More

View Article and Full-Text PDF

Consistency and convergence rate of phylogenetic inference via regularization.

Ann Stat 2018 Aug 27;46(4):1481-1512. Epub 2018 Jun 27.

Program in Computational Biology Fred Hutchinson Cancer Research Center.

It is common in phylogenetics to have some, perhaps partial, information about the overall evolutionary tree of a group of organisms and wish to find an evolutionary tree of a specific gene for those organisms. There may not be enough information in the gene sequences alone to accurately reconstruct the correct "gene tree." Although the gene tree may deviate from the "species tree" due to a variety of genetic processes, in the absence of evidence to the contrary it is parsimonious to assume that they agree. Read More

View Article and Full-Text PDF

BALL DIVERGENCE: NONPARAMETRIC TWO SAMPLE TEST.

Ann Stat 2018 Jun;46(3):1109-1137

Sun Yat-sen University.

In this paper, we first introduce Ball Divergence, a novel measure of the difference between two probability measures in separable Banach spaces, and show that the Ball Divergence of two probability measures is zero if and only if these two probability measures are identical without any moment assumption. Using Ball Divergence, we present a metric rank test procedure to detect the equality of distribution measures underlying independent samples. It is therefore robust to outliers or heavy-tail data. Read More

View Article and Full-Text PDF

HIGH DIMENSIONAL CENSORED QUANTILE REGRESSION.

Ann Stat 2018 Feb 22;46(1):308-343. Epub 2018 Feb 22.

Department of Statistics University of Michigan, Ann Arbor, MI 48109, USA.

Censored quantile regression (CQR) has emerged as a useful regression tool for survival analysis. Some commonly used CQR methods can be characterized by stochastic integral-based estimating equations in a sequential manner across quantile levels. In this paper, we analyze CQR in a high dimensional setting where the regression functions over a continuum of quantile levels are of interest. Read More

View Article and Full-Text PDF
February 2018

ASSESSING ROBUSTNESS OF CLASSIFICATION USING ANGULAR BREAKDOWN POINT.

Ann Stat 2018 Dec 11;46(6B):3362-3389. Epub 2018 Sep 11.

University of North Carolina at Chapel Hill, USA.

Robustness is a desirable property for many statistical techniques. As an important measure of robustness, breakdown point has been widely used for regression problems and many other settings. Despite the existing development, we observe that the standard breakdown point criterion is not directly applicable for many classification problems. Read More

View Article and Full-Text PDF
December 2018

Optimal Shrinkage of Eigenvalues in the Spiked Covariance Model.

Ann Stat 2018 Aug 27;46(4):1742-1778. Epub 2018 Jun 27.

Department of Statistics, Stanford University.

We show that in a common high-dimensional covariance model, the choice of loss function has a profound effect on optimal estimation. In an asymptotic framework based on the Spiked Covariance model and use of orthogonally invariant estimators, we show that optimal estimation of the population covariance matrix boils down to design of an optimal shrinker that acts elementwise on the sample eigenvalues. Indeed, to each loss function there corresponds a unique admissible eigenvalue shrinker * dominating all other shrinkers. Read More

View Article and Full-Text PDF

A NEW PERSPECTIVE ON ROBUST -ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING.

Ann Stat 2018 Oct 17;46(5):1904-1931. Epub 2018 Aug 17.

Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544, USA.

Heavy-tailed errors impair the accuracy of the least squares estimate, which can be spoiled by a single grossly outlying observation. As argued in the seminal work of Peter Huber in 1973 [ (1973) 799-821], robust alternatives to the method of least squares are sorely needed. To achieve robustness against heavy-tailed sampling distributions, we revisit the Huber estimator from a new perspective by letting the tuning parameter involved diverge with the sample size. Read More

View Article and Full-Text PDF
October 2018

LARGE COVARIANCE ESTIMATION THROUGH ELLIPTICAL FACTOR MODELS.

Ann Stat 2018 Aug 27;46(4):1383-1414. Epub 2018 Jun 27.

Dept of Operations Research & Financial Engineering, Sherrerd Hall, Princeton University, Princeton, NJ 08544, USA.

We propose a general Principal Orthogonal complEment Thresholding (POET) framework for large-scale covariance matrix estimation based on the approximate factor model. A set of high level sufficient conditions for the procedure to achieve optimal rates of convergence under different matrix norms is established to better understand how POET works. Such a framework allows us to recover existing results for sub-Gaussian data in a more transparent way that only depends on the concentration properties of the sample covariance matrix. Read More

View Article and Full-Text PDF

DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS.

Ann Stat 2018 Jun 3;46(3):1352-1382. Epub 2018 May 3.

Princeton University.

This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from subsamples of size , where is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large can be, as grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. Read More

View Article and Full-Text PDF

ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS.

Ann Stat 2018 Jun 3;46(3):989-1017. Epub 2018 May 3.

Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544, USA.

Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries by such data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions on exogeneity of covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given certain number of predictors, namely, the distribution of the correlation of a response variable with the best linear combinations of covariates , even when and are independent. When the covariance matrix of possesses the restricted eigenvalue property, we derive such distributions for both finite and diverging , using Gaussian approximation and empirical process techniques. Read More

View Article and Full-Text PDF

I-LAMM FOR SPARSE LEARNING: SIMULTANEOUS CONTROL OF ALGORITHMIC COMPLEXITY AND STATISTICAL ERROR.

Ann Stat 2018 Apr 3;46(2):814-841. Epub 2018 Apr 3.

Tencent AI Lab, Shennan Ave, Nanshan District, Shen Zhen, Guangdong, China.

We propose a computational framework named iterative local adaptive majorize-minimization (I-LAMM) to simultaneously control algorithmic complexity and statistical error when fitting high dimensional models. I-LAMM is a two-stage algorithmic implementation of the local linear approximation to a family of folded concave penalized quasi-likelihood. The first stage solves a convex program with a crude precision tolerance to obtain a coarse initial estimator, which is further refined in the second stage by iteratively solving a sequence of convex programs with smaller precision tolerances. Read More

View Article and Full-Text PDF

HIGH-DIMENSIONAL A-LEARNING FOR OPTIMAL DYNAMIC TREATMENT REGIMES.

Ann Stat 2018 Jun 3;46(3):925-957. Epub 2018 May 3.

Department of Statistics, North Carolina State University, Raleigh NC, U.S.A.

Precision medicine is a medical paradigm that focuses on finding the most effective treatment decision based on individual patient information. For many complex diseases, such as cancer, treatment decisions need to be tailored over time according to patients' responses to previous treatments. Such an adaptive strategy is referred as a dynamic treatment regime. Read More

View Article and Full-Text PDF

Chernoff Index for Cox Test of Separate Parametric Families.

Ann Stat 2018 Feb 22;46(1):1-29. Epub 2018 Feb 22.

Department of Statistics, Columbia University, 1255 Amsterdam Avenue, New York, NY 10027.

The asymptotic efficiency of a generalized likelihood ratio test proposed by Cox is studied under the large deviations framework for error probabilities developed by Chernoff. In particular, two separate parametric families of hypotheses are considered (Cox, 1961, 1962). The significance level is set such that the maximal type I and type II error probabilities for the generalized likelihood ratio test decay exponentially fast with the same rate. Read More

View Article and Full-Text PDF
February 2018

TARGETED SEQUENTIAL DESIGN FOR TARGETED LEARNING INFERENCE OF THE OPTIMAL TREATMENT RULE AND ITS MEAN REWARD.

Ann Stat 2017 15;45(6):2537-2564. Epub 2017 Dec 15.

University of California, Berkeley.

This article studies the targeted sequential inference of an optimal treatment rule (TR) and its mean reward in the non-exceptional case, , assuming that there is no stratum of the baseline covariates where treatment is neither beneficial nor harmful, and under a companion margin assumption. Our pivotal estimator, whose definition hinges on the targeted minimum loss estimation (TMLE) principle, actually infers the mean reward under the current estimate of the optimal TR. This data-adaptive statistical parameter is worthy of interest on its own. Read More

View Article and Full-Text PDF
December 2017

NONPARAMETRIC GOODNESS-OF-FIT TESTS FOR UNIFORM STOCHASTIC ORDERING.

Ann Stat 2017 15;45(6):2565-2589. Epub 2017 Dec 15.

Department of Statistics, University of South Carolina.

We propose distance-based goodness-of-fit (GOF) tests for uniform stochastic ordering with two continuous distributions and , both of which are unknown. Our tests are motivated by the fact that when and are uniformly stochastically ordered, the ordinal dominance curve = is star-shaped. We derive asymptotic distributions and prove that our testing procedure has a unique least favorable configuration of and for ∈ [1,∞]. Read More

View Article and Full-Text PDF
December 2017

TENSOR DECOMPOSITIONS AND SPARSE LOG-LINEAR MODELS.

Ann Stat 2017 21;45(1):1-38. Epub 2017 Feb 21.

Duke University.

Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. Read More

View Article and Full-Text PDF
February 2017