Publications by authors named "Mark Tygert"

7 Publications

  • Page 1 of 1

A hierarchical loss and its problems when classifying non-hierarchically.

PLoS One 2019 19;14(12):e0226222. Epub 2019 Dec 19.

Facebook, New York, NY, United States of America.

Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called "loss" or "win") used in textual or visual classification/recognition via neural networks seldom leverage a-priori information, such as a sheepdog being more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier's classes. An ultrametric tree is a tree with a so-called ultrametric distance metric such that all leaves are at the same distance from the root. Unfortunately, extensive numerical experiments indicate that the standard practice of training neural networks via stochastic gradient descent with random starting points often drives down the hierarchical loss nearly as much when minimizing the standard cross-entropy loss as when trying to minimize the hierarchical loss directly. Thus, this hierarchical loss is unreliable as an objective for plain, randomly started stochastic gradient descent to minimize; the main value of the hierarchical loss may be merely as a meaningful metric of success of a classifier.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0226222PLOS
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922344PMC
March 2020

Randomized algorithms for distributed computation of principal component analysis and singular value decomposition.

Adv Comput Math 2018 Oct 19;44(5):1651-1672. Epub 2018 Mar 19.

Facebook Artificial Intelligence Research, 1 Facebook Way, Menlo Park, CA 94025.

Randomized algorithms provide solutions to two ubiquitous problems: (1) the distributed calculation of a principal component analysis or singular value decomposition of a highly rectangular matrix, and (2) the distributed calculation of a low-rank approximation (in the form of a singular value decomposition) to an arbitrary matrix. Carefully honed algorithms yield results that are uniformly superior to those of the stock, deterministic implementations in Spark (the popular platform for distributed computation); in particular, whereas the stock software will without warning return left singular vectors that are far from numerically orthonormal, a significantly burnished randomized implementation generates left singular vectors that are numerically orthonormal to nearly the machine precision.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10444-018-9600-1DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8415723PMC
October 2018

Algorithm 971: An Implementation of a Randomized Algorithm for Principal Component Analysis.

ACM Trans Math Softw 2017 Jan;43(3)

Facebook, 1 Facebook Way, Menlo Park, CA 94025.

Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis and the calculation of truncated singular value decompositions. The present article presents an essentially black-box, foolproof implementation for Mathworks' MATLAB, a popular software platform for numerical computation. As illustrated via several tests, the randomized algorithms for low-rank approximation outperform or at least match the classical deterministic techniques (such as Lanczos iterations run to convergence) in basically all respects: accuracy, computational efficiency (both speed and memory usage), ease-of-use, parallelizability, and reliability. However, the classical procedures remain the methods of choice for estimating spectral norms and are far superior for calculating the least singular values and corresponding singular vectors (or singular subspaces).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1145/3004053DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5625842PMC
January 2017

A Mathematical Motivation for Complex-Valued Convolutional Networks.

Neural Comput 2016 05 18;28(5):815-25. Epub 2016 Feb 18.

A complex-valued convolutional network (convnet) implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: (1) convolution with complex-valued vectors, followed by (2) taking the absolute value of every entry of the resulting vectors, followed by (3) local averaging. For processing real-valued random vectors, complex-valued convnets can be viewed as data-driven multiscale windowed power spectra, data-driven multiscale windowed absolute spectra, data-driven multiwavelet absolute values, or (in their most general configuration) data-driven nonlinear multiwavelet packets. Indeed, complex-valued convnets can calculate multiscale windowed spectra when the convnet filters are windowed complex-valued exponentials. Standard real-valued convnets, using rectified linear units (ReLUs), sigmoidal (e.g., logistic or tanh) nonlinearities, or max pooling, for example, do not obviously exhibit the same exact correspondence with data-driven wavelets (whereas for complex-valued convnets, the correspondence is much more than just a vague analogy). Courtesy of the exact correspondence, the remarkably rich and rigorous body of mathematical analysis for wavelets applies directly to (complex-valued) convnets.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1162/NECO_a_00824DOI Listing
May 2016

Statistical tests for whether a given set of independent, identically distributed draws comes from a specified probability density.

Authors:
Mark Tygert

Proc Natl Acad Sci U S A 2010 Sep 7;107(38):16471-6. Epub 2010 Sep 7.

Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA.

We discuss several tests for determining whether a given set of independent and identically distributed (i.i.d.) draws does not come from a specified probability density function. The most commonly used are Kolmogorov-Smirnov tests, particularly Kuiper's variant, which focus on discrepancies between the cumulative distribution function for the specified probability density and the empirical cumulative distribution function for the given set of i.i.d. draws. Unfortunately, variations in the probability density function often get smoothed over in the cumulative distribution function, making it difficult to detect discrepancies in regions where the probability density is small in comparison with its values in surrounding regions. We discuss tests without this deficiency, complementing the classical methods. The tests of the present paper are based on the plain fact that it is unlikely to draw a random number whose probability is small, provided that the draw is taken from the same distribution used in calculating the probability (thus, if we draw a random number whose probability is small, then we can be confident that we did not draw the number from the same distribution used in calculating the probability).
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.1008446107DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2944726PMC
September 2010

A fast randomized algorithm for overdetermined linear least-squares regression.

Proc Natl Acad Sci U S A 2008 Sep 8;105(36):13212-7. Epub 2008 Sep 8.

Program in Applied Mathematics, Yale University, A. K. Watson Hall, 51 Prospect Street, New Haven, CT 06511, USA.

We introduce a randomized algorithm for overdetermined linear least-squares regression. Given an arbitrary full-rank m x n matrix A with m >/= n, any m x 1 vector b, and any positive real number epsilon, the procedure computes an n x 1 vector x such that x minimizes the Euclidean norm ||Ax - b || to relative precision epsilon. The algorithm typically requires ((log(n)+log(1/epsilon))mn+n(3)) floating-point operations. This cost is less than the (mn(2)) required by the classical schemes based on QR-decompositions or bidiagonalization. We present several numerical examples illustrating the performance of the algorithm.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.0804869105DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2734343PMC
September 2008

Randomized algorithms for the low-rank approximation of matrices.

Proc Natl Acad Sci U S A 2007 Dec 4;104(51):20167-72. Epub 2007 Dec 4.

Department of Computer Science and Program in Applied Math, Yale University, 51 Prospect Street, New Haven, CT 06511, USA.

We describe two recently proposed randomized algorithms for the construction of low-rank approximations to matrices, and demonstrate their application (inter alia) to the evaluation of the singular value decompositions of numerically low-rank matrices. Being probabilistic, the schemes described here have a finite probability of failure; in most cases, this probability is rather negligible (10(-17) is a typical value). In many situations, the new procedures are considerably more efficient and reliable than the classical (deterministic) ones; they also parallelize naturally. We present several numerical examples to illustrate the performance of the schemes.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1073/pnas.0709640104DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2154402PMC
December 2007
-->