![]() |
SIAM Journal on Mathematics of Data Science
Number of Followers: 1 ![]() ISSN (Online) 2577-0187 Published by Society for Industrial and Applied Mathematics ![]() |
- A Nonlinear Matrix Decomposition for Mining the Zeros of Sparse Data
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Lawrence K. Saul
Pages: 431 - 463
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 431-463, June 2022.
We describe a simple iterative solution to a widely recurring problem in multivariate data analysis: given a sparse nonnegative matrix ${\mathbf{X}}$, how to estimate a low-rank matrix ${{\Theta}}$ such that ${{X}} \approx f({{\Theta}})$, where $f$ is an elementwise nonlinearity' We develop a latent variable model for this problem and consider those sparsifying nonlinearities, popular in neural networks, that map all negative values to zero. The model seeks to explain the variability of sparse high-dimensional data in terms of a smaller number of degrees of freedom. We show that exact inference in this model is tractable and derive an expectation-maximization (EM) algorithm to estimate the low-rank matrix ${{\Theta}}$. Notably, we do not parameterize ${{\Theta}}$ as a product of smaller matrices to be alternately optimized; instead, we estimate ${{\Theta}}$ directly via the singular value decomposition of matrices that are repeatedly inferred (at each iteration of the EM algorithm) from the model's posterior distribution. We use the model to analyze large sparse matrices that arise from data sets of binary, grayscale, and color images. In all of these cases, we find that the model discovers much lower-rank decompositions than purely linear approaches.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-04-07T07:00:00Z
DOI: 10.1137/21M1405769
Issue No: Vol. 4, No. 2 (2022)
-
- What Kinds of Functions Do Deep Neural Networks Learn' Insights from
Variational Spline Theory-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Rahul Parhi, Robert D. Nowak
Pages: 464 - 489
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 464-489, June 2022.
We develop a variational framework to understand the properties of functions learned by fitting deep neural networks with rectified linear unit (ReLU) activations to data. We propose a new function space, which is related to classical bounded variation-type spaces, that captures the compositional structure associated with deep neural networks. We derive a representer theorem showing that deep ReLU networks are solutions to regularized data-fitting problems over functions from this space. The function space consists of compositions of functions from the Banach space of second-order bounded variation in the Radon domain. This Banach space has a sparsity-promoting norm, giving insight into the role of sparsity in deep neural networks. The neural network solutions have skip connections and rank-bounded weight matrices, providing new theoretical support for these common architectural choices. The variational problem we study can be recast as a finite-dimensional neural network training problem with regularization schemes related to the notions of weight decay and path-norm regularization. Finally, our analysis builds on techniques from variational spline theory, providing new connections between deep neural networks and splines.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-04-13T07:00:00Z
DOI: 10.1137/21M1418642
Issue No: Vol. 4, No. 2 (2022)
-
- Statistical Methods for Minimax Estimation in Linear Models with Unknown
Design Over Finite Alphabets-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Merle Behr, Axel Munk
Pages: 490 - 513
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 490-513, June 2022.
We provide a minimax optimal estimation procedure for ${F}$ and ${\Omega}$ in matrix valued linear models $Y = {{F}} {{\Omega}} + Z$, where the parameter matrix ${\Omega}$ and the design matrix ${F}$ are unknown but the latter takes values in a known finite set. The proposed finite alphabet linear model is justified in a variety of applications, ranging from signal processing to cancer genetics. We show that this allows one to separate ${F}$ and ${\Omega}$ uniquely under weak identifiability conditions, a task which is not doable, in general. To this end we quantify in the noiseless case, that is, $Z = 0$, the perturbation range of $Y$ in order to obtain stable recovery of ${F}$ and ${\Omega}$. Based on this, we derive an iterative Lloyd's type estimation procedure that attains minimax estimation rates for ${\Omega}$ and ${F}$ for Gaussian error matrix $Z$. In contrast to the least squares solution the estimation procedure can be computed efficiently and scales linearly with the total number of observations. We confirm our theoretical results in a simulation study and illustrate it with a genetic sequencing data example.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-04-25T07:00:00Z
DOI: 10.1137/21M1398860
Issue No: Vol. 4, No. 2 (2022)
-
- Approximation Bounds for Sparse Programs
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Armin Askari, Alexandre d'Aspremont, Laurent El Ghaoui
Pages: 514 - 530
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 514-530, June 2022.
We show that sparsity-constrained optimization problems over low-dimensional spaces tend to have a small duality gap. We use the Shapley--Folkman theorem to derive both data-driven bounds on the duality gap and an efficient primalization procedure to recover feasible points satisfying these bounds. These error bounds are proportional to the rate of growth of the objective with the target cardinality, which means in particular that the relaxation is nearly tight as soon as the target cardinality is large enough so that only uninformative features are added.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-04-27T07:00:00Z
DOI: 10.1137/21M1398677
Issue No: Vol. 4, No. 2 (2022)
-
- Stochastic Geometry to Generalize the Mondrian Process
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Eliza O'Reilly, Ngoc Mai Tran
Pages: 531 - 552
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 531-552, June 2022.
The stable under iteration (STIT) tessellation process is a stochastic process that produces a recursive partition of space with cut directions drawn independently from a distribution over the sphere. The case of random axis-aligned cuts is known as the Mondrian process. Random forests and Laplace kernel approximations built from the Mondrian process have led to efficient online learning methods and Bayesian optimization. In this work, we utilize tools from stochastic geometry to resolve some fundamental questions concerning STIT processes in machine learning. First, we show that STIT processes can be efficiently simulated by lifting to a higher-dimensional axis-aligned Mondrian process. Second, we characterize all possible kernels that STIT processes and their mixtures can approximate. We also give a uniform convergence rate for the approximation error of the STIT kernels to the targeted kernels, completely generalizing the work of Balog et al. [The Mondrian kernel, 2016] from the Mondrian case. Third, we obtain consistency results for STIT forests in density estimation and regression. Finally, we give a precise formula for the density estimator arising from a STIT forest. This allows for precise comparisons between the STIT forest, the STIT kernel, and the targeted kernel in density estimation. Our paper calls for further developments at the novel intersection of stochastic geometry and machine learning.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-05-02T07:00:00Z
DOI: 10.1137/20M1354490
Issue No: Vol. 4, No. 2 (2022)
-
- Quantitative Approximation Results for Complex-Valued Neural Networks
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Andrei Caragea, Dae Gwan Lee, Johannes Maly, Götz Pfander, Felix Voigtlaender
Pages: 553 - 580
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 553-580, June 2022.
Until recently, applications of neural networks in machine learning have almost exclusively relied on real-valued networks. It was recently observed, however, that complex-valued neural networks (CVNNs) exhibit superior performance in applications in which the input is naturally complex-valued, such as MRI fingerprinting. While the mathematical theory of real-valued networks has, by now, reached some level of maturity, this is far from true for complex-valued networks. In this paper, we analyze the expressivity of complex-valued networks by providing explicit quantitative error bounds for approximating $C^n$ functions on compact subsets of $\mathbb{C}^d$ by CVNNs that employ the modReLU activation function, given by $\sigma(z) = {ReLU}( z - 1), {sgn} (z)$, which is one of the most popular complex activation functions used in practice. We show that the derived approximation rates are optimal (up to log factors) in the class of modReLU networks with weights of moderate growth.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-05-02T07:00:00Z
DOI: 10.1137/21M1429540
Issue No: Vol. 4, No. 2 (2022)
-
- Wasserstein-Based Projections with Applications to Inverse Problems
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Howard Heaton, Samy Wu Fung, Alex Tong Lin, Stanley Osher, Wotao Yin
Pages: 581 - 603
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 581-603, June 2022.
Inverse problems consist of recovering a signal from a collection of noisy measurements. These are typically cast as optimization problems, with classic approaches using a data fidelity term and an analytic regularizer that stabilizes recovery. Recent plug-and-play (PnP) works propose replacing the operator for analytic regularization in optimization methods by a data-driven denoiser. These schemes obtain state-of-the-art results, but at the cost of limited theoretical guarantees. To bridge this gap, we present a new algorithm that takes samples from the manifold of true data as input and outputs an approximation of the projection operator onto this manifold. Under standard assumptions, we prove this algorithm generates a learned operator, called Wasserstein-based projection (WP), that approximates the true projection with high probability. Thus, WPs can be inserted into optimization methods in the same manner as PnP, but now with theoretical guarantees. Provided numerical examples show WPs obtain state-of-the-art results for unsupervised PnP signal recovery. All codes for this work can be found at https://github.com/swufung/WassersteinBasedProjections.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-05-05T07:00:00Z
DOI: 10.1137/20M1376790
Issue No: Vol. 4, No. 2 (2022)
-
- Tukey Depths and Hamilton--Jacobi Differential Equations
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Martin Molina-Fructuoso, Ryan Murray
Pages: 604 - 633
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 604-633, June 2022.
Widespread application of modern machine learning has increased the need for robust statistical algorithms. This work studies one such fundamental statistical concept known as the Tukey depth. We study the problem in the continuum (population) limit. In particular, we formally derive the associated necessary conditions, which take the form of a first-order partial differential equation which is necessarily satisfied at points where the Tukey depth is smooth. We discuss the interpretation of this formal necessary condition in terms of the viscosity solution of a Hamilton--Jacobi equation, but with a nonclassical Hamiltonian with discontinuous dependence on the gradient at zero. We prove that this equation possesses a unique viscosity solution and that this solution always bounds the Tukey depth from below. In certain cases we prove that the Tukey depth is equal to the viscosity solution, and we give some illustrations of standard numerical methods from the optimal control community which deal directly with the partial differential equation. We conclude by outlining several promising research directions both in terms of new numerical algorithms and theoretical challenges.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-05-10T07:00:00Z
DOI: 10.1137/21M1411998
Issue No: Vol. 4, No. 2 (2022)
-
- Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Samuel Horváth, Lihua Lei, Peter Richtárik, Michael I. Jordan
Pages: 634 - 648
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 634-648, June 2022.
Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step size schemes and batch sizes, in different regimes. Despite the appealing theoretical results, such divisive strategies provide little, if any, insight to practitioners to select algorithms that work broadly without tweaking the hyperparameters. In this work, blending the “geometrization” technique introduced by [L. Lei and M. I. Jordan, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017, pp. 148--156] and the SARAH algorithm of [L. M. Nguyen, J. Liu, K. Scheinberg, and M. Takáč, Proceedings of the 34th International Conference on Machine Learning, 2017, pp. 2613--2621], we propose the geometrized SARAH algorithm for nonconvex finite-sum and stochastic optimization. Our algorithm is proved to achieve adaptivity to both the magnitude of the target accuracy and the Polyak--Łojasiewicz (PL) constant, if present. In addition, it achieves the best-available convergence rate for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-05-12T07:00:00Z
DOI: 10.1137/21M1394308
Issue No: Vol. 4, No. 2 (2022)
-
- Block Bregman Majorization Minimization with Extrapolation
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis, Masoud Ahookhosh, Panagiotis Patrinos
Pages: 1 - 25
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 1-25, January 2022.
In this paper, we consider a class of nonsmooth nonconvex optimization problems whose objective is the sum of a block relative smooth function and a proper and lower semicontinuous block separable function. Although the analysis of block proximal gradient (BPG) methods for the class of block $L$-smooth functions has been successfully extended to Bregman BPG methods that deal with the class of block relative smooth functions, accelerated Bregman BPG methods are scarce and challenging to design. Taking our inspiration from Nesterov-type acceleration and the majorization-minimization scheme, we propose a block alternating Bregman majorization minimization framework with extrapolation (BMME). We prove subsequential convergence of BMME to a first-order stationary point under mild assumptions and study its global convergence under stronger conditions. We illustrate the effectiveness of BMME on the penalized orthogonal nonnegative matrix factorization problem.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-01-13T08:00:00Z
DOI: 10.1137/21M1432661
Issue No: Vol. 4, No. 1 (2022)
-
- Sensitivity-Informed Provable Pruning of Neural Networks
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Cenk Baykal, Lucas Liebenwein, Igor Gilitschenski, Dan Feldman, Daniela Rus
Pages: 26 - 45
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 26-45, January 2022.
We introduce a family of pruning algorithms that sparsifies the parameters of a trained model in a way that approximately preserves the model's predictive accuracy. Our algorithms use a small batch of input points to construct a data-informed importance sampling distribution over the network's parameters and use either a sampling-based or a deterministic pruning procedure, or an adaptive mixture of both, to discard redundant weights. Our methods are simultaneously computationally efficient, provably accurate, and broadly applicable to various network architectures and data distributions. The presented approaches are simple to implement and can be easily integrated into standard prune-retrain pipelines. We present empirical comparisons showing that our algorithms reliably generate highly compressed networks that incur minimal loss in performance, regardless of whether the original network is fully trained or randomly initialized.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-01-13T08:00:00Z
DOI: 10.1137/20M1383239
Issue No: Vol. 4, No. 1 (2022)
-
- Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor
Approach with Automatic Rank Determination-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Cole Hawkins, Xing Liu, Zheng Zhang
Pages: 46 - 71
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 46-71, January 2022.
Post-training model compression can reduce the inference costs of deep neural networks, but uncompressed training still consumes enormous hardware resources and energy. To enable low-energy training on edge devices, it is highly desirable to directly train a compact neural network from scratch with a low memory cost. Low-rank tensor decomposition is an effective approach to reduce the memory and computing costs of large neural networks. However, directly training low-rank tensorized neural networks is a very challenging task because it is hard to determine a proper tensor rank a priori, and the tensor rank controls both model complexity and accuracy. This paper presents a novel end-to-end framework for low-rank tensorized training. We first develop a Bayesian model that supports various low-rank tensor formats (e.g., CANDECOMP/PARAFAC, Tucker, tensor-train, and tensor-train matrix) and reduces neural network parameters with automatic rank determination during training. Then we develop a customized Bayesian solver to train large-scale tensorized neural networks. Our training methods shows orders-of-magnitude parameter reduction and little accuracy loss (or even better accuracy) in the experiments. On a very large deep learning recommendation system with over $4.2\times 10^9$ model parameters, our method can reduce the parameter number to $1.6\times 10^5$ automatically in the training process (i.e., by $2.6\times 10^4$ times) while achieving almost the same accuracy. Code is available at https://github.com/colehawkins/bayesian-tensor-rank-determination.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-01-13T08:00:00Z
DOI: 10.1137/21M1391444
Issue No: Vol. 4, No. 1 (2022)
-
- Balancing Geometry and Density: Path Distances on High-Dimensional Data
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Anna Little, Daniel McKenzie, James M. Murphy
Pages: 72 - 99
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 72-99, January 2022.
New geometric and computational analyses of power-weighted shortest path distances (PWSPDs) are presented. By illuminating the way these metrics balance geometry and density in the underlying data, we clarify their key parameters and illustrate how they provide multiple perspectives for data analysis. Comparisons are made with related data-driven metrics, which illustrate the broader role of density in kernel-based unsupervised and semisupervised machine learning. Computationally, we relate PWSPDs on complete weighted graphs to their analogues on weighted nearest neighbor graphs, providing high probability guarantees on their equivalence that are near-optimal. Connections with percolation theory are developed to establish estimates on the bias and variance of PWSPDs in the finite sample setting. The theoretical results are bolstered by illustrative experiments, demonstrating the versatility of PWSPDs for a wide range of data settings. Throughout the paper, our results generally require only that the underlying data is sampled from a compact low-dimensional manifold, and depend most crucially on the intrinsic dimension of this manifold, rather than its ambient dimension.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-01-24T08:00:00Z
DOI: 10.1137/20M1386657
Issue No: Vol. 4, No. 1 (2022)
-
- A Unifying Generative Model for Graph Learning Algorithms: Label
Propagation, Graph Convolutions, and Combinations-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Junteng Jia, Austin R. Benson
Pages: 100 - 125
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 100-125, March 2022.
Semi-supervised learning on graphs is a widely applicable problem in network science and machine learning. Two standard algorithms---label propagation and graph neural networks---both operate by repeatedly passing information along edges, the former by passing labels and the latter by passing node features, modulated by neural networks. These two types of algorithms have largely developed separately, and there is little understanding about the structure of network data that would make one of these approaches work particularly well compared to the other or when the approaches can be meaningfully combined. Here, we develop a Markov random field model for the data generation process of node attributes, based on correlations of attributes on and between vertices, that motivates and unifies these algorithmic approaches. We show that label propagation, a linearized graph convolutional network, and their combination can all be derived as conditional expectations under our model. In addition, the data model highlights problems with existing graph neural networks (and provides solutions), serves as a rigorous statistical framework for understanding issues such as over-smoothing, creates a testbed for evaluating inductive learning performance, and provides a way to sample graph attributes that resemble empirical data. We also find that a new algorithm derived from our data generation model, which we call a linear graph convolution, performs extremely well in practice on empirical data, and we provide theoretical justification for why this is the case.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-02-08T08:00:00Z
DOI: 10.1137/21M1395351
Issue No: Vol. 4, No. 1 (2022)
-
- Dimensionality Reduction, Regularization, and Generalization in
Overparameterized Regressions-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Ningyuan TeresaHuang, David W. Hogg, Soledad Villar
Pages: 126 - 152
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 126-152, March 2022.
Overparameterization in deep learning is powerful: Very large models fit the training data perfectly and yet often generalize well. This realization brought back the study of linear models for regression, including ordinary least squares (OLS), which, like deep learning, shows a “double-descent” behavior: (1) The risk (expected out-of-sample prediction error) can grow arbitrarily when the number of parameters $p$ approaches the number of samples $n$, and (2) the risk decreases with $p$ for $p>n$, sometimes achieving a lower value than the lowest risk for $p
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-02-10T08:00:00Z
DOI: 10.1137/20M1387821
Issue No: Vol. 4, No. 1 (2022)
-
- Positive Semi-definite Embedding for Dimensionality Reduction and
Out-of-Sample Extensions-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Michaël Fanuel, Antoine Aspeel, Jean-Charles Delvenne, Johan A. K. Suykens
Pages: 153 - 178
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 153-178, March 2022.
In machine learning or statistics, it is often desirable to reduce the dimensionality of a sample of data points in a high dimensional space $\mathbb{R}^d$. This paper introduces a dimensionality reduction method where the embedding coordinates are the eigenvectors of a positive semi-definite kernel obtained as the solution of an infinite dimensional analogue of a semi-definite program. This embedding is adaptive and non-linear. We discuss this problem both with weak and strong smoothness assumptions about the learned kernel. A main feature of our approach is the existence of an out-of-sample extension formula of the embedding coordinates in both cases. This extrapolation formula yields an extension of the kernel matrix to a data-dependent Mercer kernel function. Our empirical results indicate that this embedding method is more robust with respect to the influence of outliers compared with a spectral embedding method.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-02-10T08:00:00Z
DOI: 10.1137/20M1370653
Issue No: Vol. 4, No. 1 (2022)
-
- Wasserstein Barycenters Are NP-Hard to Compute
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Jason M. Altschuler, Enric Boix-Adserà
Pages: 179 - 203
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 179-203, March 2022.
Computing Wasserstein barycenters (a.k.a. optimal transport barycenters) is a fundamental problem in geometry which has recently attracted considerable attention due to many applications in data science. While there exist polynomial-time algorithms in any fixed dimension, all known running times suffer exponentially in the dimension. It is an open question whether this exponential dependence is improvable to a polynomial dependence. This paper proves that unless ${P} = {NP}$, the answer is no. This uncovers a “curse of dimensionality” for Wasserstein barycenter computation which does not occur for optimal transport computation. Moreover, our hardness results for computing Wasserstein barycenters extend to approximate computation, to seemingly simple cases of the problem, and to averaging probability distributions in other optimal transport metrics.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-02-10T08:00:00Z
DOI: 10.1137/21M1390062
Issue No: Vol. 4, No. 1 (2022)
-
- Identifying 3D Genome Organization in Diploid Organisms via Euclidean
Distance Geometry-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Anastasiya Belyaeva, Kaie Kubjas, Lawrence J. Sun, Caroline Uhler
Pages: 204 - 228
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 204-228, March 2022.
The spatial organization of the genome in the cell nucleus plays an important role for gene regulation, replication of the deoxyribonucleic acid (DNA), and genomic integrity. Through the development of chromosome conformation capture experiments (such as 3C, 4C, and Hi-C) it is now possible to obtain the contact frequencies of the DNA at the whole-genome level. In this paper, we study the problem of reconstructing the three-dimensional (3D) organization of the genome from such whole-genome contact frequencies. A standard approach is to transform the contact frequencies into noisy distance measurements and then apply semidefinite programming formulations to obtain the 3D configuration. However, neglected in such reconstructions is the fact that most eukaryotes including humans are diploid and therefore contain two copies of each genomic locus. We prove that the 3D organization of the DNA is not identifiable from the distance measurements derived from contact frequencies in diploid organisms. In fact, there are infinitely many solutions even in the noise-free setting. We then discuss various additional biologically relevant and experimentally measurable constraints (including distances between neighboring genomic loci and higher-order interactions) and prove identifiability under these conditions. Furthermore, we provide semidefinite programming formulations for computing the 3D embedding of the DNA with these additional constraints and show that we can recover the true 3D embedding with high accuracy from both noiseless and noisy measurements. Finally, we apply our algorithm to real pairwise and higher-order contact frequency data and show that we can recover known genome organization patterns.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-02-28T08:00:00Z
DOI: 10.1137/21M1390372
Issue No: Vol. 4, No. 1 (2022)
-
- Randomized Wasserstein Barycenter Computation: Resampling with Statistical
Guarantees-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Florian Heinemann, Axel Munk, Yoav Zemel
Pages: 229 - 259
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 229-259, March 2022.
We propose a hybrid resampling method to approximate finitely supported Wasserstein barycenters on large-scale datasets, which can be combined with any exact solver. Nonasymptotic bounds on the expected error of the objective value as well as the barycenters themselves allow one to calibrate computational cost and statistical accuracy. The rate of these upper bounds is shown to be optimal and independent of the underlying dimension, which appears only in the constants. Using a simple modification of the subgradient descent algorithm of Cuturi and Doucet, we showcase the applicability of our method on myriad simulated datasets, as well as a real-data example from cell microscopy, which are out of reach for state-of-the-art algorithms for computing Wasserstein barycenters.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-02-28T08:00:00Z
DOI: 10.1137/20M1385263
Issue No: Vol. 4, No. 1 (2022)
-
- Binary Classification of Gaussian Mixtures: Abundance of Support Vectors,
Benign Overfitting, and Regularization-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Ke Wang, Christos Thrampoulidis
Pages: 260 - 284
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 260-284, March 2022.
Deep neural networks generalize well despite being exceedingly overparameterized and being trained without explicit regularization. This curious phenomenon has inspired extensive research activity in establishing its statistical principles: Under what conditions is it observed' How do these depend on the data and on the training algorithm' When does regularization benefit generalization' While such questions remain wide open for deep neural nets, recent works have attempted gaining insights by studying simpler, often linear, models. Our paper contributes to this growing line of work by examining binary linear classification under a generative Gaussian mixture model in which the feature vectors take the form ${{\it x}}=\pm{{\eta}}+{{\it q}}$, where for a mean vector $\eta$ and feature noise ${{\it q}} \sim \mathcal{N}(0,{{\Sigma}})$. Motivated by recent results on the implicit bias of gradient descent, we study both max-margin support vector machine (SVM) classifiers (corresponding to logistic loss) and min-norm interpolating classifiers (corresponding to least-squares loss). First, we leverage an idea introduced in [V. Muthukumar et al., arXiv:2005.08054, 2020a] to relate the SVM solution to the min-norm interpolating solution. Second, we derive novel nonasymptotic bounds on the classification error of the latter. Combining the two, we present novel sufficient conditions on the covariance spectrum and on the signal-to-noise ratio (SNR) $SNR={ {{\eta}} _2^4}/{{\eta}}^T{{\Sigma\eta}}$ under which interpolating estimators achieve asymptotically optimal performance as overparameterization increases. Interestingly, our results extend to a noisy model with constant probability noise flips. Contrary to previously studied discriminative data models, our results emphasize the crucial role of the SNR and its interplay with the data covariance. Finally, via a combination of analytical arguments and numerical demonstrations we identify conditions under which the interpolating estimator performs better than corresponding regularized estimates.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-03-03T08:00:00Z
DOI: 10.1137/21M1415121
Issue No: Vol. 4, No. 1 (2022)
-
- Fast Cluster Detection in Networks by First Order Optimization
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Immanuel M. Bomze, Francesco Rinaldi, Damiano Zeffiro
Pages: 285 - 305
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 285-305, March 2022.
Cluster detection plays a fundamental role in the analysis of data. In this paper, we focus on the use of $s$-defective clique models for network-based cluster detection and propose a nonlinear optimization approach that efficiently handles those models in practice. In particular, we introduce an equivalent continuous formulation for the problem under analysis, and we analyze some tailored variants of the Frank--Wolfe algorithm that enable us to quickly find maximal $s$-defective cliques. The good practical behavior of those algorithmic tools, which is closely connected to their support identification properties, makes them very appealing in practical applications. The reported numerical results clearly show the effectiveness of the proposed approach.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-03-03T08:00:00Z
DOI: 10.1137/21M1408658
Issue No: Vol. 4, No. 1 (2022)
-
- A Generative Variational Model for Inverse Problems in Imaging
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Andreas Habring, Martin Holler
Pages: 306 - 335
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 306-335, March 2022.
This paper is concerned with the development, analysis, and numerical realization of a novel variational model for the regularization of inverse problems in imaging. The proposed model is inspired by the architecture of generative convolutional neural networks; it aims to generate the unknown from variables in a latent space via multilayer convolutions and nonlinear penalties, and penalizes an associated cost. In contrast to conventional neural-network-based approaches, however, the convolution kernels are learned directly from the measured data such that no training is required. The present work provides a mathematical analysis of the proposed model in a function space setting, including proofs for regularity and existence/stability of solutions, and convergence for vanishing noise. Moreover, in a discretized setting, a numerical algorithm for solving various types of inverse problems with the proposed model is derived. Numerical results are provided for applications in inpainting, denoising, deblurring under noise, superresolution, and JPEG decompression with multiple test images.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-03-03T08:00:00Z
DOI: 10.1137/21M1414978
Issue No: Vol. 4, No. 1 (2022)
-
- Overcomplete Order-3 Tensor Decomposition, Blind Deconvolution, and
Gaussian Mixture Models-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Haolin Chen, Luis Rademacher
Pages: 336 - 361
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 336-361, March 2022.
We propose a new algorithm for tensor decomposition, based on the simultaneous diagonalization algorithm, and apply our new algorithmic ideas to blind deconvolution and Gaussian mixture models. Our first contribution is a simple and efficient algorithm to decompose certain symmetric overcomplete order-3 tensors, that is, three dimensional arrays of the form $T = \sum_{i=1}^n a_i \otimes a_i \otimes a_i$ where the $a_i$s are not linearly independent. Our algorithm comes with a detailed robustness analysis. Our second contribution builds on top of our tensor decomposition algorithm to expand the family of Gaussian mixture models whose parameters can be estimated efficiently. These ideas are also presented in a more general framework of blind deconvolution that makes them applicable to mixture models of identical but very general distributions, including all centrally symmetric distributions with finite 6th moment.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-03-09T08:00:00Z
DOI: 10.1137/21M1399415
Issue No: Vol. 4, No. 1 (2022)
-
- Robust Classification Under $\ell_0$ Attack for the Gaussian Mixture Model
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Payam Delgosha, Hamed Hassani, Ramtin Pedarsani
Pages: 362 - 385
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 362-385, March 2022.
It is well known that machine learning models are vulnerable to small but cleverly designed adversarial perturbations that can cause misclassification. While there has been major progress in designing attacks and defenses for various adversarial settings, many fundamental and theoretical problems are yet to be resolved. In this paper, we consider classification in the presence of $\ell_0$-bounded adversarial perturbations, also known as sparse attacks. This setting is significantly different from other $\ell_p$-adversarial settings with $p\geq1$, as the $\ell_0$-ball is nonconvex and highly nonsmooth. Under the assumption that data is distributed according to the Gaussian mixture model, our goal is to characterize the optimal robust classifier and the corresponding robust classification error as well as a variety of tradeoffs between robustness, accuracy, and the adversary's budget. To this end, we develop a novel classification algorithm called FilTrun that has two main modules: filtration and truncation. The key idea of our method is to first filter out the nonrobust coordinates of the input and then apply a carefully designed truncated inner product for classification. By analyzing the performance of FilTrun, we derive an upper bound on the optimal robust classification error. We further find a lower bound by designing a specific adversarial strategy that enables us to derive the corresponding robust classifier and its achieved error. For the case that the covariance matrix of the Gaussian mixtures is diagonal, we show that as the input's dimension gets large, the upper and lower bounds converge; i.e., we characterize the asymptotically optimal robust classifier. Throughout, we discuss several examples that illustrate interesting behaviors such as the existence of a phase transition for the adversary's budget determining whether the effect of adversarial perturbation can be fully neutralized or not.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-03-15T07:00:00Z
DOI: 10.1137/21M1426286
Issue No: Vol. 4, No. 1 (2022)
-
- A Generalized CUR Decomposition for Matrix Pairs
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Perfect Y. Gidisu, Michiel E. Hochstenbach
Pages: 386 - 409
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 386-409, March 2022.
We propose a generalized CUR (GCUR) decomposition for matrix pairs $(A,B)$. Given matrices $A$ and $B$ with the same number of columns, such a decomposition provides low-rank approximations of both matrices simultaneously in terms of some of their rows and columns. We obtain the indices for selecting the subset of rows and columns of the original matrices using the discrete empirical interpolation method (DEIM) on the generalized singular vectors. When $B$ is square and nonsingular, there are close connections between the GCUR of $(A,B)$ and the DEIM-induced CUR of $AB^{-1}$. When $B$ is the identity, the GCUR decomposition of $A$ coincides with the DEIM-induced CUR decomposition of $A$. We also show similar connection between the GCUR of $(A,B)$ and the CUR of $AB^+$ for a nonsquare but full-rank matrix $B$, where $B^+$ denotes the Moore--Penrose pseudoinverse of $B$. While a CUR decomposition acts on one data set, a GCUR factorization jointly decomposes two data sets. The algorithm may be suitable for applications where one is interested in extracting the most discriminative features from one data set relative to another data set. In numerical experiments, we demonstrate the advantages of the new method over the standard CUR approximation for recovering data perturbed with colored noise and subgroup discovery.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-03-17T07:00:00Z
DOI: 10.1137/21M1432119
Issue No: Vol. 4, No. 1 (2022)
-
- Spectral Discovery of Jointly Smooth Features for Multimodal Data
-
Free pre-print version: Loading...Rate this result: What is this?Please help us test our new pre-print finding feature by giving the pre-print link a rating.
A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Felix Dietrich, Or Yair, Rotem Mulayoff, Ronen Talmon, Ioannis G. Kevrekidis
Pages: 410 - 430
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 410-430, March 2022.
In this paper, we propose a spectral method for deriving functions that are jointly smooth on multiple observed manifolds. This allows us to register measurements of the same phenomenon by heterogeneous sensors and to reject sensor-specific noise. Our method is unsupervised and primarily consists of two steps. First, using kernels, we obtain a subspace spanning smooth functions on each separate manifold. Then, we apply a spectral method to the obtained subspaces and discover functions that are jointly smooth on all manifolds. We show analytically that our method is guaranteed to provide a set of orthogonal functions that are as jointly smooth as possible, ordered by increasing Dirichlet energy from the smoothest to the least smooth. In addition, we show that the extracted functions can be efficiently extended to unseen data using the Nyström method. We demonstrate the proposed method on both simulated and real measured data and compare the results to nonlinear, kernel-based variants of the seminal canonical correlation analysis. Particularly, we show superior results for sleep stage identification. In addition, we show how the proposed method can be leveraged for finding minimal realizations of parameter spaces of nonlinear dynamical systems.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 2022-03-24T07:00:00Z
DOI: 10.1137/21M141590X
Issue No: Vol. 4, No. 1 (2022)
-