SIAM Journal on Mathematics of Data Science
Number of Followers: 1 Hybrid journal (It can contain Open Access articles) ISSN (Online) 25770187 Published by Society for Industrial and Applied Mathematics [17 journals] 
 A Nonlinear Matrix Decomposition for Mining the Zeros of Sparse Data

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Lawrence K. Saul
Pages: 431  463
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 431463, June 2022.
We describe a simple iterative solution to a widely recurring problem in multivariate data analysis: given a sparse nonnegative matrix ${\mathbf{X}}$, how to estimate a lowrank matrix ${{\Theta}}$ such that ${{X}} \approx f({{\Theta}})$, where $f$ is an elementwise nonlinearity' We develop a latent variable model for this problem and consider those sparsifying nonlinearities, popular in neural networks, that map all negative values to zero. The model seeks to explain the variability of sparse highdimensional data in terms of a smaller number of degrees of freedom. We show that exact inference in this model is tractable and derive an expectationmaximization (EM) algorithm to estimate the lowrank matrix ${{\Theta}}$. Notably, we do not parameterize ${{\Theta}}$ as a product of smaller matrices to be alternately optimized; instead, we estimate ${{\Theta}}$ directly via the singular value decomposition of matrices that are repeatedly inferred (at each iteration of the EM algorithm) from the model's posterior distribution. We use the model to analyze large sparse matrices that arise from data sets of binary, grayscale, and color images. In all of these cases, we find that the model discovers much lowerrank decompositions than purely linear approaches.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220407T07:00:00Z
DOI: 10.1137/21M1405769
Issue No: Vol. 4, No. 2 (2022)

 What Kinds of Functions Do Deep Neural Networks Learn' Insights from
Variational Spline Theory
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Rahul Parhi, Robert D. Nowak
Pages: 464  489
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 464489, June 2022.
We develop a variational framework to understand the properties of functions learned by fitting deep neural networks with rectified linear unit (ReLU) activations to data. We propose a new function space, which is related to classical bounded variationtype spaces, that captures the compositional structure associated with deep neural networks. We derive a representer theorem showing that deep ReLU networks are solutions to regularized datafitting problems over functions from this space. The function space consists of compositions of functions from the Banach space of secondorder bounded variation in the Radon domain. This Banach space has a sparsitypromoting norm, giving insight into the role of sparsity in deep neural networks. The neural network solutions have skip connections and rankbounded weight matrices, providing new theoretical support for these common architectural choices. The variational problem we study can be recast as a finitedimensional neural network training problem with regularization schemes related to the notions of weight decay and pathnorm regularization. Finally, our analysis builds on techniques from variational spline theory, providing new connections between deep neural networks and splines.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220413T07:00:00Z
DOI: 10.1137/21M1418642
Issue No: Vol. 4, No. 2 (2022)

 Statistical Methods for Minimax Estimation in Linear Models with Unknown
Design Over Finite Alphabets
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Merle Behr, Axel Munk
Pages: 490  513
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 490513, June 2022.
We provide a minimax optimal estimation procedure for ${F}$ and ${\Omega}$ in matrix valued linear models $Y = {{F}} {{\Omega}} + Z$, where the parameter matrix ${\Omega}$ and the design matrix ${F}$ are unknown but the latter takes values in a known finite set. The proposed finite alphabet linear model is justified in a variety of applications, ranging from signal processing to cancer genetics. We show that this allows one to separate ${F}$ and ${\Omega}$ uniquely under weak identifiability conditions, a task which is not doable, in general. To this end we quantify in the noiseless case, that is, $Z = 0$, the perturbation range of $Y$ in order to obtain stable recovery of ${F}$ and ${\Omega}$. Based on this, we derive an iterative Lloyd's type estimation procedure that attains minimax estimation rates for ${\Omega}$ and ${F}$ for Gaussian error matrix $Z$. In contrast to the least squares solution the estimation procedure can be computed efficiently and scales linearly with the total number of observations. We confirm our theoretical results in a simulation study and illustrate it with a genetic sequencing data example.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220425T07:00:00Z
DOI: 10.1137/21M1398860
Issue No: Vol. 4, No. 2 (2022)

 Approximation Bounds for Sparse Programs

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Armin Askari, Alexandre d'Aspremont, Laurent El Ghaoui
Pages: 514  530
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 514530, June 2022.
We show that sparsityconstrained optimization problems over lowdimensional spaces tend to have a small duality gap. We use the ShapleyFolkman theorem to derive both datadriven bounds on the duality gap and an efficient primalization procedure to recover feasible points satisfying these bounds. These error bounds are proportional to the rate of growth of the objective with the target cardinality, which means in particular that the relaxation is nearly tight as soon as the target cardinality is large enough so that only uninformative features are added.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220427T07:00:00Z
DOI: 10.1137/21M1398677
Issue No: Vol. 4, No. 2 (2022)

 Stochastic Geometry to Generalize the Mondrian Process

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Eliza O'Reilly, Ngoc Mai Tran
Pages: 531  552
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 531552, June 2022.
The stable under iteration (STIT) tessellation process is a stochastic process that produces a recursive partition of space with cut directions drawn independently from a distribution over the sphere. The case of random axisaligned cuts is known as the Mondrian process. Random forests and Laplace kernel approximations built from the Mondrian process have led to efficient online learning methods and Bayesian optimization. In this work, we utilize tools from stochastic geometry to resolve some fundamental questions concerning STIT processes in machine learning. First, we show that STIT processes can be efficiently simulated by lifting to a higherdimensional axisaligned Mondrian process. Second, we characterize all possible kernels that STIT processes and their mixtures can approximate. We also give a uniform convergence rate for the approximation error of the STIT kernels to the targeted kernels, completely generalizing the work of Balog et al. [The Mondrian kernel, 2016] from the Mondrian case. Third, we obtain consistency results for STIT forests in density estimation and regression. Finally, we give a precise formula for the density estimator arising from a STIT forest. This allows for precise comparisons between the STIT forest, the STIT kernel, and the targeted kernel in density estimation. Our paper calls for further developments at the novel intersection of stochastic geometry and machine learning.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220502T07:00:00Z
DOI: 10.1137/20M1354490
Issue No: Vol. 4, No. 2 (2022)

 Quantitative Approximation Results for ComplexValued Neural Networks

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Andrei Caragea, Dae Gwan Lee, Johannes Maly, Götz Pfander, Felix Voigtlaender
Pages: 553  580
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 553580, June 2022.
Until recently, applications of neural networks in machine learning have almost exclusively relied on realvalued networks. It was recently observed, however, that complexvalued neural networks (CVNNs) exhibit superior performance in applications in which the input is naturally complexvalued, such as MRI fingerprinting. While the mathematical theory of realvalued networks has, by now, reached some level of maturity, this is far from true for complexvalued networks. In this paper, we analyze the expressivity of complexvalued networks by providing explicit quantitative error bounds for approximating $C^n$ functions on compact subsets of $\mathbb{C}^d$ by CVNNs that employ the modReLU activation function, given by $\sigma(z) = {ReLU}( z  1), {sgn} (z)$, which is one of the most popular complex activation functions used in practice. We show that the derived approximation rates are optimal (up to log factors) in the class of modReLU networks with weights of moderate growth.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220502T07:00:00Z
DOI: 10.1137/21M1429540
Issue No: Vol. 4, No. 2 (2022)

 WassersteinBased Projections with Applications to Inverse Problems

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Howard Heaton, Samy Wu Fung, Alex Tong Lin, Stanley Osher, Wotao Yin
Pages: 581  603
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 581603, June 2022.
Inverse problems consist of recovering a signal from a collection of noisy measurements. These are typically cast as optimization problems, with classic approaches using a data fidelity term and an analytic regularizer that stabilizes recovery. Recent plugandplay (PnP) works propose replacing the operator for analytic regularization in optimization methods by a datadriven denoiser. These schemes obtain stateoftheart results, but at the cost of limited theoretical guarantees. To bridge this gap, we present a new algorithm that takes samples from the manifold of true data as input and outputs an approximation of the projection operator onto this manifold. Under standard assumptions, we prove this algorithm generates a learned operator, called Wassersteinbased projection (WP), that approximates the true projection with high probability. Thus, WPs can be inserted into optimization methods in the same manner as PnP, but now with theoretical guarantees. Provided numerical examples show WPs obtain stateoftheart results for unsupervised PnP signal recovery. All codes for this work can be found at https://github.com/swufung/WassersteinBasedProjections.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220505T07:00:00Z
DOI: 10.1137/20M1376790
Issue No: Vol. 4, No. 2 (2022)

 Tukey Depths and HamiltonJacobi Differential Equations

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Martin MolinaFructuoso, Ryan Murray
Pages: 604  633
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 604633, June 2022.
Widespread application of modern machine learning has increased the need for robust statistical algorithms. This work studies one such fundamental statistical concept known as the Tukey depth. We study the problem in the continuum (population) limit. In particular, we formally derive the associated necessary conditions, which take the form of a firstorder partial differential equation which is necessarily satisfied at points where the Tukey depth is smooth. We discuss the interpretation of this formal necessary condition in terms of the viscosity solution of a HamiltonJacobi equation, but with a nonclassical Hamiltonian with discontinuous dependence on the gradient at zero. We prove that this equation possesses a unique viscosity solution and that this solution always bounds the Tukey depth from below. In certain cases we prove that the Tukey depth is equal to the viscosity solution, and we give some illustrations of standard numerical methods from the optimal control community which deal directly with the partial differential equation. We conclude by outlining several promising research directions both in terms of new numerical algorithms and theoretical challenges.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220510T07:00:00Z
DOI: 10.1137/21M1411998
Issue No: Vol. 4, No. 2 (2022)

 Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Samuel Horváth, Lihua Lei, Peter Richtárik, Michael I. Jordan
Pages: 634  648
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 2, Page 634648, June 2022.
Adaptivity is an important yet understudied property in modern optimization theory. The gap between the stateoftheart theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step size schemes and batch sizes, in different regimes. Despite the appealing theoretical results, such divisive strategies provide little, if any, insight to practitioners to select algorithms that work broadly without tweaking the hyperparameters. In this work, blending the “geometrization” technique introduced by [L. Lei and M. I. Jordan, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017, pp. 148156] and the SARAH algorithm of [L. M. Nguyen, J. Liu, K. Scheinberg, and M. Takáč, Proceedings of the 34th International Conference on Machine Learning, 2017, pp. 26132621], we propose the geometrized SARAH algorithm for nonconvex finitesum and stochastic optimization. Our algorithm is proved to achieve adaptivity to both the magnitude of the target accuracy and the PolyakŁojasiewicz (PL) constant, if present. In addition, it achieves the bestavailable convergence rate for nonPL objectives simultaneously while outperforming existing algorithms for PL objectives.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220512T07:00:00Z
DOI: 10.1137/21M1394308
Issue No: Vol. 4, No. 2 (2022)

 Block Bregman Majorization Minimization with Extrapolation

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis, Masoud Ahookhosh, Panagiotis Patrinos
Pages: 1  25
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 125, January 2022.
In this paper, we consider a class of nonsmooth nonconvex optimization problems whose objective is the sum of a block relative smooth function and a proper and lower semicontinuous block separable function. Although the analysis of block proximal gradient (BPG) methods for the class of block $L$smooth functions has been successfully extended to Bregman BPG methods that deal with the class of block relative smooth functions, accelerated Bregman BPG methods are scarce and challenging to design. Taking our inspiration from Nesterovtype acceleration and the majorizationminimization scheme, we propose a block alternating Bregman majorization minimization framework with extrapolation (BMME). We prove subsequential convergence of BMME to a firstorder stationary point under mild assumptions and study its global convergence under stronger conditions. We illustrate the effectiveness of BMME on the penalized orthogonal nonnegative matrix factorization problem.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220113T08:00:00Z
DOI: 10.1137/21M1432661
Issue No: Vol. 4, No. 1 (2022)

 SensitivityInformed Provable Pruning of Neural Networks

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Cenk Baykal, Lucas Liebenwein, Igor Gilitschenski, Dan Feldman, Daniela Rus
Pages: 26  45
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 2645, January 2022.
We introduce a family of pruning algorithms that sparsifies the parameters of a trained model in a way that approximately preserves the model's predictive accuracy. Our algorithms use a small batch of input points to construct a datainformed importance sampling distribution over the network's parameters and use either a samplingbased or a deterministic pruning procedure, or an adaptive mixture of both, to discard redundant weights. Our methods are simultaneously computationally efficient, provably accurate, and broadly applicable to various network architectures and data distributions. The presented approaches are simple to implement and can be easily integrated into standard pruneretrain pipelines. We present empirical comparisons showing that our algorithms reliably generate highly compressed networks that incur minimal loss in performance, regardless of whether the original network is fully trained or randomly initialized.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220113T08:00:00Z
DOI: 10.1137/20M1383239
Issue No: Vol. 4, No. 1 (2022)

 Towards Compact Neural Networks via EndtoEnd Training: A Bayesian Tensor
Approach with Automatic Rank Determination
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Cole Hawkins, Xing Liu, Zheng Zhang
Pages: 46  71
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 4671, January 2022.
Posttraining model compression can reduce the inference costs of deep neural networks, but uncompressed training still consumes enormous hardware resources and energy. To enable lowenergy training on edge devices, it is highly desirable to directly train a compact neural network from scratch with a low memory cost. Lowrank tensor decomposition is an effective approach to reduce the memory and computing costs of large neural networks. However, directly training lowrank tensorized neural networks is a very challenging task because it is hard to determine a proper tensor rank a priori, and the tensor rank controls both model complexity and accuracy. This paper presents a novel endtoend framework for lowrank tensorized training. We first develop a Bayesian model that supports various lowrank tensor formats (e.g., CANDECOMP/PARAFAC, Tucker, tensortrain, and tensortrain matrix) and reduces neural network parameters with automatic rank determination during training. Then we develop a customized Bayesian solver to train largescale tensorized neural networks. Our training methods shows ordersofmagnitude parameter reduction and little accuracy loss (or even better accuracy) in the experiments. On a very large deep learning recommendation system with over $4.2\times 10^9$ model parameters, our method can reduce the parameter number to $1.6\times 10^5$ automatically in the training process (i.e., by $2.6\times 10^4$ times) while achieving almost the same accuracy. Code is available at https://github.com/colehawkins/bayesiantensorrankdetermination.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220113T08:00:00Z
DOI: 10.1137/21M1391444
Issue No: Vol. 4, No. 1 (2022)

 Balancing Geometry and Density: Path Distances on HighDimensional Data

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Anna Little, Daniel McKenzie, James M. Murphy
Pages: 72  99
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 7299, January 2022.
New geometric and computational analyses of powerweighted shortest path distances (PWSPDs) are presented. By illuminating the way these metrics balance geometry and density in the underlying data, we clarify their key parameters and illustrate how they provide multiple perspectives for data analysis. Comparisons are made with related datadriven metrics, which illustrate the broader role of density in kernelbased unsupervised and semisupervised machine learning. Computationally, we relate PWSPDs on complete weighted graphs to their analogues on weighted nearest neighbor graphs, providing high probability guarantees on their equivalence that are nearoptimal. Connections with percolation theory are developed to establish estimates on the bias and variance of PWSPDs in the finite sample setting. The theoretical results are bolstered by illustrative experiments, demonstrating the versatility of PWSPDs for a wide range of data settings. Throughout the paper, our results generally require only that the underlying data is sampled from a compact lowdimensional manifold, and depend most crucially on the intrinsic dimension of this manifold, rather than its ambient dimension.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220124T08:00:00Z
DOI: 10.1137/20M1386657
Issue No: Vol. 4, No. 1 (2022)

 A Unifying Generative Model for Graph Learning Algorithms: Label
Propagation, Graph Convolutions, and Combinations
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Junteng Jia, Austin R. Benson
Pages: 100  125
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 100125, March 2022.
Semisupervised learning on graphs is a widely applicable problem in network science and machine learning. Two standard algorithmslabel propagation and graph neural networksboth operate by repeatedly passing information along edges, the former by passing labels and the latter by passing node features, modulated by neural networks. These two types of algorithms have largely developed separately, and there is little understanding about the structure of network data that would make one of these approaches work particularly well compared to the other or when the approaches can be meaningfully combined. Here, we develop a Markov random field model for the data generation process of node attributes, based on correlations of attributes on and between vertices, that motivates and unifies these algorithmic approaches. We show that label propagation, a linearized graph convolutional network, and their combination can all be derived as conditional expectations under our model. In addition, the data model highlights problems with existing graph neural networks (and provides solutions), serves as a rigorous statistical framework for understanding issues such as oversmoothing, creates a testbed for evaluating inductive learning performance, and provides a way to sample graph attributes that resemble empirical data. We also find that a new algorithm derived from our data generation model, which we call a linear graph convolution, performs extremely well in practice on empirical data, and we provide theoretical justification for why this is the case.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220208T08:00:00Z
DOI: 10.1137/21M1395351
Issue No: Vol. 4, No. 1 (2022)

 Dimensionality Reduction, Regularization, and Generalization in
Overparameterized Regressions
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Ningyuan TeresaHuang, David W. Hogg, Soledad Villar
Pages: 126  152
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 126152, March 2022.
Overparameterization in deep learning is powerful: Very large models fit the training data perfectly and yet often generalize well. This realization brought back the study of linear models for regression, including ordinary least squares (OLS), which, like deep learning, shows a “doubledescent” behavior: (1) The risk (expected outofsample prediction error) can grow arbitrarily when the number of parameters $p$ approaches the number of samples $n$, and (2) the risk decreases with $p$ for $p>n$, sometimes achieving a lower value than the lowest risk for $p
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220210T08:00:00Z
DOI: 10.1137/20M1387821
Issue No: Vol. 4, No. 1 (2022)

 Positive Semidefinite Embedding for Dimensionality Reduction and
OutofSample Extensions
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Michaël Fanuel, Antoine Aspeel, JeanCharles Delvenne, Johan A. K. Suykens
Pages: 153  178
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 153178, March 2022.
In machine learning or statistics, it is often desirable to reduce the dimensionality of a sample of data points in a high dimensional space $\mathbb{R}^d$. This paper introduces a dimensionality reduction method where the embedding coordinates are the eigenvectors of a positive semidefinite kernel obtained as the solution of an infinite dimensional analogue of a semidefinite program. This embedding is adaptive and nonlinear. We discuss this problem both with weak and strong smoothness assumptions about the learned kernel. A main feature of our approach is the existence of an outofsample extension formula of the embedding coordinates in both cases. This extrapolation formula yields an extension of the kernel matrix to a datadependent Mercer kernel function. Our empirical results indicate that this embedding method is more robust with respect to the influence of outliers compared with a spectral embedding method.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220210T08:00:00Z
DOI: 10.1137/20M1370653
Issue No: Vol. 4, No. 1 (2022)

 Wasserstein Barycenters Are NPHard to Compute

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Jason M. Altschuler, Enric BoixAdserà
Pages: 179  203
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 179203, March 2022.
Computing Wasserstein barycenters (a.k.a. optimal transport barycenters) is a fundamental problem in geometry which has recently attracted considerable attention due to many applications in data science. While there exist polynomialtime algorithms in any fixed dimension, all known running times suffer exponentially in the dimension. It is an open question whether this exponential dependence is improvable to a polynomial dependence. This paper proves that unless ${P} = {NP}$, the answer is no. This uncovers a “curse of dimensionality” for Wasserstein barycenter computation which does not occur for optimal transport computation. Moreover, our hardness results for computing Wasserstein barycenters extend to approximate computation, to seemingly simple cases of the problem, and to averaging probability distributions in other optimal transport metrics.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220210T08:00:00Z
DOI: 10.1137/21M1390062
Issue No: Vol. 4, No. 1 (2022)

 Identifying 3D Genome Organization in Diploid Organisms via Euclidean
Distance Geometry
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Anastasiya Belyaeva, Kaie Kubjas, Lawrence J. Sun, Caroline Uhler
Pages: 204  228
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 204228, March 2022.
The spatial organization of the genome in the cell nucleus plays an important role for gene regulation, replication of the deoxyribonucleic acid (DNA), and genomic integrity. Through the development of chromosome conformation capture experiments (such as 3C, 4C, and HiC) it is now possible to obtain the contact frequencies of the DNA at the wholegenome level. In this paper, we study the problem of reconstructing the threedimensional (3D) organization of the genome from such wholegenome contact frequencies. A standard approach is to transform the contact frequencies into noisy distance measurements and then apply semidefinite programming formulations to obtain the 3D configuration. However, neglected in such reconstructions is the fact that most eukaryotes including humans are diploid and therefore contain two copies of each genomic locus. We prove that the 3D organization of the DNA is not identifiable from the distance measurements derived from contact frequencies in diploid organisms. In fact, there are infinitely many solutions even in the noisefree setting. We then discuss various additional biologically relevant and experimentally measurable constraints (including distances between neighboring genomic loci and higherorder interactions) and prove identifiability under these conditions. Furthermore, we provide semidefinite programming formulations for computing the 3D embedding of the DNA with these additional constraints and show that we can recover the true 3D embedding with high accuracy from both noiseless and noisy measurements. Finally, we apply our algorithm to real pairwise and higherorder contact frequency data and show that we can recover known genome organization patterns.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220228T08:00:00Z
DOI: 10.1137/21M1390372
Issue No: Vol. 4, No. 1 (2022)

 Randomized Wasserstein Barycenter Computation: Resampling with Statistical
Guarantees
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Florian Heinemann, Axel Munk, Yoav Zemel
Pages: 229  259
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 229259, March 2022.
We propose a hybrid resampling method to approximate finitely supported Wasserstein barycenters on largescale datasets, which can be combined with any exact solver. Nonasymptotic bounds on the expected error of the objective value as well as the barycenters themselves allow one to calibrate computational cost and statistical accuracy. The rate of these upper bounds is shown to be optimal and independent of the underlying dimension, which appears only in the constants. Using a simple modification of the subgradient descent algorithm of Cuturi and Doucet, we showcase the applicability of our method on myriad simulated datasets, as well as a realdata example from cell microscopy, which are out of reach for stateoftheart algorithms for computing Wasserstein barycenters.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220228T08:00:00Z
DOI: 10.1137/20M1385263
Issue No: Vol. 4, No. 1 (2022)

 Binary Classification of Gaussian Mixtures: Abundance of Support Vectors,
Benign Overfitting, and Regularization
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Ke Wang, Christos Thrampoulidis
Pages: 260  284
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 260284, March 2022.
Deep neural networks generalize well despite being exceedingly overparameterized and being trained without explicit regularization. This curious phenomenon has inspired extensive research activity in establishing its statistical principles: Under what conditions is it observed' How do these depend on the data and on the training algorithm' When does regularization benefit generalization' While such questions remain wide open for deep neural nets, recent works have attempted gaining insights by studying simpler, often linear, models. Our paper contributes to this growing line of work by examining binary linear classification under a generative Gaussian mixture model in which the feature vectors take the form ${{\it x}}=\pm{{\eta}}+{{\it q}}$, where for a mean vector $\eta$ and feature noise ${{\it q}} \sim \mathcal{N}(0,{{\Sigma}})$. Motivated by recent results on the implicit bias of gradient descent, we study both maxmargin support vector machine (SVM) classifiers (corresponding to logistic loss) and minnorm interpolating classifiers (corresponding to leastsquares loss). First, we leverage an idea introduced in [V. Muthukumar et al., arXiv:2005.08054, 2020a] to relate the SVM solution to the minnorm interpolating solution. Second, we derive novel nonasymptotic bounds on the classification error of the latter. Combining the two, we present novel sufficient conditions on the covariance spectrum and on the signaltonoise ratio (SNR) $SNR={ {{\eta}} _2^4}/{{\eta}}^T{{\Sigma\eta}}$ under which interpolating estimators achieve asymptotically optimal performance as overparameterization increases. Interestingly, our results extend to a noisy model with constant probability noise flips. Contrary to previously studied discriminative data models, our results emphasize the crucial role of the SNR and its interplay with the data covariance. Finally, via a combination of analytical arguments and numerical demonstrations we identify conditions under which the interpolating estimator performs better than corresponding regularized estimates.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220303T08:00:00Z
DOI: 10.1137/21M1415121
Issue No: Vol. 4, No. 1 (2022)

 Fast Cluster Detection in Networks by First Order Optimization

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Immanuel M. Bomze, Francesco Rinaldi, Damiano Zeffiro
Pages: 285  305
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 285305, March 2022.
Cluster detection plays a fundamental role in the analysis of data. In this paper, we focus on the use of $s$defective clique models for networkbased cluster detection and propose a nonlinear optimization approach that efficiently handles those models in practice. In particular, we introduce an equivalent continuous formulation for the problem under analysis, and we analyze some tailored variants of the FrankWolfe algorithm that enable us to quickly find maximal $s$defective cliques. The good practical behavior of those algorithmic tools, which is closely connected to their support identification properties, makes them very appealing in practical applications. The reported numerical results clearly show the effectiveness of the proposed approach.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220303T08:00:00Z
DOI: 10.1137/21M1408658
Issue No: Vol. 4, No. 1 (2022)

 A Generative Variational Model for Inverse Problems in Imaging

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Andreas Habring, Martin Holler
Pages: 306  335
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 306335, March 2022.
This paper is concerned with the development, analysis, and numerical realization of a novel variational model for the regularization of inverse problems in imaging. The proposed model is inspired by the architecture of generative convolutional neural networks; it aims to generate the unknown from variables in a latent space via multilayer convolutions and nonlinear penalties, and penalizes an associated cost. In contrast to conventional neuralnetworkbased approaches, however, the convolution kernels are learned directly from the measured data such that no training is required. The present work provides a mathematical analysis of the proposed model in a function space setting, including proofs for regularity and existence/stability of solutions, and convergence for vanishing noise. Moreover, in a discretized setting, a numerical algorithm for solving various types of inverse problems with the proposed model is derived. Numerical results are provided for applications in inpainting, denoising, deblurring under noise, superresolution, and JPEG decompression with multiple test images.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220303T08:00:00Z
DOI: 10.1137/21M1414978
Issue No: Vol. 4, No. 1 (2022)

 Overcomplete Order3 Tensor Decomposition, Blind Deconvolution, and
Gaussian Mixture Models
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Haolin Chen, Luis Rademacher
Pages: 336  361
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 336361, March 2022.
We propose a new algorithm for tensor decomposition, based on the simultaneous diagonalization algorithm, and apply our new algorithmic ideas to blind deconvolution and Gaussian mixture models. Our first contribution is a simple and efficient algorithm to decompose certain symmetric overcomplete order3 tensors, that is, three dimensional arrays of the form $T = \sum_{i=1}^n a_i \otimes a_i \otimes a_i$ where the $a_i$s are not linearly independent. Our algorithm comes with a detailed robustness analysis. Our second contribution builds on top of our tensor decomposition algorithm to expand the family of Gaussian mixture models whose parameters can be estimated efficiently. These ideas are also presented in a more general framework of blind deconvolution that makes them applicable to mixture models of identical but very general distributions, including all centrally symmetric distributions with finite 6th moment.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220309T08:00:00Z
DOI: 10.1137/21M1399415
Issue No: Vol. 4, No. 1 (2022)

 Robust Classification Under $\ell_0$ Attack for the Gaussian Mixture Model

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Payam Delgosha, Hamed Hassani, Ramtin Pedarsani
Pages: 362  385
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 362385, March 2022.
It is well known that machine learning models are vulnerable to small but cleverly designed adversarial perturbations that can cause misclassification. While there has been major progress in designing attacks and defenses for various adversarial settings, many fundamental and theoretical problems are yet to be resolved. In this paper, we consider classification in the presence of $\ell_0$bounded adversarial perturbations, also known as sparse attacks. This setting is significantly different from other $\ell_p$adversarial settings with $p\geq1$, as the $\ell_0$ball is nonconvex and highly nonsmooth. Under the assumption that data is distributed according to the Gaussian mixture model, our goal is to characterize the optimal robust classifier and the corresponding robust classification error as well as a variety of tradeoffs between robustness, accuracy, and the adversary's budget. To this end, we develop a novel classification algorithm called FilTrun that has two main modules: filtration and truncation. The key idea of our method is to first filter out the nonrobust coordinates of the input and then apply a carefully designed truncated inner product for classification. By analyzing the performance of FilTrun, we derive an upper bound on the optimal robust classification error. We further find a lower bound by designing a specific adversarial strategy that enables us to derive the corresponding robust classifier and its achieved error. For the case that the covariance matrix of the Gaussian mixtures is diagonal, we show that as the input's dimension gets large, the upper and lower bounds converge; i.e., we characterize the asymptotically optimal robust classifier. Throughout, we discuss several examples that illustrate interesting behaviors such as the existence of a phase transition for the adversary's budget determining whether the effect of adversarial perturbation can be fully neutralized or not.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220315T07:00:00Z
DOI: 10.1137/21M1426286
Issue No: Vol. 4, No. 1 (2022)

 A Generalized CUR Decomposition for Matrix Pairs

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Perfect Y. Gidisu, Michiel E. Hochstenbach
Pages: 386  409
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 386409, March 2022.
We propose a generalized CUR (GCUR) decomposition for matrix pairs $(A,B)$. Given matrices $A$ and $B$ with the same number of columns, such a decomposition provides lowrank approximations of both matrices simultaneously in terms of some of their rows and columns. We obtain the indices for selecting the subset of rows and columns of the original matrices using the discrete empirical interpolation method (DEIM) on the generalized singular vectors. When $B$ is square and nonsingular, there are close connections between the GCUR of $(A,B)$ and the DEIMinduced CUR of $AB^{1}$. When $B$ is the identity, the GCUR decomposition of $A$ coincides with the DEIMinduced CUR decomposition of $A$. We also show similar connection between the GCUR of $(A,B)$ and the CUR of $AB^+$ for a nonsquare but fullrank matrix $B$, where $B^+$ denotes the MoorePenrose pseudoinverse of $B$. While a CUR decomposition acts on one data set, a GCUR factorization jointly decomposes two data sets. The algorithm may be suitable for applications where one is interested in extracting the most discriminative features from one data set relative to another data set. In numerical experiments, we demonstrate the advantages of the new method over the standard CUR approximation for recovering data perturbed with colored noise and subgroup discovery.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220317T07:00:00Z
DOI: 10.1137/21M1432119
Issue No: Vol. 4, No. 1 (2022)

 Spectral Discovery of Jointly Smooth Features for Multimodal Data

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Authors: Felix Dietrich, Or Yair, Rotem Mulayoff, Ronen Talmon, Ioannis G. Kevrekidis
Pages: 410  430
Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 1, Page 410430, March 2022.
In this paper, we propose a spectral method for deriving functions that are jointly smooth on multiple observed manifolds. This allows us to register measurements of the same phenomenon by heterogeneous sensors and to reject sensorspecific noise. Our method is unsupervised and primarily consists of two steps. First, using kernels, we obtain a subspace spanning smooth functions on each separate manifold. Then, we apply a spectral method to the obtained subspaces and discover functions that are jointly smooth on all manifolds. We show analytically that our method is guaranteed to provide a set of orthogonal functions that are as jointly smooth as possible, ordered by increasing Dirichlet energy from the smoothest to the least smooth. In addition, we show that the extracted functions can be efficiently extended to unseen data using the Nyström method. We demonstrate the proposed method on both simulated and real measured data and compare the results to nonlinear, kernelbased variants of the seminal canonical correlation analysis. Particularly, we show superior results for sleep stage identification. In addition, we show how the proposed method can be leveraged for finding minimal realizations of parameter spaces of nonlinear dynamical systems.
Citation: SIAM Journal on Mathematics of Data Science
PubDate: 20220324T07:00:00Z
DOI: 10.1137/21M141590X
Issue No: Vol. 4, No. 1 (2022)
