Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Francesco Tudisco, Desmond J. Higham Pages: 1 - 21 Abstract: SIAM Journal on Mathematics of Data Science, Volume 5, Issue 1, Page 1-21, March 2023. Abstract. Core-periphery detection is a key task in exploratory network analysis where one aims to find a core, a set of nodes well connected internally and with the periphery, and a periphery, a set of nodes connected only (or mostly) with the core. In this work we propose a model of core-periphery for higher-order networks modeled as hypergraphs, and we propose a method for computing a core-score vector that quantifies how close each node is to the core. In particular, we show that this method solves the corresponding nonconvex core-periphery optimization problem globally to an arbitrary precision. This method turns out to coincide with the computation of the Perron eigenvector of a nonlinear hypergraph operator, suitably defined in terms of the incidence matrix of the hypergraph, generalizing recently proposed centrality models for hypergraphs. We perform several experiments on synthetic and real-world hypergraphs showing that the proposed method outperforms alternative core-periphery detection algorithms, in particular those obtained by transferring established graph methods to the hypergraph setting via clique expansion. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2023-01-25T08:00:00Z DOI: 10.1137/22M1480926 Issue No:Vol. 5, No. 1 (2023)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Martin Hanik, Hans-Christian Hege, Christoph von Tycowicz Pages: 1223 - 1249 Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 4, Page 1223-1249, December 2022. Abstract. Data sets sampled in Lie groups are widespread, and as with multivariate data, it is important for many applications to assess the differences between the sets in terms of their distributions. Indices for this task are usually derived by considering the Lie group as a Riemannian manifold. Then, however, compatibility with the group operation is guaranteed only if a bi-invariant metric exists, which is not the case for most noncompact and noncommutative groups. We show here that if one considers an affine connection structure instead, one obtains bi-invariant generalizations of well-known dissimilarity measures: Hotelling [math] statistic, Bhattacharyya distance, and Hellinger distance. Each of the dissimilarity measures matches its multivariate counterpart for Euclidean data and is translation invariant, so that biases, e.g., through an arbitrary choice of reference, are avoided. We further derive nonparametric two-sample tests that are bi-invariant and consistent. We demonstrate the potential of these dissimilarity measures by performing group tests on data of knee configurations and epidemiological shape data. Significant differences are revealed in both cases. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2022-11-15T08:00:00Z DOI: 10.1137/21M1410373 Issue No:Vol. 4, No. 4 (2022)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Alexander Dunlap, Jean-Christophe Mourrat Pages: 1250 - 1271 Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 4, Page 1250-1271, December 2022. Abstract. Sum-of-norms clustering is a convex optimization problem whose solution can be used for the clustering of multivariate data. We propose and study a localized version of this method, and show in particular that it can separate arbitrarily close balls in the stochastic ball model. More precisely, we prove a quantitative bound on the error incurred in the clustering of disjoint connected sets. Our bound is expressed in terms of the number of datapoints and the localization length of the functional. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2022-11-15T08:00:00Z DOI: 10.1137/21M1448732 Issue No:Vol. 4, No. 4 (2022)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Tao Luo, Zheng Ma, Zhi-Qin John Xu, Yaoyu Zhang Pages: 1272 - 1292 Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 4, Page 1272-1292, December 2022. Abstract. Recent works show the intriguing phenomenon of the frequency principle (F-Principle) that deep neural networks (DNNs) fit the target function from low to high frequency during training, which provides insight into the training and generalization behavior of DNNs in complex tasks. In this paper, through analysis of an infinite-width two-layer NN in the neural tangent kernel regime, we derive the exact differential equation, namely the linear frequency-principle (LFP) model, governing the evolution of NN output function in the frequency domain during training. Our exact computation applies for general activation functions with no assumption on size and distribution of training data. This LFP model unravels that higher frequencies evolve polynomially or exponentially slower than lower frequencies depending on the smoothness/regularity of the activation function. We further bridge the gap between training dynamics and generalization by proving that the LFP model implicitly minimizes a frequency-principle norm (FP-norm) of the learned function, by which higher frequencies are more severely penalized depending on the inverse of their evolution rate. Finally, we derive an a priori generalization error bound controlled by the FP-norm of the target function, which provides a theoretical justification for the empirical results that DNNs often generalize well for low-frequency functions. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2022-11-21T08:00:00Z DOI: 10.1137/21M1444400 Issue No:Vol. 4, No. 4 (2022)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Florian Maire, Pierre Vandekerkhove Pages: 1293 - 1319 Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 4, Page 1293-1319, December 2022. Abstract. A novel strategy that combines a given collection of [math]-reversible Markov kernels is proposed. At each Markov transition, one of the available kernels is selected via a state-dependent probability distribution. In contrast to random-scan type approaches that assume a constant (i.e., state-independent) selection probability distribution, the state-dependent distribution is specified so as to privilege moving according to a kernel which is relevant for the local geometry of the target distribution. This approach leverages paths or other low-dimensional manifolds that are typically present in noise vanishing distributions. Some examples for which we show (theoretically or empirically) that a locally weighted aggregation converges substantially faster and yields smaller asymptotic variances than an equivalent random-scan algorithm are provided. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2022-12-20T08:00:00Z DOI: 10.1137/22M1469626 Issue No:Vol. 4, No. 4 (2022)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Philip A. Etter, Lexing Ying Pages: 1320 - 1346 Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 4, Page 1320-1346, December 2022. Abstract. In the computational sciences, one must often estimate model parameters from data subject to noise and uncertainty, leading to inaccurate results. In order to improve the accuracy of models with noisy parameters, we consider the problem of reducing error in a linear system with the operator corrupted by noise. Our contribution in this paper is to extend the elliptic operator shifting framework from Etter and Ying, 2020 to the general nonsymmetric matrix case. Roughly, the operator shifting technique is a matrix analogue of the James–Stein estimator. The key insight is that a shift of the matrix inverse estimate in an appropriately chosen direction will reduce average error. In our extension, we interrogate a number of questions—namely, whether or not shifting towards the origin for general matrix inverses always reduces error as it does in the elliptic case. We show that this is usually the case, but that there are three key features of the general nonsingular matrices that allow for counterexamples not possible in the symmetric case. We prove that when these possibilities are eliminated by the assumption of noise symmetry and the use of the residual norm as the error metric, the optimal shift is always towards the origin, mirroring results from Etter and Ying, 2020. We also investigate behavior in the small noise regime and other scenarios. We conclude by presenting numerical experiments (with accompanying source code) inspired by reinforcement learning to demonstrate that operator shifting can yield substantial reductions in error. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2022-12-20T08:00:00Z DOI: 10.1137/21M1416849 Issue No:Vol. 4, No. 4 (2022)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Emmanuel Chevallier, Didong Li, Yulong Lu, David Dunson Pages: 1347 - 1368 Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 4, Page 1347-1368, December 2022. Abstract. In many applications, the curvature of the space supporting the data makes the statistical modeling challenging. In this paper we discuss the construction and use of probability distributions wrapped around manifolds using exponential maps. These distributions have already been used on specific manifolds. We describe their construction in the unifying framework of affine locally symmetric spaces. Affine locally symmetric spaces are a broad class of manifolds containing many manifolds encountered in the data sciences. We show that on these spaces, exponential-wrapped distributions enjoy interesting properties for practical use. We provide the generic expression of the Jacobian appearing in these distributions and compute it on two particular examples: Grassmannians and pseudohyperboloids. We illustrate the interest of such distributions in a classification experiment on simulated data. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2022-12-20T08:00:00Z DOI: 10.1137/21M1461551 Issue No:Vol. 4, No. 4 (2022)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Antonin Chambolle, Juan Pablo Contreras Pages: 1369 - 1395 Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 4, Page 1369-1395, December 2022. Abstract. This paper discusses the efficiency of Hybrid Primal-Dual (HPD) type algorithms to approximately solve discrete Optimal Transport (OT) and Wasserstein Barycenter (WB) problems, with and without entropic regularization. Our first contribution is an analysis showing that these methods yield state-of-the-art convergence rates, both theoretically and practically. Next, we extend the HPD algorithm with the linesearch proposed by Malitsky and Pock in 2018 to the setting where the dual space has a Bregman divergence, and the dual function is relatively strongly convex to the Bregman’s kernel. This extension yields a new method for OT and WB problems based on smoothing of the objective that also achieves state-of-the-art convergence rates. Finally, we introduce a new Bregman divergence based on a scaled entropy function that makes the algorithm numerically stable and reduces the smoothing, leading to sparse solutions of OT and WB problems. We complement our findings with numerical experiments and comparisons. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2022-12-20T08:00:00Z DOI: 10.1137/22M1481865 Issue No:Vol. 4, No. 4 (2022)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Sjoerd Dirksen, Shahar Mendelson, Alexander Stollenwerk Pages: 1396 - 1419 Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 4, Page 1396-1419, December 2022. Abstract. We study the problem of generating a hyperplane tessellation of an arbitrary set [math] in [math], ensuring that the Euclidean distance between any two points corresponds to the fraction of hyperplanes separating them up to a prespecified error [math]. We focus on random Gaussian tessellations with uniformly distributed shifts and derive sharp bounds on the number of hyperplanes [math] that are required. This result has natural applications in data dimension reduction—it yields a binary version of the Johnson–Lindenstrauss embedding—and signal processing under coarse quantization. Surprisingly, our lower estimates falsify the conjecture that [math], where [math] is the Gaussian width of [math], is optimal. This conjecture is the natural analogue of a conjecture by Plan and Vershynin on random tessellations of subsets of the Euclidean sphere. As it turns out, the true optimal rate is larger by an order of magnitude in the accuracy parameter [math] and depends in an intricate way on the geometry of the set. In particular, we give an explicit example where [math] is required. To the best of our knowledge, the fact that the optimal error decay rate depends on the geometry of the set [math] is a new phenomenon in the general context of random embeddings. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2022-12-20T08:00:00Z DOI: 10.1137/22M1485826 Issue No:Vol. 4, No. 4 (2022)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Boris Landa, Thomas T. C. K. Zhang, Yuval Kluger Pages: 1420 - 1446 Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 4, Page 1420-1446, December 2022. Abstract. Estimating the rank of a corrupted data matrix is an important task in data analysis, most notably for choosing the number of components in principal component analysis. Significant progress on this task was achieved using random matrix theory by characterizing the spectral properties of large noise matrices. However, utilizing such tools is not straightforward when the data matrix consists of count random variables, e.g., a Poisson matrix, in which case the noise can be heteroskedastic with an unknown variance in each entry. In this work, we focus on a Poisson random matrix with independent entries and propose a simple procedure, termed biwhitening, for estimating the rank of the underlying signal matrix (i.e., the Poisson parameter matrix) without any prior knowledge. Our approach is based on the key observation that one can scale the rows and columns of the data matrix simultaneously so that the spectrum of the corresponding noise agrees with the standard Marchenko–Pastur (MP) law, justifying the use of the MP upper edge as a threshold for rank selection. Importantly, the required scaling factors can be estimated directly from the observations by solving a matrix scaling problem via the Sinkhorn–Knopp algorithm. Aside from the Poisson, our approach is extended to families of distributions that satisfy a quadratic relation between the mean and the variance, such as the generalized Poisson, binomial, negative binomial, gamma, and many others. This quadratic relation can also account for missing entries in the data. We conduct numerical experiments that corroborate our theoretical findings, and showcase the advantage of our approach for rank estimation in challenging regimes. Furthermore, we demonstrate the favorable performance of our approach on several real datasets of single-cell RNA sequencing, high-throughput chromosome conformation capture, and document topic modeling. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2022-12-20T08:00:00Z DOI: 10.1137/21M1456807 Issue No:Vol. 4, No. 4 (2022)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Yehuda Dar, Richard G. Baraniuk Pages: 1447 - 1472 Abstract: SIAM Journal on Mathematics of Data Science, Volume 4, Issue 4, Page 1447-1472, December 2022. Abstract. We study the transfer learning process between two linear regression problems. An important and timely special case is when the regressors are overparameterized and perfectly interpolate their training data. We examine a parameter transfer mechanism whereby a subset of the parameters of the target task solution are constrained to the values learned for a related source task. We analytically characterize the generalization error of the target task in terms of the salient factors in the transfer learning architecture, i.e., the number of examples available, the number of (free) parameters in each of the tasks, the number of parameters transferred from the source to target task, and the relation between the two tasks. Our nonasymptotic analysis shows that the generalization error of the target task follows a two-dimensional double descent trend (with respect to the number of free parameters in each of the tasks) that is controlled by the transfer learning factors. Our analysis points to specific cases where the transfer of parameters is beneficial as a substitute for extra overparameterization (i.e., additional free parameters in the target task). Specifically, we show that the usefulness of a transfer learning setting is fragile and depends on a delicate interplay among the set of transferred parameters, the relation between the tasks, and the true solution. We also demonstrate that overparameterized transfer learning is not necessarily more beneficial when the source task is closer or identical to the target task. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2022-12-20T08:00:00Z DOI: 10.1137/22M1469559 Issue No:Vol. 4, No. 4 (2022)