Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Chiwei Yan, James Johndrow, Dawn Woodard, Yanwei Sun Pages: 227 - 253 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 2, Page 227-253, June 2024. Abstract. Modern mobile applications such as navigation services and ride-sharing platforms rely heavily on geospatial technologies, most critically predictions of the time required for a vehicle to traverse a particular route, or the so-called estimated time of arrival (ETA). There are various methods used in practice, which differ in terms of the geographic granularity at which the predictive model is trained—e.g., segment-based methods predict travel time at the level of road segments (or a combination of several adjacent road segments) and then aggregate across the route, whereas route-based methods use generic information about the trip, such as origin and destination, to predict travel time. Though various forms of these methods have been developed, there has been no rigorous theoretical comparison regarding their accuracies, and empirical studies have, in many cases, drawn opposite conclusions. We provide the first theoretical analysis of the predictive accuracy of various ETA prediction methods and argue that maintaining a segment-level architecture in predicting travel time is often of first-order importance. Our work highlights that the accuracy of ETA prediction is driven not just by the sophistication of the model but also by the spatial granularity at which those methods are applied. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-04-04T07:00:00Z DOI: 10.1137/23M155699X Issue No:Vol. 6, No. 2 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Tamir Bendory, Dan Edidin Pages: 254 - 282 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 2, Page 254-282, June 2024. Abstract.Multireference alignment (MRA) is the problem of recovering a signal from its multiple noisy copies, each acted upon by a random group element. MRA is mainly motivated by single-particle cryo–electron microscopy (cryo-EM) that has recently joined X-ray crystallography as one of the two leading technologies to reconstruct biological molecular structures. Previous papers have shown that, in the high-noise regime, the sample complexity of MRA and cryo-EM is [math], where [math] is the number of observations, [math] is the variance of the noise, and [math] is the lowest-order moment of the observations that uniquely determines the signal. In particular, it was shown that, in many cases, [math] for generic signals, and thus, the sample complexity is [math]. In this paper, we analyze the second moment of the MRA and cryo-EM models. First, we show that, in both models, the second moment determines the signal up to a set of unitary matrices whose dimension is governed by the decomposition of the space of signals into irreducible representations of the group. Second, we derive sparsity conditions under which a signal can be recovered from the second moment, implying sample complexity of [math]. Notably, we show that the sample complexity of cryo-EM is [math] if at most one-third of the coefficients representing the molecular structure are nonzero; this bound is near-optimal. The analysis is based on tools from representation theory and algebraic geometry. We also derive bounds on recovering a sparse signal from its power spectrum, which is the main computational problem of X-ray crystallography. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-04-04T07:00:00Z DOI: 10.1137/23M155685X Issue No:Vol. 6, No. 2 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:YoonHaeng Hur, Wenxuan Guo, Tengyuan Liang Pages: 283 - 310 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 2, Page 283-310, June 2024. Abstract. This paper introduces a new simulation-based inference procedure to model and sample from multidimensional probability distributions given access to independent and identically distributed samples, circumventing the usual approaches of explicitly modeling the density function or designing Markov chain Monte Carlo. Motivated by the seminal work on distance and isomorphism between metric measure spaces, we develop a new transform sampler to perform simulation-based inference, which estimates a notion of optimal alignments between two heterogeneous metric measure spaces [math] and [math] from empirical data sets, with estimated maps that approximately push forward one measure [math] to the other [math], and vice versa. We introduce a new notion called the reversible Gromov–Monge (RGM) distance, providing mathematical formalism behind the new sampler. We study the statistical rate of convergence of the new transform sampler, along with several analytic properties of the RGM distance and operator viewpoints of transform sampling. Synthetic and real-world examples showcasing the effectiveness of the new sampler are also demonstrated. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-04-05T07:00:00Z DOI: 10.1137/23M1550384 Issue No:Vol. 6, No. 2 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Adrien Vacher, Boris Muzellec, Francis Bach, François-Xavier Vialard, Alessandro Rudi Pages: 311 - 342 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 2, Page 311-342, June 2024. Abstract.Under smoothness conditions, it was recently shown by Vacher et al. [Proceedings of the 34th Conference on Learning Theory, Proc. Mach. Learn. Res. 134, 2021] that the squared Wasserstein distance between two distributions could be approximately computed in polynomial time with appealing statistical error bounds. In this paper, we propose to extend their result to the problem of estimating in [math] distance the transport map between two distributions. Also building upon the kernelized sum-of-squares approach, a way to model smooth positive functions, we derive a computationally tractable estimator of the transport map. Contrary to the aforementioned work, the dual problem that we solve is closer to the so-called semidual formulation of optimal transport that is known to gain convexity with respect to the linear dual formulation. After deriving a new stability result on the semidual and using localization-like techniques through Gagliardo–Nirenberg inequalities, we manage to prove under the same assumptions as in Vacher et al. that our estimator is minimax optimal up to polylog factors. Then we prove that this estimator can be computed in the worst case in [math] time, where [math] is the number of samples, and show how to improve its practical computation with a Nyström approximation scheme, a classical tool in kernel methods. Finally, we showcase several numerical simulations in medium dimension, where we compute our estimator on simple examples. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-04-15T07:00:00Z DOI: 10.1137/22M1528847 Issue No:Vol. 6, No. 2 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Hao Liu, Wenjing Liao Pages: 343 - 371 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 2, Page 343-371, June 2024. Abstract. Many functions of interest are in a high-dimensional space but exhibit low-dimensional structures. This paper studies regression of an [math]-Hölder function [math] in [math] which varies along a central subspace of dimension [math] while [math]. A direct approximation of [math] in [math] with an [math] accuracy requires the number of samples [math] in the order of [math]. In this paper, we analyze the generalized contour regression (GCR) algorithm for the estimation of the central subspace and use piecewise polynomials for function approximation. GCR is among the best estimators for the central subspace, but its sample complexity is an open question. In this paper, we partially answer this questions by proving that if a variance quantity is exactly known, GCR leads to a mean squared estimation error of [math] for the central subspace. The estimation error of this variance quantity is also given in this paper. The mean squared regression error of [math] is proved to be in the order of [math], where the exponent depends on the dimension of the central subspace [math] instead of the ambient space [math]. This result demonstrates that GCR is effective in learning the low-dimensional central subspace. We also propose a modified GCR with improved efficiency. The convergence rate is validated through several numerical experiments. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-04-22T07:00:00Z DOI: 10.1137/23M1557751 Issue No:Vol. 6, No. 2 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Yu Tian, Renaud Lambiotte Pages: 372 - 399 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 2, Page 372-399, June 2024. Abstract. Complex numbers define the relationship between entities in many situations. A canonical example would be the off-diagonal terms in a Hamiltonian matrix in quantum physics. Recent years have seen an increasing interest to extend the tools of network science when the weight of edges are complex numbers. Here, we focus on the case when the weight matrix is Hermitian, a reasonable assumption in many applications, and investigate both structural and dynamical properties of the networks with complex weights. Building on concepts from signed graphs, we introduce a classification of complex-weighted networks based on the notion of structural balance and illustrate the shared spectral properties within each type. We then apply the results to characterize the dynamics of random walks on complex-weighted networks, where local consensus can be achieved asymptotically when the graph is structurally balanced, while global consensus will be obtained when it is strictly unbalanced. Finally, we explore potential applications of our findings by generalizing the notion of cut and propose an associated spectral clustering algorithm. We also provide further characteristics of the magnetic Laplacian, associating directed networks to complex-weighted ones. The performance of the algorithm is verified on both synthetic and real networks. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-05-02T07:00:00Z DOI: 10.1137/23M1584265 Issue No:Vol. 6, No. 2 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Cédric Gerbelot, Emanuele Troiani, Francesca Mignacco, Florent Krzakala, Lenka Zdeborová Pages: 400 - 427 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 2, Page 400-427, June 2024. Abstract.We prove closed-form equations for the exact high-dimensional asymptotics of a family of first-order gradient-based methods, learning an estimator (e.g., M-estimator, shallow neural network) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match those resulting from the discretization of dynamical mean-field theory equations from statistical physics when applied to the corresponding gradient flow. Our proof method allows us to give an explicit description of how memory kernels build up in the effective dynamics and to include nonseparable update functions, allowing datasets with nonidentity covariance matrices. Finally, we provide numerical implementations of the equations for SGD with generic extensive batch size and constant learning rates. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-05-06T07:00:00Z DOI: 10.1137/23M1594388 Issue No:Vol. 6, No. 2 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Stanislav Minsker, Mohamed Ndaoud, Lang Wang Pages: 428 - 453 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 2, Page 428-453, June 2024. Abstract.We consider the high-dimensional linear regression model and assume that a fraction of the measurements are altered by an adversary with complete knowledge of the data and the underlying distribution. We are interested in a scenario where dense additive noise is heavy-tailed, while the measurement vectors follow a sub-Gaussian distribution. Within this framework, we establish minimax lower bounds for the performance of an arbitrary estimator that depend on the fraction of corrupted observations as well as the tail behavior of the additive noise. Moreover, we design a modification of the so-called square-root Slope estimator with several desirable features: (a) It is provably robust to adversarial contamination and satisfies performance guarantees in the form of sub-Gaussian deviation inequalities that match the lower error bounds, up to logarithmic factors; (b) it is fully adaptive with respect to the unknown sparsity level and the variance of the additive noise; and (c) it is computationally tractable as a solution of a convex optimization problem. To analyze performance of the proposed estimator, we prove several properties of matrices with sub-Gaussian rows that may be of independent interest. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-05-31T07:00:00Z DOI: 10.1137/23M1608690 Issue No:Vol. 6, No. 2 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Yehuda Dar, Daniel LeJeune, Richard G. Baraniuk Pages: 454 - 480 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 2, Page 454-480, June 2024. Abstract.We study a fundamental transfer learning process from source to target linear regression tasks, including overparameterized settings where there are more learned parameters than data samples. The target task learning is addressed by using its training data together with the parameters previously computed for the source task. We define a transfer learning approach to the target task as a linear regression optimization with a regularization on the distance between the to-be-learned target parameters and the already learned source parameters. We analytically characterize the generalization performance of our transfer learning approach and demonstrate its ability to resolve the peak in generalization errors in double descent phenomena of the minimum [math]-norm solution to linear regression. Moreover, we show that for sufficiently related tasks, the optimally tuned transfer learning approach can outperform the optimally tuned ridge regression method, even when the true parameter vector conforms to an isotropic Gaussian prior distribution. Namely, we demonstrate that transfer learning can beat the minimum mean square error (MMSE) solution of the independent target task. Our results emphasize the ability of transfer learning to extend the solution space to the target task and, by that, to have an improved MMSE solution. We formulate the linear MMSE solution to our transfer learning setting and point out its key differences from the common design philosophy to transfer learning. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-05-31T07:00:00Z DOI: 10.1137/23M1563062 Issue No:Vol. 6, No. 2 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Hwanwoo Kim, Daniel Sanz-Alonso, Ruiyi Yang Pages: 1 - 25 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 1, Page 1-25, March 2024. Abstract. This paper integrates manifold learning techniques within a Gaussian process upper confidence bound algorithm to optimize an objective function on a manifold. Our approach is motivated by applications where a full representation of the manifold is not available and querying the objective is expensive. We rely on a point cloud of manifold samples to define a graph Gaussian process surrogate model for the objective. Query points are sequentially chosen using the posterior distribution of the surrogate model given all previous queries. We establish regret bounds in terms of the number of queries and the size of the point cloud. Several numerical examples complement the theory and illustrate the performance of our method. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-02-02T08:00:00Z DOI: 10.1137/22M1529907 Issue No:Vol. 6, No. 1 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Francis Bach Pages: 26 - 50 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 1, Page 26-50, March 2024. Abstract. We consider linear regression problems with a varying number of random projections, where we provably exhibit a double descent curve for a fixed prediction problem, with a high-dimensional analysis based on random matrix theory. We first consider the ridge regression estimator and review earlier results using classical notions from nonparametric statistics, namely, degrees of freedom, also known as effective dimensionality. We then compute asymptotic equivalents of the generalization performance (in terms of squared bias and variance) of the minimum norm least-squares fit with random projections, providing simple expressions for the double descent phenomenon. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-02-05T08:00:00Z DOI: 10.1137/23M1558781 Issue No:Vol. 6, No. 1 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Alec Koppel, Joe Eappen, Sujay Bhatt, Cole Hawkins, Sumitra Ganesh Pages: 51 - 75 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 1, Page 51-75, March 2024. Abstract. A fundamental challenge in Bayesian inference is efficient representation of a target distribution. Many nonparametric approaches do so by sampling a large number of points using variants of Markov chain Monte Carlo (MCMC). We propose an MCMC variant that retains only those posterior samples which exceed a kernelized Stein discrepancy (KSD) threshold, which we call KSD thinning. We establish the convergence and complexity trade-offs for several settings of KSD thinning as a function of the KSD threshold parameter, sample size, and other problem parameters. We provide experimental comparisons against other online nonparametric Bayesian methods that generate low-complexity posterior representations. We observe superior consistency/complexity trade-offs across a range of settings including MCMC sampling on two Bayesian inference problems from the biological sciences, and [math] inference speedup and storage reduction for Bayesian neural networks with no loss of accuracy and no increase in training time. Our code is available at https://github.com/colehawkins/KSD-Thinning. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-02-05T08:00:00Z DOI: 10.1137/22M1510108 Issue No:Vol. 6, No. 1 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Anatoli Juditsky, Arkadi Nemirovski Pages: 76 - 96 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 1, Page 76-96, March 2024. Abstract.A polyhedral estimate [A. Juditsky and A. Nemirovski, Electron. J. Stat., 14 (2020), pp. 458–502; Statistical Inference via Convex Optimization, Princeton University Press, Princeton, NJ, 2020] is a generic efficiently computable nonlinear in observations routine for recovering an unknown signal belonging to a given convex compact set from noisy observation of the signal’s linear image. Risk analysis and optimal design of polyhedral estimates may be addressed through efficient bounding of optimal values of optimization problems. Such problems are typically hard, yet it was shown in [A. Juditsky and A. Nemirovski, Electron. J. Stat., 14 (2020), pp. 458–502] that nearly minimax optimal (“up to logarithmic factors”) estimates can be efficiently constructed when the signal set is an ellitope—a member of a wide family of convex and compact sets of special geometry (see, e.g., [A. Juditsky and A. Nemirovski, Math. Stat. Learn., 1 (2018), pp. 171–225]). The subject of this paper is a new risk analysis for a polyhedral estimate in the situation where the signal set is an intersection of an ellitope and an arbitrary polytope allowing for improved polyhedral estimate design in this situation. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-02-20T08:00:00Z DOI: 10.1137/22M1543331 Issue No:Vol. 6, No. 1 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Elisa Negrini, Levon Nurbekyan Pages: 97 - 126 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 1, Page 97-126, March 2024. Abstract. In this work, we investigate applications of no-collision transportation maps introduced by Nurbekyan et al. in 2020 in manifold learning for image data. Recently, there has been a surge in applying transportation-based distances and features for data representing motion-like or deformation-like phenomena. Indeed, comparing intensities at fixed locations often does not reveal the data structure. No-collision maps and distances developed in [L. Nurbekyan, A. Iannantuono, and A. M. Oberman, J. Sci. Comput., 82 (2020), 45] are sensitive to geometric features similar to optimal transportation (OT) maps but much cheaper to compute due to the absence of optimization. In this work, we prove that no-collision distances provide an isometry between translations (respectively, dilations) of a single probability measure and the translation (respectively, dilation) vectors equipped with a Euclidean distance. Furthermore, we prove that no-collision transportation maps, as well as OT and linearized OT maps, do not in general provide an isometry for rotations. The numerical experiments confirm our theoretical findings and show that no-collision distances achieve similar or better performance on several manifold learning tasks compared to other OT and Euclidean-based methods at a fraction of the computational cost. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-02-21T08:00:00Z DOI: 10.1137/23M1567771 Issue No:Vol. 6, No. 1 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Michał Dereziński, Elizaveta Rebrova Pages: 127 - 153 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 1, Page 127-153, March 2024. Abstract. Sketch-and-project is a framework which unifies many known iterative methods for solving linear systems and their variants, as well as further extensions to nonlinear optimization problems. It includes popular methods such as randomized Kaczmarz, coordinate descent, variants of the Newton method in convex optimization, and others. In this paper, we develop a theoretical framework for obtaining sharp guarantees on the convergence rate of sketch-and-project methods. Our approach is the first to (1) show that the convergence rate improves at least linearly with the sketch size, and even faster when the data matrix exhibits certain spectral decays and (2) allow for sparse sketching matrices, which are more efficient than dense sketches and more robust than subsampling methods. In particular, our results explain an observed phenomenon that a radical sparsification of the sketching matrix does not affect the per iteration convergence rate of sketch-and-project. To obtain our results, we develop new nonasymptotic spectral bounds for the expected sketched projection matrix, which are of independent interest; and we establish a connection between the convergence rates of iterative sketch-and-project solvers and the approximation error of randomized singular value decomposition, which is a widely used one-shot sketching algorithm for low-rank approximation. Our experiments support the theory and demonstrate that even extremely sparse sketches exhibit the convergence properties predicted by our framework. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-02-21T08:00:00Z DOI: 10.1137/23M1545537 Issue No:Vol. 6, No. 1 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Haoyu Zhao, Konstantin Burlachenko, Zhize Li, Peter Richtárik Pages: 154 - 175 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 1, Page 154-175, March 2024. Abstract.Due to the communication bottleneck in distributed and federated learning applications, algorithms using communication compression have attracted significant attention and are widely used in practice. Moreover, the huge number, high heterogeneity, and limited availability of clients result in high client-variance. This paper addresses these two issues together by proposing compressed and client-variance reduced methods COFIG and FRECON. We prove an [math] bound on the number of communication rounds of COFIG in the nonconvex setting, where [math] is the total number of clients, [math] is the number of clients participating in each round, [math] is the convergence error, and [math] is the variance parameter associated with the compression operator. In case of FRECON, we prove an [math] bound on the number of communication rounds. In the convex setting, COFIG converges within [math] communication rounds, which, to the best of our knowledge, is also the first convergence result for compression schemes that do not communicate with all the clients in each round. We stress that neither COFIG nor FRECON needs to communicate with all the clients, and they enjoy the first or faster convergence results for convex and nonconvex federated learning in the regimes considered. Experimental results point to an empirical superiority of COFIG and FRECON over existing baselines. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-03-11T07:00:00Z DOI: 10.1137/23M1553820 Issue No:Vol. 6, No. 1 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Yi Yu, Oscar H. Madrid Padilla, Daren Wang, Alessandro Rinaldo Pages: 176 - 198 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 1, Page 176-198, March 2024. Abstract.We study the following online network change point detection settings: A time series of independent, possibly sparse Bernoulli networks whose distributions might change at an unknown time are observed in a sequential manner, and at each time point, a determination has to be made on whether a change has taken place in the near past. The goal is to detect the change point event (if any has occurred) as quickly as possible, subject to prespecified constraints on the probability or number of false alarms. We propose a CUSUM-based procedure and derive two high-probability upper bounds on its detection delay, i.e., detection delay [math]; [math], under a low-rank assumption; [math], under a block-constancy assumption, where [math], and [math] are the normalized jump size, network size, entrywise sparsity, rank sparsity, and overall type I error upper bound. All the model parameters are allowed to vary as [math], the unknown change point, diverges. We further establish a minimax lower bound on the detection delay. Under the low-rank assumption and when the rank is of constant order or under the block-constancy assumption when the number of blocks [math], we obtain minimax rates. The above upper bounds are achieved by novel procedures proposed in this paper, designed for quick detection under two different forms of type I error control. The first is based on controlling the overall probability of a false alarm when there are no change points, and the second is based on specifying a lower bound on the expected time of the first false alarm. Extensive experiments show that under different scenarios and the aforementioned forms of type I error control, our proposed approaches well outperform state-of-the-art methods. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-03-20T07:00:00Z DOI: 10.1137/22M1529816 Issue No:Vol. 6, No. 1 (2024)

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Authors:Daniel LeJeune, Pratik Patil, Hamid Javadi, Richard G. Baraniuk, Ryan J. Tibshirani Pages: 199 - 225 Abstract: SIAM Journal on Mathematics of Data Science, Volume 6, Issue 1, Page 199-225, March 2024. Abstract. We take a random matrix theory approach to random sketching and show an asymptotic first-order equivalence of the regularized sketched pseudoinverse of a positive semidefinite matrix to a certain evaluation of the resolvent of the same matrix. We focus on real-valued regularization and extend previous results on an asymptotic equivalence of random matrices to the real setting, providing a precise characterization of the equivalence even under negative regularization, including a precise characterization of the smallest nonzero eigenvalue of the sketched matrix, which may be of independent interest. We then further characterize the second-order equivalence of the sketched pseudoinverse. We also apply our results to the analysis of the sketch-and-project method and to sketched ridge regression. Last, we prove that these results generalize to asymptotically free sketching matrices, obtaining the resulting equivalence for orthogonal sketching matrices and comparing our results to several common sketches used in practice. Citation: SIAM Journal on Mathematics of Data Science PubDate: 2024-03-29T07:00:00Z DOI: 10.1137/22M1530264 Issue No:Vol. 6, No. 1 (2024)