Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract We provide moment bounds for expressions of the type \((X^{(1)} \otimes \cdots \otimes X^{(d)})^T A (X^{(1)} \otimes \cdots \otimes X^{(d)})\) where \(\otimes \) denotes the Kronecker product and \(X^{(1)}, \ldots , X^{(d)}\) are random vectors with independent, mean 0, variance 1, subgaussian entries. The bounds are tight up to constants depending on d for the case of Gaussian random vectors. Our proof also provides a decoupling inequality for expressions of this type. Using these bounds, we obtain new, improved concentration inequalities for expressions of the form \(\Vert B (X^{(1)} \otimes \cdots \otimes X^{(d)})\Vert _2\) . PubDate: 2022-08-03

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract We prove that if \(I_\ell = [a_\ell ,b_\ell )\) , \(\ell =1,\ldots ,L\) , are disjoint intervals in [0, 1) with the property that the numbers \(1, a_1, \ldots , a_L, b_1, \ldots , b_L\) are linearly independent over \({\mathbb {Q}}\) , then there exist pairwise disjoint sets \(\Lambda _\ell \subset {\mathbb {Z}}\) , \(\ell =1, \ldots , L\) , such that for every \(J \subset \{ 1, \ldots , L \}\) , the system \(\{e^{2\pi i \lambda x} : \lambda \in \cup _{\ell \in J} \, \Lambda _\ell \}\) is a Riesz basis for \(L^2 ( \cup _{\ell \in J} \, I_\ell )\) . Also, we show that for any disjoint intervals \(I_\ell \) , \(\ell =1, \ldots , L\) , contained in [1, N) with \(N \in {\mathbb {N}}\) , the orthonormal basis \(\{e^{2\pi i n x} : n \in {\mathbb {Z}}\}\) of \(L^2[0,1)\) can be complemented by a Riesz basis \(\{e^{2\pi i \lambda x}: \lambda \in \Lambda \}\) for \(L^2(\cup _{\ell =1}^L \, I_{\ell })\) with some set \(\Lambda \subset (\frac{1}{N} {\mathbb {Z}}) \backslash {\mathbb {Z}}\) , in the sense that their union \(\{e^{2\pi i \lambda x} : \lambda \in {\mathbb {Z}}\cup \Lambda \}\) is a Riesz basis for \(L^2 ( [0,1) \cup I_1 \cup \cdots \cup I_L )\) . PubDate: 2022-07-28

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: The main purpose of this paper is to apply the theory of vector lattices and the related abstract modular convergence to the context of Mellin-type kernels and (non)linear vector lattice-valued operators, following the construction of an integral given in earlier papers. PubDate: 2022-07-21

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Neural collapse is an emergent phenomenon in deep learning that was recently discovered by Papyan, Han and Donoho. We propose a simple unconstrained features model in which neural collapse also emerges empirically. By studying this model, we provide some explanation for the emergence of neural collapse in terms of the landscape of empirical risk. PubDate: 2022-07-19

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract G-equivariant convolutional neural networks (GCNNs) is a geometric deep learning model for data defined on a homogeneous G-space \(\mathcal {M}\) . GCNNs are designed to respect the global symmetry in \(\mathcal {M}\) , thereby facilitating learning. In this paper, we analyze GCNNs on homogeneous spaces \(\mathcal {M} = G/K\) in the case of unimodular Lie groups G and compact subgroups \(K \le G\) . We demonstrate that homogeneous vector bundles are the natural setting for GCNNs. We also use reproducing kernel Hilbert spaces (RKHS) to obtain a sufficient criterion for expressing G-equivariant layers as convolutional layers. Finally, stronger results are obtained for some groups via a connection between RKHS and bandwidth. PubDate: 2022-07-11

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper we provide a method for constructing new Riesz bases on separable Hilbert spaces and we use it to prove sufficient conditions for the existence of exponential Riesz bases on domains of \({ {\mathbb {R}} }^d\) . We also apply our results to the construction of weighted exponential Riesz bases on [0, 1]. PubDate: 2022-06-21

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract We present a converse Kramer type sampling theorem over semi-inner product reproducing kernel Banach spaces. Assuming that a sampling expansion holds for every f belonging to a semi-inner product reproducing kernel Banach space \(\mathcal {B}\) for a fixed sequence of interpolating functions \(\{a_j ^{-1} S_j (t)\}_j\) and a subset of sampling points \(\{t_j\}_{j}\) , it results that such sequence must be a \(X^*_d\) -Riesz basis and a sampling basis for the space. Moreover, there exists an equivalent (in norm) reproducing kernel Banach space with a reproducing kernel \(G_{samp}\) such that \(\{\overline{a}_j ^{-1} G_{samp}(t_j,\,.\,)\}_j\) and \(\{a_j ^{-1} S_j(\,.\,)\}_j\) are biorthogonal. These results are a generalization of some known results over reproducing kernel Hilbert spaces. PubDate: 2022-06-20

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract The paper studies processes defined on time domains structured as oriented spatial graphs (or metric graphs, or oriented branched 1-manifolds). This setting can be used, for example, for forecasting models involving branching scenarios. For these processes, a notion of the spectrum degeneracy that takes into account the topology of the graph is introduced. The paper suggests sufficient conditions of uniqueness of extrapolation and recovery from the observations on a single branch. This also implies an analog of sampling theorem for branching processes, i.e., criterions of their recovery from a set of equidistant samples, as well as from a set of equidistant samples from a single branch. PubDate: 2022-06-07

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Semidefinite programming is an important tool to tackle several problems in data science and signal processing, including clustering and community detection. However, semidefinite programs are often slow in practice, so speed up techniques such as sketching are often considered. In the context of community detection in the stochastic block model, Mixon and Xie (IEEE Trans Inform Theory 67(10): 6832–6840, 2021) have recently proposed a sketching framework in which a semidefinite program is solved only on a subsampled subgraph of the network, giving rise to significant computational savings. In this short paper, we provide a positive answer to a conjecture of Mixon and Xie about the statistical limits of this technique for the stochastic block model with two balanced communities. PubDate: 2022-05-06 DOI: 10.1007/s43670-022-00023-9

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract The notion of generalization in classical Statistical Learning is often attached to the postulate that data points are independent and identically distributed (IID) random variables. While relevant in many applications, this postulate may not hold in general, encouraging the development of learning frameworks that are robust to non-IID data. In this work, we consider the regression problem from an Optimal Recovery perspective. Relying on a model assumption comparable to choosing a hypothesis class, a learner aims at minimizing the worst-case error, without recourse to any probabilistic assumption on the data. We first develop a semidefinite program for calculating the worst-case error of any recovery map in finite-dimensional Hilbert spaces. Then, for any Hilbert space, we show that Optimal Recovery provides a formula which is user-friendly from an algorithmic point-of-view, as long as the hypothesis class is linear. Interestingly, this formula coincides with kernel ridgeless regression in some cases, proving that minimizing the average error and worst-case error can yield the same solution. We provide numerical experiments in support of our theoretical findings. PubDate: 2022-04-26 DOI: 10.1007/s43670-022-00022-w

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Inertial algorithms for minimizing nonsmooth and nonconvex functions as the inertial proximal alternating linearized minimization algorithm (iPALM) have demonstrated their superiority with respect to computation time over their non inertial variants. In many problems in imaging and machine learning, the objective functions have a special form involving huge data which encourage the application of stochastic algorithms. While algorithms based on stochastic gradient descent are still used in the majority of applications, recently also stochastic algorithms for minimizing nonsmooth and nonconvex functions were proposed. In this paper, we derive an inertial variant of a stochastic PALM algorithm with variance-reduced gradient estimator, called iSPALM, and prove linear convergence of the algorithm under certain assumptions. Our inertial approach can be seen as generalization of momentum methods widely used to speed up and stabilize optimization algorithms, in particular in machine learning, to nonsmooth problems. Numerical experiments for learning the weights of a so-called proximal neural network and the parameters of Student-t mixture models show that our new algorithm outperforms both stochastic PALM and its deterministic counterparts. PubDate: 2022-04-22 DOI: 10.1007/s43670-022-00021-x

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Zuowei Shen, Haizhao Yang, and Shijun Zhang recently proved a sharp estimate in terms of a modulus of continuity for the error of approximating continuous functions with ReLU-activated deep neural networks in their paper “Optimal approximation rate of ReLU networks in terms of width and depth”, Journal de Mathématiques Pures et Appliqués (2021). The sharpness was established based on a Vapnik-Chervonenkis dimension estimate by showing that for each choice of certain fixed width and depth parameters, no general smaller error bound applies. This leads to counterexamples that rule out smaller error bounds. In principle, the obtained counterexamples can be different for different parameters. However, for a given convergence rate (e.g., determined by a Lipschitz class), the counterexamples can be condensed to a single one using a quantitative variant of the uniform boundedness principle. Such theorems were developed at the institute of Paul Butzer at RWTH Aachen. When network width or depth tends to infinity in certain cases, the condensed counterexamples show that the approximation order is not in the little-o class of the modulus of continuity used in the error estimate, i.e., the convergence cannot be faster than stated. PubDate: 2022-03-23 DOI: 10.1007/s43670-022-00020-y

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Polynomial graph filters and their inverses play important roles in graph signal processing. In this paper, we introduce the concept of multiple commutative graph shifts and polynomial graph filters, which could play similar roles in graph signal processing as the one-order delay and finite impulse response filters in classical multi-dimensional signal processing. We implement the filtering procedure associated with a polynomial graph filter of multiple shifts at the vertex level in a distributed network on which each vertex is equipped with a data processing subsystem for limited computation power and data storage, and a communication subsystem for direct data exchange only to its adjacent vertices. In this paper, we also consider the implementation of inverse filtering procedure associated with a polynomial graph filter of multiple shifts, and we propose two iterative approximation algorithms applicable in a distributed network and in a central facility. We also demonstrate the effectiveness of the proposed algorithms to implement the inverse filtering procedure on denoising time-varying graph signals and a dataset of US hourly temperature at 218 locations. PubDate: 2022-01-29 DOI: 10.1007/s43670-021-00019-x

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract This paper considers fast and provably accurate algorithms for approximating smooth functions on the d-dimensional torus, \(f: \mathbb {T}^d \rightarrow \mathbb {C}\) , that are sparse (or compressible) in the multidimensional Fourier basis. In particular, suppose that the Fourier series coefficients of f, \(\{c_\mathbf{k} (f) \}_{\mathbf{k} \in \mathbb {Z}^d}\) , are concentrated in a given arbitrary finite set \(\mathscr {I} \subset \mathbb {Z}^d\) so that $$\begin{aligned} \min _{\Omega \subset \mathscr {I} ~s.t.~ \left \Omega \right =s }\left\ f - \sum _{\mathbf{k} \in \Omega } c_\mathbf{k} (f) \, \mathbb {e}^{ -2 \pi \mathbb {i} {{\mathbf {k}}}\cdot \circ } \right\ _2 < \epsilon \Vert f \Vert _2 \end{aligned}$$ holds for \(s \ll \left \mathscr {I} \right \) and \(\epsilon \in (0,1)\) small. In such cases we aim to both identify a near-minimizing subset \(\Omega \subset \mathscr {I}\) and accurately approximate its associated Fourier coefficients \(\{ c_\mathbf{k} (f) \}_{\mathbf{k} \in \Omega }\) as rapidly as possible. In this paper we present both deterministic and explicit as well as randomized algorithms for solving this problem using \(\mathscr {O}(s^2 d \log ^c ( \mathscr {I} ))\) -time/memory and \(\mathscr {O}(s d \log ^c ( \mathscr {I} ))\) -time/memory, respectively. Most crucially, all of the methods proposed herein achieve these runtimes while simultaneously satisfying theoretical best s-term approximation guarantees which guarantee their numerical accuracy and robustness to noise for general functions. These results are achieved by modifying several different one-dimensional Sparse Fourier Transform (SFT) methods to subsample a function along a reconstructing rank-1 lattice for the given frequency set \(\mathscr {I} \subset \mathbb {Z}^d\) in order to rapidly identify a near-minimizing subset \(\Omega \subset \mathscr {I}\) as above without having to use anything about the lattice beyond its generating vector. This requires the development of new fast and low-memory frequency identification techniques capable of rapidly recovering vector-valued frequencies in \(\mathbb {Z}^d\) as opposed to recovering simple integer frequencies as required in the univariate setting. Two different multivariate frequency identification strategies are proposed, analyzed, and shown to lead to their own best s-term approximation methods herein, each with different accuracy versus computational speed and memory tradeoffs. PubDate: 2021-12-13 DOI: 10.1007/s43670-021-00018-y

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper, we study the error behavior of the nonequispaced fast Fourier transform (NFFT). This approximate algorithm is mainly based on the convenient choice of a compactly supported window function. So far, various window functions have been used and new window functions have recently been proposed. We present novel error estimates for NFFT with compactly supported, continuous window functions and derive rules for convenient choice from the parameters involved in NFFT. The error constant of a window function depends mainly on the oversampling factor and the truncation parameter. PubDate: 2021-11-03 DOI: 10.1007/s43670-021-00017-z

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this work we consider the problem of identification and reconstruction of doubly-dispersive channel operators which are given by finite linear combinations of time-frequency shifts. Such operators arise as time-varying linear systems for example in radar and wireless communications. In particular, for information transmission in highly non-stationary environments the channel needs to be estimated quickly with identification signals of short duration and for vehicular application simultaneous high-resolution radar is desired as well. We consider the time-continuous setting and prove an exact resampling reformulation of the involved channel operator when applied to a trigonometric polynomial as identifier in terms of sparse linear combinations of real-valued atoms. Motivated by recent works of Heckel et al. we present an exact approach for off-the-grid super-resolution which allows to perform the identification with realizable signals having compact support. Then we show how an alternating descent conditional gradient algorithm can be adapted to solve the reformulated problem. Numerical examples demonstrate the performance of this algorithm, in particular in comparison with a simple adaptive grid refinement strategy and an orthogonal matching pursuit algorithm. PubDate: 2021-10-13 DOI: 10.1007/s43670-021-00016-0

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract The massive size of modern neural networks has motivated substantial recent interest in neural network quantization, especially low-bit quantization. We introduce Stochastic Markov Gradient Descent (SMGD), a discrete optimization method applicable to training quantized neural networks. The SMGD algorithm is designed for settings where memory is highly constrained during training. We provide theoretical guarantees of algorithm performance as well as encouraging numerical results. PubDate: 2021-10-11 DOI: 10.1007/s43670-021-00015-1

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Discrete tori are \({\mathbb {Z}}_m^N\) thought of as vertices of graphs \({\mathcal {C}}_m^N\) whose adjacencies encode the Cartesian product structure. Space-limiting refers to truncation to a symmetric path neighborhood of the zero element and spectrum-limiting in this case refers to corresponding truncation in the isomorphic Fourier domain. Composition spatio-spectral limiting (SSL) operators are analogues of classical time and band limiting operators. Certain adjacency-invariant spaces of vectors defined on \({\mathbb {Z}}_m^N\) are shown to have bases consisting of Fourier transforms of eigenvectors of SSL operators. We show that when \(m=3\) or \(m=4\) , all eigenvectors of SSL arise in this way. We study the structure of corresponding invariant spaces when \(m\ge 5\) and give an example to indicate that the relationship between eigenvectors of SSL and the corresponding adjacency-invariant spaces should extend to \(m\ge 5\) . PubDate: 2021-08-03 DOI: 10.1007/s43670-021-00014-2

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper we study \(L_2\) -norm sampling discretization and sampling recovery of complex-valued functions in RKHS on \(D \subset \mathbb {R}^d\) based on random function samples. We only assume the finite trace of the kernel (Hilbert–Schmidt embedding into \(L_2\) ) and provide several concrete estimates with precise constants for the corresponding worst-case errors. In general, our analysis does not need any additional assumptions and also includes the case of non-Mercer kernels and also non-separable RKHS. The fail probability is controlled and decays polynomially in n, the number of samples. Under the mild additional assumption of separability we observe improved rates of convergence related to the decay of the singular values. Our main tool is a spectral norm concentration inequality for infinite complex random matrices with independent rows complementing earlier results by Rudelson, Mendelson, Pajor, Oliveira and Rauhut. PubDate: 2021-07-20 DOI: 10.1007/s43670-021-00013-3

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Memoryless scalar quantization (MSQ) is a common technique to quantize generalized linear samples of signals. The non-linear nature of quantization makes the analysis of the corresponding approximation error challenging, often resulting in the use of a simplifying assumption, called the “white noise hypothesis” (WNH) that is useful, yet also known to be not rigorous and, at least in certain cases, not valid. We obtain rigorous reconstruction error estimates without relying on the WNH in the setting where generalized samples of a fixed deterministic signal are obtained using (the analysis matrix of) a random frame with independent isotropic sub-Gaussian rows; quantized using MSQ; and reconstructed linearly. We establish non-asymptotic error bounds that explain the observed error decay rate as the number of measurements grows, which in the special case of Gaussian random frames show that the error approaches a (small) non-zero constant lower bound. We also extend our methodology to dithered and noisy settings as well as the compressed sensing setting where we obtain rigorous error bounds that agree with empirical observations, again, without resorting to the WNH. PubDate: 2021-06-15 DOI: 10.1007/s43670-021-00012-4