Abstract: Niels Neirynck, Willy Govaerts, Yuri A. Kuznetsov, Hil G. E. Meijer

We describe new methods for initializing the computation of homoclinic orbits for maps in a state space with arbitrary dimension and for detecting their bifurcations. The initialization methods build on known and improved methods for computing one-dimensional stable and unstable manifolds. The methods are implemented in MatContM, a freely available toolbox in Matlab for numerical analysis of bifurcations of fixed points, periodic orbits, and connecting orbits of smooth nonlinear maps. The bifurcation analysis of homoclinic connections under variation of one parameter is based on continuation methods and allows us to detect all known codimension 1 and 2 bifurcations in three-dimensional (3D) maps, including tangencies and generalized tangencies. PubDate: Wed, 03 Jan 2018 00:00:00 GMT

The Sparse Matrix-Vector Multiplication (SpMV) kernel ranks among the most important and thoroughly studied linear algebra operations, as it lies at the heart of many iterative methods for the solution of sparse linear systems, and often constitutes a severe performance bottleneck. Its optimization, which is intimately associated with the data structures used to store the sparse matrix, has always been of particular interest to the applied mathematics and computer science communities and has attracted further attention since the advent of multicore architectures. In this article, we present SparseX, an open source software package for SpMV targeting multicore platforms, that employs the state-of-the-art Compressed Sparse eXtended (CSX) sparse matrix storage format to deliver high efficiency through a highly usable “BLAS-like” interface that requires limited or no tuning. PubDate: Wed, 03 Jan 2018 00:00:00 GMT

Abstract: Javad Doliskani, Pascal Giorgi, Romain Lebreton, Eric Schost

We present an algorithm for simultaneous conversions between a given set of integers and their Residue Number System representations based on linear algebra. We provide a highly optimized implementation of the algorithm that exploits the computational features of modern processors. The main application of our algorithm is matrix multiplication over integers. Our speed-up of the conversions to and from the Residue Number System significantly improves the overall running time of matrix multiplication. PubDate: Wed, 03 Jan 2018 00:00:00 GMT

We present “GEMM-like Tensor–Tensor multiplication” (GETT), a novel approach for dense tensor contractions that mirrors the design of a high-performance general matrix–matrix multiplication (GEMM). The critical insight behind GETT is the identification of three index sets, involved in the tensor contraction, which enable us to systematically reduce an arbitrary tensor contraction to loops around a highly tuned “macro-kernel.” This macro-kernel operates on suitably prepared (“packed”) sub-tensors that reside in a specified level of the cache hierarchy. In contrast to previous approaches to tensor contractions, GETT exhibits desirable features such as unit-stride memory accesses, cache-awareness, as well as full vectorization, without requiring auxiliary memory. PubDate: Wed, 03 Jan 2018 00:00:00 GMT

Abstract: Peter Sanders, Sebastian Lamm, Lorenz Hübschle-Schneider, Emanuel Schrade, Carsten Dachsbacher

We consider the problem of sampling n numbers from the range { 1,&ldots; ,N} without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and leads to a parallel algorithm running in expected time O(n/p+log p) on p processors, i.e., scales to massively parallel machines even for moderate values of n. The amount of communication between the processors is very small (at most O(log p)) and independent of the sample size. PubDate: Wed, 03 Jan 2018 00:00:00 GMT

CPUs and operating systems are moving from 32 to 64 bits, and hence it is important to have good pseudorandom number generators designed to fully exploit these word lengths. However, existing 64-bit very long period generators based on linear recurrences modulo 2 are not completely optimized in terms of the equidistribution properties. Here, we develop 64-bit maximally equidistributed pseudorandom number generators that are optimal in this respect and have speeds equivalent to 64-bit Mersenne Twisters. We provide a table of specific parameters with period lengths from 2607-1 to 244497-1. PubDate: Wed, 03 Jan 2018 00:00:00 GMT

Most numerical ODE solvers require problems to be written as systems of first-order differential equations. This normally requires the user to rewrite higher-order differential equations as coupled first-order systems. Here, we introduce the treeVar class, written in object-oriented Matlab, which is capable of algorithmically reformulating higher-order ODEs to equivalent systems of first-order equations. This allows users to specify problems using a more natural syntax and saves them from having to manually derive the first-order reformulation. The technique works by using operator overloading to build up syntax trees of expressions as mathematical programs are evaluated. It then applies a set of rules to the resulting trees to obtain the first-order reformulation, which is returned as another program. PubDate: Wed, 03 Jan 2018 00:00:00 GMT

We propose a set of new Fortran reference implementations, based on an algorithm proposed by Kahan, for the Level 1 BLAS routines *NRM2 that compute the Euclidean norm of a real or complex input vector. The principal advantage of these routines over the current offerings is that, rather than losing accuracy as the length of the vector increases, they generate results that are accurate to almost machine precision for vectors of length N < Nmax where Nmax depends upon the precision of the floating point arithmetic being used. In addition, we make use of intrinsic modules, introduced in the latest Fortran standards, to detect occurrences of non-finite numbers in the input data and return suitable values as well as setting IEEE floating point status flags as appropriate. PubDate: Mon, 18 Dec 2017 00:00:00 GMT