To exploit both memory locality and the full performance potential of highly tuned kernels, dense linear algebra libraries, such as linear algebra package (LAPACK), commonly implement operations as blocked algorithms. However, to achieve near-optimal performance with such algorithms, significant tuning is required. In contrast, recursive algorithms are virtually tuning free and attain similar performance. In this article, we first analyze and compare blocked and recursive algorithms in terms of performance and then introduce recursive LAPACK (ReLAPACK), an open-source library of recursive algorithms to seamlessly replace many of LAPACK’s blocked algorithms. PubDate: Mon, 18 Sep 2017 00:00:00 GMT

A toolbox called ADiGator is described for algorithmically differentiating mathematical functions in MATLAB. ADiGator performs source transformation via operator overloading using forward mode algorithmic differentiation and produces a file that can be evaluated to obtain the derivative of the original function at a numeric value of the input. A convenient by-product of the file generation is the sparsity pattern of the derivative function. Moreover, because both the input and output to the algorithm are source codes, the algorithm may be applied recursively to generate derivatives of any order. A key component of the algorithm is its ability to statically exploit derivative sparsity at the MATLAB operation level to improve runtime performance. PubDate: Tue, 29 Aug 2017 00:00:00 GMT

We present a new simple algorithm for efficient, and relatively accurate computation of the Faddeyeva function w(z). The algorithm carefully exploits previous approximations by Hui et al. (1978) and Humlíček (1982) along with asymptotic expressions from Laplace continued fractions. Over a wide and fine grid of the complex argument, z = x + iy, numerical results from the present approximation show a maximum relative error less than 4.0 × 10−5 for both real and imaginary parts of w while running in a relatively shorter execution time than other competitive techniques. PubDate: Tue, 29 Aug 2017 00:00:00 GMT

Abstract: Jonathan Hogg, Jennifer Scott, Sue Thorne

Sparse symmetric indefinite problems arise in a large number of important application areas; they are often solved through the use of an LDLT factorization via a sparse direct solver. While for many problems prescaling the system matrix A is sufficient to maintain stability of the factorization, for a small but important fraction of problems numerical pivoting is required. Pivoting often incurs a significant overhead, and consequently, a number of techniques have been proposed to try and limit the need for pivoting. In particular, numerically aware ordering algorithms may be used, that is, orderings that depend not only on the sparsity pattern of A but also on the values of its (scaled) entries. PubDate: Wed, 16 Aug 2017 00:00:00 GMT

Implicitly described domains are a well-established tool in the simulation of time-dependent problems, for example, using level-set methods. To solve partial differential equations on such domains, a range of numerical methods was developed, for example, the Immersed Boundary method, the Unfitted Finite Element or Unfitted Discontinuous Galerkin methods, and the eXtended or Generalised Finite Element methods, just to name a few. Many of these methods involve integration over cut-cells or their boundaries, as they are described by sub-domains of the original level-set mesh. We present a new algorithm to geometrically evaluate the integrals over domains described by a first-order, conforming level-set function. PubDate: Wed, 16 Aug 2017 00:00:00 GMT

Abstract: Paul Springer, Jeff R. Hammond, Paolo Bientinesi

We present Tensor Transpose Compiler (TTC), an open-source parallel compiler for multidimensional tensor transpositions. To generate high-performance C++ code, TTC explores a number of optimizations, including software prefetching, blocking, loop-reordering, and explicit vectorization. To evaluate the performance of multidimensional transpositions across a range of possible use-cases, we also release a benchmark covering arbitrary transpositions of up to six dimensions. Performance results show that the routines generated by TTC achieve close to peak memory bandwidth on both the Intel Haswell and the AMD Steamroller architectures and yield significant performance gains over modern compilers. By implementing a set of pruning heuristics, TTC allows users to limit the number of potential solutions; this option is especially useful when dealing with high-dimensional tensors, as the search space might become prohibitively large. PubDate: Wed, 16 Aug 2017 00:00:00 GMT

Abstract: Sencer Nuri Yeralan, Timothy A. Davis, Wissam M. Sid-Lakhdar, Sanjay Ranka

Sparse matrix factorization involves a mix of regular and irregular computation, which is a particular challenge when trying to obtain high-performance on the highly parallel general-purpose computing cores available on graphics processing units (GPUs). We present a sparse multifrontal QR factorization method that meets this challenge and is significantly faster than a highly optimized method on a multicore CPU. Our method factorizes many frontal matrices in parallel and keeps all the data transmitted between frontal matrices on the GPU. PubDate: Wed, 16 Aug 2017 00:00:00 GMT

In order to solve a differential problem, the Laplace Transform method, when applicable, replaces the problem with a simpler one; the solution is obtained by solving the new problem and then by computing the inverse Laplace Transform of this function. In a numerical context, since the solution of the transformed problem consists of a sequence of Laplace Transform samples, most of the software for the numerical inversion cannot be used since the transform, among parameters, must be passed as a function. To fill this gap, we present Talbot Suite DE, a C software collection for Laplace Transform inversions, specifically designed for these problems and based on Talbot’s method. PubDate: Wed, 16 Aug 2017 00:00:00 GMT

A method to compute explicit solutions of homogeneous triangular systems of first-order linear initial-value ordinary differential equations with constant coefficients is described. It is suitable for the limited case of well separated eigenvalues, or for multiple zero eigenvalues provided the entire column corresponding to a zero eigenvalue is zero. The solution for the case of constant inhomogeneity is described. The method requires only the computation of a constant matrix using a simple recurrence. Computing the solutions of the system from that matrix, for values of the independent variable, requires one to exponentiate only the diagonal of a matrix. It is not necessary to compute the exponential of a general triangular matrix. PubDate: Wed, 16 Aug 2017 00:00:00 GMT

The detection of heterogeneity among objects (products, treatments, medical studies) assessed on a series of blocks (consumers, patients, methods, pathologists) is critical in numerous areas such as clinical research, cosmetic studies, or survey analysis. The Cochran’s Q test is the most widely used test for identifying heterogeneity on binary data (success vs. failure, cure vs. not cure, 1 vs. 0, etc.) . For a large number of blocks, the Q distribution can be approximated by a χ2 distribution. Unfortunately, this does not hold for limited sample sizes or sparse tables. In such situations, one has to either run Monte Carlo simulations or compute the exact Q distribution to obtain an accurate and reliable result. PubDate: Wed, 16 Aug 2017 00:00:00 GMT