Abstract: Rainer Agelek, Michael Anderson, Wolfgang Bangerth, William L. Barth

Finite element codes typically use data structures that represent unstructured meshes as collections of cells, faces, and edges, each of which require associated coordinate systems. One then needs to store how the coordinate system of each edge relates to that of neighboring cells. However, we can simplify data structures and algorithms if we can a priori orient coordinate systems in such a way that the coordinate systems on the edges follow uniquely from those on the cells by rule. Such rules require that every unstructured mesh allow the assignment of directions to edges that satisfy the convention in adjacent cells. We show that the convention chosen for unstructured quadrilateral meshes in the deal.II library always allows to orient meshes. PubDate: Mon, 24 Jul 2017 00:00:00 GMT

In this article, we explore the implementation of complex matrix multiplication. We begin by briefly identifying various challenges associated with the conventional approach, which calls for a carefully written kernel that implements complex arithmetic at the lowest possible level (i.e., assembly language). We then set out to develop a method of complex matrix multiplication that avoids the need for complex kernels altogether. This constraint promotes code reuse and portability within libraries such as Basic Linear Algebra Subprograms and BLAS-Like Library Instantiation Software (BLIS) and allows kernel developers to focus their efforts on fewer and simpler kernels. We develop two alternative approaches—one based on the 3m method and one that reflects the classic 4m formulation—each with multiple variants, all of which rely only on real matrix multiplication kernels. PubDate: Mon, 24 Jul 2017 00:00:00 GMT

The square root of a sum of squares is well known to be prone to overflow and underflow. Ad hoc scaling of intermediate results, as has been done in numerical software such as the BLAS and LAPACK, mostly avoids the problem, but it can still occur at extreme values in the range of representable numbers. More careful scaling, as has been implemented in recent versions of the standard algorithms, may come at the expense of performance or clarity. This work reimplements the vector 2-norm and the generation of Givens rotations from the Level 1 BLAS to improve their performance and design. PubDate: Mon, 24 Jul 2017 00:00:00 GMT

The 2Sum and Fast2Sum algorithms are important building blocks in numerical computing. They are used (implicitely or explicitely) in many compensated algorithms (such as compensated summation or compensated polynomial evaluation). They are also used for manipulating floating-point expansions. We show that these algorithms are much more robust than it is usually believed: The returned result makes sense even when the rounding function is not round-to-nearest, and they are almost immune to overflow. PubDate: Fri, 14 Jul 2017 00:00:00 GMT

Abstract: Máté Szőke, Tamás István Józsa, Ádám Koleszár, Irene Moulitsas, László Könözsy

The Unified Parallel C (UPC) language from the Partitioned Global Address Space (PGAS) family unifies the advantages of shared and local memory spaces and offers a relatively straightforward code parallelisation with the Central Processing Unit (CPU). In contrast, the Computer Unified Device Architecture (CUDA) development kit gives a tool to make use of the Graphics Processing Unit (GPU). We provide a detailed comparison between these novel techniques through the parallelisation of a two-dimensional lattice Boltzmann method based fluid flow solver. Our comparison between the CUDA and UPC parallelisation takes into account the required conceptual effort, the performance gain, and the limitations of the approaches from the application oriented developers’ point of view. PubDate: Fri, 14 Jul 2017 00:00:00 GMT

The T-matrix (TMAT) of a scatterer fully describes the way the scatterer interacts with incident fields and scatters waves, and is therefore used extensively in several science and engineering applications. The T-matrix is independent of several input parameters in a wave propagation model and hence the offline computation of the T-matrix provides an efficient reduced order model (ROM) framework for performing online scattering simulations for various choices of the input parameters. The authors developed and mathematically analyzed a numerically stable formulation for computing the T-matrix (J. Comput. Appl. Math. 234 (2010), 1702--1709). The TMATROM software package provides an object-oriented implementation of the numerically stable formulation and can be used in conjunction with the user’s preferred forward solver for the two-dimensional Helmholtz model. PubDate: Fri, 14 Jul 2017 00:00:00 GMT

Abstract: Daniel A. Brake, Daniel J. Bates, Wenrui Hao, Jonathan D. Hauenstein, Andrew J. Sommese, Charles W. Wampler

Bertini_real is a compiled command line program for numerically decomposing the real portion of a positive-dimensional complex component of an algebraic set. The software uses homotopy continuation to solve a series of systems via regeneration from a witness set to compute a cell decomposition. The implemented decomposition algorithms are similar to the well-known cylindrical algebraic decomposition (CAD) first established by Collins in that they produce a set of connected cells. In contrast to the CAD, Bertini_real produces cells with midpoints connected to boundary points by homotopies, which can easily be numerically tracked. Furthermore, the implemented decomposition for surfaces naturally yields a triangulation. PubDate: Fri, 14 Jul 2017 00:00:00 GMT

A new software for computing the singular value decomposition (SVD) of real or complex matrices is proposed. The method implemented in the code xGESVDQ is essentially the QR SVD algorithm available as xGESVD in LAPACK. The novelty is an extra step, the QR factorization with column (or complete row and column) pivoting, also already available in LAPACK as xGEQP3. For experts in matrix computations, the combination of the QR factorization and an SVD computation routine is not new. However, what seems to be new and important for applications is that the resulting procedure is numerically superior to xGESVD and that it is capable of reaching the accuracy of the Jacobi SVD. PubDate: Fri, 14 Jul 2017 00:00:00 GMT

A direct-search derivative-free Matlab optimizer for bound-constrained problems is described, whose remarkable features are its ability to handle a mix of continuous and discrete variables, a versatile interface as well as a novel self-training option. Its performance compares favorably with that of NOMAD (Nonsmooth Optimization by Mesh Adaptive Direct Search), a well-known derivative-free optimization package. It is also applicable to multilevel equilibrium- or constrained-type problems. Its easy-to-use interface provides a number of user-oriented features, such as checkpointing and restart, variable scaling, and early termination tools. PubDate: Wed, 28 Jun 2017 00:00:00 GMT

SYM-ILDL is a numerical software package that computes incomplete LDLT (ILDL) factorizations of symmetric indefinite and real skew-symmetric matrices. The core of the algorithm is a Crout variant of incomplete LU (ILU), originally introduced and implemented for symmetric matrices by Li and Saad [2005]. Our code is economical in terms of storage, and it deals with real skew-symmetric matrices as well as symmetric ones. The package is written in C++ and is templated, is open source, and includes a Matlab™ interface. The code includes built-in RCM and AMD reordering, two equilibration strategies, threshold Bunch-Kaufman pivoting, and rook pivoting, as well as a wrapper to MC64, a popular matching-based equilibration and reordering algorithm. PubDate: Tue, 11 Apr 2017 00:00:00 GMT

We introduce a family of implementations of low-order, additive, geometric multilevel solvers for systems of Helmholtz equations arising from Schrödinger equations. Both grid spacing and arithmetics may comprise complex numbers, and we thus can apply complex scaling to the indefinite Helmholtz operator. Our implementations are based on the notion of a spacetree and work exclusively with a finite number of precomputed local element matrices. They are globally matrix-free. Combining various relaxation factors with two grid transfer operators allows us to switch from additive multigrid over a hierarchical basis method into a Bramble-Pasciak-Xu (BPX)-type solver, with several multiscale smoothing variants within one code base. PubDate: Mon, 27 Mar 2017 00:00:00 GMT

Abstract: Fabio Luporini, David A. Ham, Paul H. J. Kelly

We present an algorithm for the optimization of a class of finite-element integration loop nests. This algorithm, which exploits fundamental mathematical properties of finite-element operators, is proven to achieve a locally optimal operation count. In specified circumstances the optimum achieved is global. Extensive numerical experiments demonstrate significant performance improvements over the state of the art in finite-element code generation in almost all cases. This validates the effectiveness of the algorithm presented here and illustrates its limitations. PubDate: Mon, 27 Mar 2017 00:00:00 GMT