Authors:Alberto Policriti; Nicola Prezza Pages: 1986 - 2011 Abstract: Computing the LZ77 factorization is a fundamental task in text compression and indexing, being the size z of this compressed representation closely related to the self-repetitiveness of the text. A long-standing problem is to compute LZ77 using small working space. Considering that \(\mathcal {O}(z)\) words of space can be significantly (up to exponentially) smaller than the size n of the input text, even succinct and entropy-compressed solutions are often unduly memory demanding. In this work we focus on an important measure of text repetitiveness: the number \(r\) of equal-letter runs in the Burrows–Wheeler transform of the reversed input text. As z, the measure \(r\) is closely related to the number of repetitions in the text and can be exponentially smaller than n. We describe two algorithms computing LZ77 in \(\mathcal {O}(r\log n)\) bits of working space and \(\mathcal {O}(n\log r)\) time. Roughly speaking, our algorithms store a constant number of memory words per BWT run to keep track of first-last run-positions and a suitable indexing mechanism to sample the runs of the BWT (instead of its positions). Important consequences of our results include (i) the possibility to convert from RLBWT- to LZ77-based compressed formats without first decompressing the text, and (ii) the existence of asymptotically-optimal construction algorithms for repetition-aware self-indexes based on these compression techniques. We finally describe an implementation of our solutions and present extensive experiments on highly repetitive datasets. Our algorithms use a working space as small as 1% of the dataset size and are two to three orders of magnitude more space-efficient (albeit slower) than existing solutions based, respectively, on entropy compression and suffix arrays. PubDate: 2018-07-01 DOI: 10.1007/s00453-017-0327-z Issue No:Vol. 80, No. 7 (2018)

Authors:Julian Arz; Johannes Fischer Pages: 2012 - 2047 Abstract: String dictionaries store a collection \(\left( s_i\right) _{0\le i < m}\) of m variable-length keys (strings) over an alphabet \(\varSigma \) and support the operations lookup (given a string \(s\in \varSigma ^*\) , decide if \(s_i=s\) for some i, and return this i) and access (given an integer \(0\le i < m\) , return the string \(s_i\) ). We show how to modify the Lempel–Ziv-78 data compression algorithm to store the strings space-efficiently and support the operations lookup and access in optimal time. Our approach is validated experimentally on dictionaries of up to 1.5 GB of uncompressed text. We achieve compression ratios often outperforming the existing alternatives, especially on dictionaries containing many repeated substrings. Our query times remain competitive. PubDate: 2018-07-01 DOI: 10.1007/s00453-017-0348-7 Issue No:Vol. 80, No. 7 (2018)

Authors:Johannes Fischer; Tomohiro I; Dominik Köppl; Kunihiko Sadakane Pages: 2048 - 2081 Abstract: We show that both the Lempel–Ziv-77 and the Lempel–Ziv-78 factorization of a text of length n on an integer alphabet of size \(\sigma \) can be computed in \(\mathop {}\mathopen {}\mathcal {O}\mathopen {}\left( n\right) \) time with either \(\mathop {}\mathopen {}\mathcal {O}\mathopen {}\left( n \lg \sigma \right) \) bits of working space, or \((1+\epsilon ) n \lg n + \mathop {}\mathopen {}\mathcal {O}\mathopen {}\left( n\right) \) bits (for a constant \(\epsilon >0\) ) of working space (including the space for the output, but not the text). PubDate: 2018-07-01 DOI: 10.1007/s00453-017-0333-1 Issue No:Vol. 80, No. 7 (2018)

Authors:Markus Lohrey; Sebastian Maneth; Carl Philipp Reh Pages: 2082 - 2105 Abstract: A linear space data structure for grammar-compressed trees is presented which allows to carry out tree traversal operations and subtree equality checks in constant time. A traversal step consists of moving to the parent or to the ith child of a node. PubDate: 2018-07-01 DOI: 10.1007/s00453-017-0331-3 Issue No:Vol. 80, No. 7 (2018)

Authors:Shahin Kamali Pages: 2106 - 2131 Abstract: The notion of clique-width for graphs offers many research topics and has received considerable attention in the past decade. A graph has clique-width k if it can be represented as an algebraic expression on k labels associated with its vertices. Many computationally hard problems can be solved in polynomial time for graphs of bounded clique-width. Interestingly also, many graph families have bounded clique-width. In this paper, we consider the problem of preprocessing a graph of size n and clique-width k to build space-efficient data structures that support basic graph navigation queries. First, by way of a counting argument, which is of interest in its own right, we prove the space requirement of any representation is \(\varOmega (kn)\) . Then we present a navigation oracle which answers adjacency query in constant time and neighborhood queries in constant time per neighbor. This oracle uses O(kn) space (i.e., O(kn) bits). We also present a degree query which reports the degree of each given vertex in \(O(k\log ^*(n))\) time using \(O(kn \log ^*(n))\) bits. PubDate: 2018-07-01 DOI: 10.1007/s00453-017-0365-6 Issue No:Vol. 80, No. 7 (2018)

Authors:Yasushi Kawase; Kazuhisa Makino; Kento Seimi Pages: 2134 - 2159 Abstract: In this paper, we introduce maximum composition ordering problems. The input is n real functions \(f_1,\dots ,f_n:\mathbb {R}\rightarrow \mathbb {R}\) and a constant \(c\in \mathbb {R}\) . We consider two settings: total and partial compositions. The maximum total composition ordering problem is to compute a permutation \(\sigma :[n]\rightarrow [n]\) which maximizes \(f_{\sigma (n)}\circ f_{\sigma (n-1)}\circ \dots \circ f_{\sigma (1)}(c)\) , where \([n]=\{1,\dots ,n\}\) . The maximum partial composition ordering problem is to compute a permutation \(\sigma :[n]\rightarrow [n]\) and a nonnegative integer \(k~(0\le k\le n)\) which maximize \(f_{\sigma (k)}\circ f_{\sigma (k-1)}\circ \dots \circ f_{\sigma (1)}(c)\) . We propose \(\mathrm {O}(n\log n)\) time algorithms for the maximum total and partial composition ordering problems for monotone linear functions \(f_i\) , which generalize linear deterioration and shortening models for the time-dependent scheduling problem. We also show that the maximum total composition ordering problem can be solved in polynomial time if \(f_i\) is of the form \(\max \{a_ix+b_i,d_i,x\}\) for some constants \(a_i\,(\ge 0)\) , \(b_i\) and \(d_i\) . As a corollary, we show that the two-valued free-order secretary problem can be solved in polynomial time. We finally prove that there exists no constant-factor approximation algorithm for the problems, even if \(f_i\) ’s are monotone, piecewise linear functions with at most two pieces, unless P \(=\) NP. PubDate: 2018-07-01 DOI: 10.1007/s00453-017-0397-y Issue No:Vol. 80, No. 7 (2018)

Authors:Hans L. Bodlaender; Hirotaka Ono; Yota Otachi Pages: 2160 - 2180 Abstract: The problem Max W-Light (Max W-Heavy) for an undirected graph is to assign a direction to each edge so that the number of vertices of outdegree at most W (resp. at least W) is maximized. It is known that these problems are NP-hard even for fixed W. For example, Max 0-Light is equivalent to the problem of finding a maximum independent set. In this paper, we show that for any fixed constant W, Max W-Heavy can be solved in linear time for hereditary graph classes for which treewidth is bounded by a function of degeneracy. We show that such graph classes include chordal graphs, circular-arc graphs, d-trapezoid graphs, chordal bipartite graphs, and graphs of bounded clique-width. To have a polynomial-time algorithm for Max W-Light, we need an additional condition of a polynomial upper bound on the number of potential maximal cliques to apply the metatheorem by Fomin et al. (SIAM J Comput 44:54–87, 2015). The aforementioned graph classes, except bounded clique-width graphs, satisfy such a condition. For graphs of bounded clique-width, we present a dynamic programming approach not using the metatheorem to show that it is actually polynomial-time solvable for this graph class too. We also study the parameterized complexity of the problems and show some tractability and intractability results. PubDate: 2018-07-01 DOI: 10.1007/s00453-017-0399-9 Issue No:Vol. 80, No. 7 (2018)

Authors:T.-H. Hubert Chan; Zhihao Gavin Tang; Xiaowei Wu Pages: 2181 - 2200 Abstract: We study the max–min fair allocation problem in which a set of m indivisible items are to be distributed among n agents such that the minimum utility among all agents is maximized. In the restricted setting, the utility of each item j on agent i is either 0 or some non-negative weight \(w_j\) . For this setting, Asadpour et al. (ACM Trans Algorithms 8(3):24, 2012) showed that a certain configuration-LP can be used to estimate the optimal value to within a factor of \(4+\delta \) , for any \(\delta >0\) , which was recently extended by Annamalai et al. (in: Indyk (ed) Proceedings of the twenty-sixth annual ACMSIAM symposium on discrete algorithms, SODA 2015, San Diego, CA, USA, January 4–6, 2015) to give a polynomial-time 13-approximation algorithm for the problem. For hardness results, Bezáková and Dani (SIGecom Exch 5(3):11–18, 2005) showed that it is \(\mathsf {NP}\) -hard to approximate the problem within any ratio smaller than 2. In this paper we consider the \((1,\epsilon )\) -restricted max–min fair allocation problem in which each item j is either heavy \((w_j = 1)\) or light \((w_j = \epsilon )\) , for some parameter \(\epsilon \in (0,1)\) . We show that the \((1,\epsilon )\) -restricted case is also \(\mathsf {NP}\) -hard to approximate within any ratio smaller than 2. Using the configuration-LP, we are able to estimate the optimal value of the problem to within a factor of \(3+\delta \) , for any \(\delta >0\) . Extending this idea, we also obtain a quasi-polynomial time \((3+4\epsilon )\) -approximation algorithm and a polynomial time 9-approximation algorithm. Moreover, we show that as \(\epsilon \) tends to 0, the approximation ratio of our polynomial-time algorithm approaches \(3+2\sqrt{2}\approx 5.83\) . PubDate: 2018-07-01 DOI: 10.1007/s00453-018-0407-8 Issue No:Vol. 80, No. 7 (2018)

Authors:Shahram Ghandeharizadeh; Sandy Irani; Jenny Lam Pages: 2201 - 2220 Abstract: We introduce the subset assignment problem in which items of varying sizes are placed in a set of bins with limited capacity. Items can be replicated and placed in any subset of the bins. Each (item, subset) pair has an associated cost. Not assigning an item to any of the bins is not free in general and can potentially be the most expensive option. The goal is to minimize the total cost of assigning items to subsets without exceeding the bin capacities. The subset assignment problem models the problem of managing a cache composed of banks of memory with varying cost/performance specifications. The ability to replicate a data item in more than one memory bank can benefit the overall performance of the system with a faster recovery time in the event of a memory failure. For this setting, the number n of data objects (items) is very large and the number d of memory banks (bins) is a small constant (on the order of 3 or 4). Therefore, the goal is to determine an optimal assignment in time that minimizes dependence on n. The integral version of this problem is NP-hard since it is a generalization of the knapsack problem. We focus on an efficient solution to the LP relaxation as the number of fractionally assigned items will be at most d. If the data objects are small with respect to the size of the memory banks, the effect of excluding the fractionally assigned data items from the cache will be small. We give an algorithm that solves the LP relaxation and runs in time \(O(\left( {\begin{array}{c}3^d\\ d+1\end{array}}\right) {\text {poly}}(d) n \log (n) \log (nC) \log (Z))\) , where Z is the maximum item size and C the maximum storage cost. PubDate: 2018-07-01 DOI: 10.1007/s00453-017-0403-4 Issue No:Vol. 80, No. 7 (2018)

Authors:Mathias Weller; Annie Chateau; Clément Dallard; Rodolphe Giroudeau Pages: 1771 - 1803 Abstract: This paper is devoted to new results about the scaffolding problem, an integral problem of genome inference in bioinformatics. The problem consists in finding a collection of disjoint cycles and paths covering a particular graph called the “scaffold graph”. We examine the difficulty and the approximability of the scaffolding problem in special classes of graphs, either close to trees, or very dense. We propose negative and positive results, exploring the frontier between difficulty and tractability of computing and/or approximating a solution to the problem. Also, we explore a new direction through related problems consisting in finding a family of edges having a strong effect on solution weight. PubDate: 2018-06-01 DOI: 10.1007/s00453-018-0405-x Issue No:Vol. 80, No. 6 (2018)

Authors:Gennaro Cordasco; Luisa Gargano; Marco Mecchia; Adele A. Rescigno; Ugo Vaccaro Pages: 1804 - 1833 Abstract: Given a network represented by a graph \(G=(V,E)\) , we consider a dynamical process of influence diffusion in G that evolves as follows: Initially only the nodes of a given \(S\subseteq V\) are influenced; subsequently, at each round, the set of influenced nodes is augmented by all the nodes in the network that have a sufficiently large number of already influenced neighbors. The question is to determine a small subset of nodes S (a target set) that can influence the whole network. This is a widely studied problem that abstracts many phenomena in the social, economic, biological, and physical sciences. It is known that the above optimization problem is hard to approximate within a factor of \(2^{\log ^{1-\epsilon } V }\) , for any \(\epsilon >0\) . In this paper, we present a fast and surprisingly simple algorithm that exhibits the following features: (1) when applied to trees, cycles, or complete graphs, it always produces an optimal solution (i.e, a minimum size target set); (2) when applied to arbitrary networks, it always produces a solution of cardinality which improves on previously known upper bounds; (3) when applied to real-life networks, it always produces solutions that substantially outperform the ones obtained by previously published algorithms (for which no proof of optimality or performance guarantee is known in any class of graphs). PubDate: 2018-06-01 DOI: 10.1007/s00453-017-0390-5 Issue No:Vol. 80, No. 6 (2018)

Authors:Yuichi Asahiro; Yuya Doi; Eiji Miyano; Kazuaki Samizo; Hirotaka Shimizu Pages: 1834 - 1856 Abstract: In this paper we study the (in)approximability of two distance-based relaxed variants of the maximum clique problem (Max Clique), named Max d-Clique and Max d-Club: A d-clique in a graph \(G = (V, E)\) is a subset \(S\subseteq V\) of vertices such that for every pair of vertices \(u, v\in S\) , the distance between u and v is at most d in G. A d-club in a graph \(G = (V, E)\) is a subset \(S'\subseteq V\) of vertices that induces a subgraph of G of diameter at most d. Given a graph G with n vertices, the goal of Max d-Clique (Max d-Club, resp.) is to find a d-clique (d-club, resp.) of maximum cardinality in G. Since Max 1-Clique and Max 1-Club are identical to Max Clique, the inapproximabilty for Max Clique shown by Zuckerman in 2007 is transferred to them. Namely, Max 1-Clique and Max 1-Club cannot be efficiently approximated within a factor of \(n^{1-\varepsilon }\) for any \(\varepsilon > 0\) unless \(\mathcal{P} = \mathcal{NP}\) . Also, in 2002, Marin \(\breve{\mathrm{c}}\) ek and Mohar showed that it is \(\mathcal{NP}\) -hard to approximate Max d-Club to within a factor of \(n^{1/3-\varepsilon }\) for any \(\varepsilon >0\) and any fixed \(d\ge 2\) . In this paper, we strengthen the hardness result; we prove that, for any \(\varepsilon > 0\) and any fixed \(d\ge 2\) , it is \(\mathcal{NP}\) -hard to approximate Max d-Club to within a factor of \(n^{1/2-\varepsilon }\) . Then, we design a polynomial-time algorithm which achieves an optimal approximation ratio of PubDate: 2018-06-01 DOI: 10.1007/s00453-017-0344-y Issue No:Vol. 80, No. 6 (2018)

Authors:Jessica Enright; Kitty Meeks Pages: 1857 - 1889 Abstract: Motivated by applications in network epidemiology, we consider the problem of determining whether it is possible to delete at most k edges from a given input graph (of small treewidth) so that the resulting graph avoids a set \(\mathcal {F}\) of forbidden subgraphs; of particular interest is the problem of determining whether it is possible to delete at most k edges so that the resulting graph has no connected component of more than h vertices, as this bounds the worst-case size of an epidemic. While even this special case of the problem is NP-complete in general (even when \(h=3\) ), we provide evidence that many of the real-world networks of interest are likely to have small treewidth, and we describe an algorithm which solves the general problem in time \(2^{O( \mathcal {F} w^r)}n\) on an input graph having n vertices and whose treewidth is bounded by a fixed constant w, if each of the subgraphs we wish to avoid has at most r vertices. For the special case in which we wish only to ensure that no component has more than h vertices, we improve on this to give an algorithm running in time \(O((wh)^{2w}n)\) , which we have implemented and tested on real datasets based on cattle movements. PubDate: 2018-06-01 DOI: 10.1007/s00453-017-0311-7 Issue No:Vol. 80, No. 6 (2018)

Authors:Cristina Bazgan; Janka Chlebikova; Thomas Pontoizeau Pages: 1890 - 1908 Abstract: We investigate the structural and algorithmic properties of 2-community structures in graphs introduced recently by Olsen (Math Soc Sci 66(3):331–336, 2013). A 2-community structure is a partition of a vertex set into two parts such that for each vertex the numbers of neighbours in/outside its own part and the sizes of the parts are correlated. We show that some well studied graph classes as graphs of maximum degree 3, minimum degree at least \( V -3\) , trees and also others, have always a 2-community structure. Furthermore, a 2-community structure can be found in polynomial time in all these classes, even with additional request of connectivity in both parts. We introduce a concept of a weak 2-community and prove that in general graphs it is NP-complete to find a balanced weak 2-community structure with or without request for connectivity in both parts. On the other hand, we present a polynomial-time algorithm to solve the problem (without the condition for connectivity of parts) in graphs of degree at most 3. PubDate: 2018-06-01 DOI: 10.1007/s00453-017-0283-7 Issue No:Vol. 80, No. 6 (2018)

Authors:David Furcy; Scott M. Summers Pages: 1909 - 1963 Abstract: Tile self-assembly in which tiles may bind in a non-cooperative fashion is often referred to as “temperature 1 self-assembly” or simply “non-cooperative self-assembly”. In this type of self-assembly, a tile may non-cooperatively bind to an assembly via (at least) one of its sides, unlike in cooperative self-assembly, in which some tiles may be required to bind on two or more sides. Cooperative self-assembly leads to highly non-trivial theoretical behavior but two-dimensional non-cooperative self-assembly is conjectured to be only capable of producing highly-regular shapes and patterns, which, in general, cannot be interpreted as complex computation. Remarkably, Cook et al. (Temperature 1 self-assembly: deterministic assembly in 3D and probabilistic assembly in 2D. In: Proceedings of the 22nd Annual ACM–SIAM Symposium on Discrete Algorithms 2011) showed that three-dimensional non-cooperative self-assembly is essentially as powerful as cooperative self-assembly, with respect to Turing universality and self-assembly of \(N \times N\) squares. In this paper, we consider the non-cooperative self-assembly of just-barely 3D shapes. Working in a three-dimensional variant of Winfree’s abstract Tile Assembly Model, we show that, for an arbitrary finite, connected shape \(X \subset {\mathbb {Z}}^2\) , there is a tile set that uniquely self-assembles at temperature 1 into a 3D representation of a scaled-up version of X with optimal program-size complexity, where the “program-size complexity”, also known as “tile complexity”, of a shape is the minimum number of tile types required to uniquely self-assemble it. Moreover, our construction is “just barely” 3D in the sense that it only places tiles in the \(z = 0\) and \(z = 1\) planes. Our result is essentially a just-barely 3D temperature 1 simulation of a similar 2D temperature 2 result by Soloveichik and Winfree (SIAM J Comput 36(6):1544–1569, 2007). PubDate: 2018-06-01 DOI: 10.1007/s00453-016-0260-6 Issue No:Vol. 80, No. 6 (2018)

Authors:Minghui Jiang Pages: 1964 - 1982 Abstract: For any \(k \ge 2\) , deciding whether the linear arboricity, star arboricity, caterpillar arboricity, spider arboricity, track number, unit track number, and subchromatic index, respectively, of a bipartite graph are at most k are all NP-complete. PubDate: 2018-06-01 DOI: 10.1007/s00453-016-0237-5 Issue No:Vol. 80, No. 6 (2018)