Authors:J. Ian Munro; Gonzalo Navarro; Jesper Sindahl Nielsen; Rahul Shah; Sharma V. Thankachan Pages: 379 - 393 Abstract: Let \(\mathcal {D} = \{\mathsf {T}_1,\mathsf {T}_2, \ldots ,\mathsf {T}_D\}\) be a collection of D string documents of n characters in total, that are drawn from an alphabet set \(\varSigma =[\sigma ]\) . The top-k document retrieval problem is to preprocess \(\mathcal{D}\) into a data structure that, given a query \((P[1\ldots p],k)\) , can return the k documents of \(\mathcal{D}\) most relevant to the pattern P. The relevance is captured using a predefined ranking function, which depends on the set of occurrences of P in \(\mathsf {T}_d\) . For example, it can be the term frequency (i.e., the number of occurrences of P in \(\mathsf {T}_d\) ), or it can be the term proximity (i.e., the distance between the closest pair of occurrences of P in \(\mathsf {T}_d\) ), or a pattern-independent importance score of \(\mathsf {T}_d\) such as PageRank. Linear space and optimal query time solutions already exist for the general top-k document retrieval problem. Compressed and compact space solutions are also known, but only for a few ranking functions such as term frequency and importance. However, space efficient data structures for term proximity based retrieval have been evasive. In this paper we present the first sub-linear space data structure for this relevance function, which uses only o(n) bits on top of any compressed suffix array of \(\mathcal{D}\) and solves queries in \(O((p+k) {{\mathrm{polylog}}}\,\,n)\) time. We also show that scores that consist of a weighted combination of term proximity, term frequency, and document importance, can be handled using twice the space required to represent the text collection. PubDate: 2017-06-01 DOI: 10.1007/s00453-016-0167-2 Issue No:Vol. 78, No. 2 (2017)

Authors:Paola Bonizzoni; Gianluca Della Vedova; Yuri Pirola; Marco Previtali; Raffaella Rizzi Pages: 394 - 424 Abstract: Some recent results (Bauer et al. in Algorithms in bioinformatics, Springer, Berlin, pp 326–337, 2012; Cox et al. in Algorithms in bioinformatics, Springer, Berlin, pp. 214–224, 2012; Rosone and Sciortino in The nature of computation. Logic, algorithms, applications, Springer, Berlin, pp 353–364, 2013) have introduced external-memory algorithms to compute self-indexes of a set of strings, mainly via computing the Burrows–Wheeler transform of the input strings. The motivations for those results stem from Bioinformatics, where a large number of short strings (called reads) are routinely produced and analyzed. In that field, a fundamental problem is to assemble a genome from a large set of much shorter samples extracted from the unknown genome. The approaches that are currently used to tackle this problem are memory-intensive. This fact does not bode well with the ongoing increase in the availability of genomic data. A data structure that is used in genome assembly is the string graph, where vertices correspond to samples and arcs represent two overlapping samples. In this paper we address an open problem of Simpson and Durbin (Bioinformatics 26(12):i367–i373, 2010): to design an external-memory algorithm to compute the string graph. PubDate: 2017-06-01 DOI: 10.1007/s00453-016-0165-4 Issue No:Vol. 78, No. 2 (2017)

Authors:Allan Borodin; Mark Braverman; Brendan Lucier; Joel Oren Pages: 425 - 452 Abstract: Motivated by applications to word-of-mouth advertising, we consider a game-theoretic scenario in which competing advertisers want to target initial adopters in a social network. Each advertiser wishes to maximize the resulting cascade of influence, modeled by a general network diffusion process. However, competition between products may adversely impact the rate of adoption for any given firm. The resulting framework gives rise to complex preferences that depend on the specifics of the stochastic diffusion model and the network topology. We study this model from the perspective of a central mechanism, such as a social networking platform, that can optimize seed placement as a service for the advertisers. We ask: given the reported budgets of the competing firms, how should a mechanism choose seeds to maximize overall efficiency? Beyond the algorithmic problem, competition raises issues of strategic behaviour: rational agents should be incentivized to truthfully report their advertising budget. For a general class of influence spread models, we show that when there are two players, the social welfare can be \(\frac{e}{e-1}\) -approximated by a polynomial-time strategyproof mechanism. Our mechanism uses a dynamic programming procedure to randomize the order in which advertisers are allocated seeds according to a greedy method. For three or more players, we demonstrate that under an additional assumption (satisfied by many existing models of influence spread) there exists a simpler strategyproof \(\frac{e}{e-1}\) -approximation mechanism; notably, this natural greedy mechanism is not necessarily strategyproof when there are only two players. PubDate: 2017-06-01 DOI: 10.1007/s00453-016-0169-0 Issue No:Vol. 78, No. 2 (2017)

Authors:Jelena Marašević; Clifford Stein; Gil Zussman Pages: 521 - 557 Abstract: This paper considers max-min fair rate allocation and routing in energy harvesting networks where fairness is required among both the nodes and the time slots. Unlike most previous work on fairness, we focus on multihop topologies and consider different routing methods. We assume a predictable energy profile and focus on the design of efficient and optimal algorithms that can serve as benchmarks for distributed and approximate algorithms. We first develop an algorithm that obtains a max-min fair rate assignment for any routing that is specified at the input. We then turn to the problem of determining a “good” routing. For time-invariable unsplittable routing, we develop an algorithm that finds routes that maximize the minimum rate assigned to any node in any slot. For fractional routing, we derive a combinatorial algorithm for the time-invariable case with constant rates. We show that the time-variable case is at least as hard as the 2-commodity feasible flow problem and design an FPTAS to combat the high running time. Finally, we show that finding an unsplittable routing or a routing tree that provides lexicographically maximum rate assignment (i.e., the best in the max-min fairness terms) is NP-hard, even for a time horizon of a single slot. Our analysis provides insights into the problem structure and can be applied to other related fairness problems. PubDate: 2017-06-01 DOI: 10.1007/s00453-016-0171-6 Issue No:Vol. 78, No. 2 (2017)

Authors:Benjamin Doerr; Frank Neumann; Andrew M. Sutton Pages: 561 - 586 Abstract: We contribute to the theoretical understanding of randomized search heuristics by investigating their optimization behavior on satisfiable random k-satisfiability instances both in the planted solution model and the uniform model conditional on satisfiability. Denoting the number of variables by n, our main technical result is that the simple ( \(1+1\) ) evolutionary algorithm with high probability finds a satisfying assignment in time \(O(n \log n)\) when the clause-variable density is at least logarithmic. For low density instances, evolutionary algorithms seem to be less effective, and all we can show is a subexponential upper bound on the runtime for densities below \(\frac{1}{k(k-1)}\) . We complement these mathematical results with numerical experiments on a broader density spectrum. They indicate that, indeed, the ( \(1+1\) ) EA is less efficient on lower densities. Our experiments also suggest that the implicit constants hidden in our main runtime guarantee are low. Our main result extends and considerably improves the result obtained by Sutton and Neumann (Lect Notes Comput Sci 8672:942–951, 2014) in terms of runtime, minimum density, and clause length. These improvements are made possible by establishing a close fitness-distance correlation in certain parts of the search space. This approach might be of independent interest and could be useful for other average-case analyses of randomized search heuristics. While the notion of a fitness-distance correlation has been around for a long time, to the best of our knowledge, this is the first time that fitness-distance correlation is explicitly used to rigorously prove a performance statement for an evolutionary algorithm. PubDate: 2017-06-01 DOI: 10.1007/s00453-016-0190-3 Issue No:Vol. 78, No. 2 (2017)

Authors:Tiago Paixão; Jorge Pérez Heredia; Dirk Sudholt; Barbora Trubenová Pages: 681 - 713 Abstract: Evolutionary algorithms (EAs) form a popular optimisation paradigm inspired by natural evolution. In recent years the field of evolutionary computation has developed a rigorous analytical theory to analyse the runtimes of EAs on many illustrative problems. Here we apply this theory to a simple model of natural evolution. In the Strong Selection Weak Mutation (SSWM) evolutionary regime the time between occurrences of new mutations is much longer than the time it takes for a mutated genotype to take over the population. In this situation, the population only contains copies of one genotype and evolution can be modelled as a stochastic process evolving one genotype by means of mutation and selection between the resident and the mutated genotype. The probability of accepting the mutated genotype then depends on the change in fitness. We study this process, SSWM, from an algorithmic perspective, quantifying its expected optimisation time for various parameters and investigating differences to a similar evolutionary algorithm, the well-known (1+1) EA. We show that SSWM can have a moderate advantage over the (1+1) EA at crossing fitness valleys and study an example where SSWM outperforms the (1+1) EA by taking advantage of information on the fitness gradient. PubDate: 2017-06-01 DOI: 10.1007/s00453-016-0212-1 Issue No:Vol. 78, No. 2 (2017)

Authors:Dogan Corus; Jun He; Thomas Jansen; Pietro S. Oliveto; Dirk Sudholt; Christine Zarges Pages: 714 - 740 Abstract: Understanding which function classes are easy and which are hard for a given algorithm is a fundamental question for the analysis and design of bio-inspired search heuristics. A natural starting point is to consider the easiest and hardest functions for an algorithm. For the (1+1) EA using standard bit mutation (SBM) it is well known that OneMax is an easiest function with unique optimum while Trap is a hardest. In this paper we extend the analysis of easiest function classes to the contiguous somatic hypermutation (CHM) operator used in artificial immune systems. We define a function MinBlocks and prove that it is an easiest function for the (1+1) EA using CHM, presenting both a runtime and a fixed budget analysis. Since MinBlocks is, up to a factor of 2, a hardest function for standard bit mutations, we consider the effects of combining both operators into a hybrid algorithm. We rigorously prove that by combining the advantages of k operators, several hybrid algorithmic schemes have optimal asymptotic performance on the easiest functions for each individual operator. In particular, the hybrid algorithms using CHM and SBM have optimal asymptotic performance on both OneMax and MinBlocks. We then investigate easiest functions for hybrid schemes and show that an easiest function for a hybrid algorithm is not just a trivial weighted combination of the respective easiest functions for each operator. PubDate: 2017-06-01 DOI: 10.1007/s00453-016-0201-4 Issue No:Vol. 78, No. 2 (2017)

Abstract: Given a set P of n moving points in fixed dimension d, where the trajectory of each point is a polynomial of degree bounded by some constant, we present a kinetic data structure (KDS) for maintenance of the closest pair on P. Assuming the closest pair distance is between 1 and \(\varDelta \) over time, our KDS uses \(O(n \log \varDelta )\) space and processes \(O(n^{2} \beta \log \varDelta \log n + n^{2} \beta \log \varDelta \log \log \varDelta )\) events, each in worst-case time \(O(\log ^2 n + \log ^2 \log \varDelta )\) . Here, \(\beta \) is an extremely slow-growing function. The locality of the KDS is \(O(\log n + \log \log \varDelta )\) . Our closest pair KDS supports insertions and deletions of points. An insertion or deletion takes worst-case time \(O(\log \varDelta \log ^2 n + \log \varDelta \log ^2 \log \varDelta )\) . Also, we use a similar approach to provide a KDS for the all \(\varepsilon \) -nearest neighbors in \(\mathbb {R}^d\) . The complexities of the previous KDSs, for both closest pair and all \(\varepsilon \) -nearest neighbors, have polylogarithmic factors, where the number of logs depends on dimension d. Assuming \(\varDelta \) is polynomial in n, our KDSs obtain improvements on the previous KDSs. Our solutions are based on a kinetic clustering on P. Though we use ideas from the previous clustering KDS by Hershberger, we simplify and improve his work. PubDate: 2017-06-21

Abstract: We present a satisfiability algorithm for k-indexed binary decision diagrams (k-IBDDs). The proposed exponential space and deterministic algorithm solves the satisfiability of k-IBDDs, i.e., k-IBDD SAT, for instances with n variables and cn nodes in \(O\left( 2^{(1-\mu _k(c))n}\right) \) time, where \(\mu _k(c) = \varOmega \left( \frac{1}{(k^2 2^k\log {c})^{2^{k-1}-1}}\right) \) . We also provide a polynomial space and deterministic algorithm that solves a k-IBDD SAT of polynomial size for any constant \(k \ge 2\) in \(O\left( 2^{ n - n^{ 1/2^{k-1} }}\right) \) time. In addition, the proposed algorithm is applicable to equivalence checking of two IBDDs. PubDate: 2017-06-20

Abstract: We study variants of the capacitated vehicle routing problem. In the multiple depot capacitated k-travelling repairmen problem (MD-CkTRP), we have a collection of clients to be served by one vehicle in a fleet of k identical vehicles based at given depots. Each client has a given demand that must be satisfied, and each vehicle can carry a total of at most Q demand before it must resupply at its original depot. We wish to route the vehicles in a way that obeys the constraints while minimizing the average time (latency) required to serve a client. This generalizes the Multi-depot k-Travelling Repairman Problem (MD-kTRP) (Chaudhuri et al. in 44th IEEE-FOCS, pp 36–45, 2003; Post and Swamy in 26th ACM-SIAM SODA, pp 512–531, 2015) to the capacitated vehicle setting, and while it has been previously studied (Lysgaard and Wholk in Eur J Oper Res 236(3):800–810, 2014), no approximation algorithm with a proven ratio is known. We give a 42.49-approximation to this general problem, and refine this constant to 25.49 when clients have unit demands. As far as we are aware, these are the first constant-factor approximations for capacitated vehicle routing problems with a latency objective. We achieve these results by developing a framework allowing us to solve a wider range of latency problems, and crafting various orienteering-style oracles for use in this framework. We also show a simple LP rounding algorithm has a better approximation ratio for the maximum coverage problem with groups (MCG), first studied by Chekuri and Kumar (Approximation, randomization, and combinatorial optimization, algorithms and techniques, pp 72–83, 2004), and use it as a subroutine in our framework. Our approximation ratio for MD-CkTRP when restricted to uncapacitated setting matches the best known bound for it (Post and Swamy in 26th ACM-SIAM SODA, pp 512–531, 2015). With our framework, any improvements to our oracles or our MCG approximation will result in improved approximations to the corresponding k-TRP problem. PubDate: 2017-06-20

Authors:Marek Chrobak; Kevin P. Costello Abstract: We study information gathering in ad-hoc radio networks. Initially, each node of the network has a piece of information called a rumor, and the overall objective is to gather all these rumors in the designated target node. The ad-hoc property refers to the fact that the topology of the network is unknown when the computation starts. Aggregation of rumors is not allowed, which means that each node may transmit at most one rumor in one step. We focus on networks with tree topologies, that is we assume that the network is a tree with all edges directed towards the root, but, being ad-hoc, its actual topology is not known. We provide two deterministic algorithms for this problem. For the model that does not assume any collision detection nor acknowledgement mechanisms, we give an \(O(n\log \log n)\) -time algorithm, improving the previous upper bound of \(O(n\log n)\) . We then show that this running time can be reduced to O(n), which is asymptotically optimal, if the model allows for acknowledgements of successful transmissions. PubDate: 2017-06-20 DOI: 10.1007/s00453-017-0336-y

Authors:Hiroshi Hirai; Hiroyuki Namba Abstract: Björklund and Husfeldt developed a randomized polynomial time algorithm to solve the shortest two disjoint paths problem. Their algorithm is based on computation of permanents modulo 4 and the isolation lemma. In this paper, we consider the following generalization of the shortest two disjoint paths problem, and develop a similar algebraic algorithm. The shortest perfect \((A+B)\) -path packing problem is: given an undirected graph G and two disjoint node subsets A, B with even cardinalities, find shortest \( A /2+ B /2\) disjoint paths whose ends are both in A or both in B. Besides its NP-hardness, we prove that this problem can be solved in randomized polynomial time if \( A + B \) is fixed. Our algorithm basically follows the framework of Björklund and Husfeldt but uses a new technique: computation of hafnian modulo \(2^k\) combined with Gallai’s reduction from T-paths to matchings. We also generalize our technique for solving other path packing problems, and discuss its limitation. PubDate: 2017-06-14 DOI: 10.1007/s00453-017-0334-0

Authors:Anup Bhattacharya; Davis Issac; Ragesh Jaiswal; Amit Kumar Abstract: Space efficient algorithms play an important role in dealing with large amount of data. In such settings, one would like to analyze the large data using small amount of “working space”. One of the key steps in many algorithms for analyzing large data is to maintain a (or a small number) random sample from the data points. In this paper, we consider two space restricted settings—(i) the streaming model, where data arrives over time and one can use only a small amount of storage, and (ii) the query model, where we can structure the data in low space and answer sampling queries. In this paper, we prove the following results in the above two settings: In the streaming setting, we would like to maintain a random sample from the elements seen so far. We prove that one can maintain a random sample using \(O(\log n)\) random bits and \(O(\log n)\) bits of space, where n is the number of elements seen so far. We can extend this to the case when elements have weights as well. In the query model, there are n elements with weights \(w_1, \ldots , w_n\) (which are w-bit integers) and one would like to sample a random element with probability proportional to its weight. Bringmann and Larsen (STOC 2013) showed how to sample such an element using \(nw +1 \) bits of space (whereas, the information theoretic lower bound is nw). We consider the approximate sampling problem, where we are given an error parameter \(\varepsilon \) , and the sampling probability of an element can be off by an \(\varepsilon \) factor. We give matching upper and lower bounds for this problem. PubDate: 2017-06-14 DOI: 10.1007/s00453-017-0335-z

Authors:Yun Deng; David Fernández-Baca Abstract: We present a new graph-based approach to the following basic problem in phylogenetic tree construction. Let \(\mathcal {P}= \{T_1, \ldots , T_k\}\) be a collection of rooted phylogenetic trees over various subsets of a set of species. The tree compatibility problem asks whether there is a phylogenetic tree T with the following property: for each \(i \in \{1, \dots , k\}\) , \(T_i\) can be obtained from the restriction of T to the species set of \(T_i\) by contracting zero or more edges. If such a tree T exists, we say that \(\mathcal {P}\) is compatible and that T displays \(\mathcal {P}\) . Our approach leads to a \(O(M_\mathcal {P}\log ^2 M_\mathcal {P})\) algorithm for the tree compatibility problem, where \(M_\mathcal {P}\) is the total number of nodes and edges in \(\mathcal {P}\) . Our algorithm either returns a tree that displays \(\mathcal {P}\) or reports that \(\mathcal {P}\) is incompatible. Unlike previous algorithms, the running time of our method does not depend on the degrees of the nodes in the input trees. Thus, our algorithm is equally fast on highly resolved and highly unresolved trees. PubDate: 2017-06-06 DOI: 10.1007/s00453-017-0330-4

Authors:Rinat Ben-Avraham; Matthias Henze; Rafel Jaume; Balázs Keszegh; Orit E. Raz; Micha Sharir; Igor Tubis Abstract: We consider the problem of minimizing the RMS distance (sum of squared distances between pairs of points) under translation between two point sets A and B, in the plane, with \(m= B \ll n= A \) , in the partial-matching setup, in which each point in B is matched to a distinct point in A. Although the problem is not known to be polynomial, we establish several structural properties of the underlying subdivision \({\mathcal {D}}_{B,A}\) of the plane and derive improved bounds on its complexity. Specifically, we show that this complexity is \(O(n^2m^{3.5} (e \ln m+e)^m)\) , so it is only quadratic in A . These results lead to the best known algorithm for finding a translation for which the partial-matching RMS distance between the point sets is minimized. In addition, we show how to compute a local minimum of the partial-matching RMS distance under translation, in polynomial time. PubDate: 2017-06-06 DOI: 10.1007/s00453-017-0326-0

Authors:Vadim E. Levit; David Tankus Abstract: A graph G is well-covered if all its maximal independent sets are of the same cardinality. Assume that a weight function w is defined on its vertices. Then G is w -well-covered if all maximal independent sets are of the same weight. For every graph G, the set of weight functions w such that G is w-well-covered is a vector space, denoted WCW(G). Let B be a complete bipartite induced subgraph of G on vertex sets of bipartition \(B_{X}\) and \(B_{Y}\) . Then B is generating if there exists an independent set S such that \(S \cup B_{X}\) and \(S \cup B_{Y}\) are both maximal independent sets of G. In the restricted case that a generating subgraph B is isomorphic to \(K_{1,1}\) , the unique edge in B is called a relating edge. Deciding whether an input graph G is well-covered is co-NP-complete. Therefore finding WCW(G) is co-NP-hard. Deciding whether an edge is relating is NP-complete. Therefore, deciding whether a subgraph is generating is NP-complete as well. In this article we discuss the connections among these problems, provide proofs for NP-completeness for several restricted cases, and present polynomial characterizations for some other cases. PubDate: 2017-06-02 DOI: 10.1007/s00453-017-0325-1

Authors:David Eppstein; Philipp Kindermann; Stephen Kobourov; Giuseppe Liotta; Anna Lubiw; Aude Maignan; Debajyoti Mondal; Hamideh Vosoughpour; Sue Whitesides; Stephen Wismath Abstract: Motivated by applications in graph drawing and information visualization, we examine the planar split thickness of a graph, that is, the smallest k such that the graph is k-splittable into a planar graph. A k-split operation substitutes a vertex v by at most k new vertices such that each neighbor of v is connected to at least one of the new vertices. We first examine the planar split thickness of complete graphs, complete bipartite graphs, multipartite graphs, bounded degree graphs, and genus-1 graphs. We then prove that it is NP-hard to recognize graphs that are 2-splittable into a planar graph, and show that one can approximate the planar split thickness of a graph within a constant factor. If the treewidth is bounded, then we can even verify k-splittability in linear time, for a constant k. PubDate: 2017-06-02 DOI: 10.1007/s00453-017-0328-y

Authors:Sander P. A. Alewijnse; Kevin Buchin; Maike Buchin; Stef Sijben; Michel A. Westenberg Abstract: We present efficient algorithms for segmenting and classifying trajectories based on a movement model parameterised by a single parameter, like the Brownian bridge movement model. Segmentation is the problem of subdividing a trajectory into interior-disjoint parts such that each part is homogeneous in its movement characteristics. We formalise this using the likelihood of the model parameter, and propose a new algorithm for trajectory segmentation based on this. We consider the case where a discrete set of m parameter values is given and present an algorithm to compute an optimal segmentation with respect to an information criterion in O(nm) time for a trajectory with n sampling points. We also present an algorithm that efficiently computes the optimal segmentation if we allow the parameter values to be drawn from a continuous domain. Classification is the problem of assigning trajectories to classes of similar movement characteristics. The set of trajectories might for instance be the subtrajectories resulting from segmenting a trajectory, thus identifying movement phases. We give an algorithm to compute the optimal classification with respect to an information criterion in \(O(m^2 + km\log m)\) time for m parameter values and k trajectories, assuming bitonic likelihood functions. We also show that classification is NP-hard if the parameter values are allowed to vary continuously and present an algorithm that solves the problem in polynomial time under mild assumptions on the input. PubDate: 2017-06-02 DOI: 10.1007/s00453-017-0329-x