for Journals by Title or ISSN for Articles by Keywords help

Publisher: Springer-Verlag (Total: 2352 journals)

 Algorithmica   [SJR: 0.898]   [H-I: 56]   [8 followers]  Follow         Hybrid journal (It can contain Open Access articles)    ISSN (Print) 1432-0541 - ISSN (Online) 0178-4617    Published by Springer-Verlag  [2352 journals]
• Special Issue: Algorithmic Tools in Cryptography
• Authors: Juan A. Garay; Rafail Ostrovsky
Pages: 985 - 986
PubDate: 2017-12-01
DOI: 10.1007/s00453-017-0368-3
Issue No: Vol. 79, No. 4 (2017)

• On the Information Ratio of Non-perfect Secret Sharing Schemes
• Authors: Oriol Farràs; Torben Brandt Hansen; Tarik Kaced; Carles Padró
Pages: 987 - 1013
Abstract: Abstract A secret sharing scheme is non-perfect if some subsets of players that cannot recover the secret value have partial information about it. The information ratio of a secret sharing scheme is the ratio between the maximum length of the shares and the length of the secret. This work is dedicated to the search of bounds on the information ratio of non-perfect secret sharing schemes and the construction of efficient linear non-perfect secret sharing schemes. To this end, we extend the known connections between matroids, polymatroids and perfect secret sharing schemes to the non-perfect case. In order to study non-perfect secret sharing schemes in all generality, we describe their structure through their access function, a real function that measures the amount of information on the secret value that is obtained by each subset of players. We prove that there exists a secret sharing scheme for every access function. Uniform access functions, that is, access functions whose values depend only on the number of players, generalize the threshold access structures. The optimal information ratio of the uniform access functions with rational values has been determined by Yoshida, Fujiwara and Fossorier. By using the tools that are described in our work, we provide a much simpler proof of that result and we extend it to access functions with real values.
PubDate: 2017-12-01
DOI: 10.1007/s00453-016-0217-9
Issue No: Vol. 79, No. 4 (2017)

• On Virtual Grey Box Obfuscation for General Circuits
• Authors: Nir Bitansky; Ran Canetti; Yael Tauman Kalai; Omer Paneth
Pages: 1014 - 1051
Abstract: Abstract An obfuscator $$\mathcal {O}$$ is Virtual Grey Box (VGB) for a class $$\mathcal {C}$$ of circuits if, for any $$C\in \mathcal {C}$$ and any predicate $$\pi$$ , deducing $$\pi (C)$$ given $$\mathcal {O}(C)$$ is tantamount to deducing $$\pi (C)$$ given unbounded computational resources and polynomially many oracle queries to C. VGB obfuscation is often significantly more meaningful than indistinguishability obfuscation (IO). In fact, for some circuit families of interest VGB is equivalent to full-fledged Virtual Black Box obfuscation. We investigate the feasibility of obtaining VGB obfuscation for general circuits. We first formulate a natural strengthening of IO, called strong IO (SIO). Essentially, $$\mathcal {O}$$ is SIO for class $$\mathcal {C}$$ if $$\mathcal {O}(C_0)\approx \mathcal {O}(C_1)$$ whenever the pair $$(C_0,C_1)$$ is taken from a distribution over $$\mathcal {C}$$ where, for all x, $$C_0(x)\ne C_1(x)$$ only with negligible probability. We then show that an obfuscator is VGB for a class $$\mathcal {C}$$ if and only if it is SIO for $$\mathcal {C}$$ . This result is unconditional and holds for any $$\mathcal {C}$$ . We also show that, for some circuit collections, SIO implies virtual black-box obfuscation. Finally, we formulate a slightly stronger variant of the semantic security property of graded encoding schemes [Pass-Seth-Telang Crypto 14], and show that existing obfuscators, such as the obfuscator of Barak et al. [Eurocrypt 14], are SIO for all circuits in NC $$^1$$ , assuming that the underlying graded encoding scheme satisfies our variant of semantic security. Put together, we obtain VGB obfuscation for all $$NC^1$$ circuits under assumptions that are almost the same as those used by Pass et al. to obtain IO for $$NC^1$$ circuits. We also observe that VGB obfuscation for all polynomial-size circuits implies the existence of semantically-secure graded encoding schemes with limited functionality known as jigsaw puzzles.
PubDate: 2017-12-01
DOI: 10.1007/s00453-016-0218-8
Issue No: Vol. 79, No. 4 (2017)

• On the Impossibility of Cryptography with Tamperable Randomness
• Authors: Per Austrin; Kai-Min Chung; Mohammad Mahmoody; Rafael Pass; Karn Seth
Pages: 1052 - 1101
Abstract: Abstract We initiate a study of the security of cryptographic primitives in the presence of efficient tampering attacks to the randomness of honest parties. More precisely, we consider p-tampering attackers that may efficiently tamper with each bit of the honest parties’ random tape with probability p, but have to do so in an “online” fashion. Our main result is a strong negative result: We show that any secure encryption scheme, bit commitment scheme, or zero-knowledge protocol can be “broken” with advantage $$\Omega (p)$$ by a p-tampering attacker. The core of this result is a new algorithm for biasing the output of bounded-value functions, which may be of independent interest. We also show that this result cannot be extended to primitives such as signature schemes and identification protocols: assuming the existence of one-way functions, such primitives can be made resilient to -tampering attacks where n is the security parameter.
PubDate: 2017-12-01
DOI: 10.1007/s00453-016-0219-7
Issue No: Vol. 79, No. 4 (2017)

• Scalable Zero Knowledge Via Cycles of Elliptic Curves
• Authors: Eli Ben-Sasson; Alessandro Chiesa; Eran Tromer; Madars Virza
Pages: 1102 - 1160
Abstract: Abstract Non-interactive zero-knowledge proofs of knowledge for general NP statements are a powerful cryptographic primitive, both in theory and in practical applications. Recently, much research has focused on achieving an additional property, succinctness, requiring the proof to be very short and easy to verify. Such proof systems are known as zero-knowledge succinct non-interactive arguments of knowledge (zk-SNARKs), and are desired when communication is expensive, or the verifier is computationally weak. Existing zk-SNARK implementations have severe scalability limitations, in terms of space complexity as a function of the size of the computation being proved (e.g., running time of the NP statement’s decision program). First, the size of the proving key is quasilinear in the upper bound on the computation size. Second, producing a proof requires “writing down” all intermediate values of the entire computation, and then conducting global operations such as FFTs. The bootstrapping technique of Bitansky et al. (STOC ’13), following Valiant (TCC ’08), offers an approach to scalability, by recursively composing proofs: proving statements about acceptance of the proof system’s own verifier (and correctness of the program’s latest step). Alas, recursive composition of known zk-SNARKs has never been realized in practice, due to enormous computational cost. Using new elliptic-curve cryptographic techniques, and methods for exploiting the proof systems’ field structure and nondeterminism, we achieve the first zk-SNARK implementation that practically achieves recursive proof composition. Our zk-SNARK implementation runs random-access machine programs and produces proofs of their correct execution, on today’s hardware, for any program running time. It takes constant time to generate the keys that support all computation sizes. Subsequently, the proving process only incurs a constant multiplicative overhead compared to the original computation’s time, and an essentially-constant additive overhead in memory. Thus, our zk-SNARK implementation is the first to have a well-defined, albeit low, clock rate of “verified instructions per second”.
PubDate: 2017-12-01
DOI: 10.1007/s00453-016-0221-0
Issue No: Vol. 79, No. 4 (2017)

• Improved Generic Attacks Against Hash-Based MACs and HAIFA
• Authors: Itai Dinur; Gaëtan Leurent
Pages: 1161 - 1195
Abstract: Abstract The security of HMAC (and more general hash-based MACs) against state-recovery and universal forgery attacks was shown to be suboptimal, following a series of results by Leurent et al. and Peyrin et al. These results have shown that such powerful attacks require significantly less than $$2^{\ell }$$ computations, contradicting the common belief (where $$\ell$$ denotes the internal state size). In this work, we revisit and extend these results, with a focus on concrete hash functions that limit the message length, and apply special iteration modes. We begin by devising the first state-recovery attack on HMAC with a HAIFA hash function (using a block counter in every compression function call), with complexity $$2^{4\ell /5}$$ . Then, we describe improved tradeoffs between the message length and the complexity of a state-recovery attack on HMAC with a Merkle–Damgård hash function. Consequently, we obtain improved attacks on several HMAC constructions used in practice, in which the hash functions limits the maximal message length (e.g., SHA-1 and SHA-2). Finally, we present the first universal forgery attacks, which can be applied with short message queries to the MAC oracle. In particular, we devise the first universal forgery attacks applicable to SHA-1 and SHA-2. Despite their theoretical interest, our attacks do not seem to threaten the practical security of the analyzed concrete HMAC constructions.
PubDate: 2017-12-01
DOI: 10.1007/s00453-016-0236-6
Issue No: Vol. 79, No. 4 (2017)

• How to Eat Your Entropy and Have it Too: Optimal Recovery Strategies for
Compromised RNGs
• Authors: Yevgeniy Dodis; Adi Shamir; Noah Stephens-Davidowitz; Daniel Wichs
Pages: 1196 - 1232
Abstract: Abstract Random number generators (RNGs) play a crucial role in many cryptographic schemes and protocols, but their security proof usually assumes that their internal state is initialized with truly random seeds and remains secret at all times. However, in many practical situations these are unrealistic assumptions: The seed is often gathered after a reset/reboot from low entropy external events such as the timing of manual key presses, and the state can be compromised at unknown points in time via side channels or penetration attacks. The usual remedy (used by all the major operating systems, including Windows, Linux, FreeBSD, MacOS, iOS, etc.) is to periodically replenish the internal state through an auxiliary input with additional randomness harvested from the environment. However, recovering from such attacks in a provably correct and computationally optimal way had remained an unsolved challenge so far. In this paper we formalize the problem of designing an efficient recovery mechanism from state compromise, by considering it as an online optimization problem. If we knew the timing of the last compromise and the amount of entropy gathered since then, we could stop producing any outputs until the state becomes truly random again. However, our challenge is to recover within a time proportional to this optimal solution even in the hardest (and most realistic) case in which (a) we know nothing about the timing of the last state compromise, and the amount of new entropy injected since then into the state, and (b) any premature production of outputs leads to the total loss of all the added entropy used by the RNG, since the attacker can use brute force to enumerate all the possible low-entropy states. In other words, the challenge is to develop recovery mechanisms which are guaranteed to save the day as quickly as possible after a compromise we are not even aware of. The dilemma that we face is that any entropy used prematurely will be lost, and any entropy which is kept unused will delay the recovery. After developing our formal definitional framework for RNGs with inputs, we show how to construct a nearly optimal RNG which is secure in our model. Our technique is inspired by the design of the Fortuna RNG (which is a heuristic RNG construction that is currently used by Windows and comes without any formal analysis), but we non-trivially adapt it to our much stronger adversarial setting. Along the way, our formal treatment of Fortuna enables us to improve its entropy efficiency by almost a factor of two, and to show that our improved construction is essentially tight, by proving a rigorous lower bound on the possible efficiency of any recovery mechanism in our very general model of the problem.
PubDate: 2017-12-01
DOI: 10.1007/s00453-016-0239-3
Issue No: Vol. 79, No. 4 (2017)

• Multiparty Key Exchange, Efficient Traitor Tracing, and More from
Indistinguishability Obfuscation
• Authors: Dan Boneh; Mark Zhandry
Pages: 1233 - 1285
Abstract: Abstract In this work, we show how to use indistinguishability obfuscation to build multiparty key exchange, efficient broadcast encryption, and efficient traitor tracing. Our schemes enjoy several interesting properties that have not been achievable before: Our multiparty non-interactive key exchange protocol does not require a trusted setup. Moreover, the size of the published value from each user is independent of the total number of users. Our broadcast encryption schemes support distributed setup, where users choose their own secret keys rather than be given secret keys by a trusted entity. The broadcast ciphertext size is independent of the number of users. Our traitor tracing system is fully collusion resistant with short ciphertexts, secret keys, and public key. Ciphertext size is logarithmic in the number of users and secret key size is independent of the number of users. Our public key size is polylogarithmic in the number of users. The recent functional encryption system of Garg, Gentry, Halevi, Raykova, Sahai, and Waters also leads to a traitor tracing scheme with similar ciphertext and secret key size, but the construction in this paper is simpler and more direct. These constructions resolve an open problem relating to differential privacy. Generalizing our traitor tracing system gives a private broadcast encryption scheme (where broadcast ciphertexts reveal minimal information about the recipient set) with optimal size ciphertext. Several of our proofs of security introduce new tools for proving security using indistinguishability obfuscation.
PubDate: 2017-12-01
DOI: 10.1007/s00453-016-0242-8
Issue No: Vol. 79, No. 4 (2017)

• Self-Bilinear Map on Unknown Order Groups from Indistinguishability
Obfuscation and Its Applications
• Authors: Takashi Yamakawa; Shota Yamada; Goichiro Hanaoka; Noboru Kunihiro
Pages: 1286 - 1317
Abstract: Abstract A self-bilinear map is a bilinear map where the domain and target groups are identical. In this paper, we introduce a self-bilinear map with auxiliary information which is a weaker variant of a self-bilinear map, construct it based on indistinguishability obfuscation and prove that a useful hardness assumption holds with respect to our construction under the factoring assumption. From our construction, we obtain a multilinear map with interesting properties: the level of multilinearity is not bounded in the setup phase, and representations of group elements are compact, i.e., their size is independent of the level of multilinearity. This is the first construction of a multilinear map with these properties. Note, however, that to evaluate the multilinear map, auxiliary information is required. As applications of our multilinear map, we construct multiparty non-interactive key-exchange and distributed broadcast encryption schemes where the maximum number of users is not fixed in the setup phase. Besides direct applications of our self-bilinear map, we show that our technique can also be used for constructing somewhat homomorphic encryption based on indistinguishability obfuscation and the $$\varPhi$$ -hiding assumption.
PubDate: 2017-12-01
DOI: 10.1007/s00453-016-0250-8
Issue No: Vol. 79, No. 4 (2017)

• Hardness of k -LWE and Applications in Traitor Tracing
• Authors: San Ling; Duong Hieu Phan; Damien Stehlé; Ron Steinfeld
Pages: 1318 - 1352
Abstract: Abstract We introduce the k- $$\mathrm {LWE}$$ problem, a Learning With Errors variant of the k-SIS problem. The Boneh-Freeman reduction from SIS to k-SIS suffers from an exponential loss in k. We improve and extend it to an LWE to k-LWE reduction with a polynomial loss in k, by relying on a new technique involving trapdoors for random integer kernel lattices. Based on this hardness result, we present the first algebraic construction of a traitor tracing scheme whose security relies on the worst-case hardness of standard lattice problems. The proposed $$\mathrm {LWE}$$ traitor tracing is almost as efficient as the $$\mathrm {LWE}$$ encryption. Further, it achieves public traceability, i.e., allows the authority to delegate the tracing capability to “untrusted” parties. To this aim, we introduce the notion of projective sampling family in which each sampling function is keyed and, with a projection of the key on a well chosen space, one can simulate the sampling function in a computationally indistinguishable way. The construction of a projective sampling family from k- $$\mathrm {LWE}$$ allows us to achieve public traceability, by publishing the projected keys of the users. We believe that the new lattice tools and the projective sampling family are quite general that they may have applications in other areas.
PubDate: 2017-12-01
DOI: 10.1007/s00453-016-0251-7
Issue No: Vol. 79, No. 4 (2017)

• On the Implausibility of Differing-Inputs Obfuscation and Extractable
Witness Encryption with Auxiliary Input
• Authors: Sanjam Garg; Craig Gentry; Shai Halevi; Daniel Wichs
Pages: 1353 - 1373
Abstract: Abstract The notion of differing-inputs obfuscation (diO) was introduced by Barak et al. (CRYPTO, pp 1–18, 2001). It guarantees that, for any two circuits $$C_0, C_1$$ for which it is difficult to come up with an input x on which $$C_0(x) \ne C_1(x)$$ , it should also be difficult to distinguish the obfuscation of $$C_0$$ from that of $$C_1$$ . This is a strengthening of indistinguishability obfuscation, where the above is only guaranteed for circuits that agree on all inputs. Two recent works of Ananth et al. (Differing-inputs obfuscation and applications, http://eprint.iacr.org/, 2013) and Boyle et al. (Lindell, pp 52–73, 2014) study the notion of diO in the setting where the attacker is also given some auxiliary information related to the circuits, showing that this notion leads to many interesting applications. In this work, we show that the existence of general-purpose diO with general auxiliary input has a surprising consequence: it implies that a specific circuit $$C^*$$ with specific auxiliary input $${\mathsf {aux}}^*$$ cannot be obfuscated in a way that hides some specific information. In other words, under the conjecture that such special-purpose obfuscation exists, we show that general-purpose diO cannot exist. This conjecture is a falsifiable assumption which we do not know how to break for candidate obfuscation schemes. We also show similar implausibility results for extractable witness encryption with auxiliary input and for “output-only dependent” hardcore bits for general one-way functions.
PubDate: 2017-12-01
DOI: 10.1007/s00453-017-0276-6
Issue No: Vol. 79, No. 4 (2017)

• Random and Conditional ( t ,  k )-Diagnosis of Hypercubes
• Authors: Chia-Chen Wei; Sun-Yuan Hsieh
Pages: 625 - 644
Abstract: Abstract Under the PMC model, we consider the (t, k)-diagnosability of a hypercube under the random fault model and conditional fault model separately. A system is called random (t, k)-diagnosable [or conditionally (t, k)-diagnosable] if at least k faulty vertices that can be identified in each iteration under the assumption that there is no any restriction on the fault distribution (or every vertex is adjacent to at least one fault-free vertex), provided that there are at most t faulty vertices, where $$t\ge k$$ . When the remaining number of faulty vertices is lower than k, all of them can be identified. In this paper, under the PMC and random fault models, we show that the sequential t-diagnosis algorithm for hypercubes proposed by [35] can be extended to the (t, k)-diagnosis algorithm for hypercubes, and we show that the n-dimensional hypercubes are randomly $$\left( \genfrac(){0.0pt}0{n}{\frac{n}{2}},n\right)$$ -diagnosable if n is even, and randomly $$\left( 2\cdot \genfrac(){0.0pt}0{n-1}{\frac{n-1}{2}},n\right)$$ -diagnosable if n is odd, where $$\genfrac(){0.0pt}0{p}{q}=\frac{p!}{q!(p-q)!}$$ . Moreover, we propose a conditional (t, k)-diagnosis algorithm for hypercubes by using properties of the conditional fault model and show that n-dimensional hypercubes are conditionally $$\left( \genfrac(){0.0pt}0{n}{\frac{n}{2}},2n-2\right)$$ -diagnosable if n is even, and conditionally $$\left( 2\cdot \genfrac(){0.0pt}0{n-1}{\frac{n-1}{2}},2n-2\right)$$ -diagnosable if n is odd. Furthermore, under the PMC model, our results improve the previous best low bounds on t under the random and conditional fault models.
PubDate: 2017-11-01
DOI: 10.1007/s00453-016-0210-3
Issue No: Vol. 79, No. 3 (2017)

• Metric Decompositions of Path-Separable Graphs
• Authors: Lior Kamma; Robert Krauthgamer
Pages: 645 - 653
Abstract: Abstract A prominent tool in many problems involving metric spaces is a notion of randomized low-diameter decomposition. Loosely speaking, $$\beta$$ -decomposition refers to a probability distribution over partitions of the metric into sets of low diameter, such that nearby points (parameterized by $$\beta >0$$ ) are likely to be “clustered” together. Applying this notion to the shortest-path metric in edge-weighted graphs, it is known that n-vertex graphs admit an $$O(\ln n)$$ -padded decomposition (Bartal, 37th annual symposium on foundations of computer science. IEEE, pp 184–193, 1996), and that excluded-minor graphs admit O(1)-padded decomposition (Klein et al., 25th annual ACM symposium on theory of computing, pp 682–690, 1993; Fakcharoenphol and Talwar, J Comput Syst Sci 69(3), 485–497, 2004; Abraham et al., Proceedings of the 46th annual ACM symposium on theory of computing. STOC ’14, pp 79–88. ACM, New York, NY, USA, 2014). We design decompositions to the family of p-path-separable graphs, which was defined by Abraham and Gavoille (Proceedings of the twenty-fifth annual acm symposium on principles of distributed computing, PODC ’06, pp 188–197, 2006) and refers to graphs that admit vertex-separators consisting of at most p shortest paths in the graph. Our main result is that every p-path-separable n-vertex graph admits an $$O(\ln (p \ln n))$$ -decomposition, which refines the $$O(\ln n)$$ bound for general graphs, and provides new bounds for families like bounded-treewidth graphs. Technically, our clustering process differs from previous ones by working in (the shortest-path metric of) carefully chosen subgraphs.
PubDate: 2017-11-01
DOI: 10.1007/s00453-016-0213-0
Issue No: Vol. 79, No. 3 (2017)

• On Polynomial Kernelization of $$\mathcal {H}$$ H - free Edge Deletion
• Authors: N. R. Aravind; R. B. Sandeep; Naveen Sivadasan
Pages: 654 - 666
Abstract: Abstract For a set $$\mathcal {H}$$ of graphs, the $$\mathcal {H}$$ -free Edge Deletion problem is to decide whether there exist at most k edges in the input graph, for some $$k\in \mathbb {N}$$ , whose deletion results in a graph without an induced copy of any of the graphs in $$\mathcal {H}$$ . The problem is known to be fixed-parameter tractable if $$\mathcal {H}$$ is of finite cardinality. In this paper, we present a polynomial kernel for this problem for any fixed finite set $$\mathcal {H}$$ of connected graphs for the case where the input graphs are of bounded degree. We use a single kernelization rule which deletes vertices ‘far away’ from the induced copies of every $$H\in \mathcal {H}$$ in the input graph. With a slightly modified kernelization rule, we obtain polynomial kernels for $$\mathcal {H}$$ -free Edge Deletion under the following three settings: $$\mathcal {H}$$ contains $$K_{1,s}$$ and $$K_t$$ ; $$\mathcal {H}$$ contains $$K_{1,s}$$ and the input graphs are $$K_t$$ -free; $$\mathcal {H}$$ contains $$K_{t}$$ and the input graphs are $$K_{1,s}$$ -free; where $$s>1$$ and $$t>2$$ are any fixed integers. Our result provides the first polynomial kernels for Claw-free Edge Deletion and Line Edge Deletion for $$K_t$$ -free input graphs which are known to be NP-complete even for $$K_4$$ -free graphs.
PubDate: 2017-11-01
DOI: 10.1007/s00453-016-0215-y
Issue No: Vol. 79, No. 3 (2017)

• Chain Minors are FPT
• Authors: Jarosław Błasiok; Marcin Kamiński
Pages: 698 - 707
Abstract: Abstract Given two finite partially ordered sets P and Q, we say that P is a chain minor of Q if there exists a partial function f from the elements of Q to the elements of P such that for every chain in P there is a chain $$C_Q$$ in Q with the property that f restricted to $$C_Q$$ is an isomorphism of chains C and $$C_Q$$ . We give an algorithm to decide whether a partially ordered set P is a chain minor of a partially ordered set Q, which runs in time $${\mathcal {O}}\left( Q \log Q \right)$$ for every fixed partially ordered set P. This solves an open problem from the monograph by Downey and Fellows (Parameterized complexity. Springer, New York, 1999) who asked whether the problem was fixed parameter tractable.
PubDate: 2017-11-01
DOI: 10.1007/s00453-016-0220-1
Issue No: Vol. 79, No. 3 (2017)

• Decision Trees for Function Evaluation: Simultaneous Optimization of Worst
and Expected Cost
• Authors: Ferdinando Cicalese; Eduardo Laber; Aline Saettler
Pages: 763 - 796
Abstract: Abstract In several applications of automatic diagnosis and active learning, a central problem is the evaluation of a discrete function by adaptively querying the values of its variables until the values read uniquely determine the value of the function. In general, the process of reading the value of a variable might involve some cost. This cost should be taken into account when deciding the next variable to read. The goal is to design a strategy for evaluating the function incurring little cost (in the worst case or in expectation according to a prior distribution on the possible variables’ assignments). Our algorithm builds a strategy (decision tree) which attains a logarithmic approximation simultaneously for the expected and worst cost spent. This is best possible under the assumption that $${{{\mathcal {P}}}} \ne \mathcal{NP}$$ .
PubDate: 2017-11-01
DOI: 10.1007/s00453-016-0225-9
Issue No: Vol. 79, No. 3 (2017)

• Editor’s Note: Special Issue on Combinatorial Pattern Matching
• Pages: 797 - 797
PubDate: 2017-11-01
DOI: 10.1007/s00453-017-0355-8
Issue No: Vol. 79, No. 3 (2017)

• A Framework for Space-Efficient String Kernels
• Authors: Djamal Belazzougui; Fabio Cunial
Pages: 857 - 883
Abstract: Abstract String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact kernels on pairs of strings of total length n, like the k-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in O(nd) time and in o(n) bits of space in addition to the input, using just a $$\mathtt {rangeDistinct}$$ data structure on the Burrows–Wheeler transform of the input strings that takes O(d) time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple values of k, like the k-mer profile and the k-th order empirical entropy, and for calibrating the value of k using the data. All such algorithms become O(n) using a suitable implementation of the $$\mathtt {rangeDistinct}$$ data structure, and by concatenating them to a suitable BWT construction algorithm, we can compute all the mentioned kernels and complexity measures, directly from the input strings, in O(n) time and in $$O(n\log {\sigma })$$ bits of space in addition to the input, where $$\sigma$$ is the size of the alphabet. Using similar data structures, we also show how to build a compact representation of the variable-length Markov chain of a string T of length n, that takes just $$3n\log {\sigma }+o(n\log {\sigma })$$ bits of space, and that can be learnt in randomized O(n) time using $$O(n\log {\sigma })$$ bits of space in addition to the input. Such model can then be used to assign a probability to a query string S of length m in O(m) time and in $$2m+o(m)$$ bits of additional space, thus providing an alternative, compositional measure of the similarity between S and T that does not require alignment.
PubDate: 2017-11-01
DOI: 10.1007/s00453-017-0286-4
Issue No: Vol. 79, No. 3 (2017)

• Guest Editors’ Foreword
• Authors: Khaled Elbassioni; Kazuhisa Makino
Pages: 884 - 885
PubDate: 2017-11-01
DOI: 10.1007/s00453-017-0359-4
Issue No: Vol. 79, No. 3 (2017)

• Partial Sorting Problem on Evolving Data
• Authors: Qin Huang; Xingwu Liu; Xiaoming Sun; Jialin Zhang
Pages: 960 - 983
Abstract: Abstract In this paper we investigate the top-k-selection problem, i.e. to determine and sort the top k elements, in the dynamic data model. Here dynamic means that the underlying total order evolves over time, and that the order can only be probed by pair-wise comparisons. It is assumed that at each time step, only one pair of elements can be compared. This assumption of restricted access is reasonable in the dynamic model, especially for massive data sets where it is impossible to access all the data before the next change occurs. Previously only two special cases were studied (Anagnostopoulos et al. in 36th international colloquium on automata, languages and programming (ICALP). LNCS, vol 5566, pp 339–350, 2009) in this model: selecting the element of a given rank, and sorting all elements. This paper systematically deals with $$1\le k\le n$$ . Specifically, we identify the critical point $$k^*$$ such that the top-k-selection problem can be solved error-free with probability $$1-o(1)$$ if and only if $$k=o(k^*)$$ . A lower bound of the error when $$k=\varOmega (k^*)$$ is also determined, which actually is tight under some conditions. In contrast, we show that the top-k-set problem, which means finding the top k elements without sorting them, can be solved error-free with probability $$1-o(1)$$ for all $$1\le k\le n$$ . Additionally, we consider some extensions of the dynamic data model and show that most of these results still hold.
PubDate: 2017-11-01
DOI: 10.1007/s00453-017-0295-3
Issue No: Vol. 79, No. 3 (2017)

JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327

Home (Search)
Subjects A-Z
Publishers A-Z
Customise
APIs