Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractThe rank metric measures the distance between two matrices by the rank of their difference. Codes designed for the rank metric have attracted considerable attention in recent years, reinforced by network coding and further motivated by a variety of applications. In code-based cryptography, the hardness of the corresponding generic decoding problem can lead to systems with reduced public-key size. In distributed data storage, codes in the rank metric have been used repeatedly to construct codes with locality, and in coded caching, they have been employed for the placement of coded symbols. This survey gives a general introduction to rank-metric codes, explains their most important applications, and highlights their relevance to these areas of research.Suggested CitationHannes Bartz, Lukas Holzbaur, Hedongliang Liu, Sven Puchinger, Julian Renner and Antonia Wachter-Zeh (2022), "Rank-Metric Codes and Their Applications", Foundations and Trends® in Communications and Information Theory: Vol. 19: No. 3, pp 390-546. http://dx.doi.org/10.1561/0100000119 PubDate: Mon, 02 May 2022 00:00:00 +020

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractCommon information is ubiquitous in information theory and related areas such as theoretical computer science and discrete probability. However, because there are multiple notions of common information, a unified understanding of the deep interconnections between them is lacking. This monograph seeks to fill this gap by leveraging a small set of mathematical techniques that are applicable across seemingly disparate problems.In Part I, we review the operational tasks and properties associated with Wyner’s and Gács–Körner–Witsenhausen’s (GKW’s) common information. In Part II, we discuss extensions of the former from the perspective of distributed source simulation. This includes the Rényi common information which forms a bridge between Wyner’s common information and the exact common information. Via a surprising equivalence between the Rényi common information of order ∞ and the exact common information, we demonstrate the existence of a joint source in which the exact common information strictly exceeds Wyner’s common information. Other closely related topics discussed in Part II include the channel synthesis problem and the connection of Wyner’s and exact common information to the nonnegative rank of matrices.In Part III, recognizing that GKW’s common information is zero for most non-degenerate sources, we examine it with a more refined lens via the Non-Interactive Correlation Distillation (NICD) problem in which we quantify the agreement probability of extracted bits from a bivariate source. We extend this to the noise stability problem which includes as special cases the k-user NICD and q-stability problems. This allows us to seamlessly transition to discussing their connections to various conjectures in information theory and discrete probability, such as the Courtade–Kumar, Li– Médard and Mossell–O’Donnell conjectures. Finally, we consider functional inequalities (e.g., the hypercontractivity and Brascamp–Lieb inequalities), which constitute a further generalization of the noise stability problem in which the Boolean functions therein are replaced by nonnnegative functions. We demonstrate that the key ideas behind the proofs in Part III can be presented in a pedagogically coherent manner and unified via information-theoretic and Fourier-analytic methods.Suggested CitationLei Yu and Vincent Y. F. Tan (2022), "Common Information, Noise Stability, and Their Extensions", Foundations and Trends® in Communications and Information Theory: Vol. 19: No. 2, pp 107-389. http://dx.doi.org/10.1561/0100000122 PubDate: Thu, 28 Apr 2022 00:00:00 +020

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractDue to its longevity and enormous information density, DNA is an attractive medium for archival data storage. Natural DNA more than 700.000 years old has been recovered, and about 5 grams of DNA can in principle hold a Zetabyte of digital information, orders of magnitude more than what is achieved on conventional storage media. Thanks to rapid technological advances, DNA storage is becoming practically feasible, as demonstrated by a number of experimental storage systems, making it a promising solution for our society’s increasing need of data storage.While in living things, DNA molecules can consist of millions of nucleotides, due to technological constraints, in practice, data is stored on many short DNA molecules, which are preserved in a DNA pool and cannot be spatially ordered. Moreover, imperfections in sequencing, synthesis, and handling, as well as DNA decay during storage, introduce random noise into the system, making the task of reliably storing and retrieving information in DNA challenging.This unique setup raises a natural information-theoretic question: how much information can be reliably stored on and reconstructed from millions of short noisy sequences' The goal of this monograph is to address this question by discussing the fundamental limits of storing information on DNA. Motivated by current technological constraints on DNA synthesis and sequencing, we propose a probabilistic channel model that captures three key distinctive aspects of the DNA storage systems: (1) the data is written onto many short DNA molecules that are stored in an unordered fashion; (2) the molecules are corrupted by noise and (3) the data is read by randomly sampling from the DNA pool. Our goal is to investigate the impact of each of these key aspects on the capacity of the DNA storage system. Rather than focusing on coding-theoretic considerations and computationally efficient encoding and decoding, we aim to build an information-theoretic foundation for the analysis of these channels, developing tools for achievability and converse arguments.Suggested CitationIlan Shomorony and Reinhard Heckel (2022), "Information-Theoretic Foundations of DNA Data Storage", Foundations and Trends® in Communications and Information Theory: Vol. 19: No. 1, pp 1-106. http://dx.doi.org/10.1561/0100000117 PubDate: Thu, 24 Feb 2022 00:00:00 +010

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractOver-complete systems of vectors, or in short, frames, play the role of analog codes in many areas of communication and signal processing. To name a few, spreading sequences for code-division multiple access (CDMA), over-complete representations for multiple-description (MD) source coding, space-time codes, sensing matrices for compressed sensing (CS), and more recently, codes for unreliable distributed computation. In this survey paper we observe an informationtheoretic random-like behavior of frame subsets. Such subframes arise in setups involving erasures (communication), random user activity (multiple access), or sparsity (signal processing), in addition to channel or quantization noise. The goodness of a frame as an analog code is a function of the eigenvalues of a sub-frame, averaged over all subframes (e.g., harmonic mean of the eigenvalues relates to least-square estimation error, while geometric mean to the Shannon transform, and condition number to the restricted isometry property).Within the highly symmetric class of Equiangular Tight Frames (ETF), as well as other “near ETF” families, we show a universal behavior of the empirical eigenvalue distribution (ESD) of a randomly-selected sub-frame: (i) the ESD is asymptotically indistinguishable from Wachter’s MANOVA distribution; and (ii) it exhibits a convergence rate to this limit that is indistinguishable from that of a matrix sequence drawn from MANOVA (Jacobi) ensembles of corresponding dimensions. Some of these results follow from careful statistical analysis of empirical evidence, and some are proved analytically using random matrix theory arguments of independent interest. The goodness measures of the MANOVA limit distribution are better, in a concrete formal sense, than those of the Marchenko–Pastur distribution at the same aspect ratio, implying that deterministic analog codes are better than random (i.i.d.) analog codes. We further give evidence that the ETF (and near ETF) family is in fact superior to any other frame family in terms of its typical sub-frame goodness.Suggested CitationMarina Haikin, Matan Gavish, Dustin G. Mixon and Ram Zamir (2021), "Asymptotic Frame Theory for Analog Coding", Foundations and Trends® in Communications and Information Theory: Vol. 18: No. 4, pp 526-645. http://dx.doi.org/10.1561/0100000125 PubDate: Thu, 18 Nov 2021 00:00:00 +010

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractAs consumers are increasingly engaged in social networking and E-commerce activities, businesses grow to rely on Big Data analytics for intelligence, and traditional IT infrastructures continue to migrate to the cloud and edge, these trends cause distributed data storage demand to rise at an unprecedented speed. Erasure coding has seen itself quickly emerged as a promising technique to reduce storage cost while providing similar reliability as replicated systems, widely adopted by companies like Facebook, Microsoft and Google. However, it also brings new challenges in characterizing and optimizing the access latency when data objects are erasure coded in distributed storage. The aim of this monograph is to provide a review of recent progress (both theoretical and practical) on systems that employ erasure codes for distributed storage.In this monograph, we will first identify the key challenges and taxonomy of the research problems and then give an overview of different models and approaches that have been developed to quantify latency of erasure-coded storage. This includes recent work leveraging MDS-Reservation, Fork-Join, Probabilistic, and Delayed-Relaunch scheduling policies, as well as their applications to characterizing access latency (e.g., mean, tail, and asymptotic latency) of erasure-coded distributed storage systems. We will also extend the discussions to video streaming from erasure-coded distributed storage systems. Next, we will bridge the gap between theory and practice, and discuss lessons learned from prototype implementations. In particular, we will discuss exemplary implementations of erasure-coded storage, illuminate key design degrees of freedom and tradeoffs, and summarize remaining challenges in real-world storage systems such as in content delivery and caching. Open problems for future research are discussed at the end of each chapter.Suggested CitationVaneet Aggarwal and Tian Lan (2021), "Modeling and Optimization of Latency in Erasure-coded Storage Systems", Foundations and Trends® in Communications and Information Theory: Vol. 18: No. 3, pp 380-525. http://dx.doi.org/10.1561/0100000108 PubDate: Wed, 07 Jul 2021 00:00:00 +020

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractIn this monograph, we develop a mathematical framework based on asymptotically good random structured codes, i.e., codes possessing algebraic properties, for network information theory. We use these codes to propose new strategies for communication in multi-terminal settings. The proposed coding strategies are applicable to arbitrary instances of the multi-terminal communication problems under consideration. In particular, we consider four fundamental problems which can be considered as building blocks of networks: distributed source coding, interference channels, multiple-access channels with distributed states and multiple description source coding. We then develop a systematic framework for characterizing the performance limits of these strategies for these problems from an information-theoretic viewpoint. Lastly, we identify several examples of the multiterminal communication problems studied herein, for which structured codes attain optimality, and provide strictly better performance as compared to classical techniques based on unstructured codes. In summary, we develop an algebraic and probabilistic framework to demonstrate the fundamental role played by structured codes in multiterminal communication problems. This monograph deals exclusively with discrete source and channel coding problems.Suggested CitationS. Sandeep Pradhan, Arun Padakandla and Farhad Shirani (2020), "An Algebraic and Probabilistic Framework for Network Information Theory", Foundations and Trends® in Communications and Information Theory: Vol. 18: No. 2, pp 173-379. http://dx.doi.org/10.1561/0100000083 PubDate: Mon, 21 Dec 2020 00:00:00 +010

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractThe present monograph focuses on the detection problem in adversarial setting. When framed in an adversarial setting, classical detection theory can not be applied any more, since, in order to make a correct decision, the presence of an adversary must be taken into account when designing the detector. In particular, the interplay between the Defender (), wishing to carry out the detection task, and the Attacker (), aiming at impeding it, must be investigated. The purpose of this monograph is to lay out the foundations of a general theory of adversarial detection, taking into account the impact that the presence of the adversary has on the design of the optimal detector. We do so by casting the adversarial detection problem into a game theoretical framework, which is then studied by relying on typical methods of information theory. As a final result, the theory allows to state the conditions under which both the false positive and false negative error probabilities tend to zero exponentially fast, and to relate the error exponents of the two kinds of errors to the distortion the attacker can introduce into the test sequence.Suggested CitationMauro Barni and Benedetta Tondi (2020), "Theoretical Foundations of Adversarial Binary Detection", Foundations and Trends® in Communications and Information Theory: Vol. 18: No. 1, pp 1-172. http://dx.doi.org/10.1561/0100000102 PubDate: Sun, 20 Dec 2020 00:00:00 +010

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractThis survey provides an exposition of a suite of techniques based on the theory of polynomials, collectively referred to as polynomial methods, which have recently been applied to address several challenging problems in statistical inference successfully. Topics including polynomial approximation, polynomial interpolation and majorization, moment space and positive polynomials, orthogonal polynomials and Gaussian quadrature are discussed, with their major probabilistic and statistical applications in property estimation on large domains and learning mixture models. These techniques provide useful tools not only for the design of highly practical algorithms with provable optimality, but also for establishing the fundamental limits of the inference problems through the method of moment matching. The effectiveness of the polynomial method is demonstrated in concrete problems such as entropy and support size estimation, distinct elements problem, and learning Gaussian mixture models.Suggested CitationYihong Wu and Pengkun Yang (2020), "Polynomial Methods in Statistical Inference: Theory and Practice", Foundations and Trends® in Communications and Information Theory: Vol. 17: No. 4, pp 402-586. http://dx.doi.org/10.1561/0100000095 PubDate: Mon, 12 Oct 2020 00:00:00 +020

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractShannon’s channel coding theorem characterizes the maximalrate of information that can be reliably transmittedover a communication channel when optimal encoding anddecoding strategies are used. In many scenarios, however,practical considerations such as channel uncertainty andimplementation constraints rule out the use of an optimaldecoder. The mismatched decoding problem addresses suchscenarios by considering the case that the decoder cannotbe optimized, but is instead fixed as part of the problemstatement. This problem is not only of direct interest inits own right, but also has close connections with otherlong-standing theoretical problems in information theory.In this monograph, we survey both classical literature andrecent developments on the mismatched decoding problem,with an emphasis on achievable random-coding rates formemoryless channels. We present two widely-consideredachievable rates known as the generalized mutual information(GMI) and the LM rate, and overview their derivationsand properties. In addition, we survey several improved ratesvia multi-user coding techniques, as well as recent developmentsand challenges in establishing upper bounds on themismatch capacity, and an analogous mismatched encodingproblem in rate-distortion theory. Throughout the monograph,we highlight a variety of applications and connectionswith other prominent information theory problems.Suggested CitationJonathan Scarlett, Albert Guillén i Fàbregas, Anelia Somekh-Baruch and Alfonso Martinez (2020), "Information-Theoretic Foundations of Mismatched Decoding", Foundations and Trends® in Communications and Information Theory: Vol. 17: No. 2. http://dx.doi.org/10.1561/0100000101 PubDate: Wed, 30 Sep 2020 00:00:00 +020

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractWe introduce the concept of “coded computing”, a novelcomputing paradigm that utilizes coding theory to effectivelyinject and leverage data/computation redundancyto mitigate several fundamental bottlenecks in large-scaledistributed computing, namely communication bandwidth,straggler’s (i.e., slow or failing nodes) delay, privacy andsecurity bottlenecks. More specifically, for MapReduce baseddistributed computing structures, we propose the “CodedDistributed Computing” (CDC) scheme, which injects redundantcomputations across the network in a structuredmanner, such that in-network coding opportunities are enabledto substantially slash the communication load to shufflethe intermediate computation results. We prove thatCDC achieves the optimal tradeoff between computationand communication, and demonstrate its impact on a widerange of distributed computing systems from cloud-baseddatacenters to mobile edge/fog computing platforms.Secondly, to alleviate the straggler effect that prolongs theexecutions of distributed machine learning algorithms, weutilize the ideas from error correcting codes to develop“Polynomial Codes” for computing general matrix algebra,and “Lagrange Coded Computing” (LCC) for computingarbitrary multivariate polynomials. The core idea of theseproposed schemes is to apply coding to create redundantdata/computation scattered across the network, such thatcompleting the overall computation task only requires a subsetof the network nodes returning their local computationresults. We demonstrate the optimality of Polynomial Codesand LCC in minimizing the computation latency, by provingthat they require the least number of nodes to return theirresults.Finally, we illustrate the role of coded computing in providingsecurity and privacy in distributed computing andmachine learning. In particular, we consider the problems ofsecure multiparty computing (MPC) and privacy-preservingmachine learning, and demonstrate how coded computingcan be leveraged to provide efficient solutions to these criticalproblems and enable substantial improvements over thestate of the art.To illustrate the impact of coded computing on real worldapplications and systems, we implement the proposed codingschemes on cloud-based distributed computing systems, andsignificantly improve the run-time performance of importantbenchmarks including distributed sorting, distributed trainingof regression models, and privacy-preserving training forimage classification. Throughout this monograph, we alsohighlight numerous open problems and exciting researchdirections for future work on coded computing.Suggested CitationSongze Li and Salman Avestimehr (2020), "Coded Computing", Foundations and Trends® in Communications and Information Theory: Vol. 17: No. 1, pp 1-148. http://dx.doi.org/10.1561/0100000103 PubDate: Thu, 30 Jul 2020 00:00:00 +020

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractCaching refers to the act of replicating information at afaster (or closer) medium with the purpose of improvingperformance. This deceptively simple idea has given rise tosome of the hardest optimization problems in the fields ofcomputer systems, networking, and the Internet, many ofwhich remain unsolved several years after their conception.While a wealth of research contributions exists from the topicsof memory systems, data centers, Internet traffic, CDNs,and recently wireless networks, the literature is dispersedand overlapping at times. In this monograph, we take a Q1unifying modeling view: by focusing on the fundamental underlyingmathematical models, we re-organize the availablematerial into a powerful framework for performing optimizationof caching systems. This way, we aspire to present asolid background for the anticipated explosion in cachingresearch, but also provide a didactic view into how engineershave managed to infuse mathematical models into the studyof caching over the last 40 years.Suggested CitationGeorgios Paschos, George Iosifidis and Giuseppe Caire (2020), "Cache Optimization Models and Algorithms", Foundations and Trends® in Communications and Information Theory: Vol. 16: No. 3-4, pp 156-343. http://dx.doi.org/10.1561/0100000104 PubDate: Thu, 30 Jul 2020 00:00:00 +020

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractIn this monograph, a tutorial review of lattice-reductionaided(LRA) and integer-forcing (IF) equalization approachesinMIMO communications is given. Both methods have incommon that integer linear combinations are decoded; theremaining integer interference is resolved subsequently. Theaim is to enlighten similarities and differences of both approaches.The various criteria for selecting the integer linearcombinations available in the literature are summarized in aunified way. Thereby, we clearly distinguish between the criteriaaccording to which the non-integer equalization partis optimized and those, which are inherently considered inthe applied lattice algorithms, i.e., constraints on the integerequalization part. The demands on the signal constellationsand coding schemes are discussed in detail. We treatLRA/IF approaches for receiver-side linear equalization anddecision-feedback equalization, as well as transmitter-sidelinear preequalization and precoding.Suggested CitationRobert F. H. Fischer, Sebastian Stern and Johannes B. Huber (2019), "Lattice-Reduction-Aided andInteger-Forcing Equalization: Structures, Criteria, Factorization, and Coding", Foundations and Trends® in Communications and Information Theory: Vol. 16: No. 1-2, pp 1-155. http://dx.doi.org/10.1561/0100000100 PubDate: Mon, 09 Dec 2019 00:00:00 +010

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractThe group testing problem concerns discovering a smallnumber of defective items within a large population byperforming tests on pools of items. A test is positive ifthe pool contains at least one defective, and negative if itcontains no defectives. This is a sparse inference problemwith a combinatorial flavour, with applications in medicaltesting, biology, telecommunications, information technology,data science, and more.In this monograph, we survey recent developments in thegroup testing problem from an information-theoretic perspective.We cover several related developments: efficientalgorithms with practical storage and computation requirements,achievability bounds for optimal decoding methods,and algorithm-independent converse bounds. We assess thetheoretical guarantees not only in terms of scaling laws,but also in terms of the constant factors, leading to thenotion of the rate of group testing, indicating the amountof information learned per test. Considering both noiselessand noisy settings, we identify several regimes where existingalgorithms are provably optimal or near-optimal, aswell as regimes where there remains greater potential forimprovement.In addition, we survey results concerning a number of variationson the standard group testing problem, includingpartial recovery criteria, adaptive algorithms with a limitednumber of stages, constrained test designs, and sublineartimealgorithms.Suggested CitationMatthew Aldridge, Oliver Johnson and Jonathan Scarlett (2019), "Group Testing: An Information Theory Perspective", Foundations and Trends® in Communications and Information Theory: Vol. 15: No. 3-4, pp 196-392. http://dx.doi.org/10.1561/0100000099 PubDate: Thu, 05 Dec 2019 00:00:00 +010

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: AbstractDeveloping computationally-efficient codes that approach the Shannon-theoretic limits for communication and compression has long been one of the major goals of information and coding theory. There have been significant advances towards this goal in the last couple of decades, with the emergence of turbo codes, sparsegraph codes, and polar codes. These codes are designed primarily for discrete-alphabet channels and sources. For Gaussian channels and sources, where the alphabet is inherently continuous, Sparse Superposition Codes or Sparse Regression Codes (SPARCs) are a promising class of codes for achieving the Shannon limits. This monograph provides a unified and comprehensive over-view of sparse regression codes, covering theory, algorithms, and practical implementation aspects. The first part of the monograph focuses on SPARCs for AWGN channel coding, and the second part on SPARCs for lossy compression (with squared error distortion criterion). In the third part, SPARCs are used to construct codes for Gaussian multi-terminal channel and source coding models such as broadcast channels, multiple-access channels, and source and channel coding with side information. The monograph concludes with a discussion of open problems and directions for future work.Suggested CitationRamji Venkataramanan, Sekhar Tatikonda and Andrew Barron (2019), "Sparse Regression Codes", Foundations and Trends® in Communications and Information Theory: Vol. 15: No. 1-2, pp 1-195. http://dx.doi.org/10.1561/0100000092 PubDate: Thu, 20 Jun 2019 00:00:00 +020