Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The main idea of this paper is to present families of bivariate distributions that depend in their formation on adding a shape parameter to the powers of the hazard and reversed hazard functions in different manners, which would provide additional flexibility in applications. Different baseline distributions were used namely, exponential, inverse exponential, uniform, inverse uniform, inverse Rayleigh, Gompertz and Pareto. Many of the mathematical properties of these families are discussed in detail. Moreover, it is observed that the new bivariate distributions also can make appropriate modeling of three real data sets. PubDate: 2022-06-30
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We propose a cautious Bayesian variable selection routine by investigating the sensitivity of a hierarchical model, where the regression coefficients are specified by spike and slab priors. We exploit the use of latent variables to understand the importance of the co-variates. These latent variables also allow us to obtain the size of the model space which is an important aspect of high dimensional problems. In our approach, instead of fixing a single prior, we adopt a specific type of robust Bayesian analysis, where we consider a set of priors within the same parametric family to specify the selection probabilities of these latent variables. We achieve that by considering a set of expected prior selection probabilities, which allows us to perform a sensitivity analysis to understand the effect of prior elicitation on the variable selection. The sensitivity analysis provides us sets of posteriors for the regression coefficients as well as the selection indicators and we show that the posterior odds of the model selection probabilities are monotone with respect to the prior expectations of the selection probabilities. We also analyse synthetic and real life datasets to illustrate our cautious variable selection method and compare it with other well known methods. PubDate: 2022-06-16
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Sequences of networks are currently a common form of network data sets. Identification of structural change-points in a network data sequence is a natural problem. The problem of change-point detection can be classified into two main types - offline change-point detection and online or sequential change-point detection. In this paper, we propose three different algorithms for online change-point detection based on certain cusum statistics for network data with community structures. For two of the proposed algorithms, we use information theoretic measures to construct the statistic for the estimation of a change-point. In the third algorithm, we use eigenvalues of the Bethe Hessian matrix to construct the statistic for the estimation of a change-point. We show the consistency property of the estimated change-point theoretically under networks generated from the multi-layer stochastic block model and the multi-layer degree-corrected block model. We also conduct an extensive simulation study to demonstrate the key properties of the algorithms as well as their efficacy. PubDate: 2022-06-01 DOI: 10.1007/s13171-021-00248-1
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract A network may have weak signals and severe degree heterogeneity, and may be very sparse in one occurrence but very dense in another. SCORE (Ann. Statist. 43, 57–89, 2015) is a recent approach to network community detection. It accommodates severe degree heterogeneity and is adaptive to different levels of sparsity, but its performance for networks with weak signals is unclear. In this paper, we show that in a broad class of network settings where we allow for weak signals, severe degree heterogeneity, and a wide range of network sparsity, SCORE achieves prefect clustering and has the so-called “exponential rate” in Hamming clustering errors. The proof uses the most recent advancement on entry-wise bounds for the leading eigenvectors of the network adjacency matrix. The theoretical analysis assures us that SCORE continues to work well in the weak signal settings, but it does not rule out the possibility that SCORE may be further improved to have better performance in real applications, especially for networks with weak signals. As a second contribution of the paper, we propose SCORE+ as an improved version of SCORE. We investigate SCORE+ with 8 network data sets and found that it outperforms several representative approaches. In particular, for the 6 data sets with relatively strong signals, SCORE+ has similar performance as that of SCORE, but for the 2 data sets (Simmons, Caltech) with possibly weak signals, SCORE+ has much lower error rates. SCORE+ proposes several changes to SCORE. We carefully explain the rationale underlying each of these changes, using a mixture of theoretical and numerical study. PubDate: 2022-06-01 DOI: 10.1007/s13171-020-00240-1
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Network data often exhibit block structures characterized by clusters of nodes with similar patterns of edge formation. When such relational data are complemented by additional information on exogenous node partitions, these sources of knowledge are typically included in the model to supervise the cluster assignment mechanism or to improve inference on edge probabilities. Although these solutions are routinely implemented, there is a lack of formal approaches to test if a given external node partition is in line with the endogenous clustering structure encoding stochastic equivalence patterns among the nodes in the network. To fill this gap, we develop a formal Bayesian testing procedure which relies on the calculation of the Bayes factor between a stochastic block model with known grouping structure defined by the exogenous node partition and an infinite relational model that allows the endogenous clustering configurations to be unknown, random and fully revealed by the block–connectivity patterns in the network. A simple Markov chain Monte Carlo method for computing the Bayes factor and quantifying uncertainty in the endogenous groups is proposed. This strategy is evaluated in simulations, and in applications studying brain networks of Alzheimer’s patients. PubDate: 2022-06-01 DOI: 10.1007/s13171-020-00231-2
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Infectious or contagious diseases can be transmitted from one person to another through social contact networks. In today’s interconnected global society, such contagion processes can cause global public health hazards, as exemplified by the ongoing Covid-19 pandemic. It is therefore of great practical relevance to investigate the network transmission of contagious diseases from the perspective of statistical inference. An important and widely studied boundary condition for contagion processes over networks is the so-called epidemic threshold. The epidemic threshold plays a key role in determining whether a pathogen introduced into a social contact network will cause an epidemic or die out. In this paper, we investigate epidemic thresholds from the perspective of statistical network inference. We identify two major challenges that are caused by high computational and sampling complexity of the epidemic threshold. We develop two statistically accurate and computationally efficient approximation techniques to address these issues under the Chung-Lu modeling framework. The second approximation, which is based on random walk sampling, further enjoys the advantage of requiring data on a vanishingly small fraction of nodes. We establish theoretical guarantees for both methods and demonstrate their empirical superiority. PubDate: 2022-06-01 DOI: 10.1007/s13171-021-00249-0
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We describe a novel method for modeling non-stationary multivariate time series, with time-varying conditional dependencies represented through dynamic networks. Our proposed approach combines traditional multi-scale modeling and network based neighborhood selection, aiming at capturing temporally local structure in the data while maintaining sparsity of the potential interactions. Our multi-scale framework is based on recursive dyadic partitioning, which recursively partitions the temporal axis into finer intervals and allows us to detect local network structural changes at varying temporal resolutions. The dynamic neighborhood selection is achieved through penalized likelihood estimation, where the penalty seeks to limit the number of neighbors used to model the data. We present theoretical and numerical results describing the performance of our method, which is motivated and illustrated using task-based magnetoencephalography (MEG) data in neuroscience. PubDate: 2022-06-01 DOI: 10.1007/s13171-021-00256-1
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract There exist various types of network block models such as the Stochastic Block Model (SBM), the Degree Corrected Block Model (DCBM), and the Popularity Adjusted Block Model (PABM). While this leads to a variety of choices, the block models do not have a nested structure. In addition, there is a substantial jump in the number of parameters from the DCBM to the PABM. The objective of this paper is formulation of a hierarchy of block model which does not rely on arbitrary identifiability conditions. We propose a Nested Block Model (NBM) that treats the SBM, the DCBM and the PABM as its particular cases with specific parameter values, and, in addition, allows a multitude of versions that are more complicated than DCBM but have fewer unknown parameters than the PABM. The latter allows one to carry out clustering and estimation without preliminary testing, to see which block model is really true. PubDate: 2022-06-01 DOI: 10.1007/s13171-021-00247-2
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Multi-layer networks of multiplex type represent relational data on a set of entities (nodes) with multiple types of relations (edges) among them where each type of relation is represented as a network layer. A large group of popular community detection methods in networks are based on optimizing a quality function known as the modularity score, which is a measure of the extent of presence of module or community structure in networks compared to a suitable null model. Here we introduce several multi-layer network modularity and model likelihood quality function measures using different null models of the multi-layer network, motivated by empirical observations in networks from a diverse field of applications. In particular, we define multi-layer variants of the Chung-Lu expected degree model as null models that differ in their modeling of the multi-layer degrees. We propose simple estimators for the models and prove their consistency properties. A hypothesis testing procedure is also proposed for selecting an appropriate null model for data. These null models are used to define modularity measures as well as model likelihood based quality functions. The proposed measures are then optimized to detect the optimal community assignment of nodes (Code available at: https://u.osu.edu/subhadeep/codes/). We compare the effectiveness of the measures in community detection in simulated networks and then apply them to four real multi-layer networks. PubDate: 2022-06-01 DOI: 10.1007/s13171-021-00257-0
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We consider the problem of estimating overlapping community memberships in a network, where each node can belong to multiple communities. More than a few communities per node are difficult to both estimate and interpret, so we focus on sparse node membership vectors. Our algorithm is based on sparse principal subspace estimation with iterative thresholding. The method is computationally efficient, with computational cost equivalent to estimating the leading eigenvectors of the adjacency matrix, and does not require an additional clustering step, unlike spectral clustering methods. We show that a fixed point of the algorithm corresponds to correct node memberships under a version of the stochastic block model. The methods are evaluated empirically on simulated and real-world networks, showing good statistical performance and computational efficiency. PubDate: 2022-06-01 DOI: 10.1007/s13171-021-00245-4
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Differences between biological networks corresponding to disease conditions can help delineate the underlying disease mechanisms. Existing methods for differential network analysis do not account for dependence of networks on covariates. As a result, these approaches may detect spurious differential connections induced by the effect of the covariates on both the disease condition and the network. To address this issue, we propose a general covariate-adjusted test for differential network analysis. Our method assesses differential network connectivity by testing the null hypothesis that the network is the same for individuals who have identical covariates and only differ in disease status. We show empirically in a simulation study that the covariate-adjusted test exhibits improved type-I error control compared with naïve hypothesis testing procedures that do not account for covariates. We additionally show that there are settings in which our proposed methodology provides improved power to detect differential connections. We illustrate our method by applying it to detect differences in breast cancer gene co-expression networks by subtype. PubDate: 2022-06-01 DOI: 10.1007/s13171-021-00252-5
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this paper, we are basically interested in the issue of estimation parameters for continuous q-distributions. The parameters estimation and simulation studies of three classical continuous Lindley, gamma and exponential q-distributions are elaborated. For the parameters estimation problem the moment method is used. The effectiveness of the proposed models are highlighted through simulation studies for different q parameters values and samples sizes of the Lindley, gamma and exponential q-distributions. PubDate: 2022-05-11
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The negative binomial distribution often fits many real datasets, for example, RNA sequence data, adequately. Furthermore, in the presence of many zeros in the data, it is customary to fit a zero inflated negative binomial distribution. In this note, we study the effect of assuming the Jeffreys prior on the parameters of these two distributions. Under this, we derive the closed form expression of the Bayes factor of the zero inflated negative binomial against negative binomial distribution. We demonstrate the effectiveness of our findings through simulations and real data analyses. PubDate: 2022-05-10
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this article, we consider the problem of inverting the exponential Radon transform of a function in the presence of noise. We propose a kernel estimator to estimate the true function. Such an estimator is closely related to filtered backprojection type inversion formulas in the noise-less setting. For the estimator proposed in this article, we then show that the convergence to the true function is at a minimax optimal rate. PubDate: 2022-05-06
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In a two-stage cluster sampling setup for binary data, a sample of clusters such as hospitals is chosen at the first stage from a large number of clusters belonging to a finite population, and in the second stage a random sample of individuals such as nurses is chosen from the selected cluster and the binary responses along with covariates are collected from the selected individuals. Because the hypothetical binary responses from the individuals in a given cluster/hospital under the first stage sample are correlated (as they share a common cluster effect), this correlation plays a complex role in developing the second stage sample based estimating equations for the underlying regression parameters. Moreover, the correlation parameters have to be consistently estimated too. In this paper, unlike the existing studies, we demonstrate how to accommodate (1) the so-called inverse correlation weights arising from a finite population based generalized quasi-likelihood (GQL) estimating function, on top of (2) the sampling weights, to develop a survey sample based doubly weighted (SSDW) estimation approach, for consistent estimation of both regression and correlation parameters. For simplicity, we refer to this GQL cum SSDW approach as the SSDW approach only. The method of moments (MM) cum SSDW approach will be simpler but less efficient, which is not included in the paper. The estimating function involved in the proposed SSDW estimating equation has the form of a sample total, which unbiasedly estimate the corresponding finite population total that arises from the aforementioned generalized quasi-likelihood function for the targeted finite population parameter. The resulting SSDW estimators, thus, become consistent for the respective parameters. This consistency property for the SSDW estimator for both regression and cluster correlation parameters is studied in details. PubDate: 2022-05-02
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The multivariate skew elliptic distributions include the multivariate skew-t distribution, which is represented as a mean- and scale-mixture distribution and is useful for analyzing skewed data with heavy tails. In the estimation of location parameters in the multivariate skew elliptic distributions, we derive minimax shrinkage estimators improving on the minimum risk location equivariant estimator relative to the quadratic loss function. Especially in the skew-t distribution, we suggest specific improved estimators where the conditions for their minimaxity do not depend on the degrees of freedom. We also study the case of a general elliptically symmetrical distribution when the covariance matrix is known up to an unknown multiple, but a residual vector is available to estimate the scale. PubDate: 2022-04-27
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this paper, maximum likelihood estimation for the parameters in a single server queues are investigated. The queues are observed over a continuous time interval (0, T], where T is determined by a suitable stopping rule. The existence of the maximum likelihood estimator is proved by applying Rolle’s theorem. Also, we have obtained the limiting distribution of the error of estimation. PubDate: 2022-04-27
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Local Structure Graph Models (LSGMs) describe network data by modeling, and thereby controlling, the local structure of networks in a direct and interpretable manner. Specification of such models requires identifying three factors: a saturated, or maximally possible, graph; a neighborhood structure of dependent potential edges; and, lastly, a model form prescribed by full conditional binary distributions with appropriate “centering” steps and dependence parameters. This last aspect particularly distinguishes LSGMs from other model formulations for network data. In this article, we explore the expanded LSGM structure to incorporate dependencies among edges that form potential triangles, thus explicitly representing transitivity in the conditional probabilities that govern edge realization. Two networks previously examined in the literature, the Faux Mesa High friendship network and the 2000 college football network, are analyzed with such models, with a focus on assessing the manner in which terms reflecting two-way and three-way dependencies among potential edges influence the data structures generated by models that incorporate them. One conclusion reached is that explicit modeling of three-way dependencies is not always needed to reflect the observed level of transitivity in an actual graph. Another conclusion is that understanding the manner in which a model represents a given problem is enhanced by examining several aspects of model structure, not just the number of some particular topological structure generated by a fitted model. PubDate: 2021-11-27 DOI: 10.1007/s13171-021-00264-1
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We derive the limiting distribution for the outlier eigenvalues of the adjacency matrix for random graphs with independent edges whose edge probability matrices have low-rank structure. We show that when the number of vertices tends to infinity, the leading eigenvalues in magnitude are jointly multivariate Gaussian with bounded covariances. As a special case, this implies a limiting normal distribution for the outlier eigenvalues of stochastic blockmodel graphs and their degree-corrected or mixed-membership variants. Our result extends the classical result of Füredi and Komlós on the fluctuation of the largest eigenvalue for Erdős–Rényi graphs. PubDate: 2021-11-03 DOI: 10.1007/s13171-021-00268-x
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this paper, we consider an expanding sparse dynamic network model where the time evolution is governed by a Markovian structure. Transition of the network from time t to t + 1 involves three components where a new node joins the existing network, some of the existing edges drop out, and new edges are formed with the incoming node. We consider long term behavior of the network density and establish its limit. We also study asymptotic distributions of the maximum likelihood estimators of key model parameters. We report results from a simulation study to investigate finite sample properties of the estimators. PubDate: 2021-09-27 DOI: 10.1007/s13171-021-00258-z