Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task, the flagship problem in Geostatistics, from a statistical learning perspective, i.e., by carrying out a nonparametric finite-sample predictive analysis. Given \(d\ge 1\) values taken by a realization of a square integrable random field \(X=\{X_s\}_{s\in S}\) , \(S\subset {\mathbb {R}}^2\) , with unknown covariance structure, at sites \(s_1,\; \ldots ,\; s_d\) in S, the goal is to predict the unknown values it takes at any other location \(s\in S\) with minimum quadratic risk. The prediction rule being derived from a training spatial dataset: a single realization \(X'\) of X, is independent from those to be predicted, observed at \(n\ge 1\) locations \(\sigma _1,\; \ldots ,\; \sigma _n\) in S. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non-independent and identically distributed nature of the training data \(X'_{\sigma _1},\; \ldots ,\; X'_{\sigma _n}\) involved in the learning procedure. In this article, non-asymptotic bounds of order \(O_{{\mathbb {P}}}(1/\sqrt{n})\) are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes, observed at locations forming a regular grid in the learning stage. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments, on simulated data and on real-world datasets, and hopefully pave the way for further developments in statistical learning based on spatial data. PubDate: 2023-11-21
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We address the problem of testing for the invariance of a probability measure under the action of a group of linear transformations. We propose a procedure based on consideration of one-dimensional projections, justified using a variant of the Cramér–Wold theorem. Our test procedure is powerful, computationally efficient, and dimension-independent, extending even to the case of infinite-dimensional spaces (multivariate functional data). It includes, as special cases, tests for exchangeability and sign-invariant exchangeability. We compare our procedure with some previous proposals in these cases, in a small simulation study. The paper concludes with two real-data examples. PubDate: 2023-11-18
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Quantile regression continues to increase in usage, providing a useful alternative to customary mean regression. Primary implementation takes the form of so-called multiple quantile regression, creating a separate regression for each quantile of interest. However, recently, advances have been made in joint quantile regression, supplying a quantile function which avoids crossing of the regression across quantiles. Here, we turn to quantile autoregression (QAR), offering a fully Bayesian version. We extend the initial quantile regression work of Koenker and Xiao (J Am Stat Assoc 101(475):980–990, 2006. https://doi.org/10.1198/016214506000000672) in the spirit of Tokdar and Kadane (Bayesian Anal 7(1):51–72, 2012. https://doi.org/10.1214/12-BA702). We offer a directly interpretable parametric model specification for QAR. Further, we offer a pth-order QAR(p) version, a multivariate QAR(1) version, and a spatial QAR(1) version. We illustrate with simulation as well as a temperature dataset collected in Aragón, Spain. PubDate: 2023-11-12
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We consider the estimation of a one-dimensional parameter in a linear model with an ultra-high number of independent variables. We argue that the standard assumptions on the design matrix are essentially technical and can be relaxed. Conversely, the assumptions on the sparsity of the nuisance parameters are unverifiable, too strong, and unavoidable. PubDate: 2023-11-07
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We propose a residual-marked empirical process test to check goodness of fit for generalized partially linear models. The proposed test can gain dimension reduction, is shown to be consistent, and can detect root-n local alternatives. We further establish asymptotic distributions of the proposed test under the null hypothesis and analyze asymptotic properties under the local and global alternatives, and suggest a bootstrap procedure for calculating the critical value. We investigate its numerical performance by simulation experiments and illustrate its utilization in two real data examples. PubDate: 2023-11-06
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The class of \(\alpha \) -stable distributions is widely used in various applications, especially for modeling heavy-tailed data. Although the \(\alpha \) -stable distributions have been used in practice for many years, new methods for identification, testing, and estimation are still being refined and new approaches are being proposed. The constant development of new statistical methods is related to the low efficiency of existing algorithms, especially when the underlying sample is small or the distribution is close to Gaussian. In this paper, we propose a new estimation algorithm for the stability index, for samples from the symmetric \(\alpha \) -stable distribution. The proposed approach is based on a quantile conditional variance ratio. We study the statistical properties of the proposed estimation procedure and show empirically that our methodology often outperforms other commonly used estimation algorithms. Moreover, we show that our statistic extracts unique sample characteristics that can be combined with other methods to refine existing methodologies via ensemble methods. Although our focus is set on the symmetric \(\alpha \) -stable case, we demonstrate that the considered statistic is insensitive to the skewness parameter change, so our method could be also used in a more generic framework. For completeness, we also show how to apply our method to real data linked to financial market and plasma physics. PubDate: 2023-10-30
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We consider Bayesian analysis for testing the general linear hypotheses in linear models with spherically symmetric errors. These error distributions not only include some of the classical linear models as special cases, but also reduce the influence of outliers and result in a robust statistical inference. Meanwhile, the design matrix is not necessarily of full rank. By appropriately modifying mixtures of g-priors for the regression coefficients under some general linear constraints, we derive closed-form Bayes factors in terms of the ratio between two Gaussian hypergeometric functions. The proposed Bayes factors rely on the data only through the modified coefficient of determinations of the two models and are shown to be independent of the error distributions, so long as they are spherically symmetric. Moreover, we establish the results of the model selection consistency with the proposed Bayes factors in the model settings with a full-rank design matrix when the number of parameters increases with the sample size. We carry out simulation studies to assess the finite sample performance of the proposed methodology. The presented results extend some existing Bayesian testing procedures in the literature. PubDate: 2023-10-30
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this work we introduce the class of Unit-Weibull Autoregressive Moving Average models for continuous random variables taking values in (0, 1). The proposed model is an observation driven one, for which, conditionally on a set of covariates and the process’ history, the random component is assumed to follow a Unit-Weibull distribution parameterized through its \(\rho \) th quantile. The systematic component prescribes an ARMA-like structure to model the conditional \(\rho \) th quantile by means of a link. Parameter estimation in the proposed model is performed using partial maximum likelihood, for which we provide closed formulas for the score vector and partial information matrix. We also discuss some inferential tools, such as the construction of confidence intervals, hypotheses testing, model selection, and forecasting. A Monte Carlo simulation study is conducted to assess the finite sample performance of the proposed partial maximum likelihood approach. Finally, we examine the prediction power by contrasting our method with others in the literature using the Manufacturing Capacity Utilization from the US. PubDate: 2023-10-24
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Several measures of non-convexity (departures from convexity) have been introduced in the literature, both for sets and functions. Some of them are of geometric nature, while others are more of topological nature. We address the statistical analysis of some of these measures of non-convexity of a set S, by dealing with their estimation based on a sample of points in S. We introduce also a new measure of non-convexity. We discuss briefly about these different notions of non-convexity, prove consistency and find the asymptotic distribution for the proposed estimators. We also consider the practical implementation of these estimators and illustrate their applicability to a real data example. PubDate: 2023-10-12
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We propose an estimator of the conditional tail moment (CTM) when the data are subject to random censorship. The variable of main interest and the censoring variable both follow a Pareto-type distribution. We establish the asymptotic properties of our estimator and discuss bias-reduction. Then, the CTM is used to estimate, in case of censorship, the premium principle for excess-of-loss reinsurance. The finite sample properties of the proposed estimators are investigated with a simulation study and we illustrate their practical applicability on a dataset of motor third party liability insurance. PubDate: 2023-10-09
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper studies the problem of simultaneously testing that each of k samples, coming from k count variables, were all generated by Poisson laws. The means of those populations may differ. The proposed procedure is designed for large k, which can be bigger than the sample sizes. First, a test is proposed for the case of independent samples, and then the obtained results are extended to dependent data. In each case, the asymptotic distribution of the test statistic is stated under the null hypothesis as well as under alternatives, which allows to study the consistency of the test. Specifically, it is shown that the test statistic is asymptotically free distributed under the null hypothesis. The finite sample performance of the test is studied via simulation. A real data set application is included. PubDate: 2023-09-29
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The ongoing replication crisis in science has increased interest in the methodology of replication studies. We propose a novel Bayesian analysis approach using power priors: The likelihood of the original study’s data is raised to the power of \(\alpha \) , and then used as the prior distribution in the analysis of the replication data. Posterior distribution and Bayes factor hypothesis tests related to the power parameter \(\alpha \) quantify the degree of compatibility between the original and replication study. Inferences for other parameters, such as effect sizes, dynamically borrow information from the original study. The degree of borrowing depends on the conflict between the two studies. The practical value of the approach is illustrated on data from three replication studies, and the connection to hierarchical modeling approaches explored. We generalize the known connection between normal power priors and normal hierarchical models for fixed parameters and show that normal power prior inferences with a beta prior on the power parameter \(\alpha \) align with normal hierarchical model inferences using a generalized beta prior on the relative heterogeneity variance \(I^2\) . The connection illustrates that power prior modeling is unnatural from the perspective of hierarchical modeling since it corresponds to specifying priors on a relative rather than an absolute heterogeneity scale. PubDate: 2023-09-21
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper is motivated by the growing interest in estimating gender wage differences in official statistics. The wage of an employee is hypothetically a reflection of her or his characteristics, such as education level or work experience. It is possible that men and women with the same characteristics earn different wages. Our goal is to estimate the differences between wages at different quantiles, using sample survey data within a superpopulation framework. To do this, we use a parametric approach based on conditional distributions of the wages in function of some auxiliary information, as well as a counterfactual distribution. We show in our simulation studies that the use of auxiliary information well correlated with the wages reduces the variance of the counterfactual quantile estimates compared to those of the competitors. Since, in general, wage distributions are heavy-tailed, the interest is to model wages by using heavy-tailed distributions like the GB2 distribution. We illustrate the approach using this distribution and the wages for men and women using simulated and real data from the Swiss Federal Statistical Office. PubDate: 2023-09-19
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Summary statistics play an important role in network data analysis. They can provide us with meaningful insight into the structure of a network. The Randić index is one of the most popular network statistics that has been widely used for quantifying information of biological networks, chemical networks, pharmacologic networks, etc. A topic of current interest is to find bounds or limits of the Randić index and its variants. A number of bounds of the indices are available in literature. Recently, there are several attempts to study the limits of the indices in the Erdős–Rényi random graph by simulation. In this paper, we shall derive the limits of the Randić index and its variants of an inhomogeneous Erdős–Rényi random graph. Our results charaterize how network heterogeneity affects the indices and provide new insights about the Randić index and its variants. Finally we apply the indices to several real-world networks. PubDate: 2023-09-15
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We address the multiple testing problem under the assumption that the true/false hypotheses are driven by a hidden Markov model (HMM), which is recognized as a fundamental setting to model multiple testing under dependence since the seminal work of Sun and Cai (J R Stat Soc Ser B (Stat Methodol) 71:393–424, 2009). While previous work has concentrated on deriving specific procedures with a controlled false discovery rate under this model, following a recent trend in selective inference, we consider the problem of establishing confidence bounds on the false discovery proportion, for a user-selected set of hypotheses that can depend on the observed data in an arbitrary way. We develop a methodology to construct such confidence bounds first when the HMM model is known, then when its parameters are unknown and estimated, including the data distribution under the null and the alternative, using a nonparametric approach. In the latter case, we propose a bootstrap-based methodology to take into account the effect of parameter estimation error. We show that taking advantage of the assumed HMM structure allows for a substantial improvement of confidence bound sharpness over existing agnostic (structure-free) methods, as witnessed both via numerical experiments and real data examples. PubDate: 2023-09-14
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We develop a new method for variable selection in a function-on-scalar single-index model. The proposed method goes beyond existing additive function-on-scalar regression framework and models dynamic effects of multiple scalar covariates via a varying coefficient single-index model. The unknown bivariate link function is modeled with splines. A computationally efficient alternating direction method of multipliers-based algorithm is used for simultaneous selection of the influential covariates and estimation of the single-index coefficients and the link function. The proposed method provides a flexible framework for variable selection in function-on-scalar regression, particularly in the presence of nonlinear and interaction effects. Numerical analysis using simulations illustrates satisfactory finite sample performance of the proposed method in terms of selection and estimation accuracy. An application is demonstrated on the CD4+ cell counts data. Software implementation of the proposed method is provided in R. PubDate: 2023-09-13
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We study properties of two resampling scenarios: Conditional Randomisation and Conditional Permutation schemes, which are relevant for testing conditional independence of discrete random variables X and Y given a random variable Z. Namely, we investigate asymptotic behaviour of estimates of a vector of probabilities in such settings, establish their asymptotic normality and ordering between asymptotic covariance matrices. The results are used to derive asymptotic distributions of the empirical Conditional Mutual Information in those set-ups. Somewhat unexpectedly, the distributions coincide for the two scenarios, despite differences in the asymptotic distributions of the estimates of probabilities. We also prove validity of permutation p-values for the Conditional Permutation scheme. The above results justify consideration of conditional independence tests based on resampled p-values and on the asymptotic chi-square distribution with an adjusted number of degrees of freedom. We show in numerical experiments that when the ratio of the sample size to the number of possible values of the triple exceeds 0.5, the test based on the asymptotic distribution with the adjustment made on a limited number of permutations is a viable alternative to the exact test for both the Conditional Permutation and the Conditional Randomisation scenarios. Moreover, there is no significant difference between the performance of exact tests for Conditional Permutation and Randomisation schemes, the latter requiring knowledge of conditional distribution of X given Z, and the same conclusion is true for both adaptive tests. PubDate: 2023-09-12
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We propose assessing the causal effects of a dynamic treatment in a longitudinal observational study, given observed confounders under suitable assumptions. The causal hidden Markov model is based on potential versions of discrete latent variables, and it accounts for the estimated propensity to be assigned to each treatment level over time using inverse probability weighting. Estimation of the model parameters is carried out through a weighted maximum log-likelihood approach. Standard errors for the parameter estimates are provided by nonparametric bootstrap. The proposal is validated through a simulation study aimed at comparing different model specifications. As an illustrative example, we consider a marketing campaign conducted by a large European bank over time on its customers. Findings provide straightforward managerial implications. PubDate: 2023-09-07 DOI: 10.1007/s11749-023-00877-8