Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The paper explores a testing problem which involves four hypotheses, that is, based on observations of two random variables X and Y, we wish to discriminate between four possibilities: identical survival functions, stochastic dominance of X over Y, stochastic dominance of Y over X, or crossing survival functions. Four-decision testing procedures for repeated measurements data are proposed. The tests are based on a permutation approach and do not rely on distributional assumptions. One-sided versions of the Cramér–von Mises, Anderson–Darling, and Kolmogorov–Smirnov statistics are utilized. The consistency of the tests is proven. A simulation study shows good power properties and control of false-detection errors. The suggested tests are applied to data from a psychophysical experiment. PubDate: 2022-05-11
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: A myriad of physical, biological and other phenomena are better modeled with semi-infinite distribution families, in which case not knowing the population minimum becomes a hassle when performing parametric inference. Ad hoc methods to deal with this problem exist, but are suboptimal and sometimes unfeasible. Besides, having the statistician handcraft solutions in a case-by-case basis is counterproductive. In this paper, we propose a framework under which the issue can be analyzed, and perform an extensive search in the literature for methods that could be used to solve the aforementioned problem; we also propose a method of our own. Simulation experiments were then performed to compare some methods from the literature and our proposal. We found that the straightforward method, which is to infer the population minimum by maximum likelihood, has severe difficulty in giving a good estimate for the population minimum, but manages to achieve very good inferred models. The other methods, including our proposal, involve estimating the population minimum, and we found that our method is superior to the other methods of this kind, considering the distributions simulated, followed very closely by the endpoint estimator by Alves et al. (Stat Sin 24(4):1811–1835, 2014). Although these two give much more accurate estimates for the population minimum, the straightforward method also displays some advantages, so choosing between these three methods will depend on the problem domain. PubDate: 2022-05-05
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this paper, we propose the use of advanced and flexible statistical models to describe the spatial displacement of earthquake data. The paper aims to account for the external geological information in the description of complex seismic point processes, through the estimation of models with space varying parameters. A local version of the Log-Gaussian Cox processes (LGCP) is introduced and applied for the first time, exploiting the inferential tools in Baddeley (Spat Stat 22:261–295, 2017), estimating the model by the local Palm likelihood. We provide methods and approaches accounting for the interaction among points, typically described by LGCP models through the estimation of the covariance parameters of the Gaussian Random Field, that in this local version are allowed to vary in space, providing a more realistic description of the clustering feature of seismic events. Furthermore, we contribute to the framework of diagnostics, outlining suitable methods for the local context and proposing a new step-wise approach addressing the particular case of multiple covariates. Overall, we show that local models provide good inferential results and could serve as the basis for future spatio-temporal local model developments, peculiar for the description of the complex seismic phenomenon. PubDate: 2022-04-25
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper deals with the estimation of kurtosis on large datasets. It aims at overcoming two frequent limitations in applications: first, Pearson's standardized fourth moment is computed as a unique measure of kurtosis; second, the fact that data might be just samples is neglected, so that the opportunity of using suitable inferential tools, like standard errors and confidence intervals, is discarded. In the paper, some recent indexes of kurtosis are reviewed as alternatives to Pearson’s standardized fourth moment. The asymptotic distribution of their natural estimators is derived, and it is used as a tool to evaluate efficiency and to build confidence intervals. A simulation study is also conducted to provide practical indications about the choice of a suitable index. As a conclusion, researchers are warned against the use of classical Pearson’s index when the sample size is too low and/or the distribution is skewed and/or heavy-tailed. Specifically, the occurrence of heavy tails can deprive Pearson’s index of any meaning or produce unreliable confidence intervals. However, such limitations can be overcome by reverting to the reviewed alternative indexes, relying just on low-order moments. PubDate: 2022-04-14
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract External preference mapping is widely used in marketing and R&D divisions to understand the consumer behaviour. The most common preference map is obtained through a two-step procedure that combines principal component analysis and least squares regression. The standard approach exploits classical regression and therefore focuses on the conditional mean. This paper proposes the use of quantile regression to enrich the preference map looking at the whole distribution of the consumer preference. The enriched maps highlight possible different consumer behaviour with respect to the less or most preferred products. This is pursued by exploring the variability of liking along the principal components as well as focusing on the direction of preference. The use of different aesthetics (colours, shapes, size, arrows) equips standard preference map with additional information and does not force the user to change the standard tool she/he is used to. The proposed methodology is shown in action on a case study pertaining yogurt preferences. PubDate: 2022-04-12
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this work, we propose a novel group selection method called Group Square-Root Elastic Net. It is based on square-root regularization with a group elastic net penalty, i.e., a \(\ell _{2,1}+\ell _2\) penalty. As a type of square-root-based procedure, one distinct feature is that the estimator is independent of the unknown noise level \(\sigma \) , which is non-trivial to estimate under the high-dimensional setting, especially when \(p\gg n\) . In many applications, the estimator is expected to be sparse, not in an irregular way, but rather in a structured manner. It makes the proposed method very attractive to tackle both high-dimensionality and structured sparsity. We study the correct subset recovery under a Group Elastic Net Irrepresentable Condition. Both the slow rate bounds and fast rate bounds are established, the latter under the Restricted Eigenvalue assumption and Gaussian noise assumption. To implement, a fast algorithm based on the scaled multivariate thresholding-based iterative selection idea is introduced with proved convergence. A comparative study examines the superiority of our approach against alternatives. PubDate: 2022-04-08
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract A pandemic poses particular challenges to decision-making because of the need to continuously adapt decisions to rapidly changing evidence and available data. For example, which countermeasures are appropriate at a particular stage of the pandemic' How can the severity of the pandemic be measured' What is the effect of vaccination in the population and which groups should be vaccinated first' The process of decision-making starts with data collection and modeling and continues to the dissemination of results and the subsequent decisions taken. The goal of this paper is to give an overview of this process and to provide recommendations for the different steps from a statistical perspective. In particular, we discuss a range of modeling techniques including mathematical, statistical and decision-analytic models along with their applications in the COVID-19 context. With this overview, we aim to foster the understanding of the goals of these modeling approaches and the specific data requirements that are essential for the interpretation of results and for successful interdisciplinary collaborations. A special focus is on the role played by data in these different models, and we incorporate into the discussion the importance of statistical literacy and of effective dissemination and communication of findings. PubDate: 2022-04-07
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this paper, we consider the confidence interval construction for the partially nonlinear models with missing responses at random under the framework of quantile regression. We propose an imputation-based empirical likelihood method to construct statistical inferences for both the unknown parametric vector in the nonlinear function and the nonparametric function and show that the proposed empirical log-likelihood ratios are both asymptotically chi-squared in theory. Furthermore, the confidence region for the parametric vector and the pointwise confidence interval for the nonparametric function are constructed. Some simulation studies are implemented to assess the performances of the proposed estimation method, and simulation results indicate that the proposed method is workable. PubDate: 2022-04-06
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The Riesz probability distribution on symmetric matrices represents an important extension of the Wishart distribution. It is defined by its Laplace transform involving the notion of generalized power. Based on the fact that some Wishart distributions are presented by the mean of the multivariate Gaussian distribution, it is shown that some Riesz probability distributions which are not necessarily Wishart are also presented by the mean of Gaussian samples with missing data. As a corollary, we deduce a Gaussian representation of the inverse Riesz distribution and we give its expectation. The results are assessed in simulation studies. PubDate: 2022-03-06
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Sustainability of agriculture is difficult to measure and assess because it is a multidimensional concept that involves economic, social and environmental aspects and is subjected to temporal evolution and geographical differences. Existing studies assessing agricultural sustainability in the European Union (EU) are affected by several shortcomings that limit their relevance for policy makers. Specifically, most of them focus on farm level or cover a small set of countries, and the few exceptions covering a broad set of countries consider only a subset of the sustainable dimensions or rely on cross-sectional data. In this paper, we consider yearly data on 12 indicators (5 for the economic, 3 for the social and 4 for the environmental dimension) measured on 26 EU countries in the period 2004–2018 (15 years), and apply group-based multivariate trajectory modeling to identify groups of countries with common trends of sustainable objectives. An expectation-maximization algorithm is proposed to perform maximum likelihood estimation from incomplete data without relying on an explicit imputation procedure. Our results highlight three groups of countries with distinguished strong and weak sustainable objectives. Strong objectives common to all the three groups include improvement of productivity, increase of personal income in rural areas, reduction of poverty in rural areas, increase of production of renewable energy, rise of organic farming and reduction of nitrogen balance. Instead, enhancement of manager turnover and reduction of greenhouse gas emissions are weak objectives common to all the three groups of countries. Our findings represent a valuable resource to formulate new schemes for the attribution of subsides within the Common Agricultural Policy (CAP). PubDate: 2022-03-05
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We present a data-driven approach to predict the next action in soccer. We focus on passing actions of the ball possessing player and aim to forecast the pass itself and when, in time, the pass will be played. At the same time, our model estimates the probability that the player loses possession of the ball before she can perform the action. Our approach consists of parameterized exponential rate models for all possible actions that are adapted to historic data with graph recurrent neural networks to account for inter-dependencies of the output space (i.e., the possible actions). We report on empirical results. PubDate: 2022-03-02 DOI: 10.1007/s10182-022-00435-x
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Multivariate data is collected in many fields, such as chemometrics, econometrics, financial engineering and genetics. In multivariate data, heteroscedasticity and collinearity occur frequently. And selecting material predictors is also a key issue when analyzing multivariate data. To accomplish these tasks, multivariate linear regression model is often constructed. We thus propose row-sparse elastic-net regularized multivariate Huber regression model in this paper. For this new model, we proof its grouping effect property and the property of resisting sample outliers. Based on the KKT condition, an accelerated proximal sub-gradient algorithm is designed to solve the proposed model and its convergency is also established. To demonstrate the accuracy and efficiency, simulation and real data experiments are carried out. The numerical results show that the new model can deal with heteroscedasticity and collinearity well. PubDate: 2022-03-01 DOI: 10.1007/s10182-021-00403-x
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The estimation of the long memory parameter d is a widely discussed issue in the literature. The harmonically weighted (HW) process was recently introduced for long memory time series with an unbounded spectral density at the origin. In contrast to the most famous fractionally integrated process, the HW approach does not require the estimation of the d parameter, but it may be just as able to capture long memory as the fractionally integrated model, if the sample size is not too large. Our contribution is a generalization of the HW model, denominated the Generalized harmonically weighted (GHW) process, which allows for an unbounded spectral density at \(k \ge 1\) frequencies away from the origin. The convergence in probability of the Whittle estimator is provided for the GHW process, along with a discussion on simulation methods. Fit and forecast performances are evaluated via an empirical application on paleoclimatic data. Our main conclusion is that the above generalization is able to model long memory, as well as its classical competitor, the fractionally differenced Gegenbauer process, does. In addition, the GHW process does not require the estimation of the memory parameter, simplifying the issue of how to disentangle long memory from a (moderately persistent) short memory component. This leads to a clear advantage of our formulation over the fractional long memory approach. PubDate: 2022-03-01 DOI: 10.1007/s10182-021-00394-9
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Spatial price comparisons rely to a high degree on the quality of the underlying price data that are collected within or across countries. Below the basic heading level, these price data often exhibit large gaps. Therefore, stochastic index number methods like the Country–Product–Dummy (CPD) method and the Gini–Eltetö–Köves–Szulc (GEKS) method are utilised for the aggregation of the price data into higher-level indices. Although the two index number methods produce differing price level estimates when prices are missing, the present paper demonstrates that both can be derived from exactly the same stochastic model. For a specific case of missing prices, it is shown that the formula underlying these price level estimates differs between the two methods only in weighting. The impact of missing prices on the efficiency of the price level estimates is analysed in two simulation studies. It can be shown that the CPD method slightly outperforms the GEKS method. Using micro data of Germany’s Consumer Price Index, it can be observed that more narrowly defined products improve estimation efficiency. PubDate: 2022-03-01 DOI: 10.1007/s10182-021-00409-5
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Finite mixtures of generalized linear models are commonly fitted by maximum likelihood and the EM algorithm. The estimation process and subsequent inferential and classification procedures can be badly affected by the occurrence of outliers. Actually, contamination in the sample at hand may lead to severely biased fitted components and poor classification accuracy. In order to take into account the potential presence of outliers, a robust fitting strategy is proposed that is based on the weighted likelihood methodology. The technique exhibits a satisfactory behavior in terms of both fitting and classification accuracy, as confirmed by some numerical studies and real data examples. PubDate: 2022-03-01 DOI: 10.1007/s10182-021-00402-y
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Modeling human ratings data subject to raters’ decision uncertainty is an attractive problem in applied statistics. In view of the complex interplay between emotion and decision making in rating processes, final raters’ choices seldom reflect the true underlying raters’ responses. Rather, they are imprecisely observed in the sense that they are subject to a non-random component of uncertainty, namely the decision uncertainty. The purpose of this article is to illustrate a statistical approach to analyse ratings data which integrates both random and non-random components of the rating process. In particular, beta fuzzy numbers are used to model raters’ non-random decision uncertainty and a variable dispersion beta linear model is instead adopted to model the random counterpart of rating responses. The main idea is to quantify characteristics of latent and non-fuzzy rating responses by means of random observations subject to fuzziness. To do so, a fuzzy version of the Expectation–Maximization algorithm is adopted to both estimate model’s parameters and compute their standard errors. Finally, the characteristics of the proposed fuzzy beta model are investigated by means of a simulation study as well as two case studies from behavioral and social contexts. PubDate: 2022-03-01 DOI: 10.1007/s10182-021-00407-7
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The development and application of models, which take the evolution of network dynamics into account, are receiving increasing attention. We contribute to this field and focus on a profile likelihood approach to model time-stamped event data for a large-scale dynamic network. We investigate the collaboration of inventors using EU patent data. As event we consider the submission of a joint patent and we explore the driving forces for collaboration between inventors. We propose a flexible semiparametric model, which includes external and internal covariates, where the latter are built from the network history. PubDate: 2022-03-01 DOI: 10.1007/s10182-021-00393-w
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The generalized method of moments (GMM) is an important estimation procedure in many areas of economics and finance, and it is well known that this estimation is highly sensitive to the presence of outliers and influential observations. Case-deletion diagnostic has been studied in GMM estimation; however, it is surprised that local influence analysis is under explored. To this end, a local influence method is proposed to assess the effect of minor perturbation on GMM estimation. The local diagnostic measures of GMM estimators under the perturbations of empirical distribution and moment condition are derived to study the joint influence of observations. The obtained results are applied to efficient instrumental variable estimation and dynamic panel data model. Two real data sets are used for illustration, and a simulation study is conducted to examine the effectiveness of the proposed methodology. The advantage of local influence method is analyzed in detail through comparison with the case-deletion method. PubDate: 2022-03-01 DOI: 10.1007/s10182-021-00398-5
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Prediction of quantiles at extreme tails is of interest in numerous applications. Extreme value modelling provides various competing predictors for this point prediction problem. A common method of assessment of a set of competing predictors is to evaluate their predictive performance in a given situation. However, due to the extreme nature of this inference problem, it can be possible that the predicted quantiles are not seen in the historical records, particularly when the sample size is small. This situation poses a problem to the validation of the prediction with its realization. In this article, we propose two non-parametric scoring approaches to assess extreme quantile prediction mechanisms. The proposed assessment methods are based on predicting a sequence of equally extreme quantiles on different parts of the data. We then use the quantile scoring function to evaluate the competing predictors. The performance of the scoring methods is compared with the conventional scoring method and the superiority of the former methods are demonstrated in a simulation study. The methods are then applied to analyze cyber Netflow data from Los Alamos National Laboratory and daily precipitation data at a station in California available from Global Historical Climatology Network. PubDate: 2022-02-14 DOI: 10.1007/s10182-021-00421-9