Subjects -> STATISTICS (Total: 130 journals)
 The end of the list has been reached or no journals were found for your choice.
Similar Journals
 Statistical PapersJournal Prestige (SJR): 1.004 Citation Impact (citeScore): 1Number of Followers: 4      Hybrid journal (It can contain Open Access articles) ISSN (Print) 1613-9798 - ISSN (Online) 0932-5026 Published by Springer-Verlag  [2469 journals]
• Bivariate densities in Bayes spaces: orthogonal decomposition and spline
representation

Abstract: Abstract A new orthogonal decomposition for bivariate probability densities embedded in Bayes Hilbert spaces is derived. It allows representing a density into independent and interactive parts, the former being built as the product of revised definitions of marginal densities, and the latter capturing the dependence between the two random variables being studied. The developed framework opens new perspectives for dependence modelling (e.g., through copulas), and allows the analysis of datasets of bivariate densities, in a Functional Data Analysis perspective. A spline representation for bivariate densities is also proposed, providing a computational cornerstone for the developed theory.
PubDate: 2022-09-22

• Simultaneous prediction using target function based on principal
components estimator with correlated errors

Abstract: Abstract Prediction is pivotal in linear regression analysis, especially in applied sciences. Before target function was defined, the predictions of actual values and/or average values of the dependent variable were obtained individually rather than simultaneously. However, in many applied studies, obtaining simultaneous predictions of both the average values and the actual values is more appropriate. In this paper, the simultaneous prediction based on the principal components estimator with correlated errors in the linear regression model under the problem of multicollinearity, which has negative effects on the prediction, is considered by utilizing the target function. We define three new predictors and make theoretical comparisons of proposed predictors by using the mean squared error of predictions. Also, we support theoretical findings with a comprehensive simulation study and two numerical examples.
PubDate: 2022-09-14

• Optimal equivalence testing in exponential families

Abstract: Abstract We develop uniformly most powerful unbiased (UMPU) two sample equivalence test for a difference of canonical parameters in exponential families. This development involves a non-unique reparametrization. We address this issue via a novel characterization of all possible reparametrizations of interest in terms of a matrix group. Furthermore, our procedure involves an intractable conditional distribution which we reproduce to a high degree of accuracy using saddlepoint approximations. The development of this saddlepoint-based procedure involves a non-unique reparametrization but we show that our procedure is invariant under choice of reparametrization. Our real data example considers the mean-to-variance ratio for normally distributed data. We compare our result to six competing equivalence testing procedures for the mean-to-variance ratio. Only our UMPU method finds evidence of equivalence, which is the expected result. We also perform a Monte Carlo simulation study which shows that our UMPU method outperforms all competing methods by exhibiting an empirical significance level which is not statistically significantly different from the nominal 5% level for all simulation settings.
PubDate: 2022-09-11

• Assessing the effectiveness of indirect questioning techniques by
detecting liars

Abstract: Abstract In many fields of applied research, mostly in sociological, economic, demographic and medical studies, misreporting due to untruthful responding represents a nonsampling error that frequently occurs especially when survey participants are presented with direct questions about sensitive, highly personal or embarrassing issues. Untruthful responses are likely to affect the overall quality of the collected data and flaw subsequent analyses, including the estimation of salient characteristics of the population under study such as the prevalence of people possessing a sensitive attribute. The problem may be mitigated by adopting indirect questioning techniques which guarantee privacy protection and enhance respondent cooperation. In this paper, making use of direct and indirect questions, we propose a procedure to detect the presence of liars in sensitive surveys which allows researchers to evaluate the impact of untruthful responses on the estimation of the prevalence of a sensitive attribute. We first introduce the theoretical framework, then apply the proposal to the Warner randomized response method, the unrelated question model, the item count technique, the crosswise model and the triangular model. To assess the effectiveness of the procedure, a simulation study is carried out. Finally, the presence and the amount of liars is discussed in two real studies concerning racism and workplace mobbing.
PubDate: 2022-09-06

• Non-asymptotic analysis and inference for an outlyingness induced
winsorized mean

Abstract: Abstract Robust estimation of a mean vector, a topic regarded as obsolete in the traditional robust statistics community, has recently surged in machine learning literature in the last decade. The latest focus is on the sub-Gaussian performance and computability of the estimators in a non-asymptotic setting. Numerous traditional robust estimators are computationally intractable, which partly contributes to the renewal of the interest in the robust mean estimation. Robust centrality estimators, however, include the trimmed mean and the sample median. The latter has the best robustness but suffers a low efficiency drawback. Trimmed mean and median of means, achieving sub-Gaussian performance have been proposed and studied in the literature. This article investigates the robustness of leading sub-Gaussian estimators of mean and reveals that none of them can resist greater than $$25\%$$ contamination in data and consequently introduces an outlyingness induced winsorized mean which has the best possible robustness (can resist up to $$50\%$$ contamination without breakdown) meanwhile achieving high efficiency. Furthermore, it has a sub-Gaussian performance for uncontaminated samples and a bounded estimation error for contaminated samples at a given confidence level in a finite sample setting. It can be computed in linear time.
PubDate: 2022-09-05

• Divergence-based tests for the bivariate gamma distribution applied to

Abstract: Abstract The use of polarimetric synthetic aperture radar (PolSAR) is one of the most successful tools for solving remote sensing problems. The multidimensional speckle noise encountered in the acquisition of these images is the main challenge for PolSAR users. Therefore, tailored processing of PolSAR images is required, especially for the use of hypothesis testing in change detection. In this paper, we use McKay’s bivariate gamma distribution (MBG) to describe a joint distribution resulting from two components of the total scattering power image (SPAN). We derive closed form expressions for the MBG Kullback–Leibler and Rényi divergences between SPAN-based random pairs. We provide new two-sample divergence-based hypothesis tests and evaluate their performance using Monte Carlo experiments. Finally, we apply the new tests to real PolSAR images to evaluate the changes caused by urbanization processes in the Los Angeles and California regions. The results show that our proposals are able to detect changes in PolSAR images.
PubDate: 2022-08-30

• Statistical analysis and first-passage-time applications of a lognormal
diffusion process with multi-sigmoidal logistic mean

Abstract: Abstract We consider a lognormal diffusion process having a multisigmoidal logistic mean, useful to model the evolution of a population which reaches the maximum level of the growth after many stages. Referring to the problem of statistical inference, two procedures to find the maximum likelihood estimates of the unknown parameters are described. One is based on the resolution of the system of the critical points of the likelihood function, and the other is on the maximization of the likelihood function with the simulated annealing algorithm. A simulation study to validate the described strategies for finding the estimates is also presented, with a real application to epidemiological data. Special attention is also devoted to the first-passage-time problem of the considered diffusion process through a fixed boundary.
PubDate: 2022-08-27

• Computing waiting time probabilities related to $$(k_{1},k_{2},\ldots ,k_{l})$$ ( k 1 , k 2 , … , k l ) pattern

Abstract: Abstract For a sequence of multi-state trials with l possible outcomes denoted by $$\left\{ 1,2,\ldots ,l\right\}$$ , let E be the event that at least $$k_{1}$$ consecutive 1s followed by at least $$k_{2}$$ consecutive 2s,..., followed by at least $$k_{l}$$ consecutive ls. Denote by $$T_{r}$$ the number of trials for the rth occurrence of the event E in a sequence of multi-state trials. This paper studies the distribution of the waiting time random variable $$T_{r}$$ when the sequence consists of independent and identically distributed multi-state trials. In particular, distributional properties of $$T_{r}$$ are examined via matrix-geometric distributions.
PubDate: 2022-08-13

• Construction of orthogonal general sliced Latin hypercube designs

Abstract: Abstract Computer experiments have attracted increasing attention in recent decades. General sliced Latin hypercube design (LHD), which is a sliced LHD with multiple layers and at each layer of which each slice can be further divided into smaller LHDs at the above layer, is widely applied in computer experiments with qualitative and quantitative factors, multiple model experiments, cross-validation, and stochastic optimization. Orthogonality is an important property for LHDs. Methods for constructing orthogonal and nearly orthogonal general sliced LHDs are put forward first time in this paper, where orthogonal designs and structural vectors are used in the constructions. The resulting designs not only possess orthogonality in the whole designs, but also achieve orthogonality in each layer before and after being collapsed. Furthermore, based on different structural vectors, the methods can be easily extended to construct orthogonal LHDs with some desired sliced or nested structures.
PubDate: 2022-08-12

• Compositional cubes: a new concept for multi-factorial compositions

Abstract: Abstract Compositional data are commonly known as multivariate observations carrying relative information. Even though the case of vector or even two-factorial compositional data (compositional tables) is already well described in the literature, there is still a need for a comprehensive approach to the analysis of multi-factorial relative-valued data. Therefore, this contribution builds around the current knowledge about compositional data a general theoretical framework for k-factorial compositional data. As a main finding it turns out that, similar to the case of compositional tables, also the multi-factorial structures can be orthogonally decomposed into an independent and several interactive parts and, moreover, a coordinate representation allowing for their separate analysis by standard analytical methods can be constructed. For the sake of simplicity, these features are explained in detail for the case of three-factorial compositions (compositional cubes), followed by an outline covering the general case. The three-dimensional structure is analyzed in depth in two practical examples, dealing with systems of spatial and time dependent compositional cubes. The methodology is implemented in the R package robCompositions.
PubDate: 2022-08-11

• Limiting distributions of the likelihood ratio test statistics for
independence of normal random vectors

Abstract: Abstract Consider the likelihood ratio test (LRT) statistics for the independence of sub-vectors from a p-variate normal random vector. We are devoted to deriving the limiting distributions of the LRT statistics based on a random sample of size n. It is well known that the limit is chi-square distribution when the dimension of the data or the number of the parameters are fixed. In a recent work by Qi et al. (Ann Inst Stat Math 71:911–946, 2019), it was shown that the LRT statistics are asymptotically normal under condition that the lengths of the normal random sub-vectors are relatively balanced if the dimension p goes to infinity with the sample size n. In this paper, we investigate the limiting distributions of the LRT statistic under general conditions. We find out all types of limiting distributions and obtain the necessary and sufficient conditions for the LRT statistic to converge to a normal distribution when p goes to infinity. We also investigate the limiting distribution of the adjusted LRT test statistic proposed in Qi et al. (2019). Moreover, we present simulation results to compare the performance of classical chi-square approximation, normal and non-normal approximation to the LRT statistics, chi-square approximation to the adjusted test statistic, and some other test statistics.
PubDate: 2022-08-09

• Seemingly unrelated clusterwise linear regression for contaminated data

Abstract: Abstract Clusterwise regression is an approach to regression analysis based on finite mixtures which is generally employed when sample observations come from a population composed of several unknown sub-populations. Whenever the response is continuous, Gaussian clusterwise linear regression models are usually employed. Such models have been recently robustified with respect to the possible presence of mild outliers in the sub-populations. However, in some fields of research, especially in the modelling of multivariate economic data or data from the social sciences, there may be prior information on the specific covariates to be considered in the linear term employed in the prediction of a certain response. As a consequence, covariates may not be the same for all responses. Thus, a novel class of multivariate Gaussian linear clusterwise regression models is proposed. This class provides an extension to mixture-based regression analysis for modelling multivariate and correlated responses in the presence of mild outliers that let the researcher free to use a different vector of covariates for each response. Details about the model identification and maximum likelihood estimation via an expectation-conditional maximisation algorithm are given. The performance of the new models is studied by simulation in comparison with other clusterwise linear regression models. A comparative evaluation of their effectiveness and usefulness is provided through the analysis of a real dataset.
PubDate: 2022-08-06

• Correction to: Testing convexity of the generalised hazard function

Abstract: A Correction to this paper has been published: 10.1007/s00362-021-01273-w
PubDate: 2022-08-01

• Tests for heteroskedasticity in transformation models

Abstract: Abstract We consider a model whereby a given response variable Y following a transformation $${{\mathcal {Y}}}:=\mathcal {T}(Y)$$ , satisfies some classical regression equation. In this transformation model the form of the transformation is specified analytically but incorporates an unknown transformation parameter. We develop testing procedures for the null hypothesis of homoskedasticity for versions of this model where the regression function is considered either known or unknown. The test statistics are formulated on the basis of Fourier-type conditional contrasts of a variance computed under the null hypothesis against the same quantity computed under alternatives. The limit null distribution of the test statistic is studied, as well as the behaviour of the test criterion under alternatives. Since the limit null distribution is complicated, a bootstrap version is suggested in order to actually carry out the test procedures. Monte Carlo results are included that illustrate the finite-sample properties of the new method. The applicability of the new tests on real data is also illustrated.
PubDate: 2022-08-01

• Portmanteau tests for generalized integer-valued autoregressive time
series models

Abstract: Abstract In recent years, integer-valued time series attract the attention of researchers and find their applications in data analysis. Among various models, the integer-valued autoregressive (INAR) ones are of great popularity and are widely applied in practice. This paper develops some portmanteau test statistics to check the adequacy of the fitted model in a wide group of INAR processes, called generalized INAR. For this purpose, the asymptotic distributions of the test statistics are obtained and, using Monte Carlo simulation studies, their finite sample properties are derived. Besides, the results are applied in analyzing a real data example
PubDate: 2022-08-01

• Confidence intervals with higher accuracy for short and long-memory linear
processes

Abstract: Abstract In this paper an easy to implement method of stochastically weighing short and long-memory linear processes is introduced. The method renders asymptotically exact size confidence intervals for the population mean which are significantly more accurate than their classic counterparts for each fixed sample size n. It is illustrated both theoretically and numerically that the randomization framework of this paper produces randomized (asymptotic) pivotal quantities, for the mean, which admit central limit theorems with smaller magnitudes of error as compared to those of their leading classic counterparts. An Edgeworth expansion result for randomly weighted linear processes whose innovations do not necessarily satisfy the Cramer condition, is established. Numerical illustrations and applications to real world data are also included.
PubDate: 2022-08-01

• Truncating the exponential with a uniform distribution

Abstract: Abstract For a sample of Exponentially distributed durations we aim at point estimation and a confidence interval for its parameter. A duration is only observed if it has ended within a certain time interval, determined by a Uniform distribution. Hence, the data is a truncated empirical process that we can approximate by a Poisson process when only a small portion of the sample is observed, as is the case for our applications. We derive the likelihood from standard arguments for point processes, acknowledging the size of the latent sample as the second parameter, and derive the maximum likelihood estimator for both. Consistency and asymptotic normality of the estimator for the Exponential parameter are derived from standard results on M-estimation. We compare the design with a simple random sample assumption for the observed durations. Theoretically, the derivative of the log-likelihood is less steep in the truncation-design for small parameter values, indicating a larger computational effort for root finding and a larger standard error. In applications from the social and economic sciences and in simulations, we indeed, find a moderately increased standard error when acknowledging truncation.
PubDate: 2022-08-01

• Quantile correlation coefficient: a new tail dependence measure

Abstract: Abstract A quantile correlation coefficient is newly defined as the geometric mean of two quantile regression slopes—that of X on Y and that of Y on X—in the same way that the Pearson correlation coefficient is related to regression coefficients. The quantile correlation is a measure of overall sensitivity of a conditional quantile of a random variable to changes in the other variable. The proposed quantile correlation can be compared across different tails within a given distribution to provide meaningful interpretations, for example, that there is stronger dependence in the left tail than overall. It can also be compared with the Pearson correlation. Neither of these two comparability within a given distribution is enabled by the existing tail-dependence correlation measures. Moreover a test for differences in the quantile correlations at different tails is proposed. The asymptotic normality of the estimated quantile correlation and the null distribution of the proposed test are established and are well supported by a Monte-Carlo study. The proposed quantile correlation methods are illustrated well by an analysis of stock return price data sets, yielding a clear indication of stronger left-tail dependence than overall dependence and stronger overall dependence than right-tail dependence.
PubDate: 2022-08-01

• Testing high-dimensional mean vector with applications

Abstract: Abstract A centered $$L^2$$ -norm based test statistic is used for testing if a high-dimensional mean vector equals zero where the data dimension may be much larger than the sample size. Inspired by the fact that under some regularity conditions the asymptotic null distributions of the proposed test are the same as the limiting distributions of a chi-square-mixture, a three-cumulant matched chi-square-approximation is suggested to approximate this null distribution. The asymptotic power of the proposed test under a local alternative is established and the effect of data non-normality is discussed. A simulation study under various settings demonstrates that in terms of size control, the proposed test performs significantly better than some existing competitors. Several real data examples are presented to illustrate the wide applicability of the proposed test to a variety of high-dimensional data analysis problems, including the one-sample problem, paired two-sample problem, and MANOVA for correlated samples or independent samples.
PubDate: 2022-08-01

• Properties of individual differences scaling and its interpretation

Abstract: Abstract Indscal models consider symmetric matrices $$\varvec{B}_{k}=\varvec{X}\varvec{W}_{k}\varvec{X}'$$ for $$k = 1, \ldots , K$$ , where $$\varvec{X}: n \times R$$ is a compromise matrix termed the group-average and $$\varvec{W}_{k}$$ is a diagonal matrix of weights given by the kth individual to the R, specified in advance, columns of $$\varvec{X}$$ ; non-negative weights are preferred and usually $$R < n$$ . We propose a new two-phase alternating least squares (ALS) algorithm, which emphasizes the two main components (group average and weighting parameters) of the Indscal model and specifically helps with the interpretation of the model. Furthermore, it has thrown new light on the properties of the converged solution, that would be satisfied by any algorithm that minimizes the basic Indscal criterion: $$min\sum _{k=1}^{K}\Vert \varvec{B}_{k}-\varvec{X}\varvec{W}_{k}\varvec{X}'\Vert ^{2}$$ where the minimization is over $$\varvec{X}$$ and the $$\varvec{W}_{k}$$ . The new algorithm has also proved to be a useful tool in unravelling the algebraic understanding of the role played by parameter constraints and their interpretation in variants of the Indscal model. The proposed analysis focusses on Indscal but the approach may be of more widespread interest, especially in the field of multidimensional data analysis. A major issue is that simultaneous least-squares estimates of the parameters may be found without imposing constraints. However, group average and individual weighting parameters may not be estimated uniquely, without imposing some subjective constraint that could encourage misleading interpretations. We encourage the use of linear constraints $$\sum _{k=1}^{K}\varvec{1'W}_{k}= \varvec{1'}$$ , as it enables a comparison of the weights obtained (i) within group k and (ii) between the same item drawn from two or more groups. However, it is easy to exchange one system of constraints to another in a post- or pre-analysis. The new two-phase ALS algorithm (i) computes for fixed $$\varvec{X}: n \times R$$ the weights $$\varvec{W}_{k}$$ subject to $$\sum _{k=1}^{K}\varvec{1'W}_{k}= \varvec{1'}$$ , and then (ii) keeping $$\varvec{W}_{k}$$ fixed, it updates $$\varvec{X}$$ . At convergence, the estimates of $$\varvec{X}: n \times R$$ and the $$\varvec{W}_{k}$$ will apply to all algorithms that minimize the Indscal criterion. Furthermore, we show that only at convergence an analysis-of-variance property holds on the demarcation region between over- and under-fitting. When the analysis-of-variance is valid, its validity extends over the whole matrix domain, over trace operations, and to individual matrix elements. The optimization process is unusual in that optima and local optima occur on the edges of what seem to be closely related to Heywood cases in Factor analysis.
PubDate: 2022-08-01

JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762