Subjects -> STATISTICS (Total: 130 journals)
 The end of the list has been reached or no journals were found for your choice.
Similar Journals
 Computational StatisticsJournal Prestige (SJR): 0.803 Citation Impact (citeScore): 1Number of Followers: 15      Hybrid journal (It can contain Open Access articles) ISSN (Print) 1613-9658 - ISSN (Online) 0943-4062 Published by Springer-Verlag  [2469 journals]
• A new non-archimedean metric on persistent homology

Abstract: Abstract In this article, we define a new non-archimedean metric structure, called cophenetic metric, on persistent homology classes of all degrees. We then show that zeroth persistent homology together with the cophenetic metric and hierarchical clustering algorithms with a number of different metrics do deliver statistically verifiable commensurate topological information based on experimental results we obtained on different datasets. We also observe that the resulting clusters coming from cophenetic distance do shine in terms of different evaluation measures such as silhouette score and the Rand index. Moreover, since the cophenetic metric is defined for all homology degrees, one can now display the inter-relations of persistent homology classes in all degrees via rooted trees.
PubDate: 2022-09-01

• Modified minimum distance estimators: definition, properties and
applications

Abstract: Abstract Estimating the location and scale parameters of a distribution is one of the most crucial issues in Statistics. Therefore, various estimators are proposed for estimating them, such as maximum likelihood, method of moments and minimum distance (e.g. Cramér-von Mises—CvM and Anderson Darling—AD), etc. However, in most of the cases, estimators of the location parameter $$\mu$$ and scale parameter $$\sigma$$ cannot be obtained in closed forms because of the nonlinear function(s) included in the corresponding estimating equations. Therefore, numerical methods are used to obtain the estimates of these parameters. However, they may have some drawbacks such as multiple roots, wrong convergency, and non-convergency of iterations. In this study, we adopt the idea of Tiku (Biometrika 54:155–165, 1967) into the CvM and AD methodologies with the intent of eliminating the aforementioned difficulties and obtaining closed form estimators of the parameters $$\mu$$ and $$\sigma$$ . Resulting estimators are called as modified CvM (MCvM) and modified AD (MAD), respectively. Proposed estimators are expressed as functions of sample observations and thus their calculations are straightforward. This property also allows us to avoid computational cost of iteration. A Monte-Carlo simulation study is conducted to compare the efficiencies of the CvM and AD estimators with their modified counterparts, i.e. the MCvM and MAD, for the normal, extreme value and Weibull distributions for an illustration. Real data sets are used to show the implementation of the proposed estimation methodologies.
PubDate: 2022-09-01

• New classes of tests for the Weibull distribution using Stein’s method
in the presence of random right censoring

Abstract: Abstract We develop two new classes of tests for the Weibull distribution based on Stein’s method. The proposed tests are applied in the full sample case as well as in the presence of random right censoring. We investigate the finite sample performance of the new tests using a comprehensive Monte Carlo study. In both the absence and presence of censoring, it is found that the newly proposed classes of tests outperform competing tests against the majority of the distributions considered. In the cases where censoring is present we consider various censoring distributions. Some remarks on the asymptotic properties of the proposed tests are included. We present another result of independent interest; a test initially proposed for use with full samples is amended to allow for testing for the Weibull distribution in the presence of censoring. The techniques developed in the paper are illustrated using two practical examples.
PubDate: 2022-09-01

• A computationally efficient approach to estimating species richness and
rarefaction curve

Abstract: Abstract In ecological and educational studies, estimators of the total number of species and rarefaction curve based on empirical samples are important tools. We propose a new method to estimate both rarefaction curve and the number of species based on a ready-made numerical approach such as quadratic optimization. The key idea in developing the proposed algorithm is based on nonparametric empirical Bayes estimation incorporating an interpolated rarefaction curve through quadratic optimization with linear constraints based on g-modeling in Efron (Stat Sci 29:285–301, 2014). Our proposed algorithm is easily implemented and shows better performances than existing methods in terms of computational speed and accuracy. Furthermore, we provide a criterion of model selection to choose some tuning parameters in estimation procedure and the idea of confidence interval based on asymptotic theory rather than resampling method. We present some asymptotic result of our estimator to validate the efficiency of our estimator theoretically. A broad range of numerical studies including simulations and real data examples are also conducted, and the gain that it produces has been compared to existing methods.
PubDate: 2022-09-01

• Statistical modeling of directional data using a robust hierarchical von
mises distribution model: perspectives for wind energy

Abstract: Abstract For describing wind direction, a variety of statistical distributions has been suggested that provides information about the wind regime at a particular location and aids the development of efficient wind energy generation. In this paper a systematic approach for data classification putting a special emphasis on the von Mises mixtures is presented. A von Mises mixture model is broad enough to cover, on one hand, symmetry and asymmetry, on the other hand, unimodality and multimodality of circular data. We developed an improved mathematical model of the classical von Mises mixture method, rests on number of principles which gives its internal coherence and originality. In principle, our hierarchical model of von Mises distributions is flexible to precisely modeled complex directional data sets. We define a new specific expectation–maximization (S-EM) algorithm for estimating the parameters of the model. The simulation showed that satisfactory fit of complex directional data could be obtained (error generally < 1%). Furthermore, the Bayesian Information Criterion is used to judge the goodness of fit and the suitability for this model versus common distributions found in the literature. The findings prove that our hierarchical model of von Mises distributions is relevant for modeling the complex directional data with several modes and/or prevailing data directions.
PubDate: 2022-09-01

• Estimation and inferences for varying coefficient partially nonlinear
quantile models with censoring indicators missing at random

Abstract: Abstract In this paper, we focus on the varying coefficient partially nonlinear quantile regression model when the response variable is right censored and the censoring indicator is missing at random. Based on the calibration and imputation estimation methods, the three-stage approaches are carried out to construct the estimators of the parameter vector in the nonlinear function part and the nonparametric varying-coefficient functions involved in the model. Under some appropriate conditions, the asymptotic properties of the proposed estimators are established. Simulation study and a real data analysis are performed to illustrate the performances of our proposed estimators.
PubDate: 2022-09-01

• Reliability inference for multicomponent stress–strength model from
Kumaraswamy-G family of distributions based on progressively first failure
censored samples

Abstract: Abstract In this article, the problem of reliability inference of multicomponent stress–strength (MSS) from Kumaraswamy-G (Kw-G) family of distributions under progressive first failure censoring is considered. The reliability of MSS is considered when both the stress and strength variables follow Kw-G distributions with different first shape parameters and common second shape parameter. The maximum likelihood (ML) and Bayes estimators of reliability are derived when all the parameters are unknown. Also, the ML, uniformly minimum variance unbiased and Bayes estimators of reliability are derived in case of common shape parameter is known. The Bayesian credible and HPD credible intervals of reliability are developed using Gibbs sampling method. The performance of various estimates developed are discussed by a Monte Carlo simulation study. At last, two real life examples are considered for illustrative purposes.
PubDate: 2022-09-01

• Gumbel’s bivariate exponential distribution: estimation of the
association parameter using ranked set sampling

PubDate: 2022-09-01

• RMSE-minimizing confidence intervals for the binomial parameter

Abstract: Abstract Let X be the number of successes in n mutually independent and identically distributed Bernoulli trials, each with probability of success p. For fixed n and $$\alpha$$ , there are $$n + 1$$ distinct two-sided $$100(1 - \alpha )$$ % confidence intervals for p associated with the outcomes $${X = 0, 1, 2, \ldots , n}$$ . There is no known exact non-randomized confidence interval for p. Existing approximate confidence interval procedures use a formula, which often requires numerical methods to implement, to calculate confidence interval bounds. The bounds associated with these confidence intervals correspond to discontinuities in the actual coverage function. The paper does not aim to provide a formula for the confidence interval bounds, but rather to select the confidence interval bounds that minimize the root mean square error of the actual coverage function for sample size n and significance level $$\alpha$$ in the frequentist context.
PubDate: 2022-09-01

• Shapiro-Wilk test for multivariate skew-normality

Abstract: Abstract The multivariate skew-normal family of distributions is a flexible class of probability models that includes the multivariate normal distribution as a special case. Two procedures for testing that a multivariate random sample comes from the multivariate skew-normal distribution are proposed here based on the estimated canonical form. Canonical data are transformed into approximately multivariate normal observations and then a multivariate version of the Shapiro-Wilk test is used for testing multivariate normality. Critical values for the tests are approximated without using parametric bootstrap. Monte Carlo simulation results provide evidence that the nominal test level is preserved, in general, under the considered settings. The simulation results also indicate that these tests are in general more powerful than existing tests for the same problem versus the studied alternatives.
PubDate: 2022-09-01

• Accelerated failure time models for recurrent event data analysis and
joint modeling

Abstract: Abstract There are two commonly encountered problems in survival analysis: (a) recurrent event data analysis, where an individual may experience an event multiple times over follow-up; and (b) joint modeling, where the event time distribution depends on a longitudinally measured internal covariate. The proportional hazards (PH) family offers an attractive modeling paradigm for recurrent event data analysis and joint modeling. Although there are well-known techniques to test the PH assumption for standard survival data analysis, checking this assumption for joint modeling has received less attention. An alternative framework involves considering an accelerated failure time (AFT) model, which is particularly useful when the PH assumption fails. Note that there are AFT models that can describe data with wide ranging characteristics but have received far less attention in modeling recurrent event data and joint analysis of time-to-event and longitudinal data. In this paper, we develop methodology to analyze these types of data using the AFT family of distributions. Fitting these models is computationally and numerically much more demanding compared to standard survival data analysis. In particular, fitting a joint model is a computationally intensive task as it requires to approximate multiple integrals that do not have an analytic solution except in very special cases. We propose computational algorithms for statistical inference, and develop a software package to fit these models. The proposed methodology is demonstrated using both simulated and real data.
PubDate: 2022-09-01

• Flexible, non-parametric modeling using regularized neural networks

PubDate: 2022-09-01

• Hybrid MLP-IDW approach based on nearest neighbor for spatial prediction

Abstract: Abstract Conventional methods of spatial prediction, such as Kriging, require assumptions such as stationarity and isotropy, which are not easy to evaluate, and often do not hold for spatial data. For these methods, the spatial dependency structure between data should be accurately modeled, which requires expert knowledge in spatial statistics. On the other hand, spatial prediction using artificial neural network (ANN) has attracted considerable interest due to ANN’s ability in learning from data without the need for complex and specialized assumptions. However, ANN models require suitable input variables for better and efficient spatial prediction. This paper aims to improve the accuracy of ANNs spatial prediction using neighboring information. Given the general principle that ”closer spatial data are more dependent”, we tried to somehow enter data dependency into the network by using the neighboring observations. We proposed a hybrid model of ANN and inverse distance weighting, based on nearby observations. We also proposed an ANN-based model for spatial prediction based on weighted values of nearby observations. The accuracy of the models was compared through a simulation study. The results showed that using neighboring information to train ANN, dramatically increases the prediction accuracy.
PubDate: 2022-09-01

• Penalized wavelet estimation and robust denoising for irregular spaced
data

Abstract: Abstract Nonparametric univariate regression via wavelets is usually implemented under the assumptions of dyadic sample size, equally spaced fixed sample points, and i.i.d. normal errors. In this work, we propose, study and compare some wavelet based nonparametric estimation methods designed to recover a one-dimensional regression function for data that not necessary possess the above requirements. These methods use appropriate regularizations by penalizing the decomposition of the unknown regression function on a wavelet basis of functions evaluated on the sampling design. Exploiting the sparsity of wavelet decompositions for signals belonging to homogeneous Besov spaces, we use some efficient proximal gradient descent algorithms, available in recent literature, for computing the estimates with fast computation times. Our wavelet based procedures, in both the standard and the robust regression case have favorable theoretical properties, thanks in large part to the separability nature of the (non convex) regularization they are based on. We establish asymptotic global optimal rates of convergence under weak conditions. It is known that such rates are, in general, unattainable by smoothing splines or other linear nonparametric smoothers. Lastly, we present several experiments to examine the empirical performance of our procedures and their comparisons with other proposals available in the literature. An interesting regression analysis of some real data applications using these procedures unambiguously demonstrate their effectiveness.
PubDate: 2022-09-01

• The truncated g-and-h distribution: estimation and application to loss
modeling

Abstract: Abstract The g-and-h distribution is a flexible model for skewed and/or leptokurtic data, which has been shown to be especially effective in actuarial analytics and risk management. Since in these fields data are often recorded only above a certain threshold, we introduce a left-truncated g-and-h distribution. Given the lack of an explicit density, we estimate the parameters via an Approximate Maximum Likelihood approach that uses the empirical characteristic function as summary statistics. Simulation results and an application to fire insurance losses suggest that the method works well and that the explicit consideration of truncation is strongly preferable with respect the use of the non-truncated g-and-h distribution.
PubDate: 2022-09-01

• Unified mean-variance feature screening for ultrahigh-dimensional
regression

Abstract: Abstract Feature screening is a popular and efficient statistical technique in processing ultrahigh-dimensional data. When a regression model consists both categorical and continuous predictors, a unified feature screening procedure is needed. Thus, we propose a unified mean-variance sure independence screening (UMV-SIS) for this setup. The mean-variance (MV), an effective utility to measure the dependence between two random variables, is widely used in feature screening for discriminant analysis. In this paper, we advocate using the kernel smoothing method to estimate MV between two continuous variables, thereby extending it to screen categorical and continuous predictors simultaneously. Besides the uniformity for screening, UMV-SIS is a model-free procedure without any specification of a regression model; this broadens the scope of its application. In theory, we show that the UMV-SIS procedure has the sure screening and ranking consistency properties under mild conditions. To solve some difficulties in marginal feature screening for linear model and further enhance the screening performance of our proposed method, an iterative UMV-SIS procedure is developed. The promising performances of the new method are supported by extensive numerical examples.
PubDate: 2022-09-01

• Bivariate elliptical regression for modeling interval-valued data

Abstract: Abstract This paper introduces a special case of a multivariate regression model with restriction for interval-valued data in the symbolic data analysis framework. This model is less sensitive in the presence of interval outliers since it considers light-heavy tails distributions. Intervals are obtained from classic data according to a fusion process and each interval can be represented by its center and range data or lower and upper bound values. The correlation between the center and range variables or lower and upper bound variables is a fundamental component for constructing the model. Therefore, a study that provides a suitable choice of the representation for intervals in bivariate models is proposed. Simulation studies in the Monte Carlo framework regarding different scenarios of interval data set with and without outliers are carried out to validate the proposed model. An application with real-life interval medical dataset is also performed.
PubDate: 2022-09-01

• Prediction of times to failure of censored units under generalized
progressive hybrid censoring scheme

Abstract: Abstract In this paper, the problem of predicting times to failure of units censored in multiple stages of generalized progressively hybrid censoring from exponential and Weibull distributions is discussed. Different classical point predictors, namely, the best unbiased, the maximum likelihood and the conditional median predictors are all derived. Moreover, the problem of interval prediction is investigated. Numerical example as well as two real data sets are used to illustrate the proposed prediction methods. Using a Monte-Carlo simulation algorithm, the performance of the point predictors is investigated in terms of the bias and mean squared prediction error criteria. Also, the width and the coverage rate of the obtained prediction intervals are studied by simulations.
PubDate: 2022-09-01

• Markovchart: an R package for cost-optimal patient monitoring and
treatment using control charts

Abstract: Abstract Control charts originate from industrial statistics, but are constantly seeing new areas of application, for example in health care (Thor et al. in BMJ Qual Saf 16(5):387–399, 2007. https://doi.org/10.1136/qshc.2006.022194; Suman and Prajapati in Int J Metrol Qual Eng, 2018. https://doi.org/10.1051/ijmqe/2018003). This paper is about the Markovchart package, an R implementation of generalised Markov chain-based control charts with health care applications in mind and with a focus on cost-effectiveness. The methods are based on Zempléni et al. (Appl Stoch Model Bus Ind 20(3):185–200, 2004. https://doi.org/10.1002/asmb.521), Dobi and Zempléni (Qual Reliab Eng Int 35(5):1379–1395, 2019a. https://doi.org/10.1002/qre.2518, Ann Univ Sci Budapestinensis Rolando Eötvös Nomin Sect Comput 49:129–146, 2019b). The implemented ideas in the package were motivated by problems encountered by health care professionals and biostatisticians when assessing the effects and costs of different monitoring schemes and therapeutic regimens. However, the implemented generalisations may be useful in other (e.g., engineering) applications too, as they mainly revolve around the loosening of assumptions seen in traditional control chart theory. The Markovchart package is able to model processes with random shift sizes (i.e., the degradation of the patient’s health), random repair (i.e., treatment) and random time between samplings (i.e., visits) as well. The article highlights the flexibility of the methods through the modelling of different disease progression and treatment scenarios and also through an application on real-world data of diabetic patients.
PubDate: 2022-09-01

• Models under which random forests perform badly; consequences for
applications

Abstract: Abstract We give examples of data-generating models under which Breiman’s random forest may be extremely slow to converge to the optimal predictor or even fail to be consistent. The evidence provided for these properties is based on mostly intuitive arguments, similar to those used earlier with simpler examples, and on numerical experiments. Although one can always choose models under which random forests perform very badly, we show that simple methods based on statistics of ‘variable use’ and ‘variable importance’ can often be used to construct a much better predictor based on a ‘many-armed’ random forest obtained by forcing initial splits on variables which the default version of the algorithm tends to ignore.
PubDate: 2022-09-01

JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762