Abstract: The application of spatial Cliff–Ord models requires information about spatial coordinates of statistical units to be reliable, which is usually the case in the context of areal data. With micro-geographic point-level data, however, such information is inevitably affected by locational errors, that can be generated intentionally by the data producer for privacy protection or can be due to inaccuracy of the geocoding procedures. This unfortunate circumstance can potentially limit the use of the spatial autoregressive modelling framework for the analysis of micro data, as the presence of locational errors may have a non-negligible impact on the estimates of model parameters. This contribution aims at developing a strategy to reduce the bias and produce more reliable inference for spatial models with location errors. The proposed estimation strategy models both the spatial stochastic process and the coarsening mechanism by means of a marked point process. The model is fitted through the maximisation of a doubly-marginalised likelihood function of the marked point process, which cleans out the effects of coarsening. The validity of the proposed approach is assessed by means of a Monte Carlo simulation study under different real-case scenarios, whereas it is applied to real data on house prices. PubDate: 2021-03-13
Abstract: We consider versions of the Metropolis algorithm which avoid the inefficiency of rejections. We first illustrate that a natural Uniform Selection algorithm might not converge to the correct distribution. We then analyse the use of Markov jump chains which avoid successive repetitions of the same state. After exploring the properties of jump chains, we show how they can exploit parallelism in computer hardware to produce more efficient samples. We apply our results to the Metropolis algorithm, to Parallel Tempering, to a Bayesian model, to a two-dimensional ferromagnetic 4 \(\times \) 4 Ising model, and to a pseudo-marginal MCMC algorithm. PubDate: 2021-03-13
Abstract: In this paper we consider mixture generalized autoregressive conditional heteroskedastic models, and propose a new iteration algorithm of type EM for the estimation of model parameters. The maximum likelihood estimates are shown to be consistent, and their asymptotic properties are investigated. More precisely, we derive simple expressions in closed form for the asymptotic covariance matrix and the expected Fisher information matrix of the ML estimator. Finally, we study the model selection and propose testing procedures. A simulation study and an application to financial real-series illustrate the results. PubDate: 2021-03-12
Abstract: This paper proposes an extension of principal component analysis to non-stationary multivariate time series data. A criterion for determining the number of final retained components is proposed. An advance correlation matrix is developed to evaluate dynamic relationships among the chosen components. The theoretical properties of the proposed method are given. Many simulation experiments show our approach performs well on both stationary and non-stationary data. Real data examples are also presented as illustrations. We develop four packages using the statistical software R that contain the needed functions to obtain and assess the results of the proposed method. PubDate: 2021-03-07
Abstract: We propose to extend CART for bivariate marked point processes to provide a segmentation of the space into homogeneous areas for interaction between marks. While usual CART tree considers marginal distribution of the response variable at each node, the proposed algorithm, SpatCART, takes into account the spatial location of the observations in the splitting criterion. We introduce a dissimilarity index based on Ripley’s intertype K-function quantifying the interaction between two populations. This index used for the growing step of the CART strategy, leads to a heterogeneity function consistent with the original CART algorithm. Therefore the new variant is a way to explore spatial data as a bivariate marked point process using binary classification trees. The proposed procedure is implemented in an R package, and illustrated on simulated examples. SpatCART is finally applied to a tropical forest example. PubDate: 2021-03-04
Abstract: In the original publication of the article, the corrections in Eq. (13) were missed, in which 2v − 1 was changed to 2v in the exponent. PubDate: 2021-03-01
Abstract: The paper presents a new insight of a recently proposed method named partial possibilistic regression path modeling. This method combines the principles of path modeling with those of possibilistic regression to model the net of relations among blocks of variables, where a weighted composite summarizes each block. It assumes that randomness can refer back as the measurement error, which is the error in modeling the relations between the observed variables and the corresponding composite, and the vagueness to the structural error, which is the uncertainty in modeling the relations among the composites behind each block of variables. The comparison of the proposed method with a classical composite-based path model is based on a simulation study. A case study on the use of Wikipedia in higher education illustrates a fruitful usability context of the proposed method. PubDate: 2021-03-01
Abstract: In this paper we propose a Dirichlet process mixture model for censored survival data with covariates. This model is suitable in two scenarios. First, this method can be used to identify clusters determined by both the censored survival data and the predictors. Second, this method is suitable for highly correlated predictors, in cases when the usual survival models cannot be implemented because they would be unstable due to multicollinearity. The Dirichlet process mixture model links a response vector to covariate data through cluster membership and in this paper this model is extended for mixtures of Weibull distributions, which can be used to model survival times and also allow for censoring. We propose two variants of this model, one with a shape parameter common to all clusters (referred to as a global parameter) for the Weibull distributions and one with a cluster-specific shape parameter. The first satisfies the proportional hazard assumption, while the latter is very flexible, as it has the advantage of allowing estimation of the survival curve whether or not the proportional hazards assumption is satisfied. We present a simulation study and, to demonstrate the applicability of the method in practice, a real application to sleep surveys in older women from The Australian Longitudinal Study on Women’s Health. The method developed in the paper is available in the R package PReMiuM. PubDate: 2021-03-01
Abstract: We consider the problem of numerically evaluating the expected value of a smooth bounded function of a chi-distributed random variable, divided by the square root of the number of degrees of freedom. This problem arises in the contexts of simultaneous inference, the selection and ranking of populations and in the evaluation of multivariate t probabilities. It also arises in the assessment of the coverage probability and expected volume properties of some non-standard confidence regions. We use a transformation put forward by Mori, followed by the application of the trapezoidal rule. This rule has the remarkable property that, for suitable integrands, it is exponentially convergent. We use it to create a nested sequence of quadrature rules, for the estimation of the approximation error, so that previous evaluations of the integrand are not wasted. The application of the trapezoidal rule requires the approximation of an infinite sum by a finite sum. We provide a new easily computed upper bound on the error of this approximation. Our overall conclusion is that this method is a very suitable candidate for the computation of the coverage and expected volume properties of non-standard confidence regions. PubDate: 2021-03-01
Abstract: Spatio-temporal change of support methods are designed for statistical analysis on spatial and temporal domains which can differ from those of the observed data. Previous work introduced a parsimonious class of Bayesian hierarchical spatio-temporal models, which we refer to as STCOS, for the case of Gaussian outcomes. Application of STCOS methodology from this literature requires a level of proficiency with spatio-temporal methods and statistical computing which may be a hurdle for potential users. The present work seeks to bridge this gap by guiding readers through STCOS computations. We focus on the R computing environment because of its popularity, free availability, and high quality contributed packages. The stcos package is introduced to facilitate computations for the STCOS model. A motivating application is the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that measures key socioeconomic and demographic variables for various populations in the United States. The STCOS methodology offers a principled approach to compute model-based estimates and associated measures of uncertainty for ACS variables on customized geographies and/or time periods. We present a detailed case study with ACS data as a guide for change of support analysis in R, and as a foundation which can be customized to other applications. PubDate: 2021-03-01
Abstract: In this paper we study partially linear varying coefficient models with missing covariates. Based on inverse probability-weighting and B-spline approximations, we propose a weighted B-spline composite quantile regression method to estimate the non-parametric function and the regression coefficients. Under some mild conditions, we establish the asymptotic normality and Horvitz–Thompson property of the proposed estimators. We further investigate a variable selection procedure by combining the proposed estimation method with adaptive LASSO. The oracle property of the proposed variable selection method is studied. Under a missing covariate scenario, two simulations with various non-normal error distributions and a real data application are conducted to assess and showcase the finite sample performance of the proposed estimation and variable selection methods. PubDate: 2021-03-01
Abstract: As a generalization to the ordinary least square regression, expectile regression, which can predict conditional expectiles, is fitted by minimizing an asymmetric square loss function on the training data. In literature, the idea of support vector machine was introduced to expectile regression to increase the flexibility of the model, resulting in support vector expectile regression (SVER). This paper reformulates the Lagrangian function of SVER as a differentiable convex function over the nonnegative orthant, which can be minimized by a simple iterative algorithm. The proposed algorithm is easy to implement, without requiring any particular optimization toolbox besides basic matrix operations. Theoretical and experimental analysis show that the algorithm converges r-linearly to the unique minimum point. The proposed method was compared to alternative algorithms on simulated data and real-world data, and we observe that the proposed method is much more computationally efficient while yielding similar prediction accuracy. PubDate: 2021-03-01
Abstract: Edge detection is the front-end processing stage in most computer vision and image understanding systems. Among various edge detection techniques, Canny edge detector is the one of most commonly used. In this paper a modified Canny edge detection technique focusing on change of the Sobel operator is proposed. Instead of convolution kernels, the weighted least squares method is utilized to calculate the horizontal and vertical gradient. Experimental results show that the new detector can detect some edges which are not observed in the results using the Canny edge detector. PubDate: 2021-03-01
Abstract: In this paper, we discuss a family of robust, high-dimensional regression models for quantile and composite quantile regression, both with and without an adaptive lasso penalty for variable selection. We reformulate these quantile regression problems and obtain estimators by applying the alternating direction method of multipliers (ADMM), majorize-minimization (MM), and coordinate descent (CD) algorithms. Our new approaches address the lack of publicly available methods for (composite) quantile regression, especially for high-dimensional data, both with and without regularization. Through simulation studies, we demonstrate the need for different algorithms applicable to a variety of data settings, which we implement in the cqrReg package for R. For comparison, we also introduce the widely used interior point (IP) formulation and test our methods against the IP algorithms in the existing quantreg package. Our simulation studies show that each of our methods, particularly MM and CD, excel in different settings such as with large or high-dimensional data sets, respectively, and outperform the methods currently implemented in quantreg. The ADMM approach offers specific promise for future developments in its amenability to parallelization and scalability. PubDate: 2021-03-01
Abstract: In this paper we propose an automatic selection of the bandwidth of the recursive non-parametric estimation of the kernel classification rule function defined by the stochastic approximation algorithm, when the explanatory data are curves and the response is categorical. We established a central limit theorem for our proposed recursive estimators, the proposed recursive estimators will be very competitive to the non-recursive one in terms of estimation error but much better in terms of computational costs. The proposed estimators are used first on simulated waveform curves and then on real phoneme data. PubDate: 2021-03-01
Abstract: Partial least squares path modeling is a statistical method that allows to analyze complex dependence relationships among several blocks of observed variables, each one represented by a latent variable. The computation of latent variable scores is an essential step of the method, achieved through an iterative procedure named here Hanafi–Wold’s procedure. The present paper generalizes properties already known in the literature for this procedure, from which additional convergence results will be obtained. PubDate: 2021-03-01
Abstract: In fitting data with a spline, finding the optimal placement of knots can significantly improve the quality of the fit. However, the challenging high-dimensional and non-convex optimization problem associated with completely free knot placement has been a major roadblock in using this approach. We present a method that uses particle swarm optimization (PSO) combined with model selection to address this challenge. The problem of overfitting due to knot clustering that accompanies free knot placement is mitigated in this method by explicit regularization, resulting in a significantly improved performance on highly noisy data. The principal design choices available in the method are delineated and a statistically rigorous study of their effect on performance is carried out using simulated data and a wide variety of benchmark functions. Our results demonstrate that PSO-based free knot placement leads to a viable and flexible adaptive spline fitting approach that allows the fitting of both smooth and non-smooth functions. PubDate: 2021-03-01
Abstract: Count data is becoming more and more ubiquitous in a wide range of applications, with datasets growing both in size and in dimension. In this context, an increasing amount of work is dedicated to the construction of statistical models directly accounting for the discrete nature of the data. Moreover, it has been shown that integrating dimension reduction to clustering can drastically improve performance and stability. In this paper, we rely on the mixture of multinomial PCA, a mixture model for the clustering of count data, also known as the probabilistic clustering-projection model in the literature. Related to the latent Dirichlet allocation model, it offers the flexibility of topic modeling while being able to assign each observation to a unique cluster. We introduce a greedy clustering algorithm, where inference and clustering are jointly done by mixing a classification variational expectation maximization algorithm, with a branch & bound like strategy on a variational lower bound. An integrated classification likelihood criterion is derived for model selection, and a thorough study with numerical experiments is proposed to assess both the performance and robustness of the method. Finally, we illustrate the qualitative interest of the latter in a real-world application, for the clustering of anatomopathological medical reports, in partnership with expert practitioners from the Institut Curie hospital. PubDate: 2021-03-01
Abstract: Doubly-truncated data arise in many fields, including economics, engineering, medicine, and astronomy. This article develops likelihood-based inference methods for lifetime distributions under the log-location-scale model and the accelerated failure time model based on doubly-truncated data. These parametric models are practically useful, but the methodologies to fit these models to doubly-truncated data are missing. We develop algorithms for obtaining the maximum likelihood estimator under both models, and propose several types of interval estimation methods. Furthermore, we show that the confidence band for the cumulative distribution function has closed-form expressions. We conduct simulations to examine the accuracy of the proposed methods. We illustrate our proposed methods by real data from a field reliability study, called the Equipment-S data. PubDate: 2021-03-01
Abstract: This paper discusses simultaneous parameter estimation and variable selection and presents a new penalized regression method. The method is based on the idea that the coefficient estimates are shrunken towards a predetermined coefficient vector which represents the prior information. This method can result in smaller length estimates of the coefficients depending on the prior information compared to elastic net. In addition to the establishment of the grouping property, we also show that the new method has the grouping effect when the predictors are highly correlated. Simulation studies and real data example show that the prediction performance of the new method is improved over the well-known ridge, lasso and elastic net regression methods yielding a lower mean squared error and competes about the variable selection under sparse and non-sparse situations. PubDate: 2021-03-01