Subjects -> STATISTICS (Total: 130 journals)
 The end of the list has been reached or no journals were found for your choice.
Similar Journals
 Statistical Methods and ApplicationsJournal Prestige (SJR): 0.466 Citation Impact (citeScore): 1Number of Followers: 6      Hybrid journal (It can contain Open Access articles) ISSN (Print) 1613-981X - ISSN (Online) 1618-2510 Published by Springer-Verlag  [2467 journals]
• Correction to: RIF regression via sensitivity curves

PubDate: 2023-03-01

• Heterogeneity in general multinomial choice models

Abstract: Abstract Different voters behave differently at the polls, different students make different university choices, or different countries choose different health care systems. Many research questions important to social scientists concern choice behavior, which involves dealing with nominal dependent variables. Drawing on the principle of maximum random utility, we propose applying a flexible and general heterogeneous multinomial logit model to study differences in choice behavior. The model systematically accounts for heterogeneity that classical models do not capture, indicates the strength of heterogeneity, and permits examining which explanatory variables cause heterogeneity. As the proposed approach allows incorporating theoretical expectations about heterogeneity into the analysis of nominal dependent variables, it can be applied to a wide range of research problems. Our empirical example uses individual-level survey data to demonstrate the benefits of the model in studying heterogeneity in electoral decisions.
PubDate: 2023-03-01

• Linear approximation of the Threshold AutoRegressive model: an application
to order estimation

Abstract: Abstract This paper proposes a linear approximation of the nonlinear Threshold AutoRegressive model. It is shown that there is a relation between the autoregressive order of the threshold model and the order of its autoregressive moving average approximation. The main advantage of this approximation can be found in the extension of some theoretical results developed in the linear setting to the nonlinear domain. Among them is proposed a new order estimation procedure for threshold models whose performance is compared, through a Monte Carlo study, to other criteria largely employed in the nonlinear threshold context.
PubDate: 2023-03-01

• A Bayesian variable selection approach to longitudinal quantile regression

Abstract: Abstract The literature on variable selection for mean regression is quite rich, both in the classical as well as in the Bayesian setting. However, if the goal is to assess the effects of the predictors at different levels of the response variable then quantile regression is useful. In this paper, we develop a Bayesian variable selection method for longitudinal response at some prefixed quantile levels of the response. We consider an Asymmetric Laplace Distribution (ALD) for the longitudinal response, and develop a simple Gibbs sampler algorithm for variable selection at each quantile level. We analyze a dataset from the health and retirement study (HRS) conducted by the University of Michigan for understanding the relationship between the physical health and the financial health of the aged individuals. We consider the out-of-pocket medical expenses as our response variable since it summarizes the physical and the financial well-being of an aged individual. Our proposed approach efficiently selects the important predictors at different prefixed quantile levels. Simulation studies are performed to assess the practical usefulness of the proposed approach. We also compare the performance of the proposed approach to some other existing methods of variable selection in quantile regression.
PubDate: 2023-03-01

• Trend resistant balanced bipartite block designs

Abstract: Abstract Balanced Bipartite Block (BBPB) designs resistant against the trend are used when the interest of the experimenter is in making comparisons between two sets of treatments that are disjoint, and there is the presence of systematic trend within a block. This paper deals with the bipartite block model incorporating trend component. The general methodology has been described related to BBPB designs incorporating trend effect. The conditions for a BBPB design to be trend resistant are also obtained. Further, methods of constructing trend resistant BBPB designs are discussed. The designs so obtained are trend resistant and are more efficient for estimating the contrasts pertaining to two treatments from different sets.
PubDate: 2023-03-01

• Automatic robust Box–Cox and extended Yeo–Johnson
transformations in regression

Abstract: Abstract The paper introduces an automatic procedure for the parametric transformation of the response in regression models to approximate normality. We consider the Box–Cox transformation and its generalization to the extended Yeo–Johnson transformation which allows for both positive and negative responses. A simulation study illuminates the superior comparative properties of our automatic procedure for the Box–Cox transformation. The usefulness of our procedure is demonstrated on four sets of data, two including negative observations. An important theoretical development is an extension of the Bayesian Information Criterion (BIC) to the comparison of models following the deletion of observations, the number deleted here depending on the transformation parameter.
PubDate: 2023-03-01

• 2-step Gradient Boosting approach to selectivity bias correction in tax
audit: an application to the VAT gap in Italy

Abstract: Abstract The revenue loss from tax avoidance can undermine the effectiveness and equity of the government policies. A standard measure of its magnitude is known as the tax gap, that is defined as the difference between the total taxes theoretically collectable and the total taxes actually collected in a given period. Estimation from a micro perspective is usually tackled in the context of bottom-up approaches, where data regularly collected through fiscal audits are analyzed in order to provide inference on the general population. However, the sampling scheme of fiscal audits performed by revenue agencies is not random but characterized by a selection bias toward risky taxpayers. The current standard adopted by the Italian Revenue Agency (IRA) for overcoming this issue in the Tax audit context is the Heckman model, based on linear models for modeling both the selection and the outcome mechanisms. Here we propose the adoption of the CART-based Gradient Boosting in place of standard linear models to account for the complex patterns often arising in the relationships between covariates and outcome. Selection bias is corrected by considering a re-weighting scheme based on propensity scores, attained through the sequential application of a classifier and a regressor. In short we refer to the method as 2-step Gradient Boosting. We argue how this scheme fits the sampling mechanism of the IRA fiscal audits, and it is applied to a sample of VAT declarations from Italian individual firms in the fiscal year 2011. Results show a marked dominance of the proposed method over the currently adopted Heckman model in terms of predictive performances.
PubDate: 2023-03-01

• Influence measures in nonparametric regression model with symmetric random
errors

Abstract: Abstract In this paper we present several diagnostic measures for the class of nonparametric regression models with symmetric random errors, which includes all continuous and symmetric distributions. In particular, we derive some diagnostic measures of global influence such as residuals, leverage values, Cook’s distance and the influence measure proposed by Peña (Technometrics 47(1):1–12, 2005) to measure the influence of an observation when it is influenced by the rest of the observations. A simulation study to evaluate the effectiveness of the diagnostic measures is presented. In addition, we develop the local influence measure to assess the sensitivity of the maximum penalized likelihood estimator of smooth function. Finally, an example with real data is given for illustration.
PubDate: 2023-03-01

• Optimal sample size for estimating the mean concentration of invasive
organisms in ballast water via a semiparametric Bayesian analysis

Abstract: Abstract We consider the determination of optimal sample sizes to estimate the concentration of organisms in ballast water via a semiparametric Bayesian approach involving a Dirichlet process mixture based on a Poisson model. This semiparametric model provides greater flexibility to model the organism distribution than that allowed by competing parametric models and is robust against misspecification. To obtain the optimal sample size we use a total cost minimization criterion, based on the sum of a Bayes risk and a sampling cost function. Credible intervals obtained via the proposed model may be used to verify compliance of the water with international standards before deballasting.
PubDate: 2023-03-01

• Maximum likelihood estimation of missing data probability for nonmonotone
missing at random data

Abstract: Abstract In general, statistical analysis with missing data requires specification of a model for the missing data probability and/or the covariate distribution. For nonmonotone missing data patterns, modeling and practical estimation of the missing data probability are very challenging. Recently a semiparametric likelihood model was developed to estimate parametric regression models for the missing data mechanism based on all the observed data, which can deal with arbitrary nonmonotone missing data patterns. However, due to the curse of dimensionality in the likelihood-based models, this method becomes impractical if the number of variables increases. This research generalizes the semiparametric likelihood model such that it can deal with any number of variables with arbitrary nonmonotone missing data patterns. It further introduces a semiparametric estimator of the missing data probability for the partially observed data, which can be used to assess the model fit. An EM algorithm with closed form expressions at each step are used to compute the estimates. Simulation studies in various settings indicate that the performance of the new method is acceptable for practical implementation. The missing data mechanism of a case-control study of hip fractures among male veterans is analyzed to illustrate the method.
PubDate: 2023-03-01

• Generalised calibration with latent variables for the treatment of unit
nonresponse in sample surveys

Abstract: Abstract Sample surveys may suffer from nonignorable unit nonresponse. This happens when the decision of whether or not to participate in the survey is correlated with variables of interest; in such a case, nonresponse produces biased estimates for parameters related to those variables, even after adjustments that account for auxiliary information. This paper presents a method to deal with nonignorable unit nonresponse that uses generalised calibration and latent variable modelling. Generalised calibration enables to model unit nonresponse using a set of auxiliary variables (instrumental or model variables), that can be different from those used in the calibration constraints (calibration variables). We propose to use latent variables to estimate the probability to participate in the survey and to construct a reweighting system incorporating such latent variables. The proposed methodology is illustrated, its properties discussed and tested on two simulation studies. Finally, it is applied to adjust estimates of the finite population mean wealth from the Italian Survey of Household Income and Wealth.
PubDate: 2023-03-01

• Nonparametric estimation of the distribution of gap times for recurrent
events

Abstract: Abstract In many longitudinal studies, information is collected on the times of different kinds of events. Some of these studies involve repeated events, where a subject or sample unit may experience a well-defined event several times throughout their history. Such events are called recurrent events. In this paper, we introduce nonparametric methods for estimating the marginal and joint distribution functions for recurrent event data. New estimators are introduced and their extensions to several gap times are also given. Nonparametric inference conditional on current or past covariate measures is also considered. We study by simulation the behavior of the proposed estimators in finite samples, considering two or three gap times. Our proposed methods are applied to the study of (multiple) recurrence times in patients with bladder tumors. Software in the form of an R package, called survivalREC, has been developed, implementing all methods.
PubDate: 2023-03-01

• Does education protect families' well-being in times of crisis'
Measurement issues and empirical findings from IT-SILC data

Abstract: Abstract This study analyses the relationship between education and material well-being from a longitudinal perspective using the European Survey on Income and Living Conditions (EU-SILC) data collected in Italy in four waves (2009–2012). It has two main aims: (i) to measure household material well-being on the basis of householders’ responses to multiple survey items (addressed to gather information on the household availability of material resources) by advancing indexes, which can account for global and relative divergences in households’ material well-being across survey waves; (ii) to assess how education and other sociodemographic characteristics affect absolute well-being and its variation (i.e. relative well-being) in the time span considered. Both aims are pursued, combining measuring and explanatory modelling approaches. That is, the use of the Multilevel Item Response Theory model allows to measure the global household material well-being and its yearly variation (i.e. relative material well-being) in the four waves. Meanwhile, the use of a multivariate (and multivariate multilevel) regression model allows to assess the effects of education and other sociodemographic characteristics on both components (absolute and relative well-being), controlling for the relevant sources of heterogeneity in the data. The value added to using the proposed methodologies with the main findings and economic implications are discussed.
PubDate: 2023-03-01

• RIF regression via sensitivity curves

Abstract: Abstract This paper proposes an empirical method to implement the recentered influence function (RIF) regression of Firpo et al. (Econometrica 77(3):953–973, 2009), a relevant method to study the effect of covariates on many statistics beyond the mean. In empirically relevant situations where the influence function is not available or difficult to compute, we suggest to use the sensitivity curve (as reported by Tukey in Exploratory Data Analysis. Addison-Wesley, Reading, MA, 1977) as a feasible alternative. This may be computationally cumbersome when the sample size is large. The relevance of the proposed strategy derives from the fact that, under general conditions, the sensitivity curve converges in probability to the influence function. In order to save computational time we propose to use a cubic splines non-parametric method for a random subsample and then to interpolate to the rest of the cases where it was not computed. Monte Carlo simulations show good finite sample properties. We illustrate the proposed estimator with an application to the polarization index of Duclos et al. (Econometrica 72(6):1737–1772, 2004).
PubDate: 2023-03-01

• Modelling time-varying covariates effect on survival via functional data
analysis: application to the MRC BO06 trial in osteosarcoma

Abstract: Abstract Time-varying covariates are of great interest in clinical research since they represent dynamic patterns which reflect disease progression. In cancer studies biomarkers values change as functions of time and chemotherapy treatment is modified by delaying a course or reducing the dose intensity, according to patient’s toxicity levels. In this work, a Functional covariate Cox Model (FunCM) to study the association between time-varying processes and a time-to-event outcome is proposed. FunCM first exploits functional data analysis techniques to represent time-varying processes in terms of functional data. Then, information related to the evolution of the functions over time is incorporated into functional regression models for survival data through functional principal component analysis. FunCM is compared to a standard time-varying covariate Cox model, commonly used despite its limiting assumptions that covariate values are constant in time and measured without errors. Data from MRC BO06/EORTC 80931 randomised controlled trial for treatment of osteosarcoma are analysed. Time-varying covariates related to alkaline phosphatase levels, white blood cell counts and chemotherapy dose during treatment are investigated. The proposed method allows to detect differences between patients with different biomarkers and treatment evolutions, and to include this information in the survival model. These aspects are seldom addressed in the literature and could provide new insights into the clinical research.
PubDate: 2023-03-01

• Generalized residuals and outlier detection for ordinal data with
challenging data structures

Abstract: Abstract Motivated by the analysis of rating data concerning perceived health status, a crucial variable in biomedical, economic and life insurance models, the paper deals with diagnostic procedures for identifying anomalous and/or influential observations in ordinal response models with challenging data structures. Deviations due to some respondents’ atypical behavior, outlying covariates and gross errors may affect the reliability of likelihood based inference, especially when non robust link functions are adopted. The present paper investigates and exploits the properties of the generalized residuals. They appear in the estimating equations of the regression coefficients and hold the remarkable characteristic of interacting with the covariates in the same fashion as the linear regression residuals. Identification of statistical units incoherent with the model can be achieved by the analysis of the residuals produced by maximum likelihood or robust M-estimation, while the inspection of the weights generated by M-estimation allows to identify influential data. Simple guidelines are proposed to this end, which disclose information on the data structure. The purpose is twofold: recognizing statistical units that deserve specific attention for their peculiar features, and being aware of the sensitivity of the fitted model to small changes in the sample. In the analysis of the self-perceived health status, extreme design points associated with incoherent responses produce highly influential observations. The diagnostic procedures identify the outliers and assess their influence.
PubDate: 2023-02-28

• Calibrated Bayes factors under flexible priors

Abstract: Abstract This article develops and explores a robust Bayes factor derived from a calibration technique that makes it particularly compatible with elicited prior knowledge. Building on previous explorations, the particular robust Bayes factor, dubbed a neutral-data comparison, is adapted for broad comparisons with existing robust Bayes factors, such as the fractional and intrinsic Bayes factors, in configurations defined by informative priors. The calibration technique is furthermore developed for use with flexible parametric priors—that is, mixture prior distributions with components that may be symmetric or skewed—, and demonstrated in an example context from forensic science. Throughout the exploration, the neutral-data comparison is shown to exhibit desirable sensitivity properties, and to show promise for adaptation to elaborate data-analysis scenarios.
PubDate: 2023-02-06

• A multi-decomposition of Zenga-84 inequality index: an application to the
disparity in CO $$_2$$ emissions in European countries

Abstract: Abstract The monitoring of CO $$_2$$ emissions has become a sensitive topic of discussion in the last years. The engagement of the protocol of Kyoto, and the subsequent activities that the different countries have carried out to reduce the CO $$_2$$ emissions, are factors which push the topic into the spotlight. An interesting issue regards how the disparities of such emissions can be analyzed by sources and by subpopulations. In this paper an innovative procedure to jointly decompose the disparity by sources and by subpopulations is proposed. The assessment of the inequality is determined by the Zenga-84 index. This new methodology is applied to the analysis of the per capita CO $$_2$$ emission disparities for European countries, by simultaneously considering their sources (coal, oil, natural gas, and other) and the membership of the country to OECD.Q
PubDate: 2023-02-06

• Partial least square based approaches for high-dimensional linear mixed
models

Abstract: Abstract To deal with repeated data or longitudinal data, linear mixed effects models are commonly used. A classical parameter estimation method is the Expectation–Maximization (EM) algorithm. In this paper, we propose three new Partial Least Square (PLS) based approaches using the EM-algorithm to reduce the high-dimensional data to a lower one for fixed effects in linear mixed models. Unlike the Principal Component Regression approach, the PLS method allows to take into account the link between the outcome and the independent variables. We compare these approaches from a simulation study and a yeast cell-cycle gene expression data set. We demonstrate the performance of two of them and we recommend their use to conduct future analyses for high dimensional data in linear mixed effect models context.
PubDate: 2023-02-02

• Estimators for ROC curves with missing biomarkers values and informative
covariates

Abstract: Abstract In this paper, we present three estimators of the $${\hbox {ROC}}$$ curve when missing observations arise among the biomarkers. Two of the procedures assume that we have covariates that allow to estimate the propensity and and from this information, the estimators are obtained using an inverse probability weighting method or a smoothed version of it. The third one assumes that the covariates are related to the biomarkers through a regression model which enables us to construct convolution–based estimators of the distribution and quantile functions. Consistency results are obtained under mild conditions. Through a numerical study we evaluate the finite sample performance of the different proposals. A real data set is also analysed.
PubDate: 2023-01-30

JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762