A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

  Subjects -> STATISTICS (Total: 130 journals)
The end of the list has been reached or no journals were found for your choice.
Similar Journals
Journal Cover
Statistical Methods and Applications
Journal Prestige (SJR): 0.466
Citation Impact (citeScore): 1
Number of Followers: 6  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1613-981X - ISSN (Online) 1618-2510
Published by Springer-Verlag Homepage  [2467 journals]
  • Partial least square based approaches for high-dimensional linear mixed
           models

    • Free pre-print version: Loading...

      Abstract: Abstract To deal with repeated data or longitudinal data, linear mixed effects models are commonly used. A classical parameter estimation method is the Expectation–Maximization (EM) algorithm. In this paper, we propose three new Partial Least Square (PLS) based approaches using the EM-algorithm to reduce the high-dimensional data to a lower one for fixed effects in linear mixed models. Unlike the Principal Component Regression approach, the PLS method allows to take into account the link between the outcome and the independent variables. We compare these approaches from a simulation study and a yeast cell-cycle gene expression data set. We demonstrate the performance of two of them and we recommend their use to conduct future analyses for high dimensional data in linear mixed effect models context.
      PubDate: 2023-02-02
       
  • Estimators for ROC curves with missing biomarkers values and informative
           covariates

    • Free pre-print version: Loading...

      Abstract: Abstract In this paper, we present three estimators of the \({\hbox {ROC}}\) curve when missing observations arise among the biomarkers. Two of the procedures assume that we have covariates that allow to estimate the propensity and and from this information, the estimators are obtained using an inverse probability weighting method or a smoothed version of it. The third one assumes that the covariates are related to the biomarkers through a regression model which enables us to construct convolution–based estimators of the distribution and quantile functions. Consistency results are obtained under mild conditions. Through a numerical study we evaluate the finite sample performance of the different proposals. A real data set is also analysed.
      PubDate: 2023-01-30
       
  • Perceived climate change risk and global green activism among young people

    • Free pre-print version: Loading...

      Abstract: Abstract In recent years, the increasing number of natural disasters has raised concerns about the sustainability of our planet’s future. As young people comprise the generation that will suffer from the negative effects of climate change, they have become involved in a new climate activism that is also gaining interest in the public debate thanks to the Fridays for Future (FFF) movement. This paper analyses the results of a survey of 1,138 young people in a southern Italian region to explore their perceptions of the extent of environmental problems and their participation in protests of green movements such as the FFF. The statistical analyses perform an ordinal classification tree using an original impurity measure considering both the ordinal nature of the response variable and the heterogeneity of its ordered categories. The results show that respondents are concerned about the threat of climate change and participate in the FFF to claim their right to a healthier planet and encourage people to adopt environmentally friendly practices in their lifestyles. Young people feel they are global citizens, connected through the Internet and social media, and show greater sensitivity to the planet’s environmental problems, so they are willing to take effective action to demand sustainable policies from decision-makers. When planning public policies that will affect future generations, it is important for policymakers to know the demands and opinions of key stakeholders, especially young people, in order to plan the most appropriate measures, such as climate change mitigation.
      PubDate: 2023-01-30
       
  • Community informed experimental design

    • Free pre-print version: Loading...

      Abstract: Abstract Network information has become a common feature of many modern experiments. From vaccine efficacy studies to marketing for product adoption, stakeholders aim to estimate global treatment effects — what happens if everyone in a network is treated versus if no one is treated. Because individual outcomes are potentially influenced by the treatments or behaviors of others in the network, experimental designs must condition on the underlying network. Social networks frequently exhibit homophilous community structure, meaning that individuals within observed or latent communities are more similar to each. This observation motivates the development of community aware experimental design. This design recognizes that information between individuals likely flows along within community edges rather than across community edges. We demonstrate that this design reduces the bias of a simple difference in means estimator, even when the community structure of the graph needs to be estimated. Further, we show that as the community detection problem gets more difficult or if the community structure does not affect the causal question, the proposed design maintains its performance.
      PubDate: 2023-01-23
       
  • Bayesian local bandwidths in a flexible semiparametric kernel estimation
           for multivariate count data with diagnostics

    • Free pre-print version: Loading...

      Abstract: Abstract In this paper, we consider a flexible semiparametric approach for estimating multivariate probability mass functions. The corresponding estimator is governed by a parametric starter, for instance a multivariate Poisson distribution with nonnegative cross correlations which is basically estimated through an expectation–maximization algorithm, and a nonparametric part which is an unknown weight discrete function to be smoothed through multiple binomial kernels. Our central focus is upon the selection matrix of bandwidths by the local Bayesian method. We additionally discuss the diagnostic model to enact an appropriate choice between the parametric, semiparametric and nonparametric approaches. Retaining a pure nonparametric method implies losing parametric benefices in this modelling framework. Practical applications, including a tail probability estimation, on multivariate count datasets are analyzed under several scenarios of correlations and dispersions. This semiparametic approach demonstrates superior performances and better interpretations compared to parametric and nonparametric ones.
      PubDate: 2023-01-23
       
  • Hierarchical Bayes small area estimation for county-level health
           prevalence to having a personal doctor

    • Free pre-print version: Loading...

      Abstract: Abstract The complexity of survey data and the availability of data from auxiliary sources motivate researchers to explore estimation methods that extend beyond traditional survey-based estimation. The U.S. Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System (BRFSS) collects a wide range of health information, including whether respondents have a personal doctor. While the BRFSS focuses on state-level estimation, there is demand for county-level estimation of health indicators using BRFSS data. A hierarchical Bayes small area estimation model is developed to combine county-level BRFSS survey data with county-level data from auxiliary sources, while accounting for various sources of error and nested geographical levels. To mitigate extreme proportions and unstable survey variances, a transformation is applied to the survey data. Model-based county-level predictions are constructed for prevalence of having a personal doctor for all the counties in the U.S., including those where BRFSS survey data were not available. An evaluation study using only the counties with large BRFSS sample sizes to fit the model versus using all the counties with BRFSS data to fit the model is also presented.
      PubDate: 2022-12-22
       
  • A new autoregressive process driven by explanatory variables and past
           observations: an application to PM 2.5

    • Free pre-print version: Loading...

      Abstract: Abstract This paper uses the empirical likelihood (EL) method for a new random coefficient autoregressive process driven by explanatory variables and past observations through logistic structure (OD-RCAR (1)), which combines explanatory variables and past observations, and puts forward the penalized maximum empirical likelihood (PMEL) method for parameters estimation and variable selection. Firstly, limiting distributions of the estimating function and log empirical likelihood ratio statistics based on EL are established. Meanwhile, this paper sets up a confidence region and EL test for parameters. Secondly, the maximum empirical likelihood estimators and their asymptotic properties are obtained. At the same time, the penalized empirical likelihood ratio test statistic is given. Thirdly, it is proved in a high-dimensional setting that the PMEL in our model can solve the problem of order selection and parameter estimation. Finally, not only practical data applications but also numerical simulations are adopted in order to describe the performance of proposed methods.
      PubDate: 2022-12-13
       
  • Assessing maths learning gaps using Italian longitudinal data

    • Free pre-print version: Loading...

      Abstract: Abstract In the educational context, one of the main goals is to reduce the disparities among students, generally at the national level, to allow all individuals to achieve a similar cultural background. Using data from a large-scale standardised test administered by INVALSI (National Institute for the Evaluation of the Educational System), this paper offers a first longitudinal analysis of the performance in the maths test of a cohort of students enrolled in 2013/2014 at grade 8 and observed up to grade 13. The aim is to identify those obstacles that undermine students’ learning to help adopt informed educational actions. Specific features of these data are their hierarchical structure and the presence of not vertically scaled scores. Two approaches have been followed for their analysis: growth models and growth percentiles. Coherently with the literature, our results suggest the presence of a gender gap, a significant impact of the type of school, and of social-cultural background. Differently from previous research on the INVALSI data, we evaluate these time-invariant covariates’ effects on students’ performance over different school cycles.
      PubDate: 2022-12-13
       
  • The relative importance of ability, luck and motivation in team sports: a
           Bayesian model of performance in the English Rugby Premiership

    • Free pre-print version: Loading...

      Abstract: Abstract Results in contact sports like Rugby are mainly interpreted in terms of the ability and/or luck of teams. But this neglects the important role of the motivation of players, reflected in the effort exerted in the game. Here we present a Bayesian hierarchical model to infer the main features that explain score differences in rugby matches of the English Premiership Rugby 2020/2021 season. The main result is that, indeed, effort (seen as a ratio between the number of tries and the scoring kick attempts) is highly relevant to explain outcomes in those matches.
      PubDate: 2022-12-08
       
  • Model-based clustering via mixtures of unrestricted skew normal factor
           analyzers with complete and incomplete data

    • Free pre-print version: Loading...

      Abstract: Abstract Mixtures of factor analyzers (MFA) based on the restricted skew normal distribution (rMSN) have emerged as a flexible tool to handle asymmetrical high-dimensional data with heterogeneity. However, the rMSN distribution is oft-criticized a lack of sufficient ability to accommodate potential skewness arisen from more than one feature space. This paper presents an alternative extension of MFA by assuming the unrestricted skew normal (uMSN) distribution for the component factors. In particular, the proposed mixtures of unrestricted skew normal factor analyzers (MuSNFA) can simultaneously capture multiple directions of skewness and deal with the occurrence of missing values or nonresponses. Under the missing at random (MAR) mechanism, we develop a computationally feasible expectation conditional maximization (ECM) algorithm for computing the maximum likelihood estimates of model parameters. Practical aspects related to model-based clustering, prediction of factor scores and imputation of missing values are also discussed. The utility of the proposed methodology is illustrated with the analysis of simulated and real datasets.
      PubDate: 2022-12-06
       
  • Student’s-t process with spatial deformation for spatio-temporal
           data

    • Free pre-print version: Loading...

      Abstract: Abstract Many models for environmental data that are observed in time and space have been proposed in the literature. The main objective of these models is usually to make predictions in time and to perform interpolations in space. Realistic predictions and interpolations are obtained when the process and its variability are well represented through a model that takes into consideration its peculiarities. In this paper, we propose a spatio-temporal model to handle observations that come from distributions with heavy tails and for which the assumption of isotropy is not realistic. As a natural choice for a heavy-tailed model, we take a Student’s-t distribution. The Student’s-t distribution, while being symmetric, provides greater flexibility in modeling data with kurtosis and shape different from the Gaussian distribution. We handle anisotropy through a spatial deformation method. Under this approach, the original geographic space of observations gets mapped into a new space where isotropy holds. Our main result is, therefore, an anisotropic model based on the heavy-tailed t distribution. Bayesian approach and the use of MCMC enable us to sample from the posterior distribution of the model parameters. In Sect. 2, we discuss the main properties of the proposed model. In Sect. 3, we present a simulation study, showing its superiority over the traditional isotropic Gaussian model. In Sect. 4, we show the motivation that has led us to propose the t distribution-based anisotropic model—the real dataset of evaporation coming from the Rio Grande do Sul state of Brazil.
      PubDate: 2022-12-01
       
  • Double-calibration estimators accounting for under-coverage and
           nonresponse in socio-economic surveys

    • Free pre-print version: Loading...

      Abstract: Abstract Under-coverage and nonresponse problems are jointly present in most socio-economic surveys. The purpose of this paper is to propose an estimation strategy that accounts for both problems by performing a two-step calibration. The first calibration exploits a set of auxiliary variables only available for the units in the sampled population to account for nonresponse. The second calibration exploits a different set of auxiliary variables available for the whole population, to account for under-coverage. The two calibrations are then unified in a double-calibration estimator. Mean and variance of the estimator are derived up to the first order of approximation. Conditions ensuring approximate unbiasedness are derived and discussed. The strategy is empirically checked by a simulation study performed on a set of artificial populations. A case study is derived from the European Union Statistics on Income and Living Conditions survey data. The strategy proposed is flexible and suitable in most situations in which both under-coverage and nonresponse are present.
      PubDate: 2022-12-01
       
  • Spare time use: profiles of Italian Millennials (beyond the media hype)

    • Free pre-print version: Loading...

      Abstract: Abstract This paper focuses on a particular population segment, that of Millennials, which has attracted much attention over recent years. Beyond the media hype, little is known about the habits of this generation towards spare time use. The present study builds on a previous work devoted to detect the different ways Italian Millennials interact with spare time, and aims at identifying profiles of Millennials branded with profile-specific time use habits and styles. In so doing, we (i) account for the multidimensional nature of time use attitude and express it into a reduced number of distinct dimensions and (ii) identify and qualify profiles of Millennials as regards the ascertained time use dimensions. By relying on an extended Item Response Theory model applied to the Italian “Multipurpose survey on households”, our main findings reveal that the way Millennials use spare time and interact with technology is much more complex, varied and multifaceted than what claimed by the media.
      PubDate: 2022-12-01
       
  • The inextricable association of measurement errors and tax evasion as
           examined through a microanalysis of survey data matched with fiscal data:
           a case study

    • Free pre-print version: Loading...

      Abstract: Abstract Individual records referring to personal interviews conducted for a survey on income in Modena during 2012 and tax year 2011 were matched with the corresponding records in the Italian Ministry of Finance databases containing fiscal income data for tax year 2011. The analysis of the resulting data set suggested that the fiscal income data were generally more reliable than the surveyed income data. Moreover, the obtained data set enabled identification of the factors determining over- and under-reporting, as well as measurement errors, through a comparison of the surveyed income data with the fiscal income data, only for suitable categories of interviewees, that is, taxpayers who are forced to respect the tax laws (the public sector) and taxpayers who have many evasion options (the private sector). The percentage of under-reporters (67.3%) was higher than the percentage of over-reporters (32.7%). Level of income, age, and education were the main regressors affecting measurement errors and the behaviours of tax evaders. Tax evasion and the impacts of personal factors affecting evasion were evaluated using various approaches. The average tax evasion amounted to 26.0% of the fiscal income. About 10% of the sample was made up of possible total tax evaders.
      PubDate: 2022-12-01
       
  • Quantile regression via the EM algorithm for joint modeling of mixed
           discrete and continuous data based on Gaussian copula

    • Free pre-print version: Loading...

      Abstract: Abstract In this paper, we develop a joint quantile regression model for correlated mixed discrete and continuous data using Gaussian copula. Our approach entails specifying marginal quantile regression models for the responses, and combining them via a copula to form a joint model. For modeling the quantiles of continuous response an asymmetric Laplace (AL) distribution is assigned to the error terms in both continuous and discrete models. For modeling the discrete response an underlying latent variable model and the threshold concept are used. Quantile regression for discrete responses can be fitted using monotone equivariance property of quantiles. By assuming a latent variable framework to describe discrete responses, the applied proposed copula still uniquely determines the joint distribution. The likelihood function of the joint model have also a tractable form but it is not differentiable in some points of the parameter space. However, by using the stochastic representation of AL distribution, the maximum likelihood estimate of parameters are obtained using an EM algorithm and also in order to carry out inference about parameters Bootstrap confidence intervals are specified using a Monte Carlo technique. Some simulation studies are performed to illustrate the performance of the model. Finally, we illustrate applications of the proposed approach using burn injuries data.
      PubDate: 2022-12-01
       
  • On a bivariate copula for modeling negative dependence: application to New
           York air quality data

    • Free pre-print version: Loading...

      Abstract: Abstract In many practical scenarios, including finance, environmental sciences, system reliability, etc., it is often of interest to study the various notion of negative dependence among the observed variables. A new bivariate copula is proposed for modeling negative dependence between two random variables that complies with most of the popular notions of negative dependence reported in the literature. Specifically, the Spearman’s rho and the Kendall’s tau for the proposed copula have a simple one-parameter form with negative values in the full range. Some important ordering properties comparing the strength of negative dependence with respect to the parameter involved are considered. Simple examples of the corresponding bivariate distributions with popular marginals are presented. Application of the proposed copula is illustrated using a real data set on air quality in the New York City, USA.
      PubDate: 2022-12-01
       
  • Joint modeling for longitudinal covariate and binary outcome via
           h-likelihood

    • Free pre-print version: Loading...

      Abstract: Abstract Joint modeling techniques of longitudinal covariates and binary outcomes have attracted considerable attention in medical research. The basic strategy for estimating the coefficients of joint models is to define a joint likelihood based on two submodels with shared random effects. Numerical integration, however, is required in the estimation step for the joint likelihood, which is computationally expensive due to the complexity of the assumed submodels. To overcome this issue, we propose a joint modeling procedure using the h-likelihood to avoid numerical integration in the estimation algorithm. We conduct Monte Carlo simulations to investigate the effectiveness of our proposed modeling procedures by evaluating both the accuracy of the parameter estimates and computational time. The accuracy of the proposed procedure is compared to the two-stage modeling and numerical integration approaches. We also validate our proposed modeling procedure by applying it to the analysis of real data.
      PubDate: 2022-12-01
       
  • A procedure for testing the hypothesis of weak efficiency in financial
           markets: a Monte Carlo simulation

    • Free pre-print version: Loading...

      Abstract: Abstract The weak form of the efficient market hypothesis is identified with the conditions established by different types of random walks (1–3) on the returns associated with the prices of a financial asset. The methods traditionally applied for testing weak efficiency in a financial market as stated by the random walk model test only some necessary, but not sufficient, condition of this model. Thus, a procedure is proposed to detect if a return series associated with a given price index follows a random walk and, if so, what type it is. The procedure combines methods that test only a necessary, but not sufficient, condition for the fulfilment of the random walk hypothesis and methods that directly test a particular type of random walk. The proposed procedure is evaluated by means of a Monte Carlo experiment, and the results show that this procedure performs better (more powerful) against linear correlation-only alternatives when starting from the Ljung–Box test. On the other hand, against the random walk type 3 alternative, the procedure is more powerful when it is initiated from the BDS test.
      PubDate: 2022-12-01
       
  • Markov models for duration-dependent transitions: selecting the states
           using duration values or duration intervals'

    • Free pre-print version: Loading...

      Abstract: Abstract In a Markov model the transition probabilities between states do not depend on the time spent in the current state. The present paper explores two ways of selecting the states of a discrete-time Markov model for a system partitioned into categories where the duration of stay in a category affects the probability of transition to another category. For a set of panel data, we compare the likelihood fits of the Markov models with states based on duration intervals and with states defined by duration values. For hierarchical systems, we show that the model with states based on duration values has a better maximum likelihood fit than the baseline Markov model where the states are the categories. We also prove that this is not the case for the duration-interval model, under conditions on the data that seem realistic in practice. Furthermore, we use the Akaike and Bayesian information criteria to compare these alternative Markov models. The theoretical findings are illustrated by an analysis of a real-world personnel data set.
      PubDate: 2022-12-01
       
  • A Bayesian approach to model individual differences and to partition
           individuals: case studies in growth and learning curves

    • Free pre-print version: Loading...

      Abstract: Abstract The first objective of the paper is to implement a two stage Bayesian hierarchical nonlinear model for growth and learning curves, particular cases of longitudinal data with an underlying nonlinear time dependence. The aim is to model simultaneously individual trajectories over time, each with specific and potentially different characteristics, and a time-dependent behavior shared among individuals, including eventual effect of covariates. At the first stage inter-individual differences are taken into account, while, at the second stage, we search for an average model. The second objective is to partition individuals into homogeneous groups, when inter individual parameters present high level of heterogeneity. A new multivariate partitioning approach is proposed to cluster individuals according to the posterior distributions of the parameters describing the individual time-dependent behaviour. To assess the proposed methods, we present simulated data and two applications to real data, one related to growth curve modeling in agriculture and one related to learning curves for motor skills. Furthermore a comparison with finite mixture analysis is shown.
      PubDate: 2022-12-01
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 35.172.230.154
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-