Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Network information has become a common feature of many modern experiments. From vaccine efficacy studies to marketing for product adoption, stakeholders aim to estimate global treatment effects — what happens if everyone in a network is treated versus if no one is treated. Because individual outcomes are potentially influenced by the treatments or behaviors of others in the network, experimental designs must condition on the underlying network. Social networks frequently exhibit homophilous community structure, meaning that individuals within observed or latent communities are more similar to each. This observation motivates the development of community aware experimental design. This design recognizes that information between individuals likely flows along within community edges rather than across community edges. We demonstrate that this design reduces the bias of a simple difference in means estimator, even when the community structure of the graph needs to be estimated. Further, we show that as the community detection problem gets more difficult or if the community structure does not affect the causal question, the proposed design maintains its performance. PubDate: 2023-01-23
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this paper, we consider a flexible semiparametric approach for estimating multivariate probability mass functions. The corresponding estimator is governed by a parametric starter, for instance a multivariate Poisson distribution with nonnegative cross correlations which is basically estimated through an expectation–maximization algorithm, and a nonparametric part which is an unknown weight discrete function to be smoothed through multiple binomial kernels. Our central focus is upon the selection matrix of bandwidths by the local Bayesian method. We additionally discuss the diagnostic model to enact an appropriate choice between the parametric, semiparametric and nonparametric approaches. Retaining a pure nonparametric method implies losing parametric benefices in this modelling framework. Practical applications, including a tail probability estimation, on multivariate count datasets are analyzed under several scenarios of correlations and dispersions. This semiparametic approach demonstrates superior performances and better interpretations compared to parametric and nonparametric ones. PubDate: 2023-01-23
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The complexity of survey data and the availability of data from auxiliary sources motivate researchers to explore estimation methods that extend beyond traditional survey-based estimation. The U.S. Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System (BRFSS) collects a wide range of health information, including whether respondents have a personal doctor. While the BRFSS focuses on state-level estimation, there is demand for county-level estimation of health indicators using BRFSS data. A hierarchical Bayes small area estimation model is developed to combine county-level BRFSS survey data with county-level data from auxiliary sources, while accounting for various sources of error and nested geographical levels. To mitigate extreme proportions and unstable survey variances, a transformation is applied to the survey data. Model-based county-level predictions are constructed for prevalence of having a personal doctor for all the counties in the U.S., including those where BRFSS survey data were not available. An evaluation study using only the counties with large BRFSS sample sizes to fit the model versus using all the counties with BRFSS data to fit the model is also presented. PubDate: 2022-12-22
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This paper uses the empirical likelihood (EL) method for a new random coefficient autoregressive process driven by explanatory variables and past observations through logistic structure (OD-RCAR (1)), which combines explanatory variables and past observations, and puts forward the penalized maximum empirical likelihood (PMEL) method for parameters estimation and variable selection. Firstly, limiting distributions of the estimating function and log empirical likelihood ratio statistics based on EL are established. Meanwhile, this paper sets up a confidence region and EL test for parameters. Secondly, the maximum empirical likelihood estimators and their asymptotic properties are obtained. At the same time, the penalized empirical likelihood ratio test statistic is given. Thirdly, it is proved in a high-dimensional setting that the PMEL in our model can solve the problem of order selection and parameter estimation. Finally, not only practical data applications but also numerical simulations are adopted in order to describe the performance of proposed methods. PubDate: 2022-12-13
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In the educational context, one of the main goals is to reduce the disparities among students, generally at the national level, to allow all individuals to achieve a similar cultural background. Using data from a large-scale standardised test administered by INVALSI (National Institute for the Evaluation of the Educational System), this paper offers a first longitudinal analysis of the performance in the maths test of a cohort of students enrolled in 2013/2014 at grade 8 and observed up to grade 13. The aim is to identify those obstacles that undermine students’ learning to help adopt informed educational actions. Specific features of these data are their hierarchical structure and the presence of not vertically scaled scores. Two approaches have been followed for their analysis: growth models and growth percentiles. Coherently with the literature, our results suggest the presence of a gender gap, a significant impact of the type of school, and of social-cultural background. Differently from previous research on the INVALSI data, we evaluate these time-invariant covariates’ effects on students’ performance over different school cycles. PubDate: 2022-12-13
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Results in contact sports like Rugby are mainly interpreted in terms of the ability and/or luck of teams. But this neglects the important role of the motivation of players, reflected in the effort exerted in the game. Here we present a Bayesian hierarchical model to infer the main features that explain score differences in rugby matches of the English Premiership Rugby 2020/2021 season. The main result is that, indeed, effort (seen as a ratio between the number of tries and the scoring kick attempts) is highly relevant to explain outcomes in those matches. PubDate: 2022-12-08
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Mixtures of factor analyzers (MFA) based on the restricted skew normal distribution (rMSN) have emerged as a flexible tool to handle asymmetrical high-dimensional data with heterogeneity. However, the rMSN distribution is oft-criticized a lack of sufficient ability to accommodate potential skewness arisen from more than one feature space. This paper presents an alternative extension of MFA by assuming the unrestricted skew normal (uMSN) distribution for the component factors. In particular, the proposed mixtures of unrestricted skew normal factor analyzers (MuSNFA) can simultaneously capture multiple directions of skewness and deal with the occurrence of missing values or nonresponses. Under the missing at random (MAR) mechanism, we develop a computationally feasible expectation conditional maximization (ECM) algorithm for computing the maximum likelihood estimates of model parameters. Practical aspects related to model-based clustering, prediction of factor scores and imputation of missing values are also discussed. The utility of the proposed methodology is illustrated with the analysis of simulated and real datasets. PubDate: 2022-12-06
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Many models for environmental data that are observed in time and space have been proposed in the literature. The main objective of these models is usually to make predictions in time and to perform interpolations in space. Realistic predictions and interpolations are obtained when the process and its variability are well represented through a model that takes into consideration its peculiarities. In this paper, we propose a spatio-temporal model to handle observations that come from distributions with heavy tails and for which the assumption of isotropy is not realistic. As a natural choice for a heavy-tailed model, we take a Student’s-t distribution. The Student’s-t distribution, while being symmetric, provides greater flexibility in modeling data with kurtosis and shape different from the Gaussian distribution. We handle anisotropy through a spatial deformation method. Under this approach, the original geographic space of observations gets mapped into a new space where isotropy holds. Our main result is, therefore, an anisotropic model based on the heavy-tailed t distribution. Bayesian approach and the use of MCMC enable us to sample from the posterior distribution of the model parameters. In Sect. 2, we discuss the main properties of the proposed model. In Sect. 3, we present a simulation study, showing its superiority over the traditional isotropic Gaussian model. In Sect. 4, we show the motivation that has led us to propose the t distribution-based anisotropic model—the real dataset of evaporation coming from the Rio Grande do Sul state of Brazil. PubDate: 2022-12-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Under-coverage and nonresponse problems are jointly present in most socio-economic surveys. The purpose of this paper is to propose an estimation strategy that accounts for both problems by performing a two-step calibration. The first calibration exploits a set of auxiliary variables only available for the units in the sampled population to account for nonresponse. The second calibration exploits a different set of auxiliary variables available for the whole population, to account for under-coverage. The two calibrations are then unified in a double-calibration estimator. Mean and variance of the estimator are derived up to the first order of approximation. Conditions ensuring approximate unbiasedness are derived and discussed. The strategy is empirically checked by a simulation study performed on a set of artificial populations. A case study is derived from the European Union Statistics on Income and Living Conditions survey data. The strategy proposed is flexible and suitable in most situations in which both under-coverage and nonresponse are present. PubDate: 2022-12-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This paper focuses on a particular population segment, that of Millennials, which has attracted much attention over recent years. Beyond the media hype, little is known about the habits of this generation towards spare time use. The present study builds on a previous work devoted to detect the different ways Italian Millennials interact with spare time, and aims at identifying profiles of Millennials branded with profile-specific time use habits and styles. In so doing, we (i) account for the multidimensional nature of time use attitude and express it into a reduced number of distinct dimensions and (ii) identify and qualify profiles of Millennials as regards the ascertained time use dimensions. By relying on an extended Item Response Theory model applied to the Italian “Multipurpose survey on households”, our main findings reveal that the way Millennials use spare time and interact with technology is much more complex, varied and multifaceted than what claimed by the media. PubDate: 2022-12-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Individual records referring to personal interviews conducted for a survey on income in Modena during 2012 and tax year 2011 were matched with the corresponding records in the Italian Ministry of Finance databases containing fiscal income data for tax year 2011. The analysis of the resulting data set suggested that the fiscal income data were generally more reliable than the surveyed income data. Moreover, the obtained data set enabled identification of the factors determining over- and under-reporting, as well as measurement errors, through a comparison of the surveyed income data with the fiscal income data, only for suitable categories of interviewees, that is, taxpayers who are forced to respect the tax laws (the public sector) and taxpayers who have many evasion options (the private sector). The percentage of under-reporters (67.3%) was higher than the percentage of over-reporters (32.7%). Level of income, age, and education were the main regressors affecting measurement errors and the behaviours of tax evaders. Tax evasion and the impacts of personal factors affecting evasion were evaluated using various approaches. The average tax evasion amounted to 26.0% of the fiscal income. About 10% of the sample was made up of possible total tax evaders. PubDate: 2022-12-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this paper, we develop a joint quantile regression model for correlated mixed discrete and continuous data using Gaussian copula. Our approach entails specifying marginal quantile regression models for the responses, and combining them via a copula to form a joint model. For modeling the quantiles of continuous response an asymmetric Laplace (AL) distribution is assigned to the error terms in both continuous and discrete models. For modeling the discrete response an underlying latent variable model and the threshold concept are used. Quantile regression for discrete responses can be fitted using monotone equivariance property of quantiles. By assuming a latent variable framework to describe discrete responses, the applied proposed copula still uniquely determines the joint distribution. The likelihood function of the joint model have also a tractable form but it is not differentiable in some points of the parameter space. However, by using the stochastic representation of AL distribution, the maximum likelihood estimate of parameters are obtained using an EM algorithm and also in order to carry out inference about parameters Bootstrap confidence intervals are specified using a Monte Carlo technique. Some simulation studies are performed to illustrate the performance of the model. Finally, we illustrate applications of the proposed approach using burn injuries data. PubDate: 2022-12-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In many practical scenarios, including finance, environmental sciences, system reliability, etc., it is often of interest to study the various notion of negative dependence among the observed variables. A new bivariate copula is proposed for modeling negative dependence between two random variables that complies with most of the popular notions of negative dependence reported in the literature. Specifically, the Spearman’s rho and the Kendall’s tau for the proposed copula have a simple one-parameter form with negative values in the full range. Some important ordering properties comparing the strength of negative dependence with respect to the parameter involved are considered. Simple examples of the corresponding bivariate distributions with popular marginals are presented. Application of the proposed copula is illustrated using a real data set on air quality in the New York City, USA. PubDate: 2022-12-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Joint modeling techniques of longitudinal covariates and binary outcomes have attracted considerable attention in medical research. The basic strategy for estimating the coefficients of joint models is to define a joint likelihood based on two submodels with shared random effects. Numerical integration, however, is required in the estimation step for the joint likelihood, which is computationally expensive due to the complexity of the assumed submodels. To overcome this issue, we propose a joint modeling procedure using the h-likelihood to avoid numerical integration in the estimation algorithm. We conduct Monte Carlo simulations to investigate the effectiveness of our proposed modeling procedures by evaluating both the accuracy of the parameter estimates and computational time. The accuracy of the proposed procedure is compared to the two-stage modeling and numerical integration approaches. We also validate our proposed modeling procedure by applying it to the analysis of real data. PubDate: 2022-12-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The weak form of the efficient market hypothesis is identified with the conditions established by different types of random walks (1–3) on the returns associated with the prices of a financial asset. The methods traditionally applied for testing weak efficiency in a financial market as stated by the random walk model test only some necessary, but not sufficient, condition of this model. Thus, a procedure is proposed to detect if a return series associated with a given price index follows a random walk and, if so, what type it is. The procedure combines methods that test only a necessary, but not sufficient, condition for the fulfilment of the random walk hypothesis and methods that directly test a particular type of random walk. The proposed procedure is evaluated by means of a Monte Carlo experiment, and the results show that this procedure performs better (more powerful) against linear correlation-only alternatives when starting from the Ljung–Box test. On the other hand, against the random walk type 3 alternative, the procedure is more powerful when it is initiated from the BDS test. PubDate: 2022-12-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In a Markov model the transition probabilities between states do not depend on the time spent in the current state. The present paper explores two ways of selecting the states of a discrete-time Markov model for a system partitioned into categories where the duration of stay in a category affects the probability of transition to another category. For a set of panel data, we compare the likelihood fits of the Markov models with states based on duration intervals and with states defined by duration values. For hierarchical systems, we show that the model with states based on duration values has a better maximum likelihood fit than the baseline Markov model where the states are the categories. We also prove that this is not the case for the duration-interval model, under conditions on the data that seem realistic in practice. Furthermore, we use the Akaike and Bayesian information criteria to compare these alternative Markov models. The theoretical findings are illustrated by an analysis of a real-world personnel data set. PubDate: 2022-12-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The first objective of the paper is to implement a two stage Bayesian hierarchical nonlinear model for growth and learning curves, particular cases of longitudinal data with an underlying nonlinear time dependence. The aim is to model simultaneously individual trajectories over time, each with specific and potentially different characteristics, and a time-dependent behavior shared among individuals, including eventual effect of covariates. At the first stage inter-individual differences are taken into account, while, at the second stage, we search for an average model. The second objective is to partition individuals into homogeneous groups, when inter individual parameters present high level of heterogeneity. A new multivariate partitioning approach is proposed to cluster individuals according to the posterior distributions of the parameters describing the individual time-dependent behaviour. To assess the proposed methods, we present simulated data and two applications to real data, one related to growth curve modeling in agriculture and one related to learning curves for motor skills. Furthermore a comparison with finite mixture analysis is shown. PubDate: 2022-12-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Educational researchers have increasingly recognised the importance of school climate as a malleable factor for improving academic performance. In this perspective, we exploit the data collected by the Italian Institute for the Evaluation of the Education System (INVALSI) to assess the effect of some school climate related factors on academic performance of tenth-grade Italian students. A Multilevel Bayesian Structural Equation Model (MBSEM) is adopted to highlight the effect of some relevant dimensions of school climate (students’ disciplinary behaviour and parents’ involvement) on academic performance and their role on the relationships between student socioeconomic status and achievement. The main findings show that disciplinary behaviour, on the one hand, directly influences the level of competence of the students, and, on the other hand, it partly mediates the effect of socioeconomic background whereas parents’ involvement does not appear to exert any significant effect on students’ performance. PubDate: 2022-12-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The link between Obesity and Hypertension is among the most popular topics which have been explored in medical research in recent decades. However, it is challenging to establish the relationship comprehensively and accurately because the distribution of BMI and blood pressure is usually fat tailed and severely tied. In this paper, we propose a data-driven copulas selection approach via penalized likelihood which can deal with tied data by interval censoring estimation. Minimax Concave Penalty is involved to perform the unbiased selection of mixed copula model for its convergence property to get un-penalized solution. Interval censoring and maximizing pseudo-likelihood, inspired from survival analysis, is introduced by considering ranks as intervals with upper and lower limits. This paper describes the model and corresponding iterative algorithm. Simulations to compare the proposed approach versus existing methods in different scenarios are presented. Additionally, the proposed method is also applied to the association modeling on the China Health and Nutrition Survey (CHNS) data. Both numerical studies and real data analysis reveal good performance of the proposed method. PubDate: 2022-12-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The spectral clustering algorithm is a technique based on the properties of the pairwise similarity matrix coming from a suitable kernel function. It is a useful approach for high-dimensional data since the units are clustered in feature space with a reduced number of dimensions. In this paper, we consider a two-step model-based approach within the spectral clustering framework. Based on simulated data, first, we discuss criteria for selecting the number of clusters and analyzing the robustness of the model-based approach concerning the choice of the proximity parameters of the kernel functions. Finally, we consider applications of the spectral methods to cluster five real textual datasets and, in this framework, a new kernel function is also proposed. The approach is illustrated on the ground of a large numerical study based on both simulated and real datasets. PubDate: 2022-12-01