Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This article proposes to use copulas to characterize the collider bias that concerns the non-substantive change in the causal dependence between variables before and after conditioning on their common effect (collider). This copula-based portrayal allows scholars (1) to capture the sophisticated (e.g., asymmetric or heavy-tail) causal dependence structure that is usually not evidenced by a summative causal effect estimate, such as the regression coefficient based on a well-matched sample; (2) to focus on the causal dependence structure that is insensitive to the influences from the marginal distributions; and (3) to directly and formally test the significance of change in the causal dependence structure using the Cramér–von Mises statistic. Both simulation and real data examples are presented, which suggest that copulas can be a handy tool for practical researchers to describe the collider bias. PubDate: 2023-11-20
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Mechanistic models are key to providing reliable information for developing infectious disease control strategies. In general, these models are fitted in Bayesian Markov chain Monte Carlo (MCMC) frameworks that incorporate heterogeneities within a population. However, these frameworks have the major drawback of being computationally expensive. This problem is even more severe when the epidemic history is incomplete, such as unknown infection times. Instead of using the time-consuming Bayesian MCMC methods, this paper explores the use of supervised classification methods to analyze the infectious disease data incorporating infection time uncertainty. The epidemic generating models are classified based on summary statistics of epidemics as inputs. The validity of these methods is investigated by using simulated epidemic data and Tomato Spotted Wilt Virus (TSWV) data, accounting for unknown infectious periods and infection times of individuals. We show that these methods are capable of capturing biological characteristics of disease transmission dynamics when there is infection time uncertainty in infectious disease data. PubDate: 2023-11-20
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper contributes to the research on the development of comparable composite indicators by introducing a Functional Weighted Malmquist Productive Index that allows for comparative trend analysis. In analogy with entropy-based weighted methods, this novel dynamic indicator is derived by measuring the degree of diversification of the single method through a family of diversity indices. The paper has the merit of proposing a new dynamic composite indicator that supplements the analysis with Functional Data Analysis (FDA) tools that provide us with useful information about the order and dynamics of the composite index trajectories. The simulation study set up in this paper raises doubts about the robustness of the entropy-based weighted methods while the application of the new index to well-being dataset highlights its practical appeal. PubDate: 2023-11-07
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper introduces an area-level Poisson mixed model with SAR(1) spatially correlated random effects. Small area predictors of proportions and counts are derived from the new model and the corresponding mean squared errors are estimated by parametric bootstrap. The behaviour of the introduced predictors is empirically investigated by running model-based simulation experiments. An application to real data from the Spanish living conditions survey of Galicia (Spain) is given. The target is the estimation of domain proportions of women under the poverty line. PubDate: 2023-10-31
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The special issue on Advanced Statistical Modeling and Causal Inference with Complex data for Better Decision Making has been inspired by the developments of innovative models and methods to answer challenging substantive questions and support policy choices and decisions. It includes a selection of twelve papers with cutting-edge methodological developments motivated by unique applications with challenging data structures from various fields of science. PubDate: 2023-10-19
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Forecasting the volume of emergency events is important for resource utilization in emergency medical services (EMS). This became more evident during the COVID-19 outbreak when emergency event forecasts used by various EMS at that time tended to be inaccurate due to fluctuations in the number, type, and geographical distribution of these events. The motivation for this study was to develop a statistical model capable of predicting the volume of emergency events for Lombardy’s regional EMS called AREU at different time horizons. To accomplish this goal, we propose a negative binomial additive autoregressive model with smoothing splines, which can predict over-dispersed counts of emergency events one, two, five, and seven days ahead. In the model development stage, a large set of covariates was considered, and the final model was selected using a cross-validation procedure that takes into account the observations’ temporal dependence. Comparisons of the forecasting performance using the mean absolute percentage error showed that the proposed model outperformed the model used by AREU, as well as other widely used forecasting models. Consequently, AREU decided to adopt the new model for its forecasting purposes. PubDate: 2023-10-16
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Sky surveys represent the fundamental data basis for detecting and locating as yet undiscovered celestial objects. Since 2008, the Fermi LAT Collaboration has catalogued thousands of \(\gamma\) -ray sources with the aim of extending our knowledge of the highly energetic physical mechanisms and processes that lie at the core of our Universe. In this article, we present a nonparametric clustering algorithm which identifies high-energy astronomical sources using the spatial information of the \(\gamma\) -ray photons detected by the large area telescope onboard the Fermi spacecraft. In particular, the sources are identified using a von Mises–Fisher kernel estimate of the photon count density on the unit sphere via an adjustment of the mean-shift algorithm which accounts for the directional nature of the collected data and the need of local smoothing. This choice entails a number of desirable benefits. It allows us to bypass the difficulties inherent on the borders of any projection of the photon directions onto a 2-dimensional plane, while guaranteeing high flexibility. The smoothing parameter is chosen adaptively, by combining scientific input with optimal selection guidelines, as known from the literature. Using statistical tools from hypothesis testing and classification, we furthermore present an automatic way to skim off sound candidate sources from the \(\gamma\) -ray emitting diffuse background and to quantify their significance. We calibrate and test our algorithm on simulated count maps provided by the Fermi LAT Collaboration. PubDate: 2023-10-05
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Covid-19 vaccination has posed crucial challenges to policymakers and health administrations worldwide. Besides the pressure posed by the pandemic, government administrations have to strive against vaccine hesitancy, which seems to be higher with respect to previous vaccination rollouts. To increase the vaccinated population, Ohio announced a monetary incentive as a lottery for those who were vaccinated. 18 other states followed this first example, with varying results. In this paper, we want to evaluate the effect of such policies within the potential outcome framework using the penalized synthetic control method. In the context of staggered treatment adoption, we estimate the effects at a disaggregated level using a panel dataset. We focused on policy outcomes at the county, state, and supra-state levels, highlighting differences between counties with different social characteristics and time frames for policy introduction. We also studied the treatment effect to see whether the impact of these monetary incentives was permanent or only temporary, accelerating the vaccination of citizens who would have been vaccinated in any case. PubDate: 2023-10-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The extension of quantile regression to count data raises several issues. We compare the traditional approach, based on transforming the count variable using jittering, with a recently proposed approach in which the coefficients of quantile regression are modelled by parametric functions. We exploit both methods to analyse university students’ data to evaluate the effect of emergency remote teaching due to COVID-19 on the number of credits earned by the students. The coefficients modelling approach performs a smoothing that is especially convenient in the tails of the distribution, preventing abrupt changes in the point estimates and increasing precision. Nonetheless, model selection is challenging because of the wide range of options and the limited availability of diagnostic tools. Thus the jittering approach remains fundamental to guide the choice of the parametric functions. PubDate: 2023-10-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The identification of territorial clusters where the population suffers from worse health conditions is an important topic in social epidemiology, in order to identify health inequalities in cities and provide health policy interventions. This objective is particularly challenging because of the mechanism of self-selection of individuals into neighborhoods, which causes selection bias. The aim of this paper consists in the identification of neighborhood clusters where elderly people living in Turin, a city in north-western Italy, are exposed to an increased risk of hospitalized fractures. The study is based on administrative data and is a retrospective, observational cohort study. It is composed by a first phase, in which the individual confounding variables are balanced across neighborhoods in order to make them comparable, and a second phase in which the neighborhoods are aggregated into clusters characterized by significantly higher health risk. In the first phase we exploited a balancing technique based on partially ordered sets (poset), called Matching on poset based Average Rank for Multiple Treatments (MARMoT). On the balanced dataset, we used a spatial scan to identify the presence of clusters and we checked whether the risk of fracture is significantly higher in some contiguous areas. The combination of both MARMoT procedure and spatial scan makes it possible to highlight two clusters of neighborhoods in Turin where the risk of incurring hospitalized fractures for elderly people is significantly higher than the mean. These results could have important implications for the implementation of health policies. PubDate: 2023-10-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Motivated by the analysis of rating data concerning perceived health status, a crucial variable in biomedical, economic and life insurance models, the paper deals with diagnostic procedures for identifying anomalous and/or influential observations in ordinal response models with challenging data structures. Deviations due to some respondents’ atypical behavior, outlying covariates and gross errors may affect the reliability of likelihood based inference, especially when non robust link functions are adopted. The present paper investigates and exploits the properties of the generalized residuals. They appear in the estimating equations of the regression coefficients and hold the remarkable characteristic of interacting with the covariates in the same fashion as the linear regression residuals. Identification of statistical units incoherent with the model can be achieved by the analysis of the residuals produced by maximum likelihood or robust M-estimation, while the inspection of the weights generated by M-estimation allows to identify influential data. Simple guidelines are proposed to this end, which disclose information on the data structure. The purpose is twofold: recognizing statistical units that deserve specific attention for their peculiar features, and being aware of the sensitivity of the fitted model to small changes in the sample. In the analysis of the self-perceived health status, extreme design points associated with incoherent responses produce highly influential observations. The diagnostic procedures identify the outliers and assess their influence. PubDate: 2023-10-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In recent years, the increasing number of natural disasters has raised concerns about the sustainability of our planet’s future. As young people comprise the generation that will suffer from the negative effects of climate change, they have become involved in a new climate activism that is also gaining interest in the public debate thanks to the Fridays for Future (FFF) movement. This paper analyses the results of a survey of 1,138 young people in a southern Italian region to explore their perceptions of the extent of environmental problems and their participation in protests of green movements such as the FFF. The statistical analyses perform an ordinal classification tree using an original impurity measure considering both the ordinal nature of the response variable and the heterogeneity of its ordered categories. The results show that respondents are concerned about the threat of climate change and participate in the FFF to claim their right to a healthier planet and encourage people to adopt environmentally friendly practices in their lifestyles. Young people feel they are global citizens, connected through the Internet and social media, and show greater sensitivity to the planet’s environmental problems, so they are willing to take effective action to demand sustainable policies from decision-makers. When planning public policies that will affect future generations, it is important for policymakers to know the demands and opinions of key stakeholders, especially young people, in order to plan the most appropriate measures, such as climate change mitigation. PubDate: 2023-10-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Network information has become a common feature of many modern experiments. From vaccine efficacy studies to marketing for product adoption, stakeholders aim to estimate global treatment effects — what happens if everyone in a network is treated versus if no one is treated. Because individual outcomes are potentially influenced by the treatments or behaviors of others in the network, experimental designs must condition on the underlying network. Social networks frequently exhibit homophilous community structure, meaning that individuals within observed or latent communities are more similar to each. This observation motivates the development of community aware experimental design. This design recognizes that information between individuals likely flows along within community edges rather than across community edges. We demonstrate that this design reduces the bias of a simple difference in means estimator, even when the community structure of the graph needs to be estimated. Further, we show that as the community detection problem gets more difficult or if the community structure does not affect the causal question, the proposed design maintains its performance. PubDate: 2023-10-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this paper, we propose a semi-supervised method to cluster unstructured textual data called semi-supervised sentiment clustering on natural language texts. The aim is to identify clusters homogeneous with respect to the overall sentiment of the texts analyzed. The method combines different techniques and methodologies: Sentiment Analysis, Threshold-based Naïve Bayes classifier, and Network-based Semi-supervised Clustering. It involves different steps. In the first step, the unstructured text is transformed into structured text, and it is categorized into positive or negative classes using a sentiment analysis algorithm. In the second step, the Threshold-based Naïve Bayes classifier is applied to identify the overall sentiment of the texts and to define a specific sentiment value for the topics. In the last step, Network-based Semi-supervised Clustering is applied to partition the instances into disjoint groups. The proposed algorithm is tested on a collection of reviews written by customers on Booking.com. The results have highlighted the capacity of the proposed algorithm to identify clusters that are distinct, non-overlapped, and homogeneous with respect to the overall sentiment. Results are also easily interpretable thanks to the network representation of the instances that helps to understand the relationship between them. PubDate: 2023-10-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We study the problem of estimating a regression function when the predictor and/or the response are circular random variables in the presence of measurement errors. We propose estimators whose weight functions are deconvolution kernels defined according to the nature of the involved variables. We derive the asymptotic properties of the proposed estimators and consider possible generalizations and extensions. We provide some simulation results and a real data case study to illustrate and compare the proposed methods. PubDate: 2023-10-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper looks into the relationship between students’ university choices and their secondary school background. The main aim is to assess the role of secondary schools in steering university applications toward local or non-local institutions, also in the light of the tertiary education supply available in students’ areas of residence. With this aim, we classify students’ mobility choices by using a robust definition of local and non-local universities that accounts for the uncertainty in the definition of students’ local areas and their characteristics. In this framework, we apply a multilevel model to jointly consider the high school effect on the probability of students belonging to one specific category of mobility (local, forced non-local, free non-local) conditional upon students’ macro areas of residence, their chosen university and field of study. The findings highlight that high schools have a relevant role in affecting students’ mobility choices, especially when considering local universities. The magnitude of the effect depends on students’ macro area of residence. In particular, this result highlights that schools may pursue specific guidance policies to address students’ choices toward local universities; furthermore, it suggests that their influence on students is stronger in areas hosting the most important universities. PubDate: 2023-10-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Fuzzy clustering methods allow the objects to belong to several clusters simultaneously, with different degrees of membership. However, a factor that influences the performance of fuzzy algorithms is the value of fuzzifier parameter. In this paper, we propose a fuzzy clustering procedure for data (time) series that does not depend on the definition of a fuzzifier parameter. It comes from two approaches, theoretically motivated for unsupervised and supervised classification cases, respectively. The first is the Probabilistic Distance clustering procedure. The second is the well known Boosting philosophy. Our idea is to adopt a boosting prospective for unsupervised learning problems, in particular we face with non hierarchical clustering problems. The global performance of the proposed method is investigated by various experiments. PubDate: 2023-10-01
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract When a treatment cannot be enforced, but only encouraged, noncompliance naturally arises. In applied economics, the common empirical strategy for dealing with noncompliance is to rely on Instrumental Variables methods. When the effects are heterogeneous, these methods allow, under a set of assumptions, to identify the causal effect for Compliers, i.e., the subset of units whose treatment is affected by the encouragement. One of the identification assumptions is the Exclusion Restriction (ER), which essentially rules out the possibility of a causal effect for Never Takers, i.e., those whose treatment is not affected by the encouragement. In this paper, we show the consequences of violations of this assumption in the impact evaluation of an intervention implemented in Uganda, where targeted households were encouraged to join a community health financing (CHF) scheme through activities of sensitization. We conduct the analyses using Bayesian model-based principal stratification, first assuming and then relaxing the ER for Never Takers. This allows showing the positive impact of the intervention on the health costs of both Compliers and Never Takers. While the causal effects for the former could be due to the encouragement but also to the actual participation in the scheme, those for the latter are unequivocally attributable to the encouragement. This indicates that sensitization alone is extremely effective in reducing vulnerability against health costs. This finding is of paramount importance for policy-making, as it is much easier and more cost-effective to implement awareness-raising campaigns than CHF schemes. PubDate: 2023-10-01