Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We consider the determination of optimal sample sizes to estimate the concentration of organisms in ballast water via a semiparametric Bayesian approach involving a Dirichlet process mixture based on a Poisson model. This semiparametric model provides greater flexibility to model the organism distribution than that allowed by competing parametric models and is robust against misspecification. To obtain the optimal sample size we use a total cost minimization criterion, based on the sum of a Bayes risk and a sampling cost function. Credible intervals obtained via the proposed model may be used to verify compliance of the water with international standards before deballasting. PubDate: 2022-05-13
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In many longitudinal studies, information is collected on the times of different kinds of events. Some of these studies involve repeated events, where a subject or sample unit may experience a well-defined event several times throughout their history. Such events are called recurrent events. In this paper, we introduce nonparametric methods for estimating the marginal and joint distribution functions for recurrent event data. New estimators are introduced and their extensions to several gap times are also given. Nonparametric inference conditional on current or past covariate measures is also considered. We study by simulation the behavior of the proposed estimators in finite samples, considering two or three gap times. Our proposed methods are applied to the study of (multiple) recurrence times in patients with bladder tumors. Software in the form of an R package, called survivalREC, has been developed, implementing all methods. PubDate: 2022-05-11
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This paper proposes a linear approximation of the nonlinear Threshold AutoRegressive model. It is shown that there is a relation between the autoregressive order of the threshold model and the order of its autoregressive moving average approximation. The main advantage of this approximation can be found in the extension of some theoretical results developed in the linear setting to the nonlinear domain. Among them is proposed a new order estimation procedure for threshold models whose performance is compared, through a Monte Carlo study, to other criteria largely employed in the nonlinear threshold context. PubDate: 2022-05-10
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Educational researchers have increasingly recognised the importance of school climate as a malleable factor for improving academic performance. In this perspective, we exploit the data collected by the Italian Institute for the Evaluation of the Education System (INVALSI) to assess the effect of some school climate related factors on academic performance of tenth-grade Italian students. A Multilevel Bayesian Structural Equation Model (MBSEM) is adopted to highlight the effect of some relevant dimensions of school climate (students’ disciplinary behaviour and parents’ involvement) on academic performance and their role on the relationships between student socioeconomic status and achievement. The main findings show that disciplinary behaviour, on the one hand, directly influences the level of competence of the students, and, on the other hand, it partly mediates the effect of socioeconomic background whereas parents’ involvement does not appear to exert any significant effect on students’ performance. PubDate: 2022-05-03
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In many practical scenarios, including finance, environmental sciences, system reliability, etc., it is often of interest to study the various notion of negative dependence among the observed variables. A new bivariate copula is proposed for modeling negative dependence between two random variables that complies with most of the popular notions of negative dependence reported in the literature. Specifically, the Spearman’s rho and the Kendall’s tau for the proposed copula have a simple one-parameter form with negative values in the full range. Some important ordering properties comparing the strength of negative dependence with respect to the parameter involved are considered. Simple examples of the corresponding bivariate distributions with popular marginals are presented. Application of the proposed copula is illustrated using a real data set on air quality in the New York City, USA. PubDate: 2022-04-28
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Joint modeling techniques of longitudinal covariates and binary outcomes have attracted considerable attention in medical research. The basic strategy for estimating the coefficients of joint models is to define a joint likelihood based on two submodels with shared random effects. Numerical integration, however, is required in the estimation step for the joint likelihood, which is computationally expensive due to the complexity of the assumed submodels. To overcome this issue, we propose a joint modeling procedure using the h-likelihood to avoid numerical integration in the estimation algorithm. We conduct Monte Carlo simulations to investigate the effectiveness of our proposed modeling procedures by evaluating both the accuracy of the parameter estimates and computational time. The accuracy of the proposed procedure is compared to the two-stage modeling and numerical integration approaches. We also validate our proposed modeling procedure by applying it to the analysis of real data. PubDate: 2022-04-25
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The spectral clustering algorithm is a technique based on the properties of the pairwise similarity matrix coming from a suitable kernel function. It is a useful approach for high-dimensional data since the units are clustered in feature space with a reduced number of dimensions. In this paper, we consider a two-step model-based approach within the spectral clustering framework. Based on simulated data, first, we discuss criteria for selecting the number of clusters and analyzing the robustness of the model-based approach concerning the choice of the proximity parameters of the kernel functions. Finally, we consider applications of the spectral methods to cluster five real textual datasets and, in this framework, a new kernel function is also proposed. The approach is illustrated on the ground of a large numerical study based on both simulated and real datasets. PubDate: 2022-04-20
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In a Markov model the transition probabilities between states do not depend on the time spent in the current state. The present paper explores two ways of selecting the states of a discrete-time Markov model for a system partitioned into categories where the duration of stay in a category affects the probability of transition to another category. For a set of panel data, we compare the likelihood fits of the Markov models with states based on duration intervals and with states defined by duration values. For hierarchical systems, we show that the model with states based on duration values has a better maximum likelihood fit than the baseline Markov model where the states are the categories. We also prove that this is not the case for the duration-interval model, under conditions on the data that seem realistic in practice. Furthermore, we use the Akaike and Bayesian information criteria to compare these alternative Markov models. The theoretical findings are illustrated by an analysis of a real-world personnel data set. PubDate: 2022-04-15
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Individual records referring to personal interviews conducted for a survey on income in Modena during 2012 and tax year 2011 were matched with the corresponding records in the Italian Ministry of Finance databases containing fiscal income data for tax year 2011. The analysis of the resulting data set suggested that the fiscal income data were generally more reliable than the surveyed income data. Moreover, the obtained data set enabled identification of the factors determining over- and under-reporting, as well as measurement errors, through a comparison of the surveyed income data with the fiscal income data, only for suitable categories of interviewees, that is, taxpayers who are forced to respect the tax laws (the public sector) and taxpayers who have many evasion options (the private sector). The percentage of under-reporters (67.3%) was higher than the percentage of over-reporters (32.7%). Level of income, age, and education were the main regressors affecting measurement errors and the behaviours of tax evaders. Tax evasion and the impacts of personal factors affecting evasion were evaluated using various approaches. The average tax evasion amounted to 26.0% of the fiscal income. About 10% of the sample was made up of possible total tax evaders. PubDate: 2022-04-02
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper focuses on a particular population segment, that of Millennials, which has attracted much attention over recent years. Beyond the media hype, little is known about the habits of this generation towards spare time use. The present study builds on a previous work devoted to detect the different ways Italian Millennials interact with spare time, and aims at identifying profiles of Millennials branded with profile-specific time use habits and styles. In so doing, we (i) account for the multidimensional nature of time use attitude and express it into a reduced number of distinct dimensions and (ii) identify and qualify profiles of Millennials as regards the ascertained time use dimensions. By relying on an extended Item Response Theory model applied to the Italian “Multipurpose survey on households”, our main findings reveal that the way Millennials use spare time and interact with technology is much more complex, varied and multifaceted than what claimed by the media. PubDate: 2022-03-31
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The weak form of the efficient market hypothesis is identified with the conditions established by different types of random walks (1–3) on the returns associated with the prices of a financial asset. The methods traditionally applied for testing weak efficiency in a financial market as stated by the random walk model test only some necessary, but not sufficient, condition of this model. Thus, a procedure is proposed to detect if a return series associated with a given price index follows a random walk and, if so, what type it is. The procedure combines methods that test only a necessary, but not sufficient, condition for the fulfilment of the random walk hypothesis and methods that directly test a particular type of random walk. The proposed procedure is evaluated by means of a Monte Carlo experiment, and the results show that this procedure performs better (more powerful) against linear correlation-only alternatives when starting from the Ljung–Box test. On the other hand, against the random walk type 3 alternative, the procedure is more powerful when it is initiated from the BDS test. PubDate: 2022-03-31
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The link between Obesity and Hypertension is among the most popular topics which have been explored in medical research in recent decades. However, it is challenging to establish the relationship comprehensively and accurately because the distribution of BMI and blood pressure is usually fat tailed and severely tied. In this paper, we propose a data-driven copulas selection approach via penalized likelihood which can deal with tied data by interval censoring estimation. Minimax Concave Penalty is involved to perform the unbiased selection of mixed copula model for its convergence property to get un-penalized solution. Interval censoring and maximizing pseudo-likelihood, inspired from survival analysis, is introduced by considering ranks as intervals with upper and lower limits. This paper describes the model and corresponding iterative algorithm. Simulations to compare the proposed approach versus existing methods in different scenarios are presented. Additionally, the proposed method is also applied to the association modeling on the China Health and Nutrition Survey (CHNS) data. Both numerical studies and real data analysis reveal good performance of the proposed method. PubDate: 2022-03-21
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this paper, we develop a joint quantile regression model for correlated mixed discrete and continuous data using Gaussian copula. Our approach entails specifying marginal quantile regression models for the responses, and combining them via a copula to form a joint model. For modeling the quantiles of continuous response an asymmetric Laplace (AL) distribution is assigned to the error terms in both continuous and discrete models. For modeling the discrete response an underlying latent variable model and the threshold concept are used. Quantile regression for discrete responses can be fitted using monotone equivariance property of quantiles. By assuming a latent variable framework to describe discrete responses, the applied proposed copula still uniquely determines the joint distribution. The likelihood function of the joint model have also a tractable form but it is not differentiable in some points of the parameter space. However, by using the stochastic representation of AL distribution, the maximum likelihood estimate of parameters are obtained using an EM algorithm and also in order to carry out inference about parameters Bootstrap confidence intervals are specified using a Monte Carlo technique. Some simulation studies are performed to illustrate the performance of the model. Finally, we illustrate applications of the proposed approach using burn injuries data. PubDate: 2022-03-12
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Under-coverage and nonresponse problems are jointly present in most socio-economic surveys. The purpose of this paper is to propose an estimation strategy that accounts for both problems by performing a two-step calibration. The first calibration exploits a set of auxiliary variables only available for the units in the sampled population to account for nonresponse. The second calibration exploits a different set of auxiliary variables available for the whole population, to account for under-coverage. The two calibrations are then unified in a double-calibration estimator. Mean and variance of the estimator are derived up to the first order of approximation. Conditions ensuring approximate unbiasedness are derived and discussed. The strategy is empirically checked by a simulation study performed on a set of artificial populations. A case study is derived from the European Union Statistics on Income and Living Conditions survey data. The strategy proposed is flexible and suitable in most situations in which both under-coverage and nonresponse are present. PubDate: 2022-03-10
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The first objective of the paper is to implement a two stage Bayesian hierarchical nonlinear model for growth and learning curves, particular cases of longitudinal data with an underlying nonlinear time dependence. The aim is to model simultaneously individual trajectories over time, each with specific and potentially different characteristics, and a time-dependent behavior shared among individuals, including eventual effect of covariates. At the first stage inter-individual differences are taken into account, while, at the second stage, we search for an average model. The second objective is to partition individuals into homogeneous groups, when inter individual parameters present high level of heterogeneity. A new multivariate partitioning approach is proposed to cluster individuals according to the posterior distributions of the parameters describing the individual time-dependent behaviour. To assess the proposed methods, we present simulated data and two applications to real data, one related to growth curve modeling in agriculture and one related to learning curves for motor skills. Furthermore a comparison with finite mixture analysis is shown. PubDate: 2022-03-07 DOI: 10.1007/s10260-022-00625-6
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Many models for environmental data that are observed in time and space have been proposed in the literature. The main objective of these models is usually to make predictions in time and to perform interpolations in space. Realistic predictions and interpolations are obtained when the process and its variability are well represented through a model that takes into consideration its peculiarities. In this paper, we propose a spatio-temporal model to handle observations that come from distributions with heavy tails and for which the assumption of isotropy is not realistic. As a natural choice for a heavy-tailed model, we take a Student’s-t distribution. The Student’s-t distribution, while being symmetric, provides greater flexibility in modeling data with kurtosis and shape different from the Gaussian distribution. We handle anisotropy through a spatial deformation method. Under this approach, the original geographic space of observations gets mapped into a new space where isotropy holds. Our main result is, therefore, an anisotropic model based on the heavy-tailed t distribution. Bayesian approach and the use of MCMC enable us to sample from the posterior distribution of the model parameters. In Sect. 2, we discuss the main properties of the proposed model. In Sect. 3, we present a simulation study, showing its superiority over the traditional isotropic Gaussian model. In Sect. 4, we show the motivation that has led us to propose the t distribution-based anisotropic model—the real dataset of evaporation coming from the Rio Grande do Sul state of Brazil. PubDate: 2022-03-01 DOI: 10.1007/s10260-022-00623-8
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In ecology, the concept of predation describes interdependent patterns of having one species (called the predator) killing and consuming another (the prey). Specifying the so-called functional response of prey populations to predation is an important matter of debate which is typically addressed by means of continuous time models. Empirical regression or autoregression models applied to discrete predator-prey population data promise feasible steady state approximations of often complicated dynamic patterns of population growth and interaction. Ewing et al. (Ecol Econ 60:605–612, 2007) argue in favour of the informational content of so-called vector autoregressive models for the dynamic analysis of predator-prey systems. In this work we reconsider their analysis of dynamic interaction of two freshwater organisms, and design a structural model that allows to approximate the functional response in causal form. Results from an unrestricted structural model are in line with core axiomatic assumptions of predator-prey models. Conditional on population growth lagged up to three periods (i.e., 36 h), the semi-daily population growth of the prey Paramecium aurelia diminishes, on average, by 1.2 percentage points in response to an increase of the population growth of the predator Didinium nasutum by one percentage point. PubDate: 2022-03-01 DOI: 10.1007/s10260-021-00564-8
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract With reference to causal mediation analysis, a parametric expression for natural direct and indirect effects is derived for the setting of a binary outcome with a binary mediator, both modelled via a logistic regression. The proposed effect decomposition operates on the odds ratio scale and does not require the outcome to be rare. It generalizes the existing ones, allowing for interactions between both the exposure and the mediator and the confounding covariates. The derived parametric formulae are flexible, in that they readily adapt to the two different natural effect decompositions defined in the mediation literature. In parallel with results derived under the rare outcome assumption, they also outline the relationship between the causal effects and the correspondent pathway-specific logistic regression parameters, isolating the controlled direct effect in the natural direct effect expressions. Formulae for standard errors, obtained via the delta method, are also given. An empirical application to data coming from a microfinance experiment performed in Bosnia and Herzegovina is illustrated. PubDate: 2022-03-01 DOI: 10.1007/s10260-021-00562-w
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract An analysis of crashes occurring in 252 unidirectional Italian motorway tunnels over a 4-year monitoring period is provided to identify the main causes of crashes in tunnels. In this paper, we propose a full Bayesian bivariate Poisson lognormal hierarchical model with correlated parameters for the joint analysis of crashes of two levels of severity, namely severe (including fatality and injury accidents only) and non-severe (property damage only), providing better insight on the available data with respect to an analysis based on severe and non-severe independent univariate models. In particular, the proposed model shows that for both of severity levels the crash frequency increases with some parameters: the average annual daily traffic per lane, the tunnel length, and the percentage of trucks, while the presence of the sidewalk provides a reduction in severe accidents. Also the presence of the third lane induces a reduction in severe accidents. Moreover, a reduction in the crash frequency of the two crash-types over years is present. The correlation between the parameters might offer additional insights into how some combinations can affect safety in tunnels. The results are critically discussed by highlighting strength and weakness of the proposed methodology. PubDate: 2022-03-01 DOI: 10.1007/s10260-021-00567-5