Abstract: In genome-wide association studies, hundreds of thousands of genetic features (genes, proteins, etc.) in a given case-control population are tested to verify existence of an association between each genetic marker and a specific disease. A popular approach in this regard is to estimate local false discovery rate (LFDR), the posterior probability that the null hypothesis is true, given an observed test statistic. However, the existing LFDR estimation methods in the literature are usually complicated. Assuming a chi-square model with one degree of freedom, which covers many situations in genome-wide association studies, we use the method of moments and introduce a simple, fast and efficient approach for LFDR estimation. We perform two different simulation strategies and compare the performance of the proposed approach with three popular LFDR estimation methods. We also examine the practical utility of the proposed method by analyzing a comprehensive 1000 genomes-based genome-wide association data containing approximately 9.4 million single nucleotide polymorphisms, and a microarray data set consisting of genetic expression levels for 6033 genes for prostate cancer patients. The R package implementing the proposed method is available on CRAN https://cran.r-project.org/web/packages/LFDR.MME. PubDate: 2021-04-30

Abstract: Providing support outside the household can be considered an actual sign of an active social life for the elderly. Adopting an ego–network perspective, we study support Italian elders provide to kin or non–kin. More specifically, using Italian survey data, we build the ego–centered networks of social contacts elders entertain and the ego–networks of support elders provide to other non–cohabitant kin or non–kin. Since ego–network data are inherently multilevel, we use Bayesian multilevel models to analyze variation in support ties, controlling for the characteristics of elders and their contacts. This modeling strategy enables dealing with sparseness and alter–alter overlap in the ego support network data and to disentangle the effects related to the ego (the elder), the dyad ego–alter, the kind of support provided, as well as social contacts and contextual variables. The results suggest that the elderly in Italy who provide support outside their household — compared to all elders in the sample — are younger, healthier, more educated, and embedded in a more diversified ego–network of social contacts. The latter also conveys both the type and the recipient of the support, with the elderly who entertain few relationships with kin being more prone to provide aid to non–kin. Further, a “peer homophily” effect in directing elder support to a non–kin is also found. PubDate: 2021-04-20

Abstract: An analysis of crashes occurring in 252 unidirectional Italian motorway tunnels over a 4-year monitoring period is provided to identify the main causes of crashes in tunnels. In this paper, we propose a full Bayesian bivariate Poisson lognormal hierarchical model with correlated parameters for the joint analysis of crashes of two levels of severity, namely severe (including fatality and injury accidents only) and non-severe (property damage only), providing better insight on the available data with respect to an analysis based on severe and non-severe independent univariate models. In particular, the proposed model shows that for both of severity levels the crash frequency increases with some parameters: the average annual daily traffic per lane, the tunnel length, and the percentage of trucks, while the presence of the sidewalk provides a reduction in severe accidents. Also the presence of the third lane induces a reduction in severe accidents. Moreover, a reduction in the crash frequency of the two crash-types over years is present. The correlation between the parameters might offer additional insights into how some combinations can affect safety in tunnels. The results are critically discussed by highlighting strength and weakness of the proposed methodology. PubDate: 2021-04-13

Abstract: Despite the evidence, the correlation between environmental impact factors has mostly been neglected in econometric environmental models or treated with traditional methodologies such as ridge regression, which are recommended when the goal is prediction and the estimated parameters are not interpreted as causal effects. This paper addresses the existing collinearity with alternative methodologies, not only to mitigate the problem mechanically, but also to isolate the effects of the environmental impact factors with the main objective of designing better policies for countries. The methodologies are applied to analyze the CO \(_2\) emissions of 114 countries covering the thirteen most recent years with available data, and the results from the empirical and methodological perspectives are compared. The treatment of collinearity with the residualization or raise regression procedures allows the researcher to obtain a global vision of the relationship between the different factors affecting CO \(_2\) emissions, thus reaching alternative conclusions to those from traditional methodologies. PubDate: 2021-04-13

Abstract: With reference to causal mediation analysis, a parametric expression for natural direct and indirect effects is derived for the setting of a binary outcome with a binary mediator, both modelled via a logistic regression. The proposed effect decomposition operates on the odds ratio scale and does not require the outcome to be rare. It generalizes the existing ones, allowing for interactions between both the exposure and the mediator and the confounding covariates. The derived parametric formulae are flexible, in that they readily adapt to the two different natural effect decompositions defined in the mediation literature. In parallel with results derived under the rare outcome assumption, they also outline the relationship between the causal effects and the correspondent pathway-specific logistic regression parameters, isolating the controlled direct effect in the natural direct effect expressions. Formulae for standard errors, obtained via the delta method, are also given. An empirical application to data coming from a microfinance experiment performed in Bosnia and Herzegovina is illustrated. PubDate: 2021-04-10

Abstract: In ecology, the concept of predation describes interdependent patterns of having one species (called the predator) killing and consuming another (the prey). Specifying the so-called functional response of prey populations to predation is an important matter of debate which is typically addressed by means of continuous time models. Empirical regression or autoregression models applied to discrete predator-prey population data promise feasible steady state approximations of often complicated dynamic patterns of population growth and interaction. Ewing et al. (Ecol Econ 60:605–612, 2007) argue in favour of the informational content of so-called vector autoregressive models for the dynamic analysis of predator-prey systems. In this work we reconsider their analysis of dynamic interaction of two freshwater organisms, and design a structural model that allows to approximate the functional response in causal form. Results from an unrestricted structural model are in line with core axiomatic assumptions of predator-prey models. Conditional on population growth lagged up to three periods (i.e., 36 h), the semi-daily population growth of the prey Paramecium aurelia diminishes, on average, by 1.2 percentage points in response to an increase of the population growth of the predator Didinium nasutum by one percentage point. PubDate: 2021-04-04

Abstract: This work aims at jointly modelling longitudinal and survival HIV data by considering the sharing of a set of parameters of interest. For the CD4 longitudinal stochastic process we propose a regression model where individual heterogeneity is allowed to vary in terms of the mean and the variance, relaxing the usual assumption of a common variance for the longitudinal residuals. Along, we will be considering a hazard regression model to analyse the time between HIV/AIDS diagnostic and death. For introducing enough flexibility in the structure linking the longitudinal and survival processes, we consider time-varying coefficients. That is achieved using Penalized Splines and allows the relationship to vary in time. The CD4 residuals standard deviation is considered as a covariate in the hazard model, thus enabling to study the effect of the CD4 counts’ stability on the survival. The proposed framework surpasses the performance of the most “traditional” joint models, which generally consider a common variance and a time-invariant link. PubDate: 2021-04-02

Abstract: We propose two novel ways of introducing dependence among Poisson counts through the use of latent variables in a three levels hierarchical model. Marginal distributions of the random variables of interest are Poisson with strict stationarity as special case. Order–p dependence is described in detail for a temporal sequence of random variables. A full Bayesian inference of the models is described and performance of the models is illustrated with a numerical analysis of maternal mortality in Mexico. Extensions to seasonal, periodic, spatial or spatio-temporal dependencies, as well as coping with overdispersion, are also discussed. PubDate: 2021-03-22

Abstract: Using the Programme for International Student Assessment (PISA) 2015 data for Italy, this paper offers a complete overview of the relationship between test anxiety and school performance by studying how anxiety affects the performance of students along the overall conditional distribution of mathematics, literature and science scores. We aim to indirectly measure whether higher goals increase test anxiety, starting from the hypothesis that high-skilled students generally set themselves high goals. We use an M-quantile regression approach that allows us to take into account the hierarchical structure and sampling weights of the PISA data. There is evidence of a negative and statistically significant relationship between test anxiety and school performance. The size of the estimated association is greater at the upper tail of the distribution of each score than at the lower tail. Therefore, our results suggest that high-performing students are more affected than low-performing students by emotional reactions to tests and school-work anxiety. PubDate: 2021-03-15

Abstract: A flexible semiparametric class of models is introduced that offers an alternative to classical regression models for count data as the Poisson and Negative Binomial model, as well as to more general models accounting for excess zeros that are also based on fixed distributional assumptions. The model allows that the data itself determine the distribution of the response variable, but, in its basic form, uses a parametric term that specifies the effect of explanatory variables. In addition, an extended version is considered, in which the effects of covariates are specified nonparametrically. The proposed model and traditional models are compared in simulations and by utilizing several real data applications from the area of health and social science. PubDate: 2021-03-01 DOI: 10.1007/s10260-021-00558-6

Abstract: We introduce a new class of robust M-estimators for performing simultaneous parameter estimation and variable selection in high-dimensional regression models. We first explain the motivations for the key ingredient of our procedures which are inspired by regularization methods used in wavelet thresholding in noisy signal processing. The derived penalized estimation procedures are shown to enjoy theoretically the oracle property both in the classical finite dimensional case as well as the high-dimensional case when the number of variables p is not fixed but can grow with the sample size n, and to achieve optimal asymptotic rates of convergence. A fast accelerated proximal gradient algorithm, of coordinate descent type, is proposed and implemented for computing the estimates and appears to be surprisingly efficient in solving the corresponding regularization problems including the case for ultra high-dimensional data where \(p \gg n\) . Finally, a very extensive simulation study and some real data analysis, compare several recent existing M-estimation procedures with the ones proposed in the paper, and demonstrate their utility and their advantages. PubDate: 2021-03-01 DOI: 10.1007/s10260-020-00511-z

Abstract: In linear time series analysis, the incorporation of the moving-average term in autoregressive models yields parsimony while retaining flexibility; in particular, the first order autoregressive moving-average model, ARMA(1,1) is notable since it retains a good approximating capability with just two parameters. In the same spirit, we assess empirically whether a similar result holds for threshold processes. First, we show that the first order threshold autoregressive moving-average process, TARMA(1,1) exhibits complex, high-dimensional, behaviour with parsimony, by comparing it with threshold autoregressive processes, TAR(p), with possibly large autoregressive order p. Second, we study the descriptive power of the TARMA(1,1) model with respect to the class of autoregressive models, seen as universal approximators: in several situations, the TARMA(1,1) model outperforms AR(p) models even when p is large. Lastly, we analyze two real world data sets: the sunspot number and the male US unemployment rate time series. In both cases, we show that TARMA models provide a better fit with respect to the best TAR models proposed in literature. PubDate: 2021-03-01 DOI: 10.1007/s10260-020-00516-8

Abstract: The expectation-maximisation algorithm is employed to perform maximum likelihood estimation in a wide range of situations, including regression analysis based on clusterwise regression models. A disadvantage of using this algorithm is that it is unable to provide an assessment of the sample variability of the maximum likelihood estimator. This inability is a consequence of the fact that the algorithm does not require deriving an analytical expression for the Hessian matrix, thus preventing from a direct evaluation of the asymptotic covariance matrix of the estimator. A solution to this problem when performing linear regression analysis through a multivariate Gaussian clusterwise regression model is developed. Two estimators of the asymptotic covariance matrix of the maximum likelihood estimator are proposed. In practical applications their use makes it possible to avoid resorting to bootstrap techniques and general purpose mathematical optimisers. The performances of these estimators are evaluated in analysing small simulated and real datasets; the obtained results illustrate their usefulness and effectiveness in practical applications. From a theoretical point of view, under suitable conditions, the proposed estimators are shown to be consistent. PubDate: 2021-03-01 DOI: 10.1007/s10260-020-00523-9

Abstract: Several methods have been devised to mitigate the effects of outlier values on survey estimates. If outliers are a concern for estimation of population quantities, it is even more necessary to pay attention to them in a small area estimation (SAE) context, where sample size is usually very small and the estimation in often model based. In this paper we set two goals: The first is to review recent developments in outlier robust SAE. In particular, we focus on the use of partial bias corrections when outlier robust fitted values under a working model generate biased predictions from sample data containing representative outliers. Then we propose an outlier robust bootstrap MSE estimator for M-quantile based small area predictors which considers a bounded-block-bootstrap approach. We illustrate these methods through model based and design based simulations and in the context of a particular survey data set that has many of the outlier characteristics that are observed in business surveys. PubDate: 2021-03-01 DOI: 10.1007/s10260-020-00514-w

Abstract: Multistage ranking models, including the popular Plackett–Luce distribution (PL), rely on the assumption that the ranking process is performed sequentially, by assigning the positions from the top to the bottom one (forward order). A recent contribution to the ranking literature relaxed this assumption with the addition of the discrete-valued reference order parameter, yielding the novel Extended Plackett–Luce model (EPL). Inference on the EPL and its generalization into a finite mixture framework was originally addressed from the frequentist perspective. In this work, we propose the Bayesian estimation of the EPL in order to address more directly and efficiently the inference on the additional discrete-valued parameter and the assessment of its estimation uncertainty, possibly uncovering potential idiosyncratic drivers in the formation of preferences. We overcome initial difficulties in employing a standard Gibbs sampling strategy to approximate the posterior distribution of the EPL by combining the data augmentation procedure and the conjugacy of the Gamma prior distribution with a tuned joint Metropolis–Hastings algorithm within Gibbs. The effectiveness and usefulness of the proposal is illustrated with applications to simulated and real datasets. PubDate: 2021-03-01 DOI: 10.1007/s10260-020-00519-5

Abstract: This paper introduces a temporal bivariate area-level linear mixed model with independent time effects for estimating small area socioeconomic indicators. The model is fitted by using the residual maximum likelihood method. Empirical best linear unbiased predictors of these indicators are derived. An approximation to the matrix of mean squared errors (MSE) is given and four MSE estimators are proposed. The first MSE estimator is a plug-in version of the MSE approximation. The remaining MSE estimators rely on parametric bootstrap procedures. Three simulation experiments designed to analyze the behavior of the fitting algorithm, the predictors and the MSE estimators are carried out. An application to real data from the 2005 and 2006 Spanish living conditions survey illustrate the introduced statistical methodology. The target is the estimation of 2006 poverty proportions and gaps by provinces and sex. PubDate: 2021-03-01 DOI: 10.1007/s10260-020-00521-x

Abstract: This paper deals with simultaneous prediction for time series models. In particular, it presents a simple procedure which gives well-calibrated simultaneous prediction intervals with coverage probability close to the target nominal value. Although the exact computation of the proposed intervals is usually not feasible, an approximation can be easily attained by means of a suitable bootstrap simulation procedure. This new predictive solution is much simpler to compute than those ones already proposed in the literature, based on asymptotic calculations. Applications of the bootstrap calibrated procedure to AR, MA and ARCH models are presented. PubDate: 2021-03-01 DOI: 10.1007/s10260-020-00526-6

Abstract: In this work we propose a new class of long-memory models with time-varying fractional parameter. In particular, the dynamics of the long-memory coefficient, d, is specified through a stochastic recurrence equation driven by the score of the predictive likelihood, as suggested by Creal et al. (J Appl Econom 28:777–795, 2013) and Harvey (Dynamic models for volatility and heavy tails: with applications to financial and economic time series, Cambridge University Press, Cambridge, 2013). We demonstrate the validity of the proposed model by a Monte Carlo experiment and an application to two real time series. PubDate: 2021-03-01 DOI: 10.1007/s10260-020-00517-7

Abstract: In this paper, we propose a model to describe the mutual interactions among the lifecycles of three substitute products acting simultaneously in a common market, thus competing for the same customers or cooperating to supply demand. To date, the literature only describes models for two competitors; therefore, the present work represents the first attempt at creating and implementing a model for three actors. The new model is applied to real data in the energy context, and its performance is compared to the performance of current models for two competitors. Regarding the datasets examined, the new model shows a relevant improvement in terms of forecasting performance, that is forecasting accuracy and prediction confidence band width. PubDate: 2021-03-01 DOI: 10.1007/s10260-020-00524-8

Abstract: This article is concerned with the Bayesian optimal design problem for multi-factor nonlinear models. In particular, the Bayesian \(\varPsi _q\) -optimality criterion proposed by Dette et al. (Stat Sinica 17:463–480, 2007) is considered. It is shown that the product-type designs are optimal for the additive multi-factor nonlinear models with or without constant term when the proposed sufficient conditions are satisfied. Some examples of application using the exponential growth models with several variables are presented to illustrate optimal designs based on the Bayesian \(\varPsi _q\) -optimality criterion considered. PubDate: 2021-03-01 DOI: 10.1007/s10260-020-00522-w