Authors:Ranjit Lall; Thomas Robinson Abstract: This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dataset, attempting to reconstruct these values with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing data. These steps are executed by a fast and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their particular application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a detailed guide to these functionalities and demonstrate their usage on a large real dataset. PubDate: Tue, 17 Oct 2023 00:00:00 +000
Authors:Diogo Ferrari Abstract: The existence of latent clusters with different responses to a treatment is a major concern in scientific research, as latent effect heterogeneity often emerges due to latent or unobserved features - e.g., genetic characteristics, personality traits, or hidden motivations - of the subjects. Conventional random- and fixed-effects methods cannot be applied to that heterogeneity if the group markers associated with that heterogeneity are latent or unobserved. Alternative methods that combine regression models and clustering procedures using Dirichlet process are available, but these methods are complex to implement, especially for non-linear regression models with discrete or binary outcomes. This article discusses the R package hdpGLM as a means of implementing a novel hierarchical Dirichlet process approach to estimate mixtures of generalized linear models outlined in Ferrari (2020). The methods implemented make it easy for researchers to investigate heterogeneity in the effect of treatment or background variables and identify clusters of subjects with differential effects. This package provides several features for out-of-the-box estimation and to generate numerical summaries and visualizations of the results. A comparison with other similar R packages is provided. PubDate: Mon, 16 Oct 2023 00:00:00 +000
Authors:Gianfranco Piras; Mauricio Sarrias Abstract: Despite the huge availability of software to estimate cross-sectional spatial models, there are only few functions to estimate models dealing with spatial limited dependent variable. This paper fills this gap introducing the new R package spldv. The package is based on generalized methods of moment (GMM) estimators and includes a series of one- and two-step estimators based on different choices of the weighting matrix for the moments conditions in the first step, and different estimators for the variance-covariance matrix of the estimated coefficients. An important feature of spldv is that users can estimate the spatial Durbin model and compute the direct, indirect, and total effects in a friendly and flexible way. PubDate: Thu, 12 Oct 2023 00:00:00 +000
Authors:Hongyu Mou; Licheng Liu, Yiqing Xu Abstract: We develop an R package panelView and a Stata package panelview for panel data visualization. They are designed to assist causal analysis with panel data and have three main functionalities: (1) They plot the treatment status and missing values in a panel dataset; (2) they visualize the temporal dynamics of the main variables of interest; and (3) they depict the bivariate relationships between a treatment variable and an outcome variable either by unit or in aggregate. These tools can help researchers better understand their panel datasets before conducting statistical analysis. PubDate: Mon, 25 Sep 2023 00:00:00 +000
Authors:Charlotte Baey; Estelle Kuhn Abstract: The issue of variance components testing arises naturally when building mixed-effects models, to decide which effects should be modeled as fixed or random or to build parsimonious models. While tests for fixed effects are available in R for models fitted with lme4, tools are missing when it comes to random effects. The varTestnlme package for R aims at filling this gap. It allows to test whether a subset of the variances and covariances corresponding to a subset of the random effects, are equal to zero using asymptotic property of the likelihood ratio test statistic. It also offers the possibility to test simultaneously for fixed effects and variance components. It can be used for linear, generalized linear or nonlinear mixed-effects models fitted via lme4, nlme or saemix. Numerical methods used to implement the test procedure are detailed and examples based on different real datasets using different mixed models are provided. Theoretical properties of the used likelihood ratio test are recalled. PubDate: Sun, 24 Sep 2023 00:00:00 +000
Authors:Milan Bouchet-Valat; Bogumił Kamiński Abstract: DataFrames.jl is a package written for and in the Julia language offering flexible and efficient handling of tabular data sets in memory. Thanks to Julia's unique strengths, it provides an appealing set of features: Rich support for standard data processing tasks and excellent flexibility and efficiency for more advanced and non-standard operations. We present the fundamental design of the package and how it compares with implementations of data frames in other languages, its main features, performance, and possible extensions. We conclude with a practical illustration of typical data processing operations. PubDate: Sat, 23 Sep 2023 00:00:00 +000
Authors:Simon A. Broda; Marc S. Paolella Abstract: This paper introduces ARCHModels.jl, a package for the Julia programming language that implements a number of univariate and multivariate autoregressive conditional heteroskedasticity models. This model class is the workhorse tool for modeling the conditional volatility of financial assets. The distinguishing feature of these models is that they model the latent volatility as a (deterministic) function of past returns and volatilities. This recursive structure results in loop-heavy code which, due to its just-in-time compiler, Julia is well-equipped to handle. As such, the entire package is written in Julia, without any binary dependencies. We benchmark the performance of ARCHModels.jl against popular implementations in MATLAB, R, and Python, and illustrate its use in a detailed case study. PubDate: Sat, 23 Sep 2023 00:00:00 +000
Authors:Alberto Garre; Jeroen Koomen, Heidy M. W. den Besten, Marcel H. Zwietering Abstract: The growth of populations is of interest in a broad variety of fields, such as epidemiology, economics or biology. Although a large variety of growth models are available in the scientific literature, their application usually requires advanced knowledge of mathematical programming and statistical inference, especially when modelling growth under dynamic environmental conditions. This article presents the biogrowth package for R, which implements functions for modelling the growth of populations. It can predict growth under static or dynamic environments, considering the effect of an arbitrary number of environmental factors. Moreover, it can be used to fit growth models to data gathered under static or dynamic environmental conditions. The package allows the user to fix any model parameter prior to the fit, an approach that can mitigate identifiability issues associated to growth models. The package includes common S3 methods for visualization and statistical analysis (summary of the fit, predictions, . . . ), easing result interpretation. It also includes functions for model comparison/selection. We illustrate the functions in biogrowth using examples from food science and economy. PubDate: Sat, 09 Sep 2023 00:00:00 +000
Authors:Wei Ma; Xiaoqing Ye, Fuyi Tu, Feifang Hu Abstract: Covariate-adaptive randomization is gaining popularity in clinical trials because they enable the generation of balanced allocations with respect to covariates. Over the past decade, substantial progress has been made in both new innovative randomization procedures and the theoretical properties of associated inferences. However, these results are scattered across the literature, and a single tool kit does not exist for use by clinical trial practitioners and researchers to conduct and evaluate these methods. The R package carat is proposed to address this need. It facilitates a broad range of covariate-adaptive randomization and testing procedures, such as the most common and classical methods, and also reflects recent developments in the field. The package contains comprehensive evaluation and comparison tools for use in both randomization procedures and tests. This enables power analysis to be conducted to assist the planning of a covariate-adaptive clinical trial. The package also implements a command-line interface to allow for an interactive allocation procedure, which is typically the case in real-world applications. In this paper, the features and functionalities of carat are presented. PubDate: Sat, 09 Sep 2023 00:00:00 +000
Authors:Raluca Gui; Markus Meierer, Patrik Schilter, René Algesheimer Abstract: Endogeneity is a common problem in any causal analysis. It arises when the independence assumption between an explanatory variable and the error in a statistical model is violated. The causes of endogeneity are manifold and include response bias in surveys, omission of important explanatory variables, or simultaneity between explanatory and response variables. Instrumental variable estimation provides a possible solution. However, valid and strong external instruments are difficult to find. Consequently, internal instrumental variable approaches have been proposed to correct for endogeneity without relying on external instruments. The R package REndo implements various internal instrumental variable approaches, i.e., latent instrumental variables estimation (Ebbes, Wedel, Boeckenholt, and Steerneman 2005), higher moments estimation (Lewbel 1997), heteroscedastic error estimation (Lewbel 2012), joint estimation using copula (Park and Gupta 2012) and multilevel generalized method of moments estimation (Kim and Frees 2007). Package usage is illustrated on simulated and real-world data. PubDate: Sat, 09 Sep 2023 00:00:00 +000