Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract A new class of survival frailty models based on the generalized inverse-Gaussian (GIG) distributions is proposed. We show that the GIG frailty models are flexible and mathematically convenient like the popular gamma frailty model. A piecewise-exponential baseline hazard function is employed, yielding flexibility for the proposed class. Although a closed-form observed log-likelihood function is available, simulation studies show that employing an EM-algorithm is advantageous concerning the direct maximization of this function. Further simulated results address the comparison of different methods for obtaining standard errors of the estimates and confidence intervals for the parameters. Additionally, the finite-sample behavior of the EM-estimators is investigated and the performance of the GIG models under misspecification assessed. We apply our methodology to a TARGET (Therapeutically Applicable Research to Generate Effective Treatments) data about the survival time of patients with neuroblastoma cancer and show some advantages of the GIG frailties over existing models in the literature. PubDate: 2021-10-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Kernel density estimation is a nonparametric procedure making use of the smoothing power of the convolution operation. Yet, it performs poorly when the density of a positive variable is estimated, due to boundary issues. So, various extensions of the kernel estimator allegedly suitable for \({\mathbb {R}}^+\) -supported densities, such as those using asymmetric kernels, abound in the literature. Those, however, are not based on any valid smoothing operation. By contrast, in this paper a kernel density estimator is defined through the Mellin convolution, the natural analogue on \({\mathbb {R}}^+\) of the usual convolution. From there, a class of asymmetric kernels related to Meijer G-functions is suggested, and asymptotic properties of this ‘Mellin–Meijer kernel density estimator’ are presented. In particular, its pointwise- and \(L_2\) -consistency (with optimal rate of convergence) are established for a large class of densities, including densities unbounded at 0 and showing power-law decay in their right tail. PubDate: 2021-10-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract The paper presents a novel approach to solve a classical two-sample problem with right-censored data. As a result, an efficient procedure for verifying equality of the two survival curves is developed. It generalizes, in a natural manner, a well-known standard, that is, the log-rank test. Under the null hypothesis, the new test statistic has an asymptotic Chi-square distribution with one degree of freedom, while the corresponding test is consistent for a wide range of the alternatives. On the other hand, to control the actual Type I error rate when sample sizes are finite, permutation approach is employed for the inference. An extensive simulation study shows that the new test procedure improves upon classical solutions and popular recent developments in the field. An analysis of the real datasets is included. A routine, written in R, is attached as Supplementary Material. PubDate: 2021-10-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract We study asymptotic properties of Bayesian multiple testing procedures and provide sufficient conditions for strong consistency under general dependence structure. We also consider a novel Bayesian multiple testing procedure and associated error measures that coherently accounts for the dependence structure present in the model. We advocate posterior versions of FDR and FNR as appropriate error rates and show that their asymptotic convergence rates are directly associated with the Kullback–Leibler divergence from the true model. The theories hold regardless of the class of postulated models being misspecified. We illustrate our results in a variable selection problem with autoregressive response variables and compare our procedure with some existing methods through simulation studies. Superior performance of the new procedure compared to the others indicates that proper exploitation of the dependence structure by multiple testing methods is indeed important. Moreover, we obtain encouraging results in a maize dataset, where we select influential marker variables. PubDate: 2021-10-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract The mean density estimation of a random closed set in \(\mathbb {R}^d\) , based on a single observation, is a crucial problem in several application areas. In the case of stationary random sets, a common practice to estimate the mean density is to take the n-dimensional volume fraction with observation window as large as possible. In the present paper, we provide large and moderate deviation results for these estimators when the random closed set \(\Theta _n\) belongs to the quite general class of stationary Boolean models with Hausdorff dimension \(n<d\) . Moreover, we establish a central limit theorem and a Berry–Esseen bound for the family of estimators under study. Our findings allow to recover some well-known results in the literature on Boolean models. Finally, we also provide a guideline for the estimation of the mean density of non-stationary Boolean models characterized by high intensity of the underlying Poisson point process. PubDate: 2021-10-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract The aim of this paper is to introduce an adaptive penalized estimator for identifying the true reduced parametric model under the sparsity assumption. In particular, we deal with the framework where the unpenalized estimator of the structural parameters needs simultaneously multiple rates of convergence (i.e., the so-called mixed-rates asymptotic behavior). We introduce a bridge-type estimator by taking into account penalty functions involving \(\ell ^q\) norms (0 < q ≤ 1). We prove that the proposed regularized estimator satisfies the oracle properties. Our approach is useful for the estimation of stochastic differential equations in the parametric sparse setting. More precisely, under the high-frequency observation scheme, we apply our methodology to an ergodic diffusion and introduce a procedure for the selection of the tuning parameters. Furthermore, the paper contains a simulation study as well as a real data prediction in order to assess about the performance of the proposed bridge estimator. PubDate: 2021-10-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract This article studies the problem whether two convex (concave) regression functions modelling the relation between a response and covariate in two samples differ by a shift in the horizontal and/or vertical axis. We consider a nonparametric situation assuming only smoothness of the regression functions. A graphical tool based on the derivatives of the regression functions and their inverses is proposed to answer this question and studied in several examples. We also formalize this question in a corresponding hypothesis and develop a statistical test. The asymptotic properties of the corresponding test statistic are investigated under the null hypothesis and local alternatives. In contrast to most of the literature on comparing shape invariant models, which requires independent data the procedure is applicable for dependent and non-stationary data. We also illustrate the finite sample properties of the new test by means of a small simulation study and two real data examples. PubDate: 2021-10-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract The segmentation of a time series into piecewise stationary segments is an important problem both in time series analysis and signal processing. In the presence of multiscale change points with both large jumps over short intervals and small jumps over long intervals, multiscale methods achieve good adaptivity but require a model selection step for removing false positives and duplicate estimators. We propose a localised application of the Schwarz criterion, which is applicable with any multiscale candidate generating procedure fulfilling mild assumptions, and establish its theoretical consistency in estimating the number and locations of multiple change points under general assumptions permitting heavy tails and dependence. In particular, combined with a MOSUM-based candidate generating procedure, it attains minimax rate optimality in both detection lower bound and localisation for i.i.d. sub-Gaussian errors. Overall competitiveness of the proposed methodology compared to existing methods is shown through its theoretical and numerical performance. PubDate: 2021-09-25

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract We investigate Bayesian variable selection in models driven by Gaussian processes, which allows us to treat linear, nonlinear and nonparametric models, in conjunction with even dependent setups, in the same vein. We consider the Bayes factor route to variable selection, and develop a general asymptotic theory for the Gaussian process framework in the “large p, large n” settings even with \(p\gg n\) , establishing almost sure exponential convergence of the Bayes factor under appropriately mild conditions. The fixed p setup is included as a special case. To illustrate, we apply our result to variable selection in linear regression, Gaussian process model with squared exponential covariance function accommodating the covariates, and a first-order autoregressive process with time-varying covariates. We also follow up our theoretical investigations with ample simulation experiments in the above regression contexts and variable selection in a real, riboflavin data consisting of 71 observations and 4088 covariates. For implementation of variable selection using Bayes factors, we develop a novel and effective general-purpose transdimensional, transformation-based Markov chain Monte Carlo algorithm, which has played a crucial role in simulated and real data applications. PubDate: 2021-09-20

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In high-dimensional data analysis, bi-level sparsity is often assumed when covariates function group-wisely and sparsity can appear either at the group level or within certain groups. In such cases, an ideal model should be able to encourage the bi-level variable selection consistently. Bi-level variable selection has become even more challenging when data have heavy-tailed distribution or outliers exist in random errors and covariates. In this paper, we study a framework of high-dimensional M-estimation for bi-level variable selection. This framework encourages bi-level sparsity through a computationally efficient two-stage procedure. In theory, we provide sufficient conditions under which our two-stage penalized M-estimator possesses simultaneous local estimation consistency and the bi-level variable selection consistency if certain non-convex penalty functions are used at the group level. Both our simulation studies and real data analysis demonstrate satisfactory finite sample performance of the proposed estimators under different irregular settings. PubDate: 2021-09-09

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In general, the solution to a regression problem is the minimizer of a given loss criterion and depends on the specified loss function. The nonparametric isotonic regression problem is special, in that optimal solutions can be found by solely specifying a functional. These solutions will then be minimizers under all loss functions simultaneously as long as the loss functions have the requested functional as the Bayes act. For the functional, the only requirement is that it can be defined via an identification function, with examples including the expectation, quantile, and expectile functionals. Generalizing classical results, we characterize the optimal solutions to the isotonic regression problem for identifiable functionals by rigorously treating these functionals as set-valued. The results hold in the case of totally or partially ordered explanatory variables. For total orders, we show that any solution resulting from the pool-adjacent-violators algorithm is optimal. PubDate: 2021-09-03

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract The proportional hazards model proposed by D. R. Cox in a high-dimensional and sparse setting is discussed. The regression parameter is estimated by the Dantzig selector, which will be proved to have the variable selection consistency. This fact enables us to reduce the dimension of the parameter and to construct asymptotically normal estimators for the regression parameter and the cumulative baseline hazard function. PubDate: 2021-08-31

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Let \(f_{Y X,Z}(y x,z)\) be the conditional probability function of Y given (X, Z), where Y is the scalar response variable, while (X, Z) is the covariable vector. This paper proposes a robust model selection criterion for \(f_{Y X,Z}(y x,z)\) with X missing at random. The proposed method is developed based on a set of assumed models for the selection probability function. However, the consistency of model selection by our proposal does not require these models to be correctly specified, while it only requires that the selection probability function is a function of these assumed selective probability functions. Under some conditions, it is proved that the model selection by the proposed method is consistent and the estimator for population parameter vector is consistent and asymptotically normal. A Monte Carlo study was conducted to evaluate the finite-sample performance of our proposal. A real data analysis was used to illustrate the practical application of our proposal. PubDate: 2021-08-25

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract It is well known that the independence of the sample mean and the sample variance characterizes the normal distribution. By using Anosov’s theorem, we further investigate the analogous characteristic properties in terms of the sample mean and some feasible definite statistics. The latter statistics introduced in this paper for the first time are based on nonnegative, definite and continuous functions of ordered arguments with positive degree of homogeneity. The proposed approach seems to be natural and can be used to derive easily characterization results for many feasible definite statistics, such as known characterizations involving the sample variance, sample range as well as Gini’s mean difference. PubDate: 2021-08-10

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract We develop a general class of noise-robust estimators based on the existing estimators in the non-noisy high-frequency data literature. The microstructure noise is a parametric function of the limit order book. The noise-robust estimators are constructed as plug-in versions of their counterparts, where we replace the efficient price, which is non-observable, by an estimator based on the raw price and limit order book data. We show that the technology can be applied to five leading examples where, depending on the problem, price possibly includes infinite jump activity and sampling times encompass asynchronicity and endogeneity. PubDate: 2021-08-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract Gaussian graphical models are semi-algebraic subsets of the cone of positive definite covariance matrices. They are widely used throughout natural sciences, computational biology and many other fields. Computing the vanishing ideal of the model gives us an implicit description of the model. In this paper, we resolve two conjectures given by Sturmfels and Uhler. In particular, we characterize those graphs for which the vanishing ideal of the Gaussian graphical model is generated in degree 1 and 2. These turn out to be the Gaussian graphical models whose ideals are toric ideals, and the resulting graphs are the 1-clique sums of complete graphs. PubDate: 2021-08-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract We consider the problem of identification of the position of some source by observations of K detectors receiving signals from this source. The time of arriving of the signal to the k-th detector depends of the distance between this detector and the source. The signals are observed in the presence of small Gaussian noise. The properties of the MLE and Bayesian estimators are studied in the asymptotic of small noise. PubDate: 2021-08-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper, we consider a robust test for structural breaks in dynamic factor models. The proposed framework considers structural changes when the underlying high-dimensional time series is contaminated by outlying observations, which are often observed in many real applications such as fMRI, economics and finance. We propose a test based on the robust estimation of a vector autoregressive model for principal component factors using minimum density power divergence. The simulations study shows excellent finite sample performance, higher powers while achieving good sizes in all cases considered. Our method is illustrated to the resting state fMRI series to detect brain connectivity changes. PubDate: 2021-08-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract In this paper, we propose a non-negative feature selection/feature grouping (nnFSG) method for general sign-constrained high-dimensional regression problems that allows regression coefficients to be disjointly homogeneous, with sparsity as a special case. To solve the resulting non-convex optimization problem, we provide an algorithm that incorporates the difference of convex programming, augmented Lagrange and coordinate descent methods. Furthermore, we show that the aforementioned nnFSG method recovers the oracle estimate consistently, and that the mean-squared errors are bounded. Additionally, we examine the performance of our method using finite sample simulations and applying it to a real protein mass spectrum dataset. PubDate: 2021-08-01

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Abstract The accuracy of response variables is crucially important to train regression models. In some situations, including the high-dimensional case, response observations tend to be inaccurate, which would lead to biased estimators by directly fitting a conventional model. For analyzing data with anomalous responses in the high-dimensional case, in this work, we adopt γ-divergence to conduct variable selection and estimation methods. The proposed method possesses good robustness to anomalous responses, and the proportion of abnormal data does not need to be modeled. It is implemented by an efficient coordinate descent algorithm. In the setting where the dimensionality p can grow exponentially fast with the sample size n, we rigorously establish variable selection consistency and estimation bounds. Numerical simulations and an application on real data are presented to demonstrate the performance of the proposed method. PubDate: 2021-08-01