Abstract: Among several variable selection methods, LASSO is the most desirable estimation procedure for handling regularization and variable selection simultaneously in the high-dimensional linear regression models when multicollinearity exists among the predictor variables. Since LASSO is unstable under high multicollinearity, the elastic-net (Enet) estimator has been used to overcome this issue. According to the literature, the estimation of regression parameters can be improved by adding prior information about regression coefficients to the model, which is available in the form of exact or stochastic linear restrictions. In this article, we proposed a stochastic restricted LASSO-type estimator (SRLASSO) by incorporating stochastic linear restrictions. Furthermore, we compared the performance of SRLASSO with LASSO and Enet in root mean square error (RMSE) criterion and mean absolute prediction error (MAPE) criterion based on a Monte Carlo simulation study. Finally, a real-world example was used to demonstrate the performance of SRLASSO. PubDate: Mon, 30 Mar 2020 07:35:09 +000

Abstract: A risk measure commonly used in financial risk management, namely, Value-at-Risk (VaR), is studied. In particular, we find a VaR forecast for heteroscedastic processes such that its (conditional) coverage probability is close to the nominal. To do so, we pay attention to the effect of estimator variability such as asymptotic bias and mean square error. Numerical analysis is carried out to illustrate this calculation for the Autoregressive Conditional Heteroscedastic (ARCH) model, an observable volatility type model. In comparison, we find VaR for the latent volatility model i.e., the Stochastic Volatility Autoregressive (SVAR) model. It is found that the effect of estimator variability is significant to obtain VaR forecast with better coverage. In addition, we may only be able to assess unconditional coverage probability for VaR forecast of the SVAR model. This is due to the fact that the volatility process of the model is unobservable. PubDate: Tue, 10 Mar 2020 07:20:05 +000

Abstract: In this paper, two methods of control chart were proposed to monitor the process based on the two-parameter Gompertz distribution. The proposed methods are the Gompertz Shewhart approach and Gompertz skewness correction method. A simulation study was conducted to compare the performance of the proposed chart with that of the skewness correction approach for various sample sizes. Furthermore, real-life data on thickness of paint on refrigerators which are nonnormal data that have attributes of a Gompertz distribution were used to illustrate the proposed control chart. The coverage probability (CP), control limit interval (CLI), and average run length (ARL) were used to measure the performance of the two methods. It was found that the Gompertz exact method where the control limits are calculated through the percentiles of the underline distribution has the highest coverage probability, while the Gompertz Shewhart approach and Gompertz skewness correction method have the least CLI and ARL. Hence, the two-parameter Gompertz-based methods would detect out-of-control faster for Gompertz-based charts. PubDate: Tue, 25 Feb 2020 04:05:11 +000

Abstract: The Weibull growth model is an important model especially for describing the growth instability; therefore, in this paper, three methods, namely, generalized maximum entropy, Bayes, and maximum a posteriori, for estimating the four parameter Weibull growth model have been presented and compared. To achieve this aim, it is necessary to use a simulation technique to generate the samples and perform the required comparisons, using varying sample sizes (10, 12, 15, 20, 25, and 30) and models depending on the standard deviation (0.5). It has been shown from the computational results that the Bayes method gives the best estimates. PubDate: Tue, 14 Jan 2020 07:50:04 +000

Abstract: In this paper, comparison results of parametric methodologies of change points, applied to maximum temperature records from the municipality of Tlaxco, Tlaxcala, México, are presented. Methodologies considered are likelihood ratio test, score test, and binary segmentation (BS), pruned exact linear time (PELT), and segment neighborhood (SN). In order to compare such methodologies, a quality analysis of the data was performed; in addition, lost data were estimated with linear regression, and finally, SARIMA models were adjusted. PubDate: Mon, 16 Dec 2019 04:35:13 +000

Abstract: Partial least squares (PLS) regression is an alternative to the ordinary least squares (OLS) regression, used in the presence of multicollinearity. As with any other modelling method, PLS regression requires a reliable model selection tool. Cross validation (CV) is the most commonly used tool with many advantages in both preciseness and accuracy, but it also has some drawbacks; therefore, we will use L-curve criterion as an alternative, given that it takes into consideration the shrinking nature of PLS. A theoretical justification for the use of L-curve criterion is presented as well as an application on both simulated and real data. The application shows how this criterion generally outperforms cross validation and generalized cross validation (GCV) in mean squared prediction error and computational efficiency. PubDate: Wed, 30 Oct 2019 09:05:29 +000

Abstract: A simple solution to determine the distributions of queue-lengths at different observation epochs for the model GIX/Geo/c is presented. In the past, various discrete-time queueing models, particularly the multiserver bulk-arrival queues, have been solved using complicated methods that lead to incomplete results. The purpose of this paper is to use the roots method to solve the model GIX/Geo/c that leads to a result that is analytically elegant and computationally efficient. This method works well even for the case when the inter-batch-arrival times follow heavy-tailed distributions. The roots of the underlying characteristic equation form the basis for all distributions of queue-lengths at different time epochs. PubDate: Tue, 03 Sep 2019 13:30:03 +000

Abstract: This paper presents analytically explicit results for the distribution of the number of customers served during a busy period for special cases of the queues when initiated with m customers. The functional equation for the Laplace transform of the number of customers served during a busy period is widely known, but several researchers state that, in general, it is not easy to invert it except for some simple cases such as and queues. Using the Lagrange inversion theorem, we give an elegant solution to this equation. We obtain the distribution of the number of customers served during a busy period for various service-time distributions such as exponential, deterministic, Erlang-k, gamma, chi-square, inverse Gaussian, generalized Erlang, matrix exponential, hyperexponential, uniform, Coxian, phase-type, Markov-modulated Poisson process, and interrupted Poisson process. Further, we also provide computational results using our method. The derivations are very fast and robust due to the lucidity of the expressions. PubDate: Tue, 27 Aug 2019 07:05:12 +000

Abstract: We obtain weak convergence and optimal scaling results for the random walk Metropolis algorithm with a Gaussian proposal distribution. The sampler is applied to hierarchical target distributions, which form the building block of many Bayesian analyses. The global asymptotically optimal proposal variance derived may be computed as a function of the specific target distribution considered. We also introduce the concept of locally optimal tunings, i.e., tunings that depend on the current position of the Markov chain. The theorems are proved by studying the generator of the first and second components of the algorithm and verifying their convergence to the generator of a modified RWM algorithm and a diffusion process, respectively. The rate at which the algorithm explores its state space is optimized by studying the speed measure of the limiting diffusion process. We illustrate the theory with two examples. Applications of these results on simulated and real data are also presented. PubDate: Mon, 26 Aug 2019 14:05:08 +000

Abstract: A new approach towards probabilistic proof of the convergence of the Collatz conjecture is described via identifying a sequential correlation of even natural numbers by divisions by that follows a recurrent pattern of the form , where represents divisions by 2 more than once. The sequence presents a probability of 50:50 of division by 2 more than once as opposed to division by 2 once over the even natural numbers. The sequence also gives the same 50:50 probability of consecutive Collatz even elements when counted for division by 2 more than once as opposed to division by 2 once and a ratio of 3:1. Considering Collatz function producing random numbers and over sufficient number of iterations, this probability distribution produces numbers in descending order that lead to the convergence of the Collatz function to 1, assuming that the only cycle of the function is 1-4-2-1. PubDate: Thu, 01 Aug 2019 01:05:58 +000

Abstract: We propose a novel modeling framework to study the effect of covariates of various types on the conditional distribution of the response. The methodology accommodates flexible model structure, allows for joint estimation of the quantiles at all levels, and provides a computationally efficient estimation algorithm. Extensive numerical investigation confirms good performance of the proposed method. The methodology is motivated by and applied to a lactating sow study, where the primary interest is to understand how the dynamic change of minute-by-minute temperature in the farrowing rooms within a day (functional covariate) is associated with low quantiles of feed intake of lactating sows, while accounting for other sow-specific information (vector covariate). PubDate: Tue, 16 Jul 2019 09:05:26 +000

Abstract: In any longitudinal study, a dropout before the final timepoint can rarely be avoided. The chosen dropout model is commonly one of these types: Missing Completely at Random (MCAR), Missing at Random (MAR), Missing Not at Random (MNAR), and Shared Parameter (SP). In this paper we estimate the parameters of the longitudinal model for simulated data and real data using the Linear Mixed Effect (LME) method. We investigate the consequences of misspecifying the missingness mechanism by deriving the so-called least false values. These are the values the parameter estimates converge to, when the assumptions may be wrong. The knowledge of the least false values allows us to conduct a sensitivity analysis, which is illustrated. This method provides an alternative to a local misspecification sensitivity procedure, which has been developed for likelihood-based analysis. We compare the results obtained by the method proposed with the results found by using the local misspecification method. We apply the local misspecification and least false methods to estimate the bias and sensitivity of parameter estimates for a clinical trial example. PubDate: Mon, 01 Jul 2019 09:05:38 +000

Abstract: Different versions of control chart structure are available under various ranked set strategies. In these control charts, computation of performance measures was carried out through Monte Carlo simulation method (MCSM). In this article, we have defined a generalized structure of control charts under variant sampling strategies followed by derivation of their different performance measures. For the derivation of different performance measures, we have proposed pivotal quantity. For comparative analysis, we have presented results of generalized performance measures by involving numerical method (NM) as computation. We found that values of generalized performance measures based on NM are almost similar to values of performance measures based on MCSM. Also, NM is time efficient and can be considered as an alternative of MCSM. PubDate: Tue, 04 Jun 2019 11:06:17 +000

Abstract: Kernel density estimators due to boundary effects are often not consistent when estimating a density near a finite endpoint of the support of the density to be estimated. To address this, researchers have proposed the application of an optimal bandwidth to balance the bias-variance trade-off in estimation of a finite population mean. This, however, does not eliminate the boundary bias. In this paper weighting method of compensating for nonresponse is proposed. Asymptotic properties of the proposed estimator of the population mean are derived. Under mild assumptions, the estimator is shown to be asymptotically consistent. PubDate: Sun, 02 Jun 2019 00:08:11 +000

Abstract: In this paper, we are interested in estimating several quantiles simultaneously in a regression context via the Bayesian approach. Assuming that the error term has an asymmetric Laplace distribution and using the relation between two distinct quantiles of this distribution, we propose a simple fully Bayesian method that satisfies the noncrossing property of quantiles. For implementation, we use Metropolis-Hastings within Gibbs algorithm to sample unknown parameters from their full conditional distribution. The performance and the competitiveness of the underlying method with other alternatives are shown in simulated examples. PubDate: Sun, 02 Jun 2019 00:00:00 +000

Abstract: In the usual quantile regression setting, the distribution of the response given the explanatory variables is unspecified. In this work, the distribution is specified and we introduce new link functions to directly model specified quantiles of seven 1–parameter continuous distributions. Using the vector generalized linear and additive model (VGLM/VGAM) framework, we transform certain prespecified quantiles to become linear or additive predictors. Our parametric quantile regression approach adopts VGLMs/VGAMs because they can handle multiple linear predictors and encompass many distributions beyond the exponential family. Coupled with the ability to fit smoothers, the underlying strong assumption of the distribution can be relaxed so as to offer a semiparametric–type analysis. By allowing multiple linear and additive predictors simultaneously, the quantile crossing problem can be avoided by enforcing parallelism constraint matrices. This article gives details of a software implementation called the VGAMextra package for R. Both the data and recently developed software used in this paper are freely downloadable from the internet. PubDate: Tue, 07 May 2019 09:05:22 +000

Abstract: Using bootstrap method, we have constructed nonparametric prediction intervals for Conditional Value-at-Risk for returns that admit a heteroscedastic location-scale model where the location and scale functions are smooth, and the function of the error term is unknown and is assumed to be uncorrelated to the independent variable. The prediction interval performs well for large sample sizes and is relatively small, which is consistent with what is obtainable in the literature. PubDate: Tue, 07 May 2019 08:05:13 +000

Abstract: The paper addresses the issue of identifying the maximum safe dose in the context of noninferiority trials where several doses of toxicological compounds exist. Statistical methodology for identifying the maximum safe dose is available for three-arm noninferiority designs with only one experimental drug treatment. Extension of this methodology for several experimental groups exists but with multiplicity adjustment. However, if the experimental or the treatment groups can be ordered a priori according to their treatment effect, then multiplicity adjustment is unneeded. Assuming homogeneity of variances across dose group in normality settings, we employed the generalized Fieller’s confidence interval method in a multiple comparison stepwise procedure by incorporating the partitioning principle in order to control the familywise error rate (FWER). Simulation results revealed that the procedure properly controlled the FWER in strong sense. Also, the power of our procedure increases with increasing sample size and the ratio of mean differences. We illustrate our procedure with mutagenicity dataset from a clinical study. PubDate: Sun, 14 Apr 2019 07:05:20 +000

Abstract: Using the Pairwise Absolute Clustering and Sparsity (PACS) penalty, we proposed the regularized quantile regression QR method (QR-PACS). The PACS penalty achieves the elimination of insignificant predictors and the combination of predictors with indistinguishable coefficients (IC), which are the two issues raised in the searching for the true model. QR-PACS extends PACS from mean regression settings to QR settings. The paper shows that QR-PACS can yield promising predictive precision as well as identifying related groups in both simulation and real data. PubDate: Wed, 10 Apr 2019 14:05:26 +000

Abstract: This article aims to introduce a generalization of the inverse Rayleigh distribution known as exponentiated inverse Rayleigh distribution (EIRD) which extends a more flexible distribution for modeling life data. Some statistical properties of the EIRD are investigated, such as mode, quantiles, moments, reliability, and hazard function. We describe different methods of parametric estimations of EIRD discussed by using maximum likelihood estimators, percentile based estimators, least squares estimators, and weighted least squares estimators and compare those estimates using extensive numerical simulations. The performances of the proposed methods of estimation are compared by Monte Carlo simulations for both small and large samples. To illustrate these methods in a practical application, a data analysis of real-world coating weights of iron sheets is obtained from the ALAF industry, Tanzania, during January-March, 2018. ALAF industry uses aluminum-zinc galvanization technology in the coating process. This application identifies the EIRD as a better model than other well-known distributions in modeling lifetime data. PubDate: Mon, 01 Apr 2019 13:05:35 +000

Abstract: In this paper we use a statistical mechanical model as a paradigm for educational choices when the reference population is partitioned according to the socioeconomic attributes of gender and residence. We study how educational attainment is influenced by socioeconomic attributes of gender and residence for five selected developing countries. The model has a social and a private incentive part with coefficients measuring the influence individuals have on each other and the external influence on individuals, respectively. The methods of partial least squares and the ordinary least squares are, respectively, used to estimate the parameters of the interacting and the noninteracting models. This work differs from the previous work that motivated this work in the following sense: (a) the reference population is divided into subgroups with unequal subgroup sizes, (b) the proportion of individuals in each of the subgroups may depend on the population size , and (c) the method of partial least squares is used for estimating the parameters of the model with social interaction as opposed to the least squares method used in the earlier work. PubDate: Mon, 04 Mar 2019 09:05:34 +000

Abstract: Without the ability to use research tools and procedures that yield consistent measurements, researchers would be unable to draw conclusions, formulate theories, or make claims about generalizability of their results. In statistics, the coefficient of variation is commonly used as the index of reliability of measurements. Thus, comparing coefficients of variation is of special interest. Moreover, the lognormal distribution has been frequently used for modeling data from many fields such as health and medical research. In this paper, we proposed a simulated Bartlett corrected likelihood ratio approach to obtain inference concerning the ratio of two coefficients of variation for lognormal distribution. Simulation studies show that the proposed method is extremely accurate even when the sample size is small. PubDate: Sun, 03 Mar 2019 09:05:34 +000

Abstract: In mean-based approaches to dietary data analysis, it is possible for potentially important associations at the tails of the intake distribution, where inadequacy or excess is greatest, to be obscured due to unobserved heterogeneity. Participants in the upper or lower tails of dietary intake data will potentially have the greatest change in their behavior when presented with a health behavior intervention; thus, alternative statistical methods to modeling these relationships are needed to fully describe the impact of the intervention. Using data from Tu Salud ¡Si Cuenta! (Your Health Matters!) at Home Intervention, we aimed to compare traditional mean-based regression to quantile regression for describing the impact of a health behavior intervention on healthy and unhealthy eating indices. The mean-based regression model identified no differences in dietary intake between intervention and standard care groups. In contrast, the quantile regression indicated a nonconstant relationship between the unhealthy eating index and study groups at the upper tail of the unhealthy eating index distribution. The traditional mean-based linear regression was unable to fully describe the intervention effect on healthy and unhealthy eating, resulting in a limited understanding of the association. PubDate: Sun, 03 Mar 2019 07:05:53 +000

Abstract: The one-sided and two-sided Shewhart w-of-w standard and improved runs-rules monitoring schemes to monitor the mean of normally distributed observations from independent and identically distributed (iid) samples are investigated from an overall performance perspective, i.e., the expected weighted run-length (EWRL), for every possible positive integer value of w. The main objective of this work is to use the Markov chain methodology to formulate a theoretical unified approach of designing and evaluating Shewhart w-of-w standard and improved runs-rules for one-sided and two-sided schemes in both the zero-state and steady-state modes. Consequently, the main findings of this paper are as follows: (i) the zero-state and steady-state ARL and initial probability vectors of some of the one-sided and two-sided Shewhart w-of-w standard and improved runs-rules schemes are theoretically similar in design; however, their empirical performances are different and (ii) unlike previous studies that use ARL only, we base our recommendations using the zero-state and steady-state EWRL metrics and we observe that the steady-state improved runs-rules schemes tend to yield better performance than the other considered competing schemes, separately, for one-sided and two-sided schemes. Finally, the zero-state and steady-state unified approach run-length equations derived here can easily be used to evaluate other monitoring schemes based on a variety of parametric and nonparametric distributions. PubDate: Tue, 19 Feb 2019 14:05:11 +000

Abstract: Background. Evaluation of diagnostic assays and predictive performance of biomarkers based on the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) are vital in diagnostic and targeted medicine. The partial area under the curve (pAUC) is an alternative metric focusing on a range of practical and clinical relevance of the diagnostic assay. In this article, we adopt and extend the min-max method to the estimation of the pAUC when multiple continuous scaled biomarkers are available and compare the performances of our proposed approach with existing approaches via simulations. Methods. We conducted extensive simulation studies to investigate the performance of different methods for the combination of biomarkers based on their abilities to produce the largest pAUC estimates. Data were generated from different multivariate distributions with equal and unequal variance-covariance matrices. Different shapes of the ROC curves, false positive fraction ranges, and sample size configurations were considered. We obtained the mean and standard deviation of the pAUC estimates through re-substitution and leave-one-pair-out cross-validation. Results. Our results demonstrate that the proposed method provides the largest pAUC estimates under the following three important practical scenarios: (1) multivariate normally distributed data for nondiseased and diseased participants have unequal variance-covariance matrices; or (2) the ROC curves generated from individual biomarker are relative close regardless of the latent normality distributional assumption; or (3) the ROC curves generated from individual biomarker have straight-line shapes. Conclusions. The proposed method is robust and investigators are encouraged to use this approach in the estimation of the pAUC for many practical scenarios. PubDate: Sun, 03 Feb 2019 00:14:54 +000

Abstract: The Berry-Esseen bound for the random variable based on the sum of squared sample correlation coefficients and used to test the complete independence in high diemensions is shown by Stein’s method. Although the Berry-Esseen bound can be applied to all real numbers in , a nonuniform bound at a real number usually provides a sharper bound if is fixed. In this paper, we present the first version of a nonuniform bound on a normal approximation for this random variable with an optimal rate of by using Stein’s method. PubDate: Sun, 03 Feb 2019 00:00:00 +000

Abstract: We define a new four-parameter model called the odd log-logistic generalized inverse Gaussian distribution which extends the generalized inverse Gaussian and inverse Gaussian distributions. We obtain some structural properties of the new distribution. We construct an extended regression model based on this distribution with two systematic structures, which can provide more realistic fits to real data than other special regression models. We adopt the method of maximum likelihood to estimate the model parameters. In addition, various simulations are performed for different parameter settings and sample sizes to check the accuracy of the maximum likelihood estimators. We provide a diagnostics analysis based on case-deletion and quantile residuals. Finally, the potentiality of the new regression model to predict price of urban property is illustrated by means of real data. PubDate: Thu, 10 Jan 2019 12:05:50 +000

Abstract: In this paper, R wave peak interval independent atrial fibrillation detection algorithm is proposed based on the analysis of the synchronization feature of the electrocardiogram signal by a deep neural network. Firstly, the synchronization feature of each heartbeat of the electrocardiogram signal is constructed by a Recurrence Complex Network. Then, a convolution neural network is used to detect atrial fibrillation by analyzing the eigenvalues of the Recurrence Complex Network. Finally, a voting algorithm is developed to improve the performance of the beat-wise atrial fibrillation detection. The MIT-BIH atrial fibrillation database is used to evaluate the performance of the proposed method. Experimental results show that the sensitivity, specificity, and accuracy of the algorithm can achieve 94.28%, 94.91%, and 94.59%, respectively. Remarkably, the proposed method was more effective than the traditional algorithms to the problem of individual variation in the atrial fibrillation detection. PubDate: Thu, 03 Jan 2019 09:05:34 +000