American Journal of Biostatistics
[9 followers] Follow
Open Access journal
ISSN (Print) 1948-9889 - ISSN (Online) 1948-9897
Published by Science Publications [28 journals]
- IS POISSON DISPERSION DILUTED OR OVER-SATURATED' AN INDEX IS CREATED
Abstract: A prelude to interpret a pattern in the repeating incidences is to identify the underlying frequency distribution of the collected data. A case in point is the Poisson distribution which is often selected for medical count data such as gene mutations, medication error and number of ambulatory pickups in a day. A requirement for the Poisson distribution is that the variance ought to be equal to the mean. The variance signifies the volatility in the occurrences. An implication is that the volatility becomes more when the average incidence is higher. When this requirement of the functional equivalence of the Poisson mean and variance is breached, the data deviates from a Poisson distribution. How could a data analyst recognize and point out to the medical team the dilution level of the requirement in their data' For this purpose, a simple and easier geometrical approach is developed in this article and illustrated with several historical data sets in the literature.
- CONFIDENCE INTERVALS FOR SIGNAL TO NOISE RATIO OF A POISSON DISTRIBUTION
Abstract: The Poisson distribution is one of the most useful probability distributions to fit rare event data. Confidence interval for the SNR is an important issue among the researchers in image processing. This study considers several confidence intervals for the SNR of a Poisson distribution. Different confidence intervals available in literature are reviewed and compared based on the coverage probability and average width of the intervals. Since a theoretical comparison is not possible, a simulation study has been conducted to compare the performance of the interval estimators. Based on the simulation study we observed that most of our proposed interval estimators are performing well in the sense of attaining nominal size and they have been recommended for the researchers. Most of the proposed intervals except methods Wald, Waldz and bootstrap are performing well in the sense of attaining nominal size. The exact method performed the best followed by VSS, Wald B and Bayes in the sense of attaining nominal size and shorter width when the SNR is large.
- CORRELATION BETWEEN DISPERSION AND MEAN TOASSESS HEALTHCARE SERVICEE
Abstract: Motivation for this research work started while helping a hospital administrator to assess whether patient oriented activity duration, X'0 is reflecting the services efficiency' Higher value of the sample mean duration, X implies lesser productivity in the hospital and more healthcare cost. Likewise, larger value of sample dispersion, sx2 in the service durations is an indicator of lesser reliability and inefficiency. Of course, the dispersion, sx2 in a healthcare hospital operation could be due to diverse medical complications among patients or operational inefficiency. Assuming that it is not the diverse medical complications of patients, how should the pertinent information from data be extracted, quantified and interpreted to address inefficient operation' This is the problem statement for discussion in this article. To be specific, in an inefficient hospital operation, the sample dispersion and mean of service durations are likely to be highly correlated. Their correlation is a clue to identify an inefficient operation of a hospital. To compute the correlation, currently there is no appropriate formula in the literature. The aim of this article is, therefore, to derive a working formula to compute the correlation between sample dispersion and mean. The dispersion is too valuable statistical measure to quickly dispense, not only in healthcare operations but also in engineering, economics, business, social or sport applications. The approach starts first in quantifying a general relationship between the dispersion and mean in a given data. This relationship might range from a linear to a quadratic, cubic or higher degree. Suppose that the dispersion, '2 is a function, f(')of the mean, ' of patient oriented activity durations. Specific functionality depends on the frequency pattern of the data. The tangent at a locus of their relationship curve is either declining or inclining line with an angle ' whose cosine value is indeed the correlation between the mean, x and dispersion, sx2. An expression to compute the angle is nowhere seen in the literature. Therefore, this article derives a general expression based on geometric concepts and then obtains specific formula for several count and continuous distributions. These expressions are foundations for further data analyses. To initiate, promote or maintain an efficient service operation for patients in a hospital, practical strategies have to be formulated based on the cluein the form of correlation value. For this purpose, a one-to-one relationship between sample dispersion and mean could be utilized to improve the service efficiency. In this process, a formula is developed to check whether the model parameters are orthogonal. The curvature and the shifting angle in the relationship between dispersion and mean are captured when the mean changes one unit. Both Poisson and exponential distributions are illustrated to comprehend the concepts and the derived expressions of this article. Efficient healthcare service is a necessity not only in USA but also in other nations because of an escalating demand by medical tourists in this era of globalized medical treatment. A reformation to the entire healthcare field could be achievable with the help of biostatical concepts and tools. To extract and comprehend pertinent data information in the patient oriented activity durations, the correlation is a tool. The data information holds the key to make the much needed reformation and operational efficiency. This article illustrates that the correlation between the data mean and dispersion provides clues. The correlation helps to assess healthcare service efficiency as it is demonstrated in this article with data. Similar applications occur in engineering, business and science fields.
- THE DEVELOPMENT OF PARAMETER ESTIMATION ON HAZARD RATE OF TRIVARIATE
Abstract: In this study, the interrelation concepts of trivariate distribution function, trivariate survival function, trivariate probability density function and trivariate hazard rate function of trivariate Weibull distribution are presented. The goal of this contribution is to estimate the trivariate Weibull hazard rate parameters. To reach this goal, we will use an analitical approach in estimating called the Maximum Likelihood Estimation (MLE) method. Using numerical iterative procedure the scale parameters, the shape parameters and the power parameter estimators on trivariate hazard rate of trivariate Weibull distribution must be obtained. The MLE technique estimates accurately the trivariate Weibull hazard rate parameters.
- Whether Gaussian Nucleus Entropy Helps' Case in Point is Prediction of
Number of Cesarean Births
Abstract: In this article, entropy in the collected data about the Gaussian population mean is traced from its embryonic stage as new data are periodically collected. The traditional Shannon's entropy has shortcomings from the data analytics point of view and it creates a necessity to refine the Shannon's entropy. Its refined version is named Gaussian Nucleus Entropy in this article. Advantages of the refined version are pointed out. The Prior, likelihood, Posterior and predictive nucleus entropies are derived, interconnected and interpreted. The results are illustrated using data on cesarean births in thirteen countries in the period [1987, 2007]. The medical communities and families are alarmed, as the cesarean births are increasing not due to emergency or necessity basis but rather for monetary or convenience basis. Nucleus entropy based data analysis answers whether their alarm is baseless.
- A Bayesian Adaptive Design for Combination of Three Drugs in Cancer Phase
I Clinical Trials
Abstract: We describe a Bayesian adaptive design for early phase cancer trials of a combination of three agents. This is an extension of an earlier work by the authors by allowing all three agents to vary during the trial and by assigning different drug combinations to cohorts of three patients. The primary objective is to estimate the Maximum Tolerated Dose (MTD) surface in the three-dimensional Cartesian space. A class of linear models on the logit of the probability of Dose Limiting Toxicity (DLT) are used to describe the relationship between doses of the three drugs and the probability of DLT. Trial design proceeds using conditional escalation with overdose control, where at each stage of the trial, we seek a dose of one agent using the current posterior distribution of the MTD of this agent given the current doses of the other two agents. The MTD surface is estimated at the end of the trial as a function of Bayes estimates of the model parameters. Operating characteristics are evaluated with respect to trial safety and percent of dose recommendation at dose combination neighborhoods around the true MTD surface.
- Determination of Predictors Associated With HIV/AIDS Patients on ART Using
Accelerated Failure Time Model for Interval Censored Survival Data
Abstract: The main objective of this paper is to identify the independent predictors affecting the survival of HIV/AIDS infected patients on Antiretroviral Therapy (ART), an interval censored event time outcome. A total of 2052 HIV/AIDS patients, who were on ART at Ram ManoharLohia Hospital, New Delhi, India, during the period of April 2004 to December 2010, were included for analysis. Accelerated Failure Time Models (AFTM) viz., exponential, Weibull, lognormal and loglogistic for interval censored survival data, have been used to determine the significant predictors for HIV/AIDS infected patients. The best model is selected on the basis of Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values. Out of 2052 HIV/AIDS patients 65.4% were males and 34.6% were females. A majority 93.7% of patients had CD4 cell counts below 350 cells/mm3 at the time of initiation of ART. The mean age of patients at diagnosis was 34.28±8.19 years. The prognostic factorsviz., age, sex, CD4 cell count, past smokers, baseline hemoglobin and baseline BMI are found to be statistically significant (p<0.000) for HIV/AIDS patients on ART. Hence, a special attention is needed for patients with low CD4 cell counts, low BMI and low hemoglobin. Lognormal AFT model is found to be the best model to identify the independent predictors for survival of HIV population.
- Trial Design and Analysis with Incomplete Paired Data
Abstract: For a clinical trial design with paired data, it often involves missing observations. In such a case, the data from the trial become a mixture of paired and unpaired data. A commonly used approach for the analysis of the trial data is to ignore the incomplete pairs. Such a treatment of missing data is not statistically efficient. We propose a simple method that will allow us to use all data, including the incomplete pairs. The method is optimal in the sense that it minimizes the variance. We will show how to design classical and adaptive trials with the proposed method. The proposed method can also be used for meta-analysis, in which, some trials with paired data and some are not.
- Interface between the Ratio β with Area Under the ROC Curve and
Kullback-Leibler Divergence Under the Combination of Half Normal and
Abstract: Classifying objects/individuals is common problem of interest. Receiver Operating Characteristic (ROC) curve is one such tool which helps in classifying the objects/individuals into one of the two known groups or populations. The present work focuses on proposing a Hybrid version of the ROC model. Usually the test scores of the two populations namely normal and abnormal tend to follow some particular distribution, here in this study it is considered that the test scores of normal follow Half Normal and abnormal follow Rayleigh distributions respectively. The characteristics of the proposed ROC model along with measures such as AUC and KLD are derived and demonstrated using a real data set and simulation data sets.
- Adaptive Superiority and Noninferiority Trial Design with Paired Binary
Abstract: Non-inferiority of a diagnostic test to the standard is a common issue in medical research. For instance, we may be interested in determining if a new diagnostic test is noninferior to the standard reference test because the new test might be inexpensive to the extent that some small inferior margin in sensitivity or specificity may be acceptable. Noninferiority trials are also found to be useful in clinical trials, such as image studies, where the data are collected in pairs. Conventional noninferiority trials for paired binary data are designed with a fixed sample size and no interim analysis is allowed. Adaptive design which allows for interim modifications of the trial becomes very popular in recent years and are widely used in clinical trials because of its efficiency. However, to our knowledge there is no adaptive design method available for noninferiority trial with paired binary data. In this study, we developed an adaptive design method for non-inferiority trials with paired binary data, which can also be used for superiority trials when the noninferiority margin is set to zero. We included a trial example and provided the SAS program for the design simulations.
- General Linear Models in a Missing Outcome Environment of Clinical Trials
Incorporating with Splines for Time-Invariant Continuous Adjustment
Abstract: Missing data is a common occurrence in longitudinal studies of health care research. Although many studies have shown the potential usefulness of current missing analyses, e.g., (1) Complete Case (CC) analysis; (2) imputation methods such as Last Observation Carried Forward (LOCF), multiple imputations, Expectation-Maximization algorithm approach; and (3) methods using all available data such as linear mixed model and generalized estimation equations approach, the CC analysis or LOCF imputation method have been popular due to their simplicity of execution regardless of some critical drawbacks. The proposed approach employs the generalized least squares method using all available data without deletion or imputations for missing outcomes, producing the best linear unbiased estimate. A simulation study was conducted to compare the proposed approach to commonly used missing analyses under each missing data mechanism and showed the validity of the proposed approach, especially with the first order autoregressive correlation structure. B-spline is applied to the proposed model to manage non-linear relationships between outcome and continuous covariate. Application to a cell therapy clinical trial is presented.
- Methods for Computing Missing Item Response in Psychometric Scale
Abstract: Therapeutic potential of a new antidepressant drug isevaluated frequently based on multi-item psychometric scales. The total scoreof a psychometric scale is calculated based on the responses of multiple-items,in which each item is scored on a likert scale. Missing responses in some ofthe items are inevitable and hence it is a problem in calculating the totalscore of a scale. Different approaches can be used to handle the missing itemresponses in constructing the total scores of a psychometric scale. Oneapproach is that if a patient has missing responses in one or more items,his/her total score will be missing; another approach is that the missing itemresponse will be imputed before calculating the scale total score. For theimputation, different methods can be used. Each of the methods has somedrawbacks. This paper compares six methods, commonly used in imputing themissing item responses when there are missing responses at one or more items,but not missing more than 50% items of the scale. Simulation studies indicatethat substituting the mean of the completed items of a scale for a givenpatient is generally the most desirable method for imputing both the random andnon-random missing items in the psychometric scale construction.
- APPLICATION OF THE MEDIATION ANALYSIS APPROACH TO CANCER PREVENTION TRIALS
RELATING TO CANCER SEVERITY
Abstract: In a cancer prevention trial, an outcome such as cancer severity cannot be evaluated in individuals who do not develop cancer. In such a situation, the principal stratification approach has been applied. Under this approach, the Principal Strata Effect (PSE) has been considered, which is defined as the effect of treatment on the outcome among the subpopulation in which individuals would have developed cancer under either treatment arm. However, in this study, the author does not apply this approach to the situation. Instead, the author discusses the mediation analysis approach, in which Natural Direct and Indirect Effects (NDE and NIE) are considered. This approach has an advantage as it considers two possible mechanisms of treatment control of cancer severity: The first is that the treatment may prevent an individual from getting cancer, which could be regarded as control of cancer severity; the second is that even if the treatment does not prevent an individual from getting cancer, it may still impair the cancer severity. The former mechanism corresponds to the NIE and the latter corresponds to the NDE, although the PSE can consider only the latter mechanism. Methodologies proposed in the context of vaccine trials are applied to data from a randomized prostate cancer prevention trial.
- COLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE
Abstract: This article considers the analysis of Multiple Linear Regressions (MLRs) that are essential statistical method for the analysis of medical data in various fields of medical research like prognostic studies, epidemiological risk factor studies, experimental studies, diagnostic studies and observational studies. An approach is used in this article to select the âtrueâ regression model with different sample sizes. We used the simulation study to evaluate the approach in terms of its ability to identify the âtrueâ model with two options of distance measures: Ward's Minimum Variance Approach and the Single Linkage Approach. The comparison of the two options performed was in terms of their percentage of the number of times that they identify the âtrueâ model. The simulation results indicate that overall, the approach exhibited excellent performance, where the second option providing the best performance for the two sample sizes considered. The primary result of our article is that we recommend using the approach with the second option as a standard procedure to select the âtrueâ model.
- âBIVARIATE DISTRIBUTIONâ FOR INFRASTRUCTURES
AMONG OPERATIVE, NATURAL AND NO MENOPAUSES
Abstract: Menopause is not an illness but rather an important event as it changes the body physiology and mental cognition via hormonal changes. During data analysis of menopauses incidence data, new bivariate distribution is discovered. Their marginal, conditional distribution and statistical properties including the inter and partial correlations are explored and utilized to interpret menopauses data. A likelihood ratio hypothesis testing procedure is constructed to test the statistical significance of the sample estimate of the chance for menopause and estimate of the chance for operative menopause. The menopause data are analyzed and interpreted in the illustration. Research directions for future work are pointed out.
- MULTI-STATE MODELS OF HIV/AIDS BY HOMOGENEOUS SEMI-MARKOV PROCESS
Abstract: Multi-state stochastic models are useful tools for studying complex dynamics such as chronic diseases. The purpose of this study is to determine factors associated with the progression between different stages of the disease and to model the progression of HIV/AIDS disease of an individual patient under ART follow-up using semi-Markov processes. A sample of 1456 patients has been taken from a hospital record at Amhara Referral Hospitals, Amhara Region, Ethiopia, who have been under ART follow up from June 2006 to August 2013. The states of disease progression adopted in the multi-state model were defined based on of the following CD4 cell counts: â¥500(SI); 200 to 499(SII);
- COXâS PROPORTIONAL HAZARD MODEL AND CONSTRUCTION OF LIFE
TABLE FOR UNDER-FIVE
Abstract: A primary data of 836 eligible women in the age group of 15-49 years is used to determine the causal effects of covariates on under-five mortality. The eight covariates viz., Number of family Members (NHM), Type of Toilet Facility (TTF), Total Children ever Born (TCB), Parity (PAR), Duration of Breastfeeding (DBF), use Contraceptive (CMT), DPT and Ideal Number of Girl (ING) are considered as covariates of the study. By applying Coxâs regression analysis, six covariates viz., TTF, NHM, CMT, DBF, DPT and ING have substantially and significantly effect on under-five mortality. Further, a life table of under-five children under study is constructed using the estimate of survival function obtained from Coxâs regression model.
- COMPARISON OF FIVE EXACT CONFIDENCE INTERVALS FOR THE BINOMIAL PROPORTION
Abstract: The Wald interval is easy to calculate; it is often used as the confidence interval for binomial proportions. However, when using this confidence interval, the actual coverage probability often falls under the nominal coverage probability in small cases. On the other hand, several confidence intervals where the actual cover age probability does not fall under the nominal coverage probability are suggested. In this study, we intro-duce five exact confidence intervals where the actual coverage probability does not fall under the nominal coverage probability and we calculate the expected length of the confidence intervals and compare/verify the accuracy of the coverage probabilities. Further, we examined the characteristics of these five exact confidence intervals at length. Coverage probability of Sterne was significantly closer to 0.95 than the other confidence intervals and stable. Its expected Length are not scattered in the width compared with the other methods. As a result, we found that the quality of the confidence interval based on the Sterne test is its availability for small samples.
- COMBINATION OF STATISTICAL TECHNIQUES FOR SUBMERGED FERMENTATION FOR
EXTRACELLULAR POLYSACCHARIDE AND BIOMASS OF GANODERMA TSUGAE
Abstract: Biomass and extracellular polysaccharide of Ganoderma tsugae have various biological activity including anti-inflamatory activity, antioxidant activity and antitumor activity. However, the growth rate of G. tsugae in nature is very slow. Therefore, many studies have attempted to develop mass culture systems for G. tsugae using laboratory techniques. Many parameters of submerged fermentation for G. tsugae were studies to determine the optimization of process by combination of statistical techniques. Ten parameters from preliminary results and literature reviews (maltose, skim milk, KH2PO4+K2HPO4, MgSO4ï-7H2O, CaCO3, vitamin B5+B6, olive oil, ethanol, pH and shaking speed) were screened by Packett Berman design. The significant parameters were determined the optimal ranges by path of steepest ascent method. The optimal condition of process was performed by response surface method. Maltose, skim milk and pH are significant parameters for G. tsugae cultivation. The conditions of 31.031 g L-1 maltose, 14.055 g L-1 skim milk and an initial pH of 7.12 resulted in the maximum extracellular polysaccharide content of 415 mg L-1 and the same fermentation broth at an initial pH of 6.46 exhibited the most biomass at 15.776 g L-1. Finally, the optimal condition was compared with un-optimal condition which result indicates that the combination of statistical techniques enhance the productions of biomass and extracellular polysaccharide (13X and 1.5X of the control, respectively). Therefore, these strategies are useful for improvement of submerged fermentation of G. tsugae which it can apply in pharmaceutical industry.
- DOES OVER OR UNDER DISPERSION IN INVERSE BINOMIAL DATA SUGGEST
ANYTHING' A CASE IN POINT IS THE WAITING TIME FOR BOTH HEART-LUNG
Abstract: The model is an abstraction of the reality. The selection of the usual inverse binomial as an underlying model for the number of patients waiting in months for heart and lung transplant is questionable because the data exhibit not the required balance between the dispersion and its functional equivalent in terms of the mean but rather an over or under dispersion. This phenomenon of over/under dispersion has been a challenge to find an appropriate underlying model for the data. This article offers an innovative approach with a new model to resolve the methodological breakdown. The new model is named Imbalanced Inverse Binomial Model (IIBM). A statistical methodology is devised based on IIBM to analyze the collected data. The methodology is illustrated with a real life data on the number of patients waiting in months for heart and lung transplants together. The results in the illustration do convince that the new approach is quite powerful and brings out a lot more information which would have been missed otherwise. In specific, the odds of receiving the organs are higher under an estimated imbalance in the data than under an ideal zero imbalance in all the states except Alabama. The odds are consistently higher under an estimated imbalance in the data than under an ideal zero imbalance across all the age groups waiting in months. Further research work is needed to identify and explain the factors which might have caused the imbalance between the observed dispersion in the data and its functionally equivalent amount according to the underlying inverse binomial model for the data. The contents of this article remains the foundation on which the future research work will be built.
- INFORMATICS ABOUT FEAR TO REPORT RAPES USING BUMPED-UP POISSON MODEL
Abstract: The rape victims are frightened to report with a fear of retaliation or humiliation. Consequently, the number of reported rapes is under-estimated. How should the number of unreported rapes be identified is discussed in this article. For this purpose, the Poisson distribution is modified and it is named Bumped-up Poisson distribution in this article. Related probability-informatics are derived to estimate the unreported rapes and proportion fearing to report. A hypothesis testing procedure is developed to assess the significance of an estimated proportion fearing. Our approach is tried with the reported rapes during the years 2007 and 2008 in a random sample of nations in all the continents. Proximities among the nations are identified in rape incidences.
- MULTILEVEL ORDINAL RESPONSE MODELING OF TREND OF BREASTFEEDING INITIATION
Abstract: The amount of health benefits derived from breastfeeding is influenced by age of the child at initiation of the first breast milk, the duration and intensity of breastfeeding and age at which the child is introduced to supplementary foods and other liquids. In this study, the general trend of timing of breastfeeding initiation among nursing mothers in Nigeria between 1990 and 2003 is examined. The timing of initiation of the first breast milk to a child by her mother is measured in a three-level ordinal scale (immediately, within 24 h and days after birth) and the impacts of some socio-economic and maternal factors on this are determined. Results from this study revealed a significant improvement in the trend of early initiation of breast milk among Nigeria mothers between 1990 and 2003 (p
- SIMULATIONS ON SINGLE VACANCY DEFECT TRANSIENTS FOR FACE-CENTER-CUBIC
Abstract: The simulations on single vacancy defect transients for FCC structure were conducted to study the change in its final structure, especially the average atomic volume. The numerical code âALINEâ was employed for this purpose. The results obtained showed that when a single vacancy defect occurred in a perfect FCC-crystal structure, the average atomic volume was found to be suddenly increased and then gradually decreased down the value close to the initial value. This suggested that the FCC structure was able to expand and fill the volume originally occupied by the missing atom.
- Truncated Estimate in Log-Binomial Model: Algorithm and Simulation
Abstract: Problem statement: Relative risk has concrete meanings of comparing two groups and measuring the association between exposures and outcomes in medical and public health studies. Log-binomial model, using a log link function on binary outcomes, is straightforward to estimate risk ratios, whereas generates boundary problems. When the estimates are located near the boundary of constrained parameter space, common approaches or procedures using software such as R or SAS fail to converge. Approach: In this study we proposed a truncated algorithm to estimate relative risk using the log-binomial model. We used simulation studies on both single and multiple covariates models to investigate its performance and compare with other similar methods. Results: Our algorithm was shown to outperform other methods regarding precision, especially in high dimensional predictor space. Conclusion: The truncated IWLS method solves the slow convergence problem and provides valid estimates when previously proposed methods fail.
- Measures of Explained Variation and the Base-Rate Problem for Logistic
Abstract: Problem statement: Logistic regression, perhaps the most frequently used regression model after the General Linear Model (GLM), is extensively used in the field of medical science to analyze prognostic factors in studies of dichotomous outcomes. Unlike the GLM, many different proposals have been made to measure the explained variation in logistic regression analysis. One of the limitations of these measures is their dependency on the incidence of the event of interest in the population. This has clear disadvantage, especially when one seeks to compare the predictive ability of a set of prognostic factors in two subgroups of a population. Approach: The purpose of this article is to study the base-rate sensitivity of several R2 measures that have been proposed for use in logistic regression. We compared the base-rate sensitivity of thirteen R2 type parametric and nonparametric statistics. Since a theoretical comparison is not possible, a simulation study was conducted for this purpose. We used results from an existing dataset to simulate populations with different base-rates. Logistic models are generated using the covariate values from the dataset. Results: We found nonparametric R2 measures to be less sensitive to the base-rate as compared to their parametric counterpart. Logistic regression is a parametric tool and use of the nonparametric R2 may result inconsistent results. Among the parametric R2 measures, the likelihood ratio R2 appears to be least dependent on the base-rate and has relatively superior interpretability as a measure of explained variation. Conclusion/Recommendations: Some potential measures of explained variation are identified which tolerate fluctuations in base-rate reasonably well and at the same time provide a good estimate of the explained variation on an underlying continuous variable. It would be, however, misleading to draw strong conclusions based only on the conclusions of this research only.