Advances in Bioinformatics
[SJR: 0.421] [H-I: 8] [18 followers] Follow
Open Access journal
ISSN (Print) 1687-8027 - ISSN (Online) 1687-8035
Published by Hindawi [333 journals]
- An Efficient Approach in Analysis of DNA Base Calling Using Neural Fuzzy
Abstract: This paper presented the issues of true representation and a reliable measure for analyzing the DNA base calling is provided. The method implemented dealt with the data set quality in analyzing DNA sequencing, it is investigating solution of the problem of using Neurofuzzy techniques for predicting the confidence value for each base in DNA base calling regarding collecting the data for each base in DNA, and the simulation model of designing the ANFIS contains three subsystems and main system; obtain the three features from the subsystems and in the main system and use the three features to predict the confidence value for each base. This is achieving effective results with high performance in employment.
PubDate: Tue, 31 Jan 2017 05:55:29 +000
- Multiple Linear Regression for Reconstruction of Gene Regulatory Networks
in Solving Cascade Error Problems
Abstract: Gene regulatory network (GRN) reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR) to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C) as a direct interaction (A → C). Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5.
PubDate: Sun, 29 Jan 2017 11:42:15 +000
- Combined Docking with Classical Force Field and Quantum Chemical
Semiempirical Method PM7
Abstract: Results of the combined use of the classical force field and the recent quantum chemical PM7 method for docking are presented. Initially the gridless docking of a flexible low molecular weight ligand into the rigid target protein is performed with the energy function calculated in the MMFF94 force field with implicit water solvent in the PCM model. Among several hundred thousand local minima, which are found in the docking procedure, about eight thousand lowest energy minima are chosen and then energies of these minima are recalculated with the recent quantum chemical semiempirical PM7 method. This procedure is applied to 16 test complexes with different proteins and ligands. For almost all test complexes such energy recalculation results in the global energy minimum configuration corresponding to the ligand pose near the native ligand position in the crystalized protein-ligand complex. A significant improvement of the ligand positioning accuracy comparing with MMFF94 energy calculations is demonstrated.
PubDate: Mon, 16 Jan 2017 00:00:00 +000
- In Silico Analysis of SNPs in PARK2 and PINK1 Genes That Potentially Cause
Autosomal Recessive Parkinson Disease
Abstract: Introduction. Parkinson’s disease (PD) is a common neurodegenerative disorder. Mutations in PINK1 are the second most common agents causing autosomal recessive, early onset PD. We aimed to identify the pathogenic SNPs in PARK2 and PINK1 using in silico prediction software and their effect on the structure, function, and regulation of the proteins. Materials and Methods. We carried out in silico prediction of structural effect of each SNP using different bioinformatics tools to predict substitution influence on protein structure and function. Result. Twenty-one SNPs in PARK2 gene were found to affect transcription factor binding activity. 185 SNPs were found to affect splicing. Ten SNPs were found to affect the miRNA binding site. Two SNPs rs55961220 and rs56092260 affected the structure, function, and stability of Parkin protein. In PINK1 gene only one SNP (rs7349186) was found to affect the structure, function, and stability of the PINK1 protein. Ten SNPs were found to affect the microRNA binding site. Conclusion. Better understanding of Parkinson’s disease caused by mutations in PARK2 and PINK1 genes was achieved using in silico prediction. Further studies should be conducted with a special consideration of the ethnic diversity of the different populations.
PubDate: Thu, 29 Dec 2016 09:06:35 +000
- Prediction and In Silico Identification of Novel B-Cells and T-Cells
Epitopes in the S1-Spike Glycoprotein of M41 and CR88 (793/B) Infectious
Bronchitis Virus Serotypes for Application in Peptide Vaccines
Abstract: Bioinformatic analysis was used to predict antigenic B-cell and T-cell epitopes within the S1 glycoprotein of M41 and CR88 IBV strains. A conserved linear B-cell epitope peptide, , was identified in M41 IBV strains while three such epitopes types namely, , , and , were predicted in CR88 IBV strains. Analysis of MHCI binding peptides in M41 IBV strains revealed the presence of 15 antigenic peptides out of which 12 were highly conserved in 96–100% of the total M41 strains analysed. Interestingly three of these peptides, GGPITYKVM208, WFNSLSVSI356, and YLADAGLAI472, relatively had high antigenicity index (>1.0). On the other hand, 11 MHCI binding epitope peptides were identified in CR88 IBV strains. Of these, five peptides were found to be highly conserved with a range between 90% and 97%. However, WFNSLSVSL358, SYNISAASV88, and YNISAASVA89 peptides comparably showed high antigenicity scores (>1.0). Combination of antigenic B-cells and T-cells peptides that are conserved across many strains as approach to evoke humoral and CTL immune response will potentially lead to a broad-based vaccine that could reduce the challenges in using live attenuated vaccine technology in the control of IBV infection in poultry.
PubDate: Wed, 07 Sep 2016 07:57:19 +000
- In Silico Analysis of Gene Expression Network Components Underlying
Pigmentation Phenotypes in the Python Identified Evolutionarily Conserved
Clusters of Transcription Factor Binding Sites
Abstract: Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1) that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus. Our results provide insight into pigment phenotypes in pythons.
PubDate: Tue, 06 Sep 2016 13:53:23 +000
- Systematic Bioinformatic Approach for Prediction of Linear B-Cell Epitopes
on Dengue E and prM Protein
Abstract: B-cell epitopes on the envelope (E) and premembrane (prM) proteins of dengue virus (DENV) were predicted using bioinformatics tools, BepiPred, Ellipro, and SVMTriP. Predicted epitopes, 32 and 17 for E and prM proteins, respectively, were then characterized for their level of conservations. The epitopes, EP4/E (48–55), epitope number 4 of E protein at amino acids 48–55, EP9/E (165–182), EP11/E (218–233), EP20/E (322–349), EP21/E (326–353), EP23/E (356–365), and EP25/E (380–386), showed a high intraserotype conservancy with very low pan-serotype conservancy, demonstrating a potential target as serotype specific diagnostic markers. EP3 (30–41) located in domain-I and EP26/E (393–409), EP27/E (416–435), EP28/E (417–430) located in the stem region of E protein, and EP8/prM (93–112) from the prM protein have a pan-serotype conservancy higher than 70%. These epitopes indicate a potential use as universal vaccine candidates, subjected to verification of their potential in viral neutralization. EP2/E (16–21), EP5/E (62–123), EP6/E (63–89), EP19/E (310–329), and EP24/E (371–402), which have more than 50% pan-serotype conservancies, were found on E protein regions that are important in host cell attachment. Previous studies further show evidence for some of these epitopes to generate cross-reactive neutralizing antibodies, indicating their importance in antiviral strategies for DENV. This study suggests that bioinformatic approaches are attractive first line of screening for identification of linear B-cell epitopes.
PubDate: Thu, 01 Sep 2016 13:22:58 +000
- An Optimal Seed Based Compression Algorithm for DNA Sequences
Abstract: This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms.
PubDate: Sun, 31 Jul 2016 13:15:42 +000
- Bioinformatics Approach for Prediction of Functional Coding/Noncoding
Simple Polymorphisms (SNPs/Indels) in Human BRAF Gene
Abstract: This study was carried out for Homo sapiens single variation (SNPs/Indels) in BRAF gene through coding/non-coding regions. Variants data was obtained from database of SNP even last update of November, 2015. Many bioinformatics tools were used to identify functional SNPs and indels in proteins functions, structures and expressions. Results shown, for coding polymorphisms, 111 SNPs predicted as highly damaging and six other were less. For UTRs, showed five SNPs and one indel were altered in micro RNAs binding sites (3′ UTR), furthermore nil SNP or indel have functional altered in transcription factor binding sites (5′ UTR). In addition for 5′/3′ splice sites, analysis showed that one SNP within 5′ splice site and one Indel in 3′ splice site showed potential alteration of splicing. In conclude these previous functional identified SNPs and indels could lead to gene alteration, which may be directly or indirectly contribute to the occurrence of many diseases.
PubDate: Sun, 10 Jul 2016 09:07:07 +000
- Multiphase Simulated Annealing Based on Boltzmann and Bose-Einstein
Distribution Applied to Protein Folding Problem
Abstract: A new hybrid Multiphase Simulated Annealing Algorithm using Boltzmann and Bose-Einstein distributions (MPSABBE) is proposed. MPSABBE was designed for solving the Protein Folding Problem (PFP) instances. This new approach has four phases: (i) Multiquenching Phase (MQP), (ii) Boltzmann Annealing Phase (BAP), (iii) Bose-Einstein Annealing Phase (BEAP), and (iv) Dynamical Equilibrium Phase (DEP). BAP and BEAP are simulated annealing searching procedures based on Boltzmann and Bose-Einstein distributions, respectively. DEP is also a simulated annealing search procedure, which is applied at the final temperature of the fourth phase, which can be seen as a second Bose-Einstein phase. MQP is a search process that ranges from extremely high to high temperatures, applying a very fast cooling process, and is not very restrictive to accept new solutions. However, BAP and BEAP range from high to low and from low to very low temperatures, respectively. They are more restrictive for accepting new solutions. DEP uses a particular heuristic to detect the stochastic equilibrium by applying a least squares method during its execution. MPSABBE parameters are tuned with an analytical method, which considers the maximal and minimal deterioration of problem instances. MPSABBE was tested with several instances of PFP, showing that the use of both distributions is better than using only the Boltzmann distribution on the classical SA.
PubDate: Mon, 20 Jun 2016 09:55:17 +000
- FullSSR: Microsatellite Finder and Primer Designer
Abstract: Microsatellites are genomic sequences comprised of tandem repeats of short nucleotide motifs widely used as molecular markers in population genetics. FullSSR is a new bioinformatic tool for microsatellite (SSR) loci detection and primer design using genomic data from NGS assay. The software was tested with 2000 sequences of Oryza sativa shotgun sequencing project from the National Center of Biotechnology Information Trace Archive and with partial genome sequencing with ROCHE 454® from Caiman latirostris, Salvator merianae, Aegla platensis, and Zilchiopsis collastinensis. FullSSR performance was compared against other similar SSR search programs. The results of the use of this kind of approach depend on the parameters set by the user. In addition, results can be affected by the analyzed sequences because of differences among the genomes. FullSSR simplifies the detection of SSRs and primer design on a big data set. The command line interface of FullSSR was intended to be used as part of genomic analysis tools pipeline; however, it can be used as a stand-alone program because the results are easily interpreted for a nonexpert user.
PubDate: Mon, 06 Jun 2016 06:51:05 +000
- Agent-Based Deterministic Modeling of the Bone Marrow Homeostasis
Abstract: Modeling of stem cells not only describes but also predicts how a stem cell’s environment can control its fate. The first stem cell populations discovered were hematopoietic stem cells (HSCs). In this paper, we present a deterministic model of bone marrow (that hosts HSCs) that is consistent with several of the qualitative biological observations. This model incorporates stem cell death (apoptosis) after a certain number of cell divisions and also demonstrates that a single HSC can potentially populate the entire bone marrow. It also demonstrates that there is a production of sufficient number of differentiated cells (RBCs, WBCs, etc.). We prove that our model of bone marrow is biologically consistent and it overcomes the biological feasibility limitations of previously reported models. The major contribution of our model is the flexibility it allows in choosing model parameters which permits several different simulations to be carried out in silico without affecting the homeostatic properties of the model. We have also performed agent-based simulation of the model of bone marrow system proposed in this paper. We have also included parameter details and the results obtained from the simulation. The program of the agent-based simulation of the proposed model is made available on a publicly accessible website.
PubDate: Thu, 02 Jun 2016 14:24:41 +000
- Evaluation of Bioinformatic Programmes for the Analysis of Variants within
Splice Site Consensus Regions
Abstract: The increasing diagnostic use of gene sequencing has led to an expanding dataset of novel variants that lie within consensus splice junctions. The challenge for diagnostic laboratories is the evaluation of these variants in order to determine if they affect splicing or are merely benign. A common evaluation strategy is to use in silico analysis, and it is here that a number of programmes are available online; however, currently, there are no consensus guidelines on the selection of programmes or protocols to interpret the prediction results. Using a collection of 222 pathogenic mutations and 50 benign polymorphisms, we evaluated the sensitivity and specificity of four in silico programmes in predicting the effect of each variant on splicing. The programmes comprised Human Splice Finder (HSF), Max Entropy Scan (MES), NNSplice, and ASSP. The MES and ASSP programmes gave the highest performance based on Receiver Operator Curve analysis, with an optimal cut-off of score reduction of 10%. The study also showed that the sensitivity of prediction is affected by the level of conservation of individual positions, with in silico predictions for variants at positions 4 and +7 within consensus splice sites being largely uninformative.
PubDate: Tue, 24 May 2016 11:20:45 +000
- A Support Vector Machine Classification of Thyroid Bioptic Specimens Using
Abstract: Biomarkers able to characterise and predict multifactorial diseases are still one of the most important targets for all the “omics” investigations. In this context, Matrix-Assisted Laser Desorption/Ionisation-Mass Spectrometry Imaging (MALDI-MSI) has gained considerable attention in recent years, but it also led to a huge amount of complex data to be elaborated and interpreted. For this reason, computational and machine learning procedures for biomarker discovery are important tools to consider, both to reduce data dimension and to provide predictive markers for specific diseases. For instance, the availability of protein and genetic markers to support thyroid lesion diagnoses would impact deeply on society due to the high presence of undetermined reports (THY3) that are generally treated as malignant patients. In this paper we show how an accurate classification of thyroid bioptic specimens can be obtained through the application of a state-of-the-art machine learning approach (i.e., Support Vector Machines) on MALDI-MSI data, together with a particular wrapper feature selection algorithm (i.e., recursive feature elimination). The model is able to provide an accurate discriminatory capability using only 20 out of 144 features, resulting in an increase of the model performances, reliability, and computational efficiency. Finally, tissue areas rather than average proteomic profiles are classified, highlighting potential discriminating areas of clinical interest.
PubDate: Tue, 17 May 2016 13:58:40 +000
- Expressing Redundancy among Linear-Epitope Sequence Data Based on
Residue-Level Physicochemical Similarity in the Context of Antigenic
Abstract: Epitope-based design of vaccines, immunotherapeutics, and immunodiagnostics is complicated by structural changes that radically alter immunological outcomes. This is obscured by expressing redundancy among linear-epitope data as fractional sequence-alignment identity, which fails to account for potentially drastic loss of binding affinity due to single-residue substitutions even where these might be considered conservative in the context of classical sequence analysis. From the perspective of immune function based on molecular recognition of epitopes, functional redundancy of epitope data (FRED) thus may be defined in a biologically more meaningful way based on residue-level physicochemical similarity in the context of antigenic cross-reaction, with functional similarity between epitopes expressed as the Shannon information entropy for differential epitope binding. Such similarity may be estimated in terms of structural differences between an immunogen epitope and an antigen epitope with reference to an idealized binding site of high complementarity to the immunogen epitope, by analogy between protein folding and ligand-receptor binding; but this underestimates potential for cross-reactivity, suggesting that epitope-binding site complementarity is typically suboptimal as regards immunologic specificity. The apparently suboptimal complementarity may reflect a tradeoff to attain optimal immune function that favors generation of immune-system components each having potential for cross-reactivity with a variety of epitopes.
PubDate: Wed, 04 May 2016 09:57:04 +000
- Ebolavirus Database: Gene and Protein Information Resource for
Abstract: Ebola Virus Disease (EVD) is a life-threatening haemorrhagic fever in humans. Even though there are many reports on EVD, the protein precursor functions and virulent factors of ebolaviruses remain poorly understood. Comparative analyses of Ebolavirus genomes will help in the identification of these important features. This prompted us to develop the Ebolavirus Database (EDB) and we have provided links to various tools that will aid researchers to locate important regions in both the genomes and proteomes of Ebolavirus. The genomic analyses of ebolaviruses will provide important clues for locating the essential and core functional genes. The aim of EDB is to act as an integrated resource for ebolaviruses and we strongly believe that the database will be a useful tool for clinicians, microbiologists, health care workers, and bioscience researchers.
PubDate: Thu, 14 Apr 2016 11:54:54 +000
- Feature Selection Has a Large Impact on One-Class Classification Accuracy
for MicroRNAs in Plants
Abstract: MicroRNAs (miRNAs) are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of ~95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection.
PubDate: Tue, 12 Apr 2016 14:00:26 +000
- Molecular Docking and In Silico ADMET Study Reveals Acylguanidine 7a as a
Potential Inhibitor of β-Secretase
Abstract: Amyloidogenic pathway in Alzheimer’s disease (AD) involves breakdown of APP by β-secretase followed by γ-secretase and results in formation of amyloid beta plaque. β-secretase has been a promising target for developing novel anti-Alzheimer drugs. To test different molecules for this purpose, test ligands like acylguanidine 7a, rosiglitazone, pioglitazone, and tartaric acid were docked against our target protein β-secretase enzyme retrieved from Protein Data Bank, considering MK-8931 (phase III trial, Merck) as the positive control. Docking revealed that, with respect to their free binding energy, acylguanidine 7a has the lowest binding energy followed by MK-8931 and pioglitazone and binds significantly to β-secretase. In silico ADMET predictions revealed that except tartaric acid all other compounds had minimal toxic effects and had good absorption as well as solubility characteristics. These compounds may serve as potential lead compound for developing new anti-Alzheimer drug.
PubDate: Sun, 10 Apr 2016 09:49:34 +000
- Robust Feature Selection from Microarray Data Based on Cooperative Game
Theory and Qualitative Mutual Information
Abstract: High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI) for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.
PubDate: Sun, 20 Mar 2016 09:47:50 +000
- Efficacy and Toxicity Assessment of Different Antibody Based
Antiangiogenic Drugs by Computational Docking Method
Abstract: Bevacizumab and trastuzumab are two antibody based antiangiogenic drugs that are in clinical practice for the treatment of different cancers. Presently applications of these drugs are based on the empirical choice of clinical experts that follow towards population based clinical trials and, hence, their molecular efficacies in terms of quantitative estimates are not being explored. Moreover, different clinical trials with these drugs showed different toxicity symptoms in patients. Here, using molecular docking study, we made an attempt to reveal the molecular rationale regarding their efficacy and off-target toxicity. Though our study reinforces their antiangiogenic potentiality and, among the two, trastuzumab has much higher efficacy; however, this study also reveals that compared to bevacizumab, trastuzumab has higher toxicity effect, specially on the cardiovascular system. This study also reveals the molecular rationale of ocular dysfunction by antiangiogenic drugs. The molecular rationale of toxicity as revealed in this study may help in the judicious choice as well as therapeutic scheduling of these drugs in different cancers.
PubDate: Mon, 07 Mar 2016 12:07:42 +000
- Structural Dynamics of Human Argonaute2 and Its Interaction with siRNAs
Designed to Target Mutant tdp43
Abstract: The human Argonaute2 protein (Ago2) is a key player in RNA interference pathway and small RNA recognition by Ago2 is the crucial step in siRNA mediated gene silencing mechanism. The present study highlights the structural and functional dynamics of human Ago2 and the interaction mechanism of Ago2 with a set of seven siRNAs for the first time. The human Ago2 protein adopts two conformations such as “open” and “close” during the simulation of 25 ns. One of the domains named as PAZ, which is responsible for anchoring the 3′-end of siRNA guide strand, is observed as a highly flexible region. The interaction between Ago2 and siRNA, analyzed using a set of siRNAs (targeting at positions 128, 251, 341, 383, 537, 1113, and 1115 of mRNA) designed to target tdp43 mutants causing Amyotrophic Lateral Sclerosis (ALS) disease, revealed the stable and strong recognition of siRNA by the Ago2 protein during dynamics. Among the studied siRNAs, the siRNA341 is identified as a potent siRNA to recognize Ago2 and hence could be used further as a possible siRNA candidate to target the mutant tdp43 protein for the treatment of ALS patients.
PubDate: Sun, 06 Mar 2016 12:52:41 +000
- Random versus Deterministic Descent in RNA Energy Landscape Analysis
Abstract: Identifying sets of metastable conformations is a major research topic in RNA energy landscape analysis, and recently several methods have been proposed for finding local minima in landscapes spawned by RNA secondary structures. An important and time-critical component of such methods is steepest, or gradient, descent in attraction basins of local minima. We analyse the speed-up achievable by randomised descent in attraction basins in the context of large sample sets where the size has an order of magnitude in the region of ~106. While the gain for each individual sample might be marginal, the overall run-time improvement can be significant. Moreover, for the two nongradient methods we analysed for partial energy landscapes induced by ten different RNA sequences, we obtained that the number of observed local minima is on average larger by 7.3% and 3.5%, respectively. The run-time improvement is approximately 16.6% and 6.8% on average over the ten partial energy landscapes. For the large sample size we selected for descent procedures, the coverage of local minima is very high up to energy values of the region where the samples were randomly selected from the partial energy landscapes; that is, the difference to the total set of local minima is mainly due to the upper area of the energy landscapes.
PubDate: Wed, 02 Mar 2016 06:49:32 +000
- BacHbpred: Support Vector Machine Methods for the Prediction of Bacterial
Abstract: The recent upsurge in microbial genome data has revealed that hemoglobin-like (HbL) proteins may be widely distributed among bacteria and that some organisms may carry more than one HbL encoding gene. However, the discovery of HbL proteins has been limited to a small number of bacteria only. This study describes the prediction of HbL proteins and their domain classification using a machine learning approach. Support vector machine (SVM) models were developed for predicting HbL proteins based upon amino acid composition (AC), dipeptide composition (DC), hybrid method (AC + DC), and position specific scoring matrix (PSSM). In addition, we introduce for the first time a new prediction method based on max to min amino acid residue (MM) profiles. The average accuracy, standard deviation (SD), false positive rate (FPR), confusion matrix, and receiver operating characteristic (ROC) were analyzed. We also compared the performance of our proposed models in homology detection databases. The performance of the different approaches was estimated using fivefold cross-validation techniques. Prediction accuracy was further investigated through confusion matrix and ROC curve analysis. All experimental results indicate that the proposed BacHbpred can be a perspective predictor for determination of HbL related proteins. BacHbpred, a web tool, has been developed for HbL prediction.
PubDate: Mon, 29 Feb 2016 17:23:34 +000
- In Silico Approach for SAR Analysis of the Predicted Model of DEPDC1B: A
Novel Target for Oral Cancer
Abstract: With the incidence rate of oral carcinogenesis increasing in the Southeast-Asian countries, due to increase in the consumption of tobacco and betel quid as well as infection from human papillomavirus, specifically type 16, it becomes crucial to predict the transition of premalignant lesion to cancerous tissue at an initial stage in order to control the process of oncogenesis. DEPDC1B, downregulated in the presence of E2 protein, was recently found to be overexpressed in oral cancer, which can possibly be explained by the disruption of the E2 open reading frame upon the integration of viral genome into the host genome. DEPDC1B mediates its effect by directly interacting with Rac1 protein, which is known to regulate important cell signaling pathways. Therefore, DEPDC1B can be a potential biomarker as well as a therapeutic target for diagnosing and curing the disease. However, the lack of 3D model of the structure makes the utilization of DEPDC1B as a therapeutic target difficult. The present study focuses on the prediction of a suitable 3D model of the protein as well as the analysis of protein-protein interaction between DEPDC1B and Rac1 protein using PatchDock web server along with the identification of allosteric or regulatory sites of DEPDC1B.
PubDate: Mon, 29 Feb 2016 14:19:51 +000
- Large-Scale Recurrent Neural Network Based Modelling of Gene Regulatory
Network Using Cuckoo Search-Flower Pollination Algorithm
Abstract: The accurate prediction of genetic networks using computational tools is one of the greatest challenges in the postgenomic era. Recurrent Neural Network is one of the most popular but simple approaches to model the network dynamics from time-series microarray data. To date, it has been successfully applied to computationally derive small-scale artificial and real-world genetic networks with high accuracy. However, they underperformed for large-scale genetic networks. Here, a new methodology has been proposed where a hybrid Cuckoo Search-Flower Pollination Algorithm has been implemented with Recurrent Neural Network. Cuckoo Search is used to search the best combination of regulators. Moreover, Flower Pollination Algorithm is applied to optimize the model parameters of the Recurrent Neural Network formalism. Initially, the proposed method is tested on a benchmark large-scale artificial network for both noiseless and noisy data. The results obtained show that the proposed methodology is capable of increasing the inference of correct regulations and decreasing false regulations to a high degree. Secondly, the proposed methodology has been validated against the real-world dataset of the DNA SOS repair network of Escherichia coli. However, the proposed method sacrifices computational time complexity in both cases due to the hybrid optimization process.
PubDate: Tue, 16 Feb 2016 12:51:16 +000
- Inhibition of Mycobacterium-RmlA by Molecular Modeling, Dynamics
Simulation, and Docking
Abstract: The increasing resistance to anti-tb drugs has enforced strategies for finding new drug targets against Mycobacterium tuberculosis (Mtb). In recent years enzymes associated with the rhamnose pathway in Mtb have attracted attention as drug targets. The present work is on α-D-glucose-1-phosphate thymidylyltransferase (RmlA), the first enzyme involved in the biosynthesis of L-rhamnose, of Mtb cell wall. This study aims to derive a 3D structure of RmlA by using a comparative modeling approach. Structural refinement and energy minimization of the built model have been done with molecular dynamics. The reliability assessment of the built model was carried out with various protein checking tools such as Procheck, Whatif, ProsA, Errat, and Verify 3D. The obtained model investigates the relation between the structure and function. Molecular docking interactions of Mtb-RmlA with modified EMB (ethambutol) ligands and natural substrate have revealed specific key residues Arg13, Lys23, Asn109, and Thr223 which play an important role in ligand binding and selection. Compared to all EMB ligands, EMB-1 has shown better interaction with Mtb-RmlA model. The information thus discussed above will be useful for the rational design of safe and effective inhibitors specific to RmlA enzyme pertaining to the treatment of tuberculosis.
PubDate: Sun, 14 Feb 2016 09:30:49 +000
- In Silico Phylogenetic Analysis and Molecular Modelling Study of
Abstract: 2-Haloalkanoic acid dehalogenase enzymes have broad range of applications, starting from bioremediation to chemical synthesis of useful compounds that are widely distributed in fungi and bacteria. In the present study, a total of 81 full-length protein sequences of 2-haloalkanoic acid dehalogenase from bacteria and fungi were retrieved from NCBI database. Sequence analysis such as multiple sequence alignment (MSA), conserved motif identification, computation of amino acid composition, and phylogenetic tree construction were performed on these primary sequences. From MSA analysis, it was observed that the sequences share conserved lysine (K) and aspartate (D) residues in them. Also, phylogenetic tree indicated a subcluster comprised of both fungal and bacterial species. Due to nonavailability of experimental 3D structure for fungal 2-haloalkanoic acid dehalogenase in the PDB, molecular modelling study was performed for both fungal and bacterial sources of enzymes present in the subcluster. Further structural analysis revealed a common evolutionary topology shared between both fungal and bacterial enzymes. Studies on the buried amino acids showed highly conserved Leu and Ser in the core, despite variation in their amino acid percentage. Additionally, a surface exposed tryptophan was conserved in all of these selected models.
PubDate: Wed, 06 Jan 2016 11:58:08 +000
- FN-Identify: Novel Restriction Enzymes-Based Method for Bacterial
Identification in Absence of Genome Sequencing
Abstract: Sequencing and restriction analysis of genes like 16S rRNA and HSP60 are intensively used for molecular identification in the microbial communities. With aid of the rapid progress in bioinformatics, genome sequencing became the method of choice for bacterial identification. However, the genome sequencing technology is still out of reach in the developing countries. In this paper, we propose FN-Identify, a sequencing-free method for bacterial identification. FN-Identify exploits the gene sequences data available in GenBank and other databases and the two algorithms that we developed, CreateScheme and GeneIdentify, to create a restriction enzyme-based identification scheme. FN-Identify was tested using three different and diverse bacterial populations (members of Lactobacillus, Pseudomonas, and Mycobacterium groups) in an in silico analysis using restriction enzymes and sequences of 16S rRNA gene. The analysis of the restriction maps of the members of three groups using the fragment numbers information only or along with fragments sizes successfully identified all of the members of the three groups using a minimum of four and maximum of eight restriction enzymes. Our results demonstrate the utility and accuracy of FN-Identify method and its two algorithms as an alternative method that uses the standard microbiology laboratories techniques when the genome sequencing is not available.
PubDate: Thu, 31 Dec 2015 06:00:15 +000
- Local Mutational Pressures in Genomes of Zaire Ebolavirus and Marburg
Abstract: Heterogeneities in nucleotide content distribution along the length of Zaire ebolavirus and Marburg virus genomes have been analyzed. Results showed that there is asymmetric mutational A-pressure in the majority of Zaire ebolavirus genes; there is mutational AC-pressure in the coding region of the matrix protein VP40, probably, caused by its high expression at the end of the infection process; there is also AC-pressure in the 3′-part of the nucleoprotein (NP) coding gene associated with low amount of secondary structure formed by the 3′-part of its mRNA; in the middle of the glycoprotein (GP) coding gene that kind of mutational bias is linked with the high amount of secondary structure formed by the corresponding fragment of RNA negative (−) strand; there is relatively symmetric mutational AU-pressure in the polymerase (Pol) coding gene caused by its low expression level. In Marburg virus all genes, including C-rich fragment of GP coding region, demonstrate asymmetric mutational A-bias, while the last gene (Pol) demonstrates more symmetric mutational AU-pressure. The hypothesis of a newly synthesized RNA negative (−) strand shielding by complementary fragments of mRNAs has been described in this work: shielded fragments of RNA negative (−) strand should be better protected from oxidative damage and prone to ADAR-editing.
PubDate: Sun, 20 Dec 2015 12:55:49 +000
- HBS-Tools for Hairpin Bisulfite Sequencing Data Processing and Analysis
Abstract: The emerging genome-wide hairpin bisulfite sequencing (hairpin-BS-Seq) technique enables the determination of the methylation pattern for DNA double strands simultaneously. Compared with traditional bisulfite sequencing (BS-Seq) techniques, hairpin-BS-Seq can determine methylation fidelity and increase mapping efficiency. However, no computational tool has been designed for the analysis of hairpin-BS-Seq data yet. Here we present HBS-tools, a set of command line based tools for the preprocessing, mapping, methylation calling, and summarizing of genome-wide hairpin-BS-Seq data. It accepts paired-end hairpin-BS-Seq reads to recover the original (pre-bisulfite-converted) sequences using global alignment and then calls the methylation statuses for cytosines on both DNA strands after mapping the original sequences to the reference genome. After applying to hairpin-BS-Seq datasets, we found that HBS-tools have a reduced mapping time and improved mapping efficiency compared with state-of-the-art mapping tools. The HBS-tools source scripts, along with user guide and testing data, are freely available for download.
PubDate: Sun, 20 Dec 2015 08:55:29 +000