Hybrid journal (It can contain Open Access articles) ISSN (Print) 1467-5463 - ISSN (Online) 1477-4054 Published by Oxford University Press[397 journals]
Authors:Niu S; Yang J, McDermaid A, et al. First page: 360 Abstract: Briefings in Bioinformatics, 2017. https://doi.org/10.1093/bib/bbx051 PubDate: Thu, 22 Feb 2018 00:00:00 GMT DOI: 10.1093/bib/bby012
Authors:Aghaee-Bakhtiari S; Arefian E, Lau P. First page: 254 Abstract: Recent discovery of thousands of small and large noncoding RNAs, in parallel to technical improvements enabling scientists to study the transcriptome in much higher depth, has resulted in massive data generation. This burst of information prompts the development of easily accessible resources for storage, retrieval and analysis of raw and processed data, and hundreds of Web-based tools dedicated to these tasks have been made available. However, the increasing number and diversity of bioinformatics tools, each covering a specific and specialized area, as well as their redundancies, represent potential sources of complication for end users. To overcome these issues, we are introducing an easy-to-follow classification of microRNA (miRNA)-related bioinformatics tools for biologists interested in studying this important class of small noncoding RNAs. We also developed our miRNA database miRNA algorithmic network database (miRandb) that is a meta-database, which presents a survey of > 180 Web-based miRNA databases. These include miRNA sequence, discovery, target prediction, target validation, expression and regulation, functions and their roles in diseases, interactions in cellular pathways and networks and deep sequencing. miRandb recapitulates the diverse possibilities and facilitates that access to the different categories of miRNA resources. Researchers can easily select the category of miRNA information and desired organism, in result eligible databases with their features are presented. This database introducing an easy-to-follow classification of available resources that can facilitate selection of appropriate resources for miRNA-related bioinformatics tools. Finally, we described current shortages and future necessities that assist researchers to use these tools easily. Our database is accessible at http://mirandb.ir. PubDate: Tue, 03 Jan 2017 00:00:00 GMT DOI: 10.1093/bib/bbw109
Authors:Churkin A; Retwitzer M, Reinharz V, et al. First page: 350 Abstract: Computational programs for predicting RNA sequences with desired folding properties have been extensively developed and expanded in the past several years. Given a secondary structure, these programs aim to predict sequences that fold into a target minimum free energy secondary structure, while considering various constraints. This procedure is called inverse RNA folding. Inverse RNA folding has been traditionally used to design optimized RNAs with favorable properties, an application that is expected to grow considerably in the future in light of advances in the expanding new fields of synthetic biology and RNA nanostructures. Moreover, it was recently demonstrated that inverse RNA folding can successfully be used as a valuable preprocessing step in computational detection of novel noncoding RNAs. This review describes the most popular freeware programs that have been developed for such purposes, starting from RNAinverse that was devised when formulating the inverse RNA folding problem. The most recently published ones that consider RNA secondary structure as input are antaRNA, RNAiFold and incaRNAfbinv, each having different features that could be beneficial to specific biological problems in practice. The various programs also use distinct approaches, ranging from ant colony optimization to constraint programming, in addition to adaptive walk, simulated annealing and Boltzmann sampling. This review compares between the various programs and provides a simple description of the various possibilities that would benefit practitioners in selecting the most suitable program. It is geared for specific tasks requiring RNA design based on input secondary structure, with an outlook toward the future of RNA design programs. PubDate: Tue, 03 Jan 2017 00:00:00 GMT DOI: 10.1093/bib/bbw120
Authors:van Gelder C; Hooft R, van Rijswijk M, et al. First page: 359 Abstract: Briefings in Bioinformatics, 2017. https://doi.org/10.1093/bib/bbx087 PubDate: Fri, 15 Dec 2017 00:00:00 GMT DOI: 10.1093/bib/bbx171
Authors:Bauer D; Zadoorian A, Wilson L, et al. First page: 179 Abstract: MotivationDespite being essential for numerous clinical and research applications, high-resolution human leukocyte antigen (HLA) typing remains challenging and laboratory tests are also time-consuming and labour intensive. With next-generation sequencing data becoming widely accessible, on-demand in silico HLA typing offers an economical and efficient alternative.ResultsIn this study we evaluate the HLA typing accuracy and efficiency of five computational HLA typing methods by comparing their predictions against a curated set of > 1000 published polymerase chain reaction-derived HLA genotypes on three different data sets (whole genome sequencing, whole exome sequencing and transcriptomic sequencing data). The highest accuracy at clinically relevant resolution (four digits) we observe is 81% on RNAseq data by PHLAT and 99% accuracy by OptiType when limited to Class I genes only. We also observed variability between the tools for resource consumption, with runtime ranging from an average of 5 h (HLAminer) to 7 min (seq2HLA) and memory from 12.8 GB (HLA-VBSeq) to 0.46 GB (HLAminer) per sample.While a minimal coverage is required, other factors also determine prediction accuracy and the results between tools do not correlate well. Therefore, by combining tools, there is the potential to develop a highly accurate ensemble method that is able to deliver fast, economical HLA typing from existing sequencing data. PubDate: Mon, 31 Oct 2016 00:00:00 GMT DOI: 10.1093/bib/bbw097
Authors:Guo S; Zhou Y, Zeng P, et al. First page: 188 Abstract: Tremendous differences between human sexes are universally observed. Therefore, identifying and analyzing the sex-biased genes are becoming basically important for uncovering the mystery of sex differences and personalized medicine. Here, we presented a computational method to identify sex-biased genes from public gene expression databases. We obtained 1407 female-biased genes (FGs) and 1096 male-biased genes (MGs) across 14 different tissues. Bioinformatics analysis revealed that compared with MGs, FGs have higher evolutionary rate, higher single-nucleotide polymorphism density, less homologous gene numbers and smaller phyletic age. FGs have lower expression level, higher tissue specificity and later expressed stage in body development. Moreover, FGs are highly involved in immune-related functions, whereas MGs are more enriched in metabolic process. In addition, cellular network analysis revealed that MGs have higher degree, more cellular activating signaling and tend to be located in cellular inner space, whereas FGs have lower degree, more cellular repressing signaling and tend to be located in cellular outer space. Finally, the identified sex-biased genes and the discovered biological insights together can be a valuable resource helpful for investigating sex-biased physiology and medicine, for example sex-biased disease diagnosis and therapy, which represents one important aspect of personalized and precision medicine. PubDate: Wed, 21 Dec 2016 00:00:00 GMT DOI: 10.1093/bib/bbw125
Authors:Weirick T; Militello G, Ponomareva Y, et al. First page: 199 Abstract: To meet the increasing demand in the field, numerous long noncoding RNA (lncRNA) databases are available. Given many lncRNAs are specifically expressed in certain cell types and/or time-dependent manners, most lncRNA databases fall short of providing such profiles. We developed a strategy using logic programming to handle the complex organization of organs, their tissues and cell types as well as gender and developmental time points. To showcase this strategy, we introduce ‘RenalDB’ (http://renaldb.uni-frankfurt.de), a database providing expression profiles of RNAs in major organs focusing on kidney tissues and cells. RenalDB uses logic programming to describe complex anatomy, sample metadata and logical relationships defining expression, enrichment or specificity. We validated the content of RenalDB with biological experiments and functionally characterized two long intergenic noncoding RNAs: LOC440173 is important for cell growth or cell survival, whereas PAXIP1-AS1 is a regulator of cell death. We anticipate RenalDB will be used as a first step toward functional studies of lncRNAs in the kidney. PubDate: Wed, 07 Dec 2016 00:00:00 GMT DOI: 10.1093/bib/bbw117
Authors:Mohammed Y; Palmblad M. First page: 210 Abstract: In mass spectrometry-based proteomics, peptides are typically identified from tandem mass spectra using spectrum comparison. A sequence search engine compares experimentally obtained spectra with those predicted from protein sequences, applying enzyme cleavage and fragmentation rules. To this, there are two main alternatives: spectral libraries and de novo sequencing. The former compares measured spectra with a collection of previously acquired and identified spectra in a library. De novo attempts to sequence peptides from the tandem mass spectra alone. We here present a theoretical framework and a data processing workflow for visualizing and comparing the results of these different types of algorithms. The method considers the three search strategies as different dimensions, identifies distinct agreement classes and visualizes the complementarity of the search strategies. We have included X! Tandem, SpectraST and PepNovo, as they are in common use and representative for algorithms of each type. Our method allows advanced investigation of how the three search methods perform relatively to each other and shows the impact of the currently used decoy sequences for evaluating the false discovery rates. PubDate: Mon, 05 Dec 2016 00:00:00 GMT DOI: 10.1093/bib/bbw115
Authors:Wuyun Q; Zheng W, Peng Z, et al. First page: 219 Abstract: Sequence-based prediction of residue–residue contact in proteins becomes increasingly more important for improving protein structure prediction in the big data era. In this study, we performed a large-scale comparative assessment of 15 locally installed contact predictors. To assess these methods, we collected a big data set consisting of 680 nonredundant proteins covering different structural classes and target difficulties. We investigated a wide range of factors that may influence the precision of contact prediction, including target difficulty, structural class, the alignment depth and distribution of contact pairs in a protein structure. We found that: (1) the machine learning-based methods outperform the direct-coupling-based methods for short-range contact prediction, while the latter are significantly better for long-range contact prediction. The consensus-based methods, which combine machine learning and direct-coupling methods, perform the best. (2) The target difficulty does not have clear influence on the machine learning-based methods, while it does affect the direct-coupling and consensus-based methods significantly. (3) The alignment depth has relatively weak effect on the machine learning-based methods. However, for the direct-coupling-based methods and consensus-based methods, the predicted contacts for targets with deeper alignment tend to be more accurate. (4) All methods perform relatively better on β and α + β proteins than on α proteins. (5) Residues buried in the core of protein structure are more prone to be in contact than residues on the surface (22 versus 6%). We believe these are useful results for guiding future development of new approach to contact prediction. PubDate: Mon, 31 Oct 2016 00:00:00 GMT DOI: 10.1093/bib/bbw106
Authors:Chen J; Guo M, Wang X, et al. First page: 231 Abstract: Protein remote homology detection is one of the most fundamental and central problems for the studies of protein structures and functions, aiming to detect the distantly evolutionary relationships among proteins via computational methods. During the past decades, many computational approaches have been proposed to solve this important task. These methods have made a substantial contribution to protein remote homology detection. Therefore, it is necessary to give a comprehensive review and comparison on these computational methods. In this article, we divide these computational approaches into three categories, including alignment methods, discriminative methods and ranking methods. Their advantages and disadvantages are discussed in a comprehensive perspective, and their performance is compared on widely used benchmark data sets. Finally, some open questions in this field are further explored and discussed. PubDate: Sun, 13 Nov 2016 00:00:00 GMT DOI: 10.1093/bib/bbw108
Authors:Guo L; Liang T. First page: 245 Abstract: Multiple microRNA (miRNA) variant (isomiR) sequences have been identified at miRNA loci, suggesting that the miRNA sequence is not a single sequence but a series of isomiR sequences with sequence and expression heterogeneities. These isomiRs may be considered a large gene family with diverse expression patterns or a mini-gene cluster with high levels of sequence similarity. Although the isomiRs are diverse and have potentially coordinated relationships in regulatory networks via isomiR–isomiR interactions, they are largely unstudied. External interactions with other RNAs also enrich the cross-talk across different RNA molecules. In the present study, we describe miRNAs/isomiRs and their interactions, and methods and platforms. Interactions with small RNAs may be an internal regulatory pattern and an effective means of achieving synergistic regulation, which provides a new angle to explore the small RNA world. PubDate: Fri, 23 Dec 2016 00:00:00 GMT DOI: 10.1093/bib/bbw124
Authors:Madani Tonekaboni S; Soltan Ghoraie L, Manem V, et al. First page: 263 Abstract: Drug combinations have been proposed as a promising therapeutic strategy to overcome drug resistance and improve efficacy of monotherapy regimens in cancer. This strategy aims at targeting multiple components of this complex disease. Despite the increasing number of drug combinations in use, many of them were empirically found in the clinic, and the molecular mechanisms underlying these drug combinations are often unclear. These challenges call for rational, systematic approaches for drug combination discovery. Although high-throughput screening of single-agent therapeutics has been successfully implemented, it is not feasible to test all possible drug combinations, even for a reduced subset of anticancer drugs. Hence, in vitro and in vivo screening of a large number of drug combinations are not practical. Therefore, devising computational methods to efficiently explore the space of drug combinations and to discover efficacious combinations has attracted a lot of attention from the scientific community in the past few years. Nevertheless, in the absence of consensus regarding the computational approaches used to predict efficacious drug combinations, a plethora of methods, techniques and hypotheses have been developed to date, while the research field lacks an elaborate categorization of the existing computational methods and the available data sources. In this manuscript, we review and categorize the state-of-the-art computational approaches for drug combination prediction, and elaborate on the limitations of these methods and the existing challenges. We also discuss about the recent pan-cancer drug combination data sets and their importance in revising the available methods or developing more performant approaches. PubDate: Tue, 15 Nov 2016 00:00:00 GMT DOI: 10.1093/bib/bbw104
Authors:Paricharak S; Méndez-Lucio O, Chavan Ravindranath A, et al. First page: 277 Abstract: High-throughput screening (HTS) campaigns are routinely performed in pharmaceutical companies to explore activity profiles of chemical libraries for the identification of promising candidates for further investigation. With the aim of improving hit rates in these campaigns, data-driven approaches have been used to design relevant compound screening collections, enable effective hit triage and perform activity modeling for compound prioritization. Remarkable progress has been made in the activity modeling area since the recent introduction of large-scale bioactivity-based compound similarity metrics. This is evidenced by increased hit rates in iterative screening strategies and novel insights into compound mode of action obtained through activity modeling. Here, we provide an overview of the developments in data-driven approaches, elaborate on novel activity modeling techniques and screening paradigms explored and outline their significance in HTS. PubDate: Thu, 27 Oct 2016 00:00:00 GMT DOI: 10.1093/bib/bbw105
Authors:Manzoni C; Kia D, Vandrovcova J, et al. First page: 286 Abstract: Advances in the technologies and informatics used to generate and process large biological data sets (omics data) are promoting a critical shift in the study of biomedical sciences. While genomics, transcriptomics and proteinomics, coupled with bioinformatics and biostatistics, are gaining momentum, they are still, for the most part, assessed individually with distinct approaches generating monothematic rather than integrated knowledge. As other areas of biomedical sciences, including metabolomics, epigenomics and pharmacogenomics, are moving towards the omics scale, we are witnessing the rise of inter-disciplinary data integration strategies to support a better understanding of biological systems and eventually the development of successful precision medicine. This review cuts across the boundaries between genomics, transcriptomics and proteomics, summarizing how omics data are generated, analysed and shared, and provides an overview of the current strengths and weaknesses of this global approach. This work intends to target students and researchers seeking knowledge outside of their field of expertise and fosters a leap from the reductionist to the global-integrative analytical approach in research. PubDate: Tue, 22 Nov 2016 00:00:00 GMT DOI: 10.1093/bib/bbw114
Authors:Mc Auley M; Mooney K, Salcedo-Sora J. First page: 303 Abstract: Dietary folates have a key role to play in health, as deficiencies in the intake of these B vitamins have been implicated in a wide variety of clinical conditions. The reason for this is folates function as single carbon donors in the synthesis of methionine and nucleotides. Moreover, folates have a vital role to play in the epigenetics of mammalian cells by supplying methyl groups for DNA methylation reactions. Intriguingly, a growing body of experimental evidence suggests that DNA methylation status could be a central modulator of the ageing process. This has important health implications because the methylation status of the human genome could be used to infer age-related disease risk. Thus, it is imperative we further our understanding of the processes which underpin DNA methylation and how these intersect with folate metabolism and ageing. The biochemical and molecular mechanisms, which underpin these processes, are complex. However, computational modelling offers an ideal framework for handling this complexity. A number of computational models have been assembled over the years, but to date, no model has represented the full scope of the interaction between the folate cycle and the reactions, which governs the DNA methylation cycle. In this review, we will discuss several of the models, which have been developed to represent these systems. In addition, we will present a rationale for developing a combined model of folate metabolism and the DNA methylation cycle. PubDate: Wed, 21 Dec 2016 00:00:00 GMT DOI: 10.1093/bib/bbw116
Authors:Pappalardo F; Rajput A, Motta S. First page: 318 Abstract: The central nervous system is the most complex network of the human body. The existence and functionality of a large number of molecular species in human brain are still ambiguous and mostly unknown, thus posing a challenge to Science and Medicine. Neurological diseases inherit the same level of complexity, making effective treatments difficult to be found. Multiple sclerosis (MS) is a major neurological disease that causes severe inabilities and also a significant social burden on health care system: between 2 and 2.5 million people are affected by it, and the cost associated with it is significantly higher as compared with other neurological diseases because of the chronic nature of the disease and to the partial efficacy of current therapies. Despite difficulties in understanding and treating MS, many computational models have been developed to help neurologists. In the present work, we briefly review the main characteristics of MS and present a selection criteria of modeling approaches. PubDate: Thu, 22 Dec 2016 00:00:00 GMT DOI: 10.1093/bib/bbw123
Authors:Li Y; Wu F, Ngom A. First page: 325 Abstract: Driven by high-throughput sequencing techniques, modern genomic and clinical studies are in a strong need of integrative machine learning models for better use of vast volumes of heterogeneous information in the deep understanding of biological systems and the development of predictive models. How data from multiple sources (called multi-view data) are incorporated in a learning system is a key step for successful analysis. In this article, we provide a comprehensive review on omics and clinical data integration techniques, from a machine learning perspective, for various analyses such as prediction, clustering, dimension reduction and association. We shall show that Bayesian models are able to use prior information and model measurements with various distributions; tree-based methods can either build a tree with all features or collectively make a final decision based on trees learned from each view; kernel methods fuse the similarity matrices learned from individual views together for a final similarity matrix or learning model; network-based fusion methods are capable of inferring direct and indirect associations in a heterogeneous network; matrix factorization models have potential to learn interactions among features from different views; and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems. PubDate: Fri, 09 Dec 2016 00:00:00 GMT DOI: 10.1093/bib/bbw113
Authors:Blagus R; Goeman J. First page: 341 Abstract: When building classifiers, it is natural to require that the classifier correctly estimates the event probability (Constraint 1), that it has equal sensitivity and specificity (Constraint 2) or that it has equal positive and negative predictive values (Constraint 3). We prove that in the balanced case, where there is equal proportion of events and non-events, any classifier that satisfies one of these constraints will always satisfy all. Such unbiasedness of events and non-events is much more difficult to achieve in the case of rare events, i.e. the situation in which the proportion of events is (much) smaller than 0.5. Here, we prove that it is impossible to meet all three constraints unless the classifier achieves perfect predictions. Any non-perfect classifier can only satisfy at most one constraint, and satisfying one constraint implies violating the other two constraints in a specific direction. Our results have implications for classifiers optimized using g-means or F1-measure, which tend to satisfy Constraints 2 and 1, respectively. Our results are derived from basic probability theory and illustrated with simulations based on some frequently used classifiers. PubDate: Wed, 16 Nov 2016 00:00:00 GMT DOI: 10.1093/bib/bbw107