Journal Cover Bioinformatics
  [SJR: 4.643]   [H-I: 271]   [298 followers]  Follow
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 1367-4803 - ISSN (Online) 1460-2059
   Published by Oxford University Press Homepage  [370 journals]
  • Identifying simultaneous rearrangements in cancer genomes
    • Authors: Oesper L; Dantas S, Raphael B.
      Abstract: AbstractMotivationThe traditional view of cancer evolution states that a cancer genome accumulates a sequential ordering of mutations over a long period of time. However, in recent years it has been suggested that a cancer genome may instead undergo a one-time catastrophic event, such as chromothripsis, where a large number of mutations instead occur simultaneously. A number of potential signatures of chromothripsis have been proposed. In this work, we provide a rigorous formulation and analysis of the ‘ability to walk the derivative chromosome’ signature originally proposed by Korbel and Campbell. In particular, we show that this signature, as originally envisioned, may not always be present in a chromothripsis genome and we provide a precise quantification of under what circumstances it would be present. We also propose a variation on this signature, the H/T alternating fraction, which allows us to overcome some of the limitations of the original signature.ResultsWe apply our measure to both simulated data and a previously analyzed real cancer dataset and find that the H/T alternating fraction may provide useful signal for distinguishing genomes having acquired mutations simultaneously from those acquired in a sequential fashion.Availability and implementationAn implementation of the H/T alternating fraction is available at informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 27 Nov 2017 00:00:00 GMT
  • Identifying structural variants using linked-read sequencing data
    • Authors: Elyanow R; Wu H, Raphael B.
      Abstract: AbstractMotivationStructural variation, including large deletions, duplications, inversions, translocations and other rearrangements, is common in human and cancer genomes. A number of methods have been developed to identify structural variants from Illumina short-read sequencing data. However, reliable identification of structural variants remains challenging because many variants have breakpoints in repetitive regions of the genome and thus are difficult to identify with short reads. The recently developed linked-read sequencing technology from 10X Genomics combines a novel barcoding strategy with Illumina sequencing. This technology labels all reads that originate from a small number (∼5 to 10) DNA molecules ∼50 Kbp in length with the same molecular barcode. These barcoded reads contain long-range sequence information that is advantageous for identification of structural variants.ResultsWe present Novel Adjacency Identification with Barcoded Reads (NAIBR), an algorithm to identify structural variants in linked-read sequencing data. NAIBR predicts novel adjacencies in an individual genome resulting from structural variants using a probabilistic model that combines multiple signals in barcoded reads. We show that NAIBR outperforms several existing methods for structural variant identification—including two recent methods that also analyze linked-reads—on simulated sequencing data and 10X whole-genome sequencing data from the NA12878 human genome and the HCC1954 breast cancer cell line. Several of the novel somatic structural variants identified in HCC1954 overlap known cancer genes.Availability and implementationSoftware is available at informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 03 Nov 2017 00:00:00 GMT
  • Identification of copy number variations and translocations in cancer
           cells from Hi-C data
    • Authors: Chakraborty A; Ay F.
      Abstract: AbstractMotivationEukaryotic chromosomes adapt a complex and highly dynamic three-dimensional (3D) structure, which profoundly affects different cellular functions and outcomes including changes in epigenetic landscape and in gene expression. Making the scenario even more complex, cancer cells harbor chromosomal abnormalities [e.g. copy number variations (CNVs) and translocations] altering their genomes both at the sequence level and at the level of 3D organization. High-throughput chromosome conformation capture techniques (e.g. Hi-C), which are originally developed for decoding the 3D structure of the chromatin, provide a great opportunity to simultaneously identify the locations of genomic rearrangements and to investigate the 3D genome organization in cancer cells. Even though Hi-C data has been used for validating known rearrangements, computational methods that can distinguish rearrangement signals from the inherent biases of Hi-C data and from the actual 3D conformation of chromatin, and can precisely detect rearrangement locations de novo have been missing.ResultsIn this work, we characterize how intra and inter-chromosomal Hi-C contacts are distributed for normal and rearranged chromosomes to devise a new set of algorithms (i) to identify genomic segments that correspond to CNV regions such as amplifications and deletions (HiCnv), (ii) to call inter-chromosomal translocations and their boundaries (HiCtrans) from Hi-C experiments and (iii) to simulate Hi-C data from genomes with desired rearrangements and abnormalities (AveSim) in order to select optimal parameters for and to benchmark the accuracy of our methods. Our results on 10 different cancer cell lines with Hi-C data show that we identify a total number of 105 amplifications and 45 deletions together with 90 translocations, whereas we identify virtually no such events for two karyotypically normal cell lines. Our CNV predictions correlate very well with whole genome sequencing data among chromosomes with CNV events for a breast cancer cell line (r = 0.89) and capture most of the CNVs we simulate using Avesim. For HiCtrans predictions, we report evidence from the literature for 30 out of 90 translocations for eight of our cancer cell lines. Furthermore, we show that our tools identify and correctly classify relatively understudied rearrangements such as double minutes and homogeneously staining regions. Considering the inherent limitations of existing techniques for karyotyping (i.e. missing balanced rearrangements and those near repetitive regions), the accurate identification of CNVs and translocations in a cost-effective and high-throughput setting is still a challenge. Our results show that the set of tools we develop effectively utilize moderately sequenced Hi-C libraries (100–300 million reads) to identify known and de novo chromosomal rearrangements/abnormalities in well-established cancer cell lines. With the decrease in required number of cells and the increase in attainable resolution, we believe that our framework will pave the way towards comprehensive mapping of genomic rearrangements in primary cells from cancer patients using Hi-C.Availability and implementationCNV calling:, Translocation calling: and Hi-C simulation: informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Wed, 18 Oct 2017 00:00:00 GMT
  • CCmiR: a computational approach for competitive and cooperative microRNA
           binding prediction
    • Authors: Ding J; Li X, Hu H.
      Abstract: AbstractMotivationThe identification of microRNA (miRNA) target sites is important. In the past decade, dozens of computational methods have been developed to predict miRNA target sites. Despite their existence, rarely does a method consider the well-known competition and cooperation among miRNAs when attempts to discover target sites. To fill this gap, we developed a new approach called CCmiR, which takes the cooperation and competition of multiple miRNAs into account in a statistical model to predict their target sites.ResultsTested on four different datasets, CCmiR predicted miRNA target sites with a high recall and a reasonable precision, and identified known and new cooperative and competitive miRNAs supported by literature. Compared with three state-of-the-art computational methods, CCmiR had a higher recall and a higher precision.Availability and implementationCCmiR is freely available at or haihu@cs.ucf.eduSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 25 Sep 2017 00:00:00 GMT
  • Phandango: an interactive viewer for bacterial population genomics
    • Authors: Hadfield J; Croucher N, Goater R, et al.
      Abstract: AbstractSummaryFully exploiting the wealth of data in current bacterial population genomics datasets requires synthesizing and integrating different types of analysis across millions of base pairs in hundreds or thousands of isolates. Current approaches often use static representations of phylogenetic, epidemiological, statistical and evolutionary analysis results that are difficult to relate to one another. Phandango is an interactive application running in a web browser allowing fast exploration of large-scale population genomics datasets combining the output from multiple genomic analysis methods in an intuitive and interactive manner.Availability and implementationPhandango is a web application freely available for use at and includes a diverse collection of datasets as examples. Source code together with a detailed wiki page is available on GitHub at or
      PubDate: Mon, 25 Sep 2017 00:00:00 GMT
  • RINspector: a Cytoscape app for centrality analyses and DynaMine
           flexibility prediction
    • Authors: Brysbaert G; Lorgouilloux K, Vranken W, et al.
      Abstract: AbstractMotivationProtein function is directly related to amino acid residue composition and the dynamics of these residues. Centrality analyses based on residue interaction networks permit to identify key residues in a protein that are important for its fold or function. Such central residues and their environment constitute suitable targets for mutagenesis experiments. Predicted flexibility and changes in flexibility upon mutation provide valuable additional information for the design of such experiments.ResultsWe combined centrality analyses with DynaMine flexibility predictions in a Cytoscape app called RINspector. The app performs centrality analyses and directly visualizes the results on a graph of predicted residue flexibility. In addition, the effect of mutations on local flexibility can be calculated.Availability and implementationThe app is publicly available in the Cytoscape app store.Contactguillaume.brysbaert@univ-lille1.frSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 22 Sep 2017 00:00:00 GMT
  • TROVE: a user-friendly tool for visualizing and analyzing cancer hallmarks
           in signaling networks
    • Authors: Chua H; Bhowmick S, Zheng J.
      Abstract: AbstractSummaryCancer hallmarks, a concept that seeks to explain the complexity of cancer initiation and development, provide a new perspective of studying cancer signaling which could lead to a greater understanding of this complex disease. However, to the best of our knowledge, there is currently a lack of tools that support such hallmark-based study of the cancer signaling network, thereby impeding the gain of knowledge in this area. We present TROVE, an user-friendly software that facilitates hallmark annotation, visualization and analysis in cancer signaling networks. In particular, TROVE facilitates hallmark analysis specific to particular cancer types.Availability and implementationAvailable under the Eclipse Public License from: and or
      PubDate: Fri, 22 Sep 2017 00:00:00 GMT
  • HoTResDB: host transcriptional response database for viral hemorrhagic
    • Authors: Lo J; Zhang D, Speranza E, et al.
      Abstract: AbstractSummaryHigh-throughput screening of the host transcriptional response to various viral infections provides a wealth of data, but utilization of microarray and next generation sequencing (NGS) data for analysis can be difficult. The Host Transcriptional Response DataBase (HoTResDB), allows visitors to access already processed microarray and NGS data from non-human primate models of viral hemorrhagic fever to better understand the host transcriptional response.AvailabilityHoTResDB is freely available at
      PubDate: Fri, 22 Sep 2017 00:00:00 GMT
  • Detecting presence of mutational signatures in cancer with confidence
    • Authors: Huang X; Wojtowicz D, Przytycka T.
      Abstract: AbstractMotivationCancers arise as the result of somatically acquired changes in the DNA of cancer cells. However, in addition to the mutations that confer a growth advantage, cancer genomes accumulate a large number of somatic mutations resulting from normal DNA damage and repair processes as well as carcinogenic exposures or cancer related aberrations of DNA maintenance machinery. These mutagenic processes often produce characteristic mutational patterns called mutational signatures. The decomposition of a cancer genome’s mutation catalog into mutations consistent with such signatures can provide valuable information about cancer etiology. However, the results from different decomposition methods are not always consistent. Hence, one needs to be able to not only decompose a patient’s mutational profile into signatures but also establish the accuracy of such decomposition.ResultsWe proposed two complementary ways of measuring confidence and stability of decomposition results and applied them to analyze mutational signatures in breast cancer genomes. We identified both very stable and highly unstable signatures, as well as signatures that previously have not been associated with breast cancer. We also provided additional support for the novel signatures. Our results emphasize the importance of assessing the confidence and stability of inferred signature contributions.Availability and implementationAll tools developed in this paper have been implemented in an R package, called SignatureEstimation, which is available from\ or przytyck@ncbi.nlm.nih.govSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 22 Sep 2017 00:00:00 GMT
  • IntPred: a structure-based predictor of protein–protein interaction
    • Authors: Northey T; Barešić A, Martin A.
      Abstract: AbstractMotivationProtein–protein interactions are vital for protein function with the average protein having between three and ten interacting partners. Knowledge of precise protein–protein interfaces comes from crystal structures deposited in the Protein Data Bank (PDB), but only 50% of structures in the PDB are complexes. There is therefore a need to predict protein–protein interfaces in silico and various methods for this purpose. Here we explore the use of a predictor based on structural features and which exploits random forest machine learning, comparing its performance with a number of popular established methods.ResultsOn an independent test set of obligate and transient complexes, our IntPred predictor performs well (MCC = 0.370, ACC = 0.811, SPEC = 0.916, SENS = 0.411) and compares favourably with other methods. Overall, IntPred ranks second of six methods tested with SPPIDER having slightly better overall performance (MCC = 0.410, ACC = 0.759, SPEC = 0.783, SENS = 0.676), but considerably worse specificity than IntPred. As with SPPIDER, using an independent test set of obligate complexes enhanced performance (MCC = 0.381) while performance is somewhat reduced on a dataset of transient complexes (MCC = 0.303). The trade-off between sensitivity and specificity compared with SPPIDER suggests that the choice of the appropriate tool is application-dependent.Availability and implementationIntPred is implemented in Perl and may be downloaded for local use or run via a web server at or informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 18 Sep 2017 00:00:00 GMT
  • A novel SCCA approach via truncated ℓ1-norm and truncated group lasso
           for brain imaging genetics
    • Authors: Du L; Liu K, Zhang T, et al.
      Abstract: AbstractMotivationBrain imaging genetics, which studies the linkage between genetic variations and structural or functional measures of the human brain, has become increasingly important in recent years. Discovering the bi-multivariate relationship between genetic markers such as single-nucleotide polymorphisms (SNPs) and neuroimaging quantitative traits (QTs) is one major task in imaging genetics. Sparse Canonical Correlation Analysis (SCCA) has been a popular technique in this area for its powerful capability in identifying bi-multivariate relationships coupled with feature selection. The existing SCCA methods impose either the ℓ1-norm or its variants to induce sparsity. The ℓ0-norm penalty is a perfect sparsity-inducing tool which, however, is an NP-hard problem.ResultsIn this paper, we propose the truncated ℓ1-norm penalized SCCA to improve the performance and effectiveness of the ℓ1-norm based SCCA methods. Besides, we propose an efficient optimization algorithms to solve this novel SCCA problem. The proposed method is an adaptive shrinkage method via tuning τ. It can avoid the time intensive parameter tuning if given a reasonable small τ. Furthermore, we extend it to the truncated group-lasso (TGL), and propose TGL-SCCA model to improve the group-lasso-based SCCA methods. The experimental results, compared with four benchmark methods, show that our SCCA methods identify better or similar correlation coefficients, and better canonical loading profiles than the competing methods. This demonstrates the effectiveness and efficiency of our methods in discovering interesting imaging genetic associations.Availability and implementationThe Matlab code and sample data are freely available at∼shenlab/tools/tlpscca/ or shenli@iu.eduSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 18 Sep 2017 00:00:00 GMT
  • Meta-server for automatic analysis, scoring and ranking of docking models
    • Authors: Anashkina A; Kravatsky Y, Kuznetsov E, et al.
      Abstract: AbstractMotivationModelling with multiple servers that use different algorithms for docking results in more reliable predictions of interaction sites. However, the scoring and comparison of all models by an expert is time-consuming and is not feasible for large volumes of data generated by such modelling.ResultsQuality ASsessment of DOcking Models (QASDOM) Server is a simple and efficient tool for real-time simultaneous analysis, scoring and ranking of data sets of receptor–ligand complexes built by a range of docking techniques. This meta-server is designed to analyse large data sets of docking models and rank them by scoring criteria developed in this study. It produces two types of output showing the likelihood of specific residues and clusters of residues to be involved in receptor–ligand interactions and the ranking of models. The server also allows visualizing residues that form interaction sites in the receptor and ligand sequence and displays 3D model structures of the receptor–ligand complexes.Availability informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 18 Sep 2017 00:00:00 GMT
  • MemBrain-contact 2.0: a new two-stage machine learning model for the
           prediction enhancement of transmembrane protein residue contacts in the
           full chain
    • Authors: Yang J; Shen H.
      Abstract: AbstractMotivationInter-residue contacts in proteins have been widely acknowledged to be valuable for protein 3 D structure prediction. Accurate prediction of long-range transmembrane inter-helix residue contacts can significantly improve the quality of simulated membrane protein models.ResultsIn this paper, we present an updated MemBrain predictor, which aims to predict transmembrane protein residue contacts. Our new model benefits from an efficient learning algorithm that can mine latent structural features, which exist in original feature space. The new MemBrain is a two-stage inter-helix contact predictor. The first stage takes sequence-based features as inputs and outputs coarse contact probabilities for each residue pair, which will be further fed into convolutional neural network together with predictions from three direct-coupling analysis approaches in the second stage. Experimental results on the training dataset show that our method achieves an average accuracy of 81.6% for the top L/5 predictions using a strict sequence-based jackknife cross-validation. Evaluated on the test dataset, MemBrain can achieve 79.4% prediction accuracy. Moreover, for the top L/5 predicted long-range loop contacts, the prediction performance can reach an accuracy of 56.4%. These results demonstrate that the new MemBrain is promising for transmembrane protein’s contact map prediction.Availability and implementation informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 15 Sep 2017 00:00:00 GMT
  • MetExploreViz: web component for interactive metabolic network
    • Authors: Chazalviel M; Frainay C, Poupin N, et al.
      Abstract: AbstractSummaryMetExploreViz is an open source web component that can be easily embedded in any web site. It provides features dedicated to the visualization of metabolic networks and pathways and thus offers a flexible solution to analyse omics data in a biochemical context.Availability and implementationDocumentation and link to GIT code repository (GPL 3.0 license) are available at this URL:
      PubDate: Fri, 15 Sep 2017 00:00:00 GMT
  • SINCERITIES: inferring gene regulatory networks from time-stamped single
           cell transcriptional expression profiles
    • Authors: Papili Gao N; Ud-Dean S, Gandrillon O, et al.
      Abstract: AbstractMotivationSingle cell transcriptional profiling opens up a new avenue in studying the functional role of cell-to-cell variability in physiological processes. The analysis of single cell expression profiles creates new challenges due to the distributive nature of the data and the stochastic dynamics of gene transcription process. The reconstruction of gene regulatory networks (GRNs) using single cell transcriptional profiles is particularly challenging, especially when directed gene-gene relationships are desired.ResultsWe developed SINCERITIES (SINgle CEll Regularized Inference using TIme-stamped Expression profileS) for the inference of GRNs from single cell transcriptional profiles. We focused on time-stamped cross-sectional expression data, commonly generated from transcriptional profiling of single cells collected at multiple time points after cell stimulation. SINCERITIES recovers directed regulatory relationships among genes by employing regularized linear regression (ridge regression), using temporal changes in the distributions of gene expressions. Meanwhile, the modes of the gene regulations (activation and repression) come from partial correlation analyses between pairs of genes. We demonstrated the efficacy of SINCERITIES in inferring GRNs using in silico time-stamped single cell expression data and single cell transcriptional profiles of THP-1 monocytic human leukemia cells. The case studies showed that SINCERITIES could provide accurate GRN predictions, significantly better than other GRN inference algorithms such as TSNI, GENIE3 and JUMP3. Moreover, SINCERITIES has a low computational complexity and is amenable to problems of extremely large dimensionality. Finally, an application of SINCERITIES to single cell expression data of T2EC chicken erythrocytes pointed to BATF as a candidate novel regulator of erythroid development.Availability and implementationMATLAB and R version of SINCERITIES are freely available from the following websites: and The single cell THP-1 and T2EC transcriptional profiles are available from the original publications (Kouno et al., 2013; Richard et al., 2016). The in silico single cell data are available on SINCERITIES websites.Contactrudi.gunawan@chem.ethz.chSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 14 Sep 2017 00:00:00 GMT
  • JEPEGMIX2: improved gene-level joint analysis of eQTLs in cosmopolitan
    • Authors: Chatzinakos C; Lee D, Webb B, et al.
      Abstract: AbstractMotivationTo increase detection power, researchers use gene level analysis methods to aggregate weak marker signals. Due to gene expression controlling biological processes, researchers proposed aggregating signals for expression Quantitative Trait Loci (eQTL). Most gene-level eQTL methods make statistical inferences based on (i) summary statistics from genome-wide association studies (GWAS) and (ii) linkage disequilibrium patterns from a relevant reference panel. While most such tools assume homogeneous cohorts, our Gene-level Joint Analysis of functional SNPs in Cosmopolitan Cohorts (JEPEGMIX) method accommodates cosmopolitan cohorts by using heterogeneous panels. However, JEPGMIX relies on brain eQTLs from older gene expression studies and does not adjust for background enrichment in GWAS signals.ResultsWe propose JEPEGMIX2, an extension of JEPEGMIX. When compared to JPEGMIX, it uses (i) cis-eQTL SNPs from the latest expression studies and (ii) brains specific (sub)tissues and tissues other than brain. JEPEGMIX2 also (i) avoids accumulating averagely enriched polygenic information by adjusting for background enrichment and (ii) to avoid an increase in false positive rates for studies with numerous highly enriched (above the background) genes, it outputs gene q-values based on Holm adjustment of P-values.Availability and implementation informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 14 Sep 2017 00:00:00 GMT
  • SNPDelScore: combining multiple methods to score deleterious effects of
           noncoding mutations in the human genome
    • Authors: Alvarez R; Li S, Landsman D, et al.
      Abstract: AbstractSummaryAddressing deleterious effects of noncoding mutations is an essential step towards the identification of disease-causal mutations of gene regulatory elements. Several methods for quantifying the deleteriousness of noncoding mutations using artificial intelligence, deep learning and other approaches have been recently proposed. Although the majority of the proposed methods have demonstrated excellent accuracy on different test sets, there is rarely a consensus. In addition, advanced statistical and artificial learning approaches used by these methods make it difficult porting these methods outside of the labs that have developed them. To address these challenges and to transform the methodological advances in predicting deleterious noncoding mutations into a practical resource available for the broader functional genomics and population genetics communities, we developed SNPDelScore, which uses a panel of proposed methods for quantifying deleterious effects of noncoding mutations to precompute and compare the deleteriousness scores of all common SNPs in the human genome in 44 cell lines. The panel of deleteriousness scores of a SNP computed using different methods is supplemented by functional information from the GWAS Catalog, libraries of transcription factor-binding sites, and genic characteristics of mutations. SNPDelScore comes with a genome browser capable of displaying and comparing large sets of SNPs in a genomic locus and rapidly identifying consensus SNPs with the highest deleteriousness scores making those prime candidates for phenotype-causal polymorphisms.Availability and implementation informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 14 Sep 2017 00:00:00 GMT
  • SiNoPsis: Single Nucleotide Polymorphisms selection and promoter profiling
    • Authors: Boloc D; Rodríguez N, Gassó P, et al.
      Abstract: AbstractMotivationThe selection of a single nucleotide polymorphism (SNP) using bibliographic methods can be a very time-consuming task. Moreover, a SNP selected in this way may not be easily visualized in its genomic context by a standard user hoping to correlate it with other valuable information. Here we propose a web form built on top of Circos that can assist SNP-centered screening, based on their location in the genome and the regulatory modules they can disrupt. Its use may allow researchers to prioritize SNPs in genotyping and disease studies.ResultsSiNoPsis is bundled as a web portal. It focuses on the different structures involved in the genomic expression of a gene, especially those found in the core promoter upstream region. These structures include transcription factor binding sites (for promoter and enhancer signals), histones and promoter flanking regions. Additionally, the tool provides eQTL and linkage disequilibrium (LD) properties for a given SNP query, yielding further clues about other indirectly associated SNPs. Possible disruptions of the aforementioned structures affecting gene transcription are reported using multiple resource databases. SiNoPsis has a simple user-friendly interface, which allows single queries by gene symbol, genomic coordinates, Ensembl gene identifiers, RefSeq transcript identifiers and SNPs. It is the only portal providing useful SNP selection based on regulatory modules and LD with functional variants in both textual and graphic modes (by properly defining the arguments and parameters needed to run Circos).Availability and implementationSiNoPsis is freely available at informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 14 Sep 2017 00:00:00 GMT
  • web-based topic modelling for substructure discovery in mass
    • Authors: Wandy J; Zhu Y, van der Hooft J, et al.
      Abstract: AbstractMotivationWe recently published MS2LDA, a method for the decomposition of sets of molecular fragment data derived from large metabolomics experiments. To make the method more widely available to the community, here we present, a web application that allows users to upload their data, run MS2LDA analyses and explore the results through interactive takes tandem mass spectrometry data in many standard formats and allows the user to infer the sets of fragment and neutral loss features that co-occur together (Mass2Motifs). As an alternative workflow, the user can also decompose a data set onto predefined Mass2Motifs. This is accomplished through the web interface or programmatically from our web service.Availability and implementationThe website can be found at, while the source code is available at under the MIT informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 14 Sep 2017 00:00:00 GMT
  • LinkageMapView—rendering high-resolution linkage and QTL maps
    • Authors: Ouellette L; Reid R, Blanchard S, et al.
      Abstract: AbstractMotivationLinkage and quantitative trait loci (QTL) maps are critical tools for the study of the genetic basis of complex traits. With the advances in sequencing technology over the past decade, linkage map densities have been increasing dramatically, while the visualization tools have not kept pace. LinkageMapView is a free add-on package written in R that produces high resolution, publication-ready visualizations of linkage and QTL maps. While there is software available to generate linkage map graphics, none are freely available, produce publication quality figures, are open source and can run on all platforms. LinkageMapView can be integrated into map building pipelines as it seamlessly incorporates output from R/qtl and also accepts simple text or comma delimited files. There are numerous options within the package to build highly customizable maps, allow for linkage group comparisons, and annotate QTL regions.Availability and implementation
      PubDate: Wed, 13 Sep 2017 00:00:00 GMT
  • A utility maximizing and privacy preserving approach for protecting
           kinship in genomic databases
    • Authors: Kale G; Ayday E, Tastan O.
      Abstract: AbstractMotivationRapid and low cost sequencing of genomes enabled widespread use of genomic data in research studies and personalized customer applications, where genomic data is shared in public databases. Although the identities of the participants are anonymized in these databases, sensitive information about individuals can still be inferred. One such information is kinship.ResultsWe define two routes kinship privacy can leak and propose a technique to protect kinship privacy against these risks while maximizing the utility of shared data. The method involves systematic identification of minimal portions of genomic data to mask as new participants are added to the database. Choosing the proper positions to hide is cast as an optimization problem in which the number of positions to mask is minimized subject to privacy constraints that ensure the familial relationships are not revealed. We evaluate the proposed technique on real genomic data. Results indicate that concurrent sharing of data pertaining to a parent and an offspring results in high risks of kinship privacy, whereas the sharing data from further relatives together is often safer. We also show arrival order of family members have a high impact on the level of privacy risks and on the utility of sharing data.Availability and implementation or informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 12 Sep 2017 00:00:00 GMT
  • NetProphet 2.0: mapping transcription factor networks by exploiting
           scalable data resources
    • Authors: Kang Y; Liow H, Maier E, et al.
      Abstract: AbstractMotivationCells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and ‘integrative’ algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types.ResultsWe present NetProphet 2.0, a ‘data light’ algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map.Availability and implementationSource code and comprehensive documentation are freely available at informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 12 Sep 2017 00:00:00 GMT
  • Omics AnalySIs System for PRecision Oncology (OASISPRO): a web-based omics
           analysis tool for clinical phenotype prediction
    • Authors: Yu K; Fitzpatrick M, Pappas L, et al.
      Abstract: AbstractSummaryPrecision oncology is an approach that accounts for individual differences to guide cancer management. Omics signatures have been shown to predict clinical traits for cancer patients. However, the vast amount of omics information poses an informatics challenge in systematically identifying patterns associated with health outcomes, and no general purpose data mining tool exists for physicians, medical researchers and citizen scientists without significant training in programming and bioinformatics. To bridge this gap, we built the Omics AnalySIs System for PRecision Oncology (OASISPRO), a web-based system to mine the quantitative omics information from The Cancer Genome Atlas (TCGA). This system effectively visualizes patients’ clinical profiles, executes machine-learning algorithms of choice on the omics data and evaluates the prediction performance using held-out test sets. With this tool, we successfully identified genes strongly associated with tumor stage, and accurately predicted patients’ survival outcomes in many cancer types, including adrenocortical carcinoma. By identifying the links between omics and clinical phenotypes, this system will facilitate omics studies on precision cancer medicine and contribute to establishing personalized cancer treatment plans.Availability and implementationThis web-based tool is available at; source codes are available at or mpsnyder@stanford.eduSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 12 Sep 2017 00:00:00 GMT
  • MAJIQ-SPEL: web-tool to interrogate classical and complex splicing
           variations from RNA-Seq data
    • Authors: Green C; Gazzara M, Barash Y.
      Abstract: AbstractSummaryAnalysis of RNA sequencing (RNA-Seq) data have highlighted the fact that most genes undergo alternative splicing (AS) and that these patterns are tightly regulated. Many of these events are complex, resulting in numerous possible isoforms that quickly become difficult to visualize, interpret and experimentally validate. To address these challenges we developed MAJIQ-SPEL, a web-tool that takes as input local splicing variations (LSVs) quantified from RNA-Seq data and provides users with visualization and quantification of gene isoforms associated with those. Importantly, MAJIQ-SPEL is able to handle both classical (binary) and complex, non-binary, splicing variations. Using a matching primer design algorithm it also suggests to users possible primers for experimental validation by RT-PCR and displays those, along with the matching protein domains affected by the LSV, on UCSC Genome Browser for further downstream analysis.Availability and implementationProgram and code will be available at informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 11 Sep 2017 00:00:00 GMT
  • DART: a fast and accurate RNA-seq mapper with a partitioning strategy
    • Authors: Lin H; Hsu W.
      Abstract: AbstractMotivationIn recent years, the massively parallel cDNA sequencing (RNA-Seq) technologies have become a powerful tool to provide high resolution measurement of expression and high sensitivity in detecting low abundance transcripts. However, RNA-seq data requires a huge amount of computational efforts. The very fundamental and critical step is to align each sequence fragment against the reference genome. Various de novo spliced RNA aligners have been developed in recent years. Though these aligners can handle spliced alignment and detect splice junctions, some challenges still remain to be solved. With the advances in sequencing technologies and the ongoing collection of sequencing data in the ENCODE project, more efficient alignment algorithms are highly demanded. Most read mappers follow the conventional seed-and-extend strategy to deal with inexact matches for sequence alignment. However, the extension is much more time consuming than the seeding step.ResultsWe proposed a novel RNA-seq de novo mapping algorithm, call DART, which adopts a partitioning strategy to avoid the extension step. The experiment results on synthetic datasets and real NGS datasets showed that DART is a highly efficient aligner that yields the highest or comparable sensitivity and accuracy compared to most state-of-the-art aligners, and more importantly, it spends the least amount of time among the selected aligners.Availability and implementation informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 05 Sep 2017 00:00:00 GMT
  • CCFold: rapid and accurate prediction of coiled-coil structures and
           application to modelling intermediate filaments
    • Authors: Guzenko D; Strelkov S.
      Abstract: AbstractMotivationAccurate molecular structure of the protein dimer representing the elementary building block of intermediate filaments (IFs) is essential towards the understanding of the filament assembly, rationalizing their mechanical properties and explaining the effect of disease-related IF mutations. The dimer contains a ∼300-residue long α-helical coiled coil which cannot be assessed by either direct experimental structure determination or modelling using standard approaches. At the same time, coiled coils are well-represented in structural databases.ResultsHere we present CCFold, a generally applicable threading-based algorithm which produces coiled-coil models from protein sequence only. The algorithm is based on a statistical analysis of experimentally determined structures and can handle any hydrophobic repeat patterns in addition to the most common heptads. We demonstrate that CCFold outperforms general-purpose computational folding in terms of accuracy, while being faster by orders of magnitude. By combining the CCFold algorithm and Rosetta folding we generate representative dimer models for all IF protein classes.Availability and implementationThe source code is freely available at; a web server to run the program is at informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 04 Sep 2017 00:00:00 GMT
  • A graph regularized non-negative matrix factorization method for
           identifying microRNA-disease associations
    • Authors: Xiao Q; Luo J, Liang C, et al.
      Abstract: AbstractMotivationMicroRNAs (miRNAs) play crucial roles in post-transcriptional regulations and various cellular processes. The identification of disease-related miRNAs provides great insights into the underlying pathogenesis of diseases at a system level. However, most existing computational approaches are biased towards known miRNA-disease associations, which is inappropriate for those new diseases or miRNAs without any known association information.ResultsIn this study, we propose a new method with graph regularized non-negative matrix factorization in heterogeneous omics data, called GRNMF, to discover potential associations between miRNAs and diseases, especially for new diseases and miRNAs or those diseases and miRNAs with sparse known associations. First, we integrate the disease semantic information and miRNA functional information to estimate disease similarity and miRNA similarity, respectively. Considering that there is no available interaction observed for new diseases or miRNAs, a preprocessing step is developed to construct the interaction score profiles that will assist in prediction. Next, a graph regularized non-negative matrix factorization framework is utilized to simultaneously identify potential associations for all diseases. The results indicated that our proposed method can effectively prioritize disease-associated miRNAs with higher accuracy compared with other recent approaches. Moreover, case studies also demonstrated the effectiveness of GRNMF to infer unknown miRNA-disease associations for those novel diseases and miRNAs.Availability and implementationThe code of GRNMF is freely available at informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 01 Sep 2017 00:00:00 GMT
  • A non-negative matrix factorization based method for predicting
           disease-associated miRNAs in miRNA-disease bilayer network
    • Authors: Zhong Y; Xuan P, Wang X, et al.
      Abstract: AbstractMotivationIdentification of disease-associated miRNAs (disease miRNAs) is critical for understanding disease etiology and pathogenesis. Since miRNAs exert their functions by regulating the expression of their target mRNAs, several methods based on the target genes were proposed to predict disease miRNA candidates. They achieved only limited success as they all suffered from the high false-positive rate of target prediction results. Alternatively, other prediction methods were based on the observation that miRNAs with similar functions tend to be associated with similar diseases and vice versa. The methods exploited the information about miRNAs and diseases, including the functional similarities between miRNAs, the similarities between diseases, and the associations between miRNAs and diseases. However, how to integrate the multiple kinds of information completely and consider the biological characteristic of disease miRNAs is a challenging problem.ResultsWe constructed a bilayer network to represent the complex relationships among miRNAs, among diseases and between miRNAs and diseases. We proposed a non-negative matrix factorization based method to rank, so as to predict, the disease miRNA candidates. The method integrated the miRNA functional similarity, the disease similarity and the miRNA-disease associations seamlessly, which exploited the complex relationships within the bilayer network and the consensus relationship between multiple kinds of information. Considering the correlation between the candidates related to various diseases, it predicted their respective candidates for all the diseases simultaneously. In addition, the sparseness characteristic of disease miRNAs was introduced to generate more reliable prediction model that excludes those noisy candidates. The results on 15 common diseases showed a superior performance of the new method for not only well-characterized diseases but also new ones. A detailed case study on breast neoplasms, colorectal neoplasms, lung neoplasms and 32 other diseases demonstrated the ability of the method for discovering potential disease miRNAs.Availability and implementationThe web service for the new method and the list of predicted candidates for all the diseases are available at or or informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 01 Sep 2017 00:00:00 GMT
  • regNet: an R package for network-based propagation of gene expression
    • Authors: Seifert M; Beyer A.
      Abstract: AbstractSummaryGene expression alterations and potentially underlying gene copy number mutations can be measured routinely in the wet lab, but it is still extremely challenging to quantify impacts of altered genes on clinically relevant characteristics to predict putative driver genes. We developed the R package regNet that utilizes gene expression and copy number data to learn regulatory networks for the quantification of potential impacts of individual gene expression alterations on user-defined target genes via network propagation. We demonstrate the value of regNet by identifying putative major regulators that distinguish pilocytic from diffuse astrocytomas and by predicting putative impacts of glioblastoma-specific gene copy number alterations on cell cycle pathway genes and patient survival.Availability and implementationregNet is available for download at under GNU GPL-3.Contactmichael.seifert@tu-dresden.deSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 31 Aug 2017 00:00:00 GMT
  • Gearing up to handle the mosaic nature of life in the quest for orthologs
    • Authors: Forslund K; Pereira C, Capella-Gutierrez S, et al.
      Abstract: AbstractSummaryThe Quest for Orthologs (QfO) is an open collaboration framework for experts in comparative phylogenomics and related research areas who have an interest in highly accurate orthology predictions and their applications. We here report highlights and discussion points from the QfO meeting 2015 held in Barcelona. Achievements in recent years have established a basis to support developments for improved orthology prediction and to explore new approaches. Central to the QfO effort is proper benchmarking of methods and services, as well as design of standardized datasets and standardized formats to allow sharing and comparison of results. Simultaneously, analysis pipelines have been improved, evaluated and adapted to handle large datasets. All this would not have occurred without the long-term collaboration of Consortium members. Meeting regularly to review and coordinate complementary activities from a broad spectrum of innovative researchers clearly benefits the community. Highlights of the meeting include addressing sources of and legitimacy of disagreements between orthology calls, the context dependency of orthology definitions, special challenges encountered when analyzing very anciently rooted orthologies, orthology in the light of whole-genome duplications, and the concept of orthologous versus paralogous relationships at different levels, including domain-level orthology. Furthermore, particular needs for different applications (e.g. plant genomics, ancient gene families and others) and the infrastructure for making orthology inferences available (e.g. interfaces with model organism databases) were discussed, with several ongoing efforts that are expected to be reported on during the upcoming 2017 QfO or
      PubDate: Wed, 30 Aug 2017 00:00:00 GMT
  • In silico identification of rescue sites by double force scanning
    • Authors: Tiberti M; Pandini A, Fraternali F, et al.
      Abstract: AbstractMotivationA deleterious amino acid change in a protein can be compensated by a second-site rescue mutation. These compensatory mechanisms can be mimicked by drugs. In particular, the location of rescue mutations can be used to identify protein regions that can be targeted by small molecules to reactivate a damaged mutant.ResultsWe present the first general computational method to detect rescue sites. By mimicking the effect of mutations through the application of forces, the double force scanning (DFS) method identifies the second-site residues that make the protein structure most resilient to the effect of pathogenic mutations. We tested DFS predictions against two datasets containing experimentally validated and putative evolutionary-related rescue sites. A remarkably good agreement was found between predictions and experimental data. Indeed, almost half of the rescue sites in p53 was correctly predicted by DFS, with 65% of remaining sites in contact with DFS predictions. Similar results were found for other proteins in the evolutionary dataset.Availability and implementationThe DFS code is available under GPL at or informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 14 Aug 2017 00:00:00 GMT
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
Home (Search)
Subjects A-Z
Publishers A-Z
Your IP address:
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2016