Followed Journals
Journal you Follow: 0
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Journals are published.
Already have an account? Sign In to see the journals you follow.
Similar Journals
Journal Cover
Journal Prestige (SJR): 6.14
Citation Impact (citeScore): 8
Number of Followers: 399  
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1367-4803 - ISSN (Online) 1460-2059
Published by Oxford University Press Homepage  [412 journals]
  • SOMM4mC: a second-order Markov model for DNA N4-methylcytosine site
           prediction in six species
    • Authors: Yang J; Lang K, Zhang G, et al.
      Pages: 4103 - 4105
      Abstract: AbstractMotivationDNA N4-methylcytosine (4mC) modification is an important epigenetic modification in prokaryotic DNA due to its role in regulating DNA replication and protecting the host DNA against degradation. An efficient algorithm to identify 4mC sites is needed for downstream analyses.ResultsIn this study, we propose a new prediction method named SOMM4mC based on a second-order Markov model, which makes use of the transition probability between adjacent nucleotides to identify 4mC sites. The results show that the first-order and second-order Markov model are superior to the three existing algorithms in all six species (Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Escherichia coli, Geoalkalibacter subterruneus and Geobacter pickeringii) where benchmark datasets are available. However, the classification performance of SOMM4mC is more outstanding than that of first-order Markov model. Especially, for E.coli and C.elegans, the overall accuracy of SOMM4mC are 91.8% and 87.6%, which are 8.5% and 6.1% higher than those of the latest method 4mcPred-SVM, respectively. This shows that more discriminant sequence information is captured by SOMM4mC through the dependency between adjacent nucleotides.Availability and implementationThe web server of SOMM4mC is freely accessible at or
      PubDate: Fri, 15 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa507
      Issue No: Vol. 36, No. 14 (2020)
  • Testing hypotheses about the microbiome using the linear decomposition
           model (LDM)
    • Authors: Hu Y; Satten G, Valencia A.
      Pages: 4106 - 4115
      Abstract: AbstractMotivationMethods for analyzing microbiome data generally fall into one of two groups: tests of the global hypothesis of any microbiome effect, which do not provide any information on the contribution of individual operational taxonomic units (OTUs); and tests for individual OTUs, which do not typically provide a global test of microbiome effect. Without a unified approach, the findings of a global test may be hard to resolve with the findings at the individual OTU level. Further, many tests of individual OTU effects do not preserve the false discovery rate (FDR).ResultsWe introduce the linear decomposition model (LDM), that provides a single analysis path that includes global tests of any effect of the microbiome, tests of the effects of individual OTUs while accounting for multiple testing by controlling the FDR, and a connection to distance-based ordination. The LDM accommodates both continuous and discrete variables (e.g. clinical outcomes, environmental factors) as well as interaction terms to be tested either singly or in combination, allows for adjustment of confounding covariates, and uses permutation-based P-values that can control for sample correlation. The LDM can also be applied to transformed data, and an ‘omnibus’ test can easily combine results from analyses conducted on different transformation scales. We also provide a new implementation of PERMANOVA based on our approach. For global testing, our simulations indicate the LDM provided correct type I error and can have comparable power to existing distance-based methods. For testing individual OTUs, our simulations indicate the LDM controlled the FDR well. In contrast, DESeq2 often had inflated FDR; MetagenomeSeq generally had the lowest sensitivity. The flexibility of the LDM for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies. We also show that our implementation of PERMANOVA can outperform existing implementations.Availability and implementationThe R package LDM is available on GitHub at in formats appropriate for Macintosh or informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 21 Apr 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa260
      Issue No: Vol. 36, No. 14 (2020)
  • Optimization of co-evolution analysis through phylogenetic profiling
           reveals pathway-specific signals
    • Authors: Bloch I; Sherill-Rofe D, Stupp D, et al.
      Pages: 4116 - 4125
      Abstract: AbstractSummaryThe exponential growth in available genomic data is expected to reach full sequencing of a million genomes in the coming decade. Improving and developing methods to analyze these genomes and to reveal their utility is of major interest in a wide variety of fields, such as comparative and functional genomics, evolution and bioinformatics. Phylogenetic profiling is an established method for predicting functional interactions between proteins based on similarities in their evolutionary patterns across species. Proteins that function together (i.e. generate complexes, interact in the same pathways or improve adaptation to environmental niches) tend to show coordinated evolution across the tree of life. The normalized phylogenetic profiling (NPP) method takes into account minute changes in proteins across species to identify protein co-evolution. Despite the success of this method, it is still not clear what set of parameters is required for optimal use of co-evolution in predicting functional interactions. Moreover, it is not clear if pathway evolution or function should direct parameter choice. Here, we create a reliable and usable NPP construction pipeline. We explore the effect of parameter selection on functional interaction prediction using NPP from 1028 genomes, both separately and in various value combinations. We identify several parameter sets that optimize performance for pathways with certain biological annotation. This work reveals the importance of choosing the right parameters for optimized function prediction based on a biological context.Availability and implementationSource code and documentation are available on GitHub: informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 30 Apr 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa281
      Issue No: Vol. 36, No. 14 (2020)
  • MetaviralSPAdes: assembly of viruses from metagenomic data
    • Authors: Antipov D; Raiko M, Lapidus A, et al.
      Pages: 4126 - 4129
      Abstract: AbstractMotivationAlthough the set of currently known viruses has been steadily expanding, only a tiny fraction of the Earth’s virome has been sequenced so far. Shotgun metagenomic sequencing provides an opportunity to reveal novel viruses but faces the computational challenge of identifying viral genomes that are often difficult to detect in metagenomic assemblies.ResultsWe describe a MetaviralSPAdes tool for identifying viral genomes in metagenomic assembly graphs that is based on analyzing variations in the coverage depth between viruses and bacterial chromosomes. We benchmarked MetaviralSPAdes on diverse metagenomic datasets, verified our predictions using a set of virus-specific Hidden Markov Models and demonstrated that it improves on the state-of-the-art viral identification pipelines.Availability and implementationMetaviralSPAdes includes ViralAssembly, ViralVerify and ViralComplete modules that are available as standalone packages:, and informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 15 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa490
      Issue No: Vol. 36, No. 14 (2020)
  • Higher-order Markov models for metagenomic sequence classification
    • Authors: Burks D; Azad R, Xu J.
      Pages: 4130 - 4136
      Abstract: AbstractMotivationAlignment-free, stochastic models derived from k-mer distributions representing reference genome sequences have a rich history in the classification of DNA sequences. In particular, the variants of Markov models have previously been used extensively. Higher-order Markov models have been used with caution, perhaps sparingly, primarily because of the lack of enough training data and computational power. Advances in sequencing technology and computation have enabled exploitation of the predictive power of higher-order models. We, therefore, revisited higher-order Markov models and assessed their performance in classifying metagenomic sequences.ResultsComparative assessment of higher-order models (HOMs, 9th order or higher) with interpolated Markov model, interpolated context model and lower-order models (8th order or lower) was performed on metagenomic datasets constructed using sequenced prokaryotic genomes. Our results show that HOMs outperform other models in classifying metagenomic fragments as short as 100 nt at all taxonomic ranks, and at lower ranks when the fragment size was increased to 250 nt. HOMs were also found to be significantly more accurate than local alignment which is widely relied upon for taxonomic classification of metagenomic sequences. A novel software implementation written in C++ performs classification faster than the existing Markovian metagenomic classifiers and can therefore be used as a standalone classifier or in conjunction with existing taxonomic classifiers for more robust classification of metagenomic sequences.Availability and implementationThe software has been made available at informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 09 Jun 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa562
      Issue No: Vol. 36, No. 14 (2020)
  • CiteFuse enables multi-modal analysis of CITE-seq data
    • Authors: Kim H; Lin Y, Geddes T, et al.
      Pages: 4137 - 4143
      Abstract: AbstractMotivationMulti-modal profiling of single cells represents one of the latest technological advancements in molecular biology. Among various single-cell multi-modal strategies, cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) allows simultaneous quantification of two distinct species: RNA and cell-surface proteins. Here, we introduce CiteFuse, a streamlined package consisting of a suite of tools for doublet detection, modality integration, clustering, differential RNA and protein expression analysis, antibody-derived tag evaluation, ligand–receptor interaction analysis and interactive web-based visualization of CITE-seq data.ResultsWe demonstrate the capacity of CiteFuse to integrate the two data modalities and its relative advantage against data generated from single-modality profiling using both simulations and real-world CITE-seq data. Furthermore, we illustrate a novel doublet detection method based on a combined index of cell hashing and transcriptome data. Finally, we demonstrate CiteFuse for predicting ligand–receptor interactions by using multi-modal CITE-seq data. Collectively, we demonstrate the utility and effectiveness of CiteFuse for the integrative analysis of transcriptome and epitope profiles from CITE-seq data.Availability and implementationCiteFuse is freely available at as an online web service and at as an R informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 30 Apr 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa282
      Issue No: Vol. 36, No. 14 (2020)
  • An in silico model of LINE-1-mediated neoplastic evolution
    • Authors: LeBien J; McCollam G, Atallah J, et al.
      Pages: 4144 - 4153
      Abstract: AbstractMotivationRecent research has uncovered roles for transposable elements (TEs) in multiple evolutionary processes, ranging from somatic evolution in cancer to putatively adaptive germline evolution across species. Most models of TE population dynamics, however, have not incorporated actual genome sequence data. The effect of site integration preferences of specific TEs on evolutionary outcomes and the effects of different selection regimes on TE dynamics in a specific genome are unknown. We present a stochastic model of LINE-1 (L1) transposition in human cancer. This system was chosen because the transposition of L1 elements is well understood, the population dynamics of cancer tumors has been modeled extensively, and the role of L1 elements in cancer progression has garnered interest in recent years.ResultsOur model predicts that L1 retrotransposition (RT) can play either advantageous or deleterious roles in tumor progression, depending on the initial lesion size, L1 insertion rate and tumor driver genes. Small changes in the RT rate or set of driver tumor-suppressor genes (TSGs) were observed to alter the dynamics of tumorigenesis. We found high variation in the density of L1 target sites across human protein-coding genes. We also present an analysis, across three cancer types, of the frequency of homozygous TSG disruption in wild-type hosts compared to those with an inherited driver allele.Availability and implementationSource code is available at informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 04 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa279
      Issue No: Vol. 36, No. 14 (2020)
  • Deshrinking ridge regression for genome-wide association studies
    • Authors: Wang M; Li R, Xu S, et al.
      Pages: 4154 - 4162
      Abstract: AbstractMotivationGenome-wide association studies (GWAS) are still the primary steps toward gene discovery. The urgency is more obvious in the big data era when GWAS are conducted simultaneously for thousand traits, e.g. transcriptomic and metabolomic traits. Efficient mixed model association (EMMA) and genome-wide efficient mixed model association (GEMMA) are the widely used methods for GWAS. An algorithm with high computational efficiency is badly needed. It is interesting to note that the test statistics of the ordinary ridge regression (ORR) have the same patterns across the genome as those obtained from the EMMA method. However, ORR has never been used for GWAS due to its severe shrinkage on the estimated effects and the test statistics.ResultsWe introduce a degree of freedom for each marker effect obtained from ORR and use it to deshrink both the estimated effect and the standard error so that the Wald test of ORR is brought back to the same level as that of EMMA. The new method is called deshrinking ridge regression (DRR). By evaluating the methods under three different model sizes (small, medium and large), we demonstrate that DRR is more generalized for all model sizes than EMMA, which only works for medium and large models. Furthermore, DRR detect all markers in a simultaneous manner instead of scanning one marker at a time. As a result, the computational time complexity of DRR is much simpler than EMMA and about m (number of genetic variants) times simpler than that of GEMMA when the sample size is way smaller than the number of markers.Contactshizhong.xu@ucr.eduSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 07 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa345
      Issue No: Vol. 36, No. 14 (2020)
  • Boosting the extraction of elementary flux modes in genome-scale metabolic
           networks using the linear programming approach
    • Authors: Guil F; Hidalgo J, García J, et al.
      Pages: 4163 - 4170
      Abstract: AbstractMotivationElementary flux modes (EFMs) are a key tool for analyzing genome-scale metabolic networks, and several methods have been proposed to compute them. Among them, those based on solving linear programming (LP) problems are known to be very efficient if the main interest lies in computing large enough sets of EFMs.ResultsHere, we propose a new method called EFM-Ta that boosts the efficiency rate by analyzing the information provided by the LP solver. We base our method on a further study of the final tableau of the simplex method. By performing additional elementary steps and avoiding trivial solutions consisting of two cycles, we obtain many more EFMs for each LP problem posed, improving the efficiency rate of previously proposed methods by more than one order of magnitude.Availability and implementationSoftware is freely available at informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 10 Jul 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa280
      Issue No: Vol. 36, No. 14 (2020)
  • pepFunk: a tool for peptide-centric functional analysis of metaproteomic
           human gut microbiome studies
    • Authors: Simopoulos C; Ning Z, Zhang X, et al.
      Pages: 4171 - 4179
      Abstract: AbstractMotivationEnzymatic digestion of proteins before mass spectrometry analysis is a key process in metaproteomic workflows. Canonical metaproteomic data processing pipelines typically involve matching spectra produced by the mass spectrometer to a theoretical spectra database, followed by matching the identified peptides back to parent-proteins. However, the nature of enzymatic digestion produces peptides that can be found in multiple proteins due to conservation or chance, presenting difficulties with protein and functional assignment.ResultsTo combat this challenge, we developed pepFunk, a peptide-centric metaproteomic workflow focused on the analysis of human gut microbiome samples. Our workflow includes a curated peptide database annotated with Kyoto Encyclopedia of Genes and Genomes (KEGG) terms and a gene set variation analysis-inspired pathway enrichment adapted for peptide-level data. Analysis using our peptide-centric workflow is fast and highly correlated to a protein-centric analysis, and can identify more enriched KEGG pathways than analysis using protein-level data. Our workflow is open source and available as a web application or source code to be run locally.Availability and implementationpepFunk is available online as a web application at with open-source code available from informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 05 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa289
      Issue No: Vol. 36, No. 14 (2020)
  • HPOLabeler: improving prediction of human protein–phenotype
           associations by learning to rank
    • Authors: Liu L; Huang X, Mamitsuka H, et al.
      Pages: 4180 - 4188
      Abstract: AbstractMotivationAnnotating human proteins by abnormal phenotypes has become an important topic. Human Phenotype Ontology (HPO) is a standardized vocabulary of phenotypic abnormalities encountered in human diseases. As of November 2019, only <4000 proteins have been annotated with HPO. Thus, a computational approach for accurately predicting protein–HPO associations would be important, whereas no methods have outperformed a simple Naive approach in the second Critical Assessment of Functional Annotation, 2013–2014 (CAFA2).ResultsWe present HPOLabeler, which is able to use a wide variety of evidence, such as protein–protein interaction (PPI) networks, Gene Ontology, InterPro, trigram frequency and HPO term frequency, in the framework of learning to rank (LTR). LTR has been proved to be powerful for solving large-scale, multi-label ranking problems in bioinformatics. Given an input protein, LTR outputs the ranked list of HPO terms from a series of input scores given to the candidate HPO terms by component learning models (logistic regression, nearest neighbor and a Naive method), which are trained from given multiple evidence. We empirically evaluate HPOLabeler extensively through mainly two experiments of cross validation and temporal validation, for which HPOLabeler significantly outperformed all component models and competing methods including the current state-of-the-art method. We further found that (i) PPI is most informative for prediction among diverse data sources and (ii) low prediction performance of temporal validation might be caused by incomplete annotation of new proteins.Availability and implementation informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 07 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa284
      Issue No: Vol. 36, No. 14 (2020)
  • Stereo3D: using stereo images to enrich 3D visualization
    • Authors: Liu Y; Singh V, Zheng D, et al.
      Pages: 4189 - 4190
      Abstract: AbstractSummaryVisualization in 3D space is a standard but critical process for examining the complex structure of high-dimensional data. Stereoscopic imaging technology can be adopted to enhance 3D representation of many complex data, especially those consisting of points and lines. We illustrate the simple steps that are involved and strongly recommend others to implement it in designing visualization software. To facilitate its application, we created a new software that can convert a regular 3D scatterplot or network figure to a pair of stereo images.Availability and implementationStereo3D is freely available as an open source R package released under an MIT license at Others can integrate the codes and implement the method in academic software.Contactdeyou.zheng@einsteinmed.orgSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Sat, 16 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa521
      Issue No: Vol. 36, No. 14 (2020)
  • DeepNano-blitz: a fast base caller for MinION nanopore sequencers
    • Authors: Boža V; Perešíni P, Brejová B, et al.
      Pages: 4191 - 4192
      Abstract: AbstractMotivationOxford Nanopore MinION is a portable DNA sequencer that is marketed as a device that can be deployed anywhere. Current base callers, however, require a powerful GPU to analyze data produced by MinION in real time, which hampers field applications.ResultsWe have developed a fast base caller DeepNano-blitz that can analyze stream from up to two MinION runs in real time using a common laptop CPU (i7-7700HQ), with no GPU requirements. The base caller settings allow trading accuracy for speed and the results can be used for real time run monitoring (i.e. sample composition, barcode balance, species identification, etc.) or prefiltering of results for more detailed analysis (i.e. filtering out human DNA from human–pathogen runs).Availability and implementationDeepNano-blitz has been developed and tested on Linux and Intel processors and is available under MIT license at informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Wed, 06 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa297
      Issue No: Vol. 36, No. 14 (2020)
  • ipcoal: an interactive Python package for simulating and analyzing
           genealogies and sequences on a species tree or network
    • Authors: McKenzie P; Eaton D, Schwartz R.
      Pages: 4193 - 4196
      Abstract: AbstractSummaryipcoal is a free and open source Python package for simulating and analyzing genealogies and sequences. It automates the task of describing complex demographic models (e.g. with divergence times, effective population sizes, migration events) to the msprime coalescent simulator by parsing a user-supplied species tree or network. Genealogies, sequences and metadata are returned in tabular format allowing for easy downstream analyses. ipcoal includes phylogenetic inference tools to automate gene tree inference from simulated sequence data, and visualization tools for analyzing results and verifying model accuracy. The ipcoal package is a powerful tool for posterior predictive data analysis, for methods validation and for teaching coalescent methods in an interactive and visual environment.Availability and implementationSource code is available from the GitHub repository ( and is distributed for packaged installation with conda. Complete documentation and interactive notebooks prepared for teaching purposes, including an empirical example, are available at
      PubDate: Tue, 12 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa486
      Issue No: Vol. 36, No. 14 (2020)
  • Capybara: equivalence ClAss enumeration of coPhylogenY event-BAsed
    • Authors: Wang Y; Mary A, Sagot M, et al.
      Pages: 4197 - 4199
      Abstract: AbstractMotivationPhylogenetic tree reconciliation is the method of choice in analyzing host-symbiont systems. Despite the many reconciliation tools that have been proposed in the literature, two main issues remain unresolved: (i) listing suboptimal solutions (i.e. whose score is ‘close’ to the optimal ones) and (ii) listing only solutions that are biologically different ‘enough’. The first issue arises because the optimal solutions are not always the ones biologically most significant; providing many suboptimal solutions as alternatives for the optimal ones is thus very useful. The second one is related to the difficulty to analyze an often huge number of optimal solutions. In this article, we propose Capybara that addresses both of these problems in an efficient way. Furthermore, it includes a tool for visualizing the solutions that significantly helps the user in the process of analyzing the results.Availability and implementationThe source code, documentation and binaries for all platforms are freely available at or blerina.sinaimeri@inria.frSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 18 Jun 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa498
      Issue No: Vol. 36, No. 14 (2020)
  • EasyVS: a user-friendly web-based tool for molecule library selection and
           structure-based virtual screening
    • Authors: Pires D; Veloso W, Myung Y, et al.
      Pages: 4200 - 4202
      Abstract: AbstractSummaryEasyVS is a web-based platform built to simplify molecule library selection and virtual screening. With an intuitive interface, the tool allows users to go from selecting a protein target with a known structure and tailoring a purchasable molecule library to performing and visualizing docking in a few clicks. Our system also allows users to filter screening libraries based on molecule properties, cluster molecules by similarity and personalize docking parameters.Availability and implementationEasyVS is freely available as an easy-to-use web interface at or informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 12 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa480
      Issue No: Vol. 36, No. 14 (2020)
  • PLIDflow: an open-source workflow for the online analysis of
           protein–ligand docking using galaxy
    • Authors: Ulzurrun E; Duarte Y, Perez-Wohlfeil E, et al.
      Pages: 4203 - 4205
      Abstract: AbstractMotivationMolecular docking is aimed at predicting the conformation of small-molecule (ligands) within an identified binding site (BS) in a target protein (receptor). Protein–ligand docking plays an important role in modern drug discovery and biochemistry for protein engineering. However, efficient docking analysis of proteins requires prior knowledge of the BS, which is not always known. The process which covers BS identification and protein–ligand docking usually requires the combination of different programs, which require several input parameters. This is furtherly aggravated when factoring in computational demands, such as CPU-time. Therefore, these types of simulation experiments can become a complex process for researchers without a background in computer sciences.ResultsTo overcome these problems, we have designed an automatic computational workflow (WF) to process protein–ligand complexes, which runs from the identification of the possible BSs positions to the prediction of the experimental binding modes and affinities of the ligand. This open-access WF runs under the Galaxy platform that integrates public domain software. The results of the proposed method are in close agreement with state-of-the-art docking software.Availability and implementationSoftware is available at: informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Sat, 16 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa481
      Issue No: Vol. 36, No. 14 (2020)
  • BioStructures.jl: read, write and manipulate macromolecular structures in
    • Authors: Greener J; Selvaraj J, Ward B, et al.
      Pages: 4206 - 4207
      Abstract: AbstractSummaryRobust, flexible and fast software to read, write and manipulate macromolecular structures is a prerequisite for productively doing structural bioinformatics. We present BioStructures.jl, the first dedicated package in the Julia programming language for dealing with macromolecular structures and the Protein Data Bank. BioStructures.jl builds on the lessons learned with similar packages to provide a large feature set, a flexible object representation and high performance.Availability and implementationBioStructures.jl is freely available under the MIT license. Source code and documentation are available at BioStructures.jl is compatible with Julia versions 0.6 and later and is
      PubDate: Thu, 14 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa502
      Issue No: Vol. 36, No. 14 (2020)
  • ProteinFishing: a protein complex generator within the ModelX toolsuite
    • Authors: Cianferoni D; Radusky L, Head S, et al.
      Pages: 4208 - 4210
      Abstract: AbstractSummaryAccurate 3D modelling of protein–protein interactions (PPI) is essential to compensate for the absence of experimentally determined complex structures. Here, we present a new set of commands within the ModelX toolsuite capable of generating atomic-level protein complexes suitable for interface design. Among these commands, the new tool ProteinFishing proposes known and/or putative alternative 3D PPI for a given protein complex. The algorithm exploits backbone compatibility of protein fragments to generate mutually exclusive protein interfaces that are quickly evaluated with a knowledge-based statistical force field. Using interleukin-10-R2 co-crystalized with interferon-lambda-3, and a database of X-ray structures containing interleukin-10, this algorithm was able to generate interleukin-10-R2/interleukin-10 structural models in agreement with experimental data.Availability and implementationProteinFishing is a portable command-line tool included in the ModelX toolsuite, written in C++, that makes use of an SQL (tested for MySQL and MariaDB) relational database delivered with a template SQL dump called FishXDB. FishXDB contains the empty tables of ModelX fragments and the data used by the embedded statistical force field. ProteinFishing is compiled for Linux-64bit, MacOS-64bit and Windows-32bit operating systems. This software is a proprietary license and is distributed as an executable with its correspondent database dumps. It can be downloaded publicly at Licenses are freely available for academic users after registration on the website and are available under commercial license for for-profit organizations or or luis.serrano@crg.euSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 21 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa533
      Issue No: Vol. 36, No. 14 (2020)
  • GDASC: a GPU parallel-based web server for detecting hidden batch factors
    • Authors: Wang X; Yi H, Wang J, et al.
      Pages: 4211 - 4213
      Abstract: AbstractSummaryWe developed GDASC, a web version of our former DASC algorithm implemented with GPU. It provides a user-friendly web interface for detecting batch factors. Based on the good performance of DASC algorithm, it is able to give the most accurate results. For two steps of DASC, data-adaptive shrinkage and semi-non-negative matrix factorization, we designed parallelization strategies facing convex clustering solution and decomposition process. It runs more than 50 times faster than the original version on the representative RNA sequencing quality control dataset. With its accuracy and high speed, this server will be a useful tool for batch effects analysis.Availability and implementation informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 05 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa427
      Issue No: Vol. 36, No. 14 (2020)
  • ThETA: transcriptome-driven efficacy estimates for gene-based TArget
    • Authors: Failli M; Paananen J, Fortino V, et al.
      Pages: 4214 - 4216
      Abstract: AbstractSummaryEstimating efficacy of gene–target-disease associations is a fundamental step in drug discovery. An important data source for this laborious task is RNA expression, which can provide gene–disease associations on the basis of expression fold change and statistical significance. However, the simply use of the log-fold change can lead to numerous false-positive associations. On the other hand, more sophisticated methods that utilize gene co-expression networks do not consider tissue specificity. Here, we introduce Transcriptome-driven Efficacy estimates for gene-based TArget discovery (ThETA), an R package that enables non-expert users to use novel efficacy scoring methods for drug–target discovery. In particular, ThETA allows users to search for gene perturbation (therapeutics) that reverse disease-gene expression and genes that are closely related to disease-genes in tissue-specific networks. ThETA also provides functions to integrate efficacy evaluations obtained with different approaches and to build an overall efficacy score, which can be used to identify and prioritize gene(target)–disease associations. Finally, ThETA implements visualizations to show tissue-specific interconnections between target and disease-genes, and to indicate biological annotations associated with the top selected genes.Availability and implementationThETA is freely available for academic use at informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 21 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa518
      Issue No: Vol. 36, No. 14 (2020)
  • scTPA: a web tool for single-cell transcriptome analysis of pathway
           activation signatures
    • Authors: Zhang Y; Zhang Y, Hu J, et al.
      Pages: 4217 - 4219
      Abstract: AbstractMotivationAt present, a fundamental challenge in single-cell RNA-sequencing data analysis is functional interpretation and annotation of cell clusters. Biological pathways in distinct cell types have different activation patterns, which facilitates the understanding of cell functions using single-cell transcriptomics. However, no effective web tool has been implemented for single-cell transcriptome data analysis based on prior biological pathway knowledge.ResultsHere, we present scTPA, a web-based platform for pathway-based analysis of single-cell RNA-seq data in human and mouse. scTPA incorporates four widely-used gene set enrichment methods to estimate the pathway activation scores of single cells based on a collection of available biological pathways with different functional and taxonomic classifications. The clustering analysis and cell-type-specific activation pathway identification were provided for the functional interpretation of cell types from a pathway-oriented perspective. An intuitive interface allows users to conveniently visualize and download single-cell pathway signatures. Overall, scTPA is a comprehensive tool for the identification of pathway activation signatures for the analysis of single cell heterogeneity.Availability and implementation or or informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 21 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa532
      Issue No: Vol. 36, No. 14 (2020)
  • Unipept CLI 2.0: adding support for visualizations and functional
    • Authors: Verschaffelt P; Van Thienen P, Van Den Bossche T, et al.
      Pages: 4220 - 4221
      Abstract: AbstractSummaryUnipept is an ecosystem of tools developed for fast metaproteomics data-analysis consisting of a web application, a set of web services (application programming interface, API) and a command-line interface (CLI). After the successful introduction of version 4 of the Unipept web application, we here introduce version 2.0 of the API and CLI. Next to the existing taxonomic analysis, version 2.0 of the API and CLI provides access to Unipept’s powerful functional analysis for metaproteomics samples. The functional analysis pipeline supports retrieval of Enzyme Commission numbers, Gene Ontology terms and InterPro entries for the individual peptides in a metaproteomics sample. This paves the way for other applications and developers to integrate these new information sources into their data processing pipelines, which greatly increases insight into the functions performed by the organisms in a specific environment. Both the API and CLI have also been expanded with the ability to render interactive visualizations from a list of taxon ids. These visualizations are automatically made available on a dedicated website and can easily be shared by users.Availability and implementationThe API is available at Information regarding the CLI can be found at Both interfaces are freely available and open-source under the MIT license.Contactpieter.verschaffelt@ugent.beSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Wed, 03 Jun 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa553
      Issue No: Vol. 36, No. 14 (2020)
  • HiGwas: how to compute longitudinal GWAS data in population designs
    • Authors: Wang Z; Wang N, Wang Z, et al.
      Pages: 4222 - 4224
      Abstract: AbstractSummaryGenome-wide association studies (GWAS), particularly designed with thousands and thousands of single-nucleotide polymorphisms (SNPs) (big p) genotyped on tens of thousands of subjects (small n), are encountered by a major challenge of p ≪ n. Although the integration of longitudinal information can significantly enhance a GWAS’s power to comprehend the genetic architecture of complex traits and diseases, an additional challenge is generated by an autocorrelative process. We have developed several statistical models for addressing these two challenges by implementing dimension reduction methods and longitudinal data analysis. To make these models computationally accessible to applied geneticists, we wrote an R package of computer software, HiGwas, designed to analyze longitudinal GWAS datasets. Functions in the package encompass single SNP analyses, significance-level adjustment, preconditioning and model selection for a high-dimensional set of SNPs. HiGwas provides the estimates of genetic parameters and the confidence intervals of these estimates. We demonstrate the features of HiGwas through real data analysis and vignette document in the package.Availability and implementation informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 05 Jun 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa294
      Issue No: Vol. 36, No. 14 (2020)
  • FastTargetPred: a program enabling the fast prediction of putative protein
           targets for input chemical databases
    • Authors: Chaput L; Guillaume V, Singh N, et al.
      Pages: 4225 - 4226
      Abstract: AbstractSummarySeveral web‐based tools predict the putative targets of a small molecule query compound by similarity to molecules with known bioactivity data using molecular fingerprints. In numerous situations, it would however be valuable to be able to run such computations on a local computer. We present FastTargetPred, a new program for the prediction of protein targets for small molecule queries. Structural similarity computations rely on a large collection of confirmed protein–ligand activities extracted from the curated ChEMBL 25 database. The program allows to annotate an input chemical library of ∼100k compounds within a few hours on a simple personal computer.Availability and implementationFastTargetPred is written in Python 3 (≥3.7) and C languages. Python code depends only on the Python Standard Library. The program can be run on Linux, MacOS and Windows operating systems. Pre-compiled versions are available at FastTargetPred is licensed under the GNU GPLv3. The program calls some scripts from the free chemistry toolkit MayaChemTools.Contactbruno.villoutreix@inserm.frSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Sat, 27 Jun 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa494
      Issue No: Vol. 36, No. 14 (2020)
  • iBioProVis: interactive visualization and analysis of compound bioactivity
    • Authors: Donmez A; Rifaioglu A, Acar A, et al.
      Pages: 4227 - 4230
      Abstract: AbstractSummaryiBioProVis is an interactive tool for visual analysis of the compound bioactivity space in the context of target proteins, drugs and drug candidate compounds. iBioProVis tool takes target protein identifiers and, optionally, compound SMILES as input, and uses the state-of-the-art non-linear dimensionality reduction method t-Distributed Stochastic Neighbor Embedding (t-SNE) to plot the distribution of compounds embedded in a 2D map, based on the similarity of structural properties of compounds and in the context of compounds’ cognate targets. Similar compounds, which are embedded to proximate points on the 2D map, may bind the same or similar target proteins. Thus, iBioProVis can be used to easily observe the structural distribution of one or two target proteins’ known ligands on the 2D compound space, and to infer new binders to the same protein, or to infer new potential target(s) for a compound of interest, based on this distribution. Principal component analysis (PCA) projection of the input compounds is also provided, Hence the user can interactively observe the same compound or a group of selected compounds which is projected by both PCA and embedded by t-SNE. iBioProVis also provides detailed information about drugs and drug candidate compounds through cross-references to widely used and well-known databases, in the form of linked table views. Two use-case studies were demonstrated, one being on angiotensin-converting enzyme 2 (ACE2) protein which is Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Spike protein receptor. ACE2 binding compounds and seven antiviral drugs were closely embedded in which two of them have been under clinical trial for Coronavirus disease 19 (COVID-19).Availability and implementationiBioProVis and its carefully filtered dataset are available at for public informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 14 May 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa496
      Issue No: Vol. 36, No. 14 (2020)
  • RNAIndel: discovering somatic coding indels from tumor RNA-Seq data
    • Authors: Hagiwara K; Ding L, Edmonson M, et al.
      Pages: 4231 - 4231
      Abstract: Bioinformatics (2019) doi: 10.1093/bioinformatics/btz753
      PubDate: Thu, 18 Jun 2020 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btaa247
      Issue No: Vol. 36, No. 14 (2020)
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762

Your IP address:
Home (Search)
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-