Subjects -> LIBRARY AND INFORMATION SCIENCES (Total: 392 journals)
    - DIGITAL CURATION AND PRESERVATION (13 journals)
    - LIBRARY ADMINISTRATION (1 journals)
    - LIBRARY AND INFORMATION SCIENCES (378 journals)

LIBRARY AND INFORMATION SCIENCES (378 journals)                  1 2 | Last

Showing 1 - 200 of 379 Journals sorted alphabetically
027.7 Zeitschrift für Bibliothekskultur / Journal for Library Culture     Open Access   (Followers: 61)
Access     Full-text available via subscription   (Followers: 23)
Acervo : Revista do Arquivo Nacional     Open Access   (Followers: 1)
African Journal of Library, Archives and Information Science     Full-text available via subscription   (Followers: 67)
Against the Grain     Partially Free   (Followers: 119)
AIB Studi     Full-text available via subscription   (Followers: 10)
Alexandría : Revista de Ciencias de la Información     Open Access   (Followers: 11)
Alexandria : The Journal of National and International Library and Information Issues     Full-text available via subscription   (Followers: 55)
Alsic : Apprentissage des Langues et Systèmes d'Information et de Communication     Open Access   (Followers: 12)
American Archivist     Hybrid Journal   (Followers: 129)
American Libraries     Partially Free   (Followers: 187)
Anales de Documentacion     Open Access   (Followers: 13)
Anuari de l'Observatori de Biblioteques, Llibres i Lectura     Open Access   (Followers: 2)
ANZTLA EJournal     Full-text available via subscription  
Archeion Online     Open Access   (Followers: 3)
Archimag     Full-text available via subscription   (Followers: 3)
Archival Science     Hybrid Journal   (Followers: 64)
Archivaria     Open Access   (Followers: 32)
Archives     Full-text available via subscription   (Followers: 6)
Archives and Manuscripts     Hybrid Journal   (Followers: 50)
Archives and Museum Informatics     Hybrid Journal   (Followers: 97)
Ariadne Magazine     Open Access   (Followers: 145)
Art Libraries Journal     Hybrid Journal   (Followers: 10)
Aslib Journal of Information Management     Hybrid Journal   (Followers: 32)
Aslib Proceedings     Hybrid Journal   (Followers: 148)
AtoZ : novas práticas em informação e conhecimento     Open Access  
Australasian Journal of Information Systems     Open Access   (Followers: 16)
Australasian Public Libraries and Information Services     Full-text available via subscription   (Followers: 31)
Australian Academic & Research Libraries     Full-text available via subscription   (Followers: 93)
Australian Library Journal     Full-text available via subscription   (Followers: 146)
Baca : Jurnal Dokumentasi dan Informasi     Open Access   (Followers: 1)
Bangladesh Journal of Library and Information Science     Open Access   (Followers: 44)
Behavioral & Social Sciences Librarian     Hybrid Journal   (Followers: 143)
Berkala Ilmu Perpustakaan dan Informasi     Open Access  
Biblios     Open Access   (Followers: 11)
Biblioteca Escolar em Revista     Open Access  
Biblioteca Universitaria     Open Access   (Followers: 14)
Bibliotecas : Revista de la Escuela de Bibliotecología, Documentación e Información     Open Access   (Followers: 3)
Bibliotecas Universitárias : pesquisas, experiências e perspectivas     Open Access   (Followers: 1)
Bibliotecas. Anales de Investigacion     Open Access  
Biblioteka     Open Access   (Followers: 2)
Biblioteka i Edukacja     Open Access   (Followers: 4)
Bibliotheca Orientalis     Full-text available via subscription   (Followers: 14)
BIBLIOTIKA : Jurnal Kajian Perpustakaan dan Informasi     Open Access  
BIBLOS - Revista do Departamento de Biblioteconomia e História     Open Access   (Followers: 6)
BiD : textos universitaris de biblioteconomia i documentació     Open Access   (Followers: 6)
Bilgi Dünyası     Open Access   (Followers: 5)
Biodiversity Information Science and Standards     Open Access   (Followers: 1)
Bioinformatics     Hybrid Journal   (Followers: 216)
Biuletyn EBIB     Open Access  
Boletín Cultural y Bibliográfico     Open Access   (Followers: 2)
Book History     Full-text available via subscription   (Followers: 113)
Bridgewater Review     Open Access   (Followers: 4)
Bulletin des bibliotheques de France     Full-text available via subscription   (Followers: 7)
Bulletin of the Association for Information Science and Technology     Open Access   (Followers: 22)
Bulletin of the John Rylands Library     Hybrid Journal   (Followers: 21)
Canadian Journal of Academic Librarianship     Open Access   (Followers: 20)
Canadian Journal of Information and Library Science     Full-text available via subscription   (Followers: 245)
Cataloging & Classification Quarterly     Hybrid Journal   (Followers: 169)
CERN IdeaSquare Journal of Experimental Innovation     Open Access  
Children and Libraries : The Journal of the Association for Library Service to Children     Full-text available via subscription   (Followers: 16)
CIC. Cuadernos de Informacion y Comunicacion     Open Access   (Followers: 4)
Ciência da Informação em Revista     Open Access   (Followers: 1)
Code4Lib Journal     Open Access   (Followers: 172)
Collaborative Librarianship     Open Access   (Followers: 52)
Collection and Curation     Hybrid Journal   (Followers: 11)
College & Research Libraries     Open Access   (Followers: 453)
College & Research Libraries News     Partially Free   (Followers: 244)
College & Undergraduate Libraries     Hybrid Journal   (Followers: 220)
Communicate : Journal of Library and Information Science     Full-text available via subscription   (Followers: 63)
Communication Booknotes Quarterly     Hybrid Journal   (Followers: 15)
Communications in Information Literacy     Open Access   (Followers: 194)
Community & Junior College Libraries     Hybrid Journal   (Followers: 42)
Cuadernos de Gestión de Información     Open Access  
Data Curation Profiles Directory     Open Access   (Followers: 5)
Data Technologies and Applications     Hybrid Journal   (Followers: 208)
DESIDOC Journal of Library & Information Technology     Open Access   (Followers: 95)
Digital Library Perspectives     Hybrid Journal   (Followers: 39)
Digital Platform: Information Technologies in Sociocultural Sphere     Open Access   (Followers: 1)
Documentación de las Ciencias de la Información     Open Access  
Documentation et bibliothèques     Full-text available via subscription   (Followers: 9)
e & i Elektrotechnik und Informationstechnik     Hybrid Journal   (Followers: 8)
e-Ciencias de la Información     Open Access   (Followers: 1)
Eastern Librarian     Open Access   (Followers: 11)
Edulib : Journal of Library and Information Science     Open Access   (Followers: 26)
Egyptian Informatics Journal     Open Access   (Followers: 5)
El Profesional de la Informacion     Full-text available via subscription   (Followers: 17)
eLucidate     Open Access   (Followers: 7)
Emerging Library & Information Perspectives     Open Access   (Followers: 29)
Encontros Bibli : revista eletrônica de biblioteconomia e ciência da informação     Open Access   (Followers: 3)
Ethics and Information Technology     Hybrid Journal   (Followers: 64)
European Journal of Information Systems     Hybrid Journal   (Followers: 85)
European Science Editing     Open Access  
Evidence Based Library and Information Practice     Open Access   (Followers: 386)
Florida Libraries     Open Access   (Followers: 1)
Folia Bibliologica     Open Access  
Forensic Science International: Digital Investigation     Full-text available via subscription   (Followers: 317)
Foundations and Trends® in Information Retrieval     Full-text available via subscription   (Followers: 30)
Georgia Library Quarterly     Open Access   (Followers: 21)
Ghana Library Journal     Full-text available via subscription   (Followers: 16)
Global Knowledge, Memory and Communication     Hybrid Journal   (Followers: 806)
GSI Journals Serie C : Advancements in Information Sciences and Technologies     Open Access   (Followers: 1)
Health Information Management Journal     Hybrid Journal   (Followers: 23)
Hipertext.net : Anuario Académico sobre Documentación Digital y Comunicación Interactiva     Open Access  
HLA News     Full-text available via subscription   (Followers: 2)
IASSIST Quarterly     Open Access  
Idaho Librarian     Free   (Followers: 8)
IFLA Journal     Hybrid Journal   (Followers: 218)
In Monte Artium     Full-text available via subscription   (Followers: 1)
In the Library with the Lead Pipe     Open Access   (Followers: 122)
InCID : Revista de Ciência da Informação e Documentação     Open Access  
InCite     Full-text available via subscription   (Followers: 19)
Informaatiotutkimus     Open Access   (Followers: 3)
Informação & Informação     Open Access   (Followers: 2)
Informação em Pauta     Open Access  
Informacijos mokslai     Open Access  
Información, Cultura y Sociedad     Open Access   (Followers: 2)
Informatio. Revista del Instituto de Información de la Facultad de Información y Comunicación     Open Access  
Information     Open Access   (Followers: 30)
Information & Culture : A Journal of History     Full-text available via subscription   (Followers: 31)
Information Discovery and Delivery     Hybrid Journal   (Followers: 43)
Information Manager (The)     Open Access   (Followers: 29)
Information Processing & Management     Hybrid Journal   (Followers: 124)
Information Retrieval     Hybrid Journal   (Followers: 187)
Information Sciences     Hybrid Journal   (Followers: 168)
Information Systems Frontiers     Hybrid Journal   (Followers: 27)
Information Systems Research     Full-text available via subscription   (Followers: 127)
Information Technologies & International Development     Open Access   (Followers: 81)
Information Technologist (The)     Full-text available via subscription   (Followers: 17)
Information Technology and Libraries     Open Access   (Followers: 292)
Information Today     Full-text available via subscription   (Followers: 34)
Informationspraxis     Open Access   (Followers: 12)
Informationswissenschaft : Theorie, Methode und Praxis     Open Access   (Followers: 4)
iNFOTEZY     Open Access  
Insaniyat : Journal of Islam and Humanities     Open Access   (Followers: 1)
Insights : the UKSG journal     Open Access   (Followers: 62)
InterActions: UCLA Journal of Education and Information     Open Access   (Followers: 11)
Interdisciplinary Journal of e-Skills and Lifelong Learning     Open Access   (Followers: 3)
Interdisciplinary Journal of Information, Knowledge, and Management     Open Access   (Followers: 12)
International Association of School Librarianship Conference Proceedings     Open Access  
International Information & Library Review     Hybrid Journal   (Followers: 395)
International Journal of Bibliometrics in Business and Management     Hybrid Journal   (Followers: 2)
International Journal of Business Information Systems     Hybrid Journal   (Followers: 14)
International Journal of Cooperative Information Systems     Hybrid Journal   (Followers: 4)
International Journal of Digital Curation     Open Access   (Followers: 82)
International Journal of Digital Library Systems     Full-text available via subscription   (Followers: 73)
International Journal of Doctoral Studies     Open Access   (Followers: 6)
International Journal of Information and Decision Sciences     Hybrid Journal   (Followers: 10)
International Journal of Information Management     Hybrid Journal   (Followers: 154)
International Journal of Information Privacy, Security and Integrity     Hybrid Journal   (Followers: 25)
International Journal of Information Retrieval Research     Full-text available via subscription   (Followers: 28)
International Journal of Information Science and Management     Open Access   (Followers: 5)
International Journal of Information Technology, Communications and Convergence     Hybrid Journal   (Followers: 14)
International Journal of Information, Diversity, & Inclusion     Open Access   (Followers: 3)
International Journal of Intellectual Property Management     Hybrid Journal   (Followers: 26)
International Journal of Intercultural Information Management     Hybrid Journal   (Followers: 12)
International Journal of Legal Information     Full-text available via subscription   (Followers: 48)
International Journal of Librarianship     Open Access   (Followers: 25)
International Journal of Library and Information Science     Open Access   (Followers: 229)
International Journal of Library Science     Open Access   (Followers: 262)
International Journal of Library Science     Full-text available via subscription   (Followers: 55)
International Journal of Multicriteria Decision Making     Hybrid Journal   (Followers: 8)
International Journal of Multimedia Information Retrieval     Partially Free   (Followers: 8)
International Journal of Organisational Design and Engineering     Hybrid Journal   (Followers: 3)
International Journal of Web Portals     Full-text available via subscription   (Followers: 16)
International Journal on Digital Libraries     Hybrid Journal   (Followers: 544)
InULA Notes : Indiana University Librarians Association     Open Access  
Investigación Bibliotecológica     Open Access   (Followers: 4)
IRIS - Revista de Informação, Memória e Tecnologia     Open Access  
Issues in Informing Science and Information Technology     Open Access   (Followers: 2)
Issues in Science and Technology Librarianship     Open Access   (Followers: 2)
JISTEM : Journal of Information Systems and Technology Management     Open Access   (Followers: 6)
JLIS.it     Open Access   (Followers: 7)
JMIR Medical Informatics     Open Access   (Followers: 9)
Journal of Academic Librarianship     Hybrid Journal   (Followers: 1012)
Journal of Access Services     Hybrid Journal   (Followers: 39)
Journal of Advancements in Library Sciences     Open Access   (Followers: 47)
Journal of Adventist Libraries and Archives     Open Access  
Journal of Altmetrics     Open Access   (Followers: 7)
Journal of Archival Organization     Hybrid Journal   (Followers: 28)
Journal of Copyright in Education & Librarianship     Open Access   (Followers: 29)
Journal of Creative Library Practice     Open Access   (Followers: 98)
Journal of Data Mining and Digital Humanities     Open Access   (Followers: 39)
Journal of Documentation     Hybrid Journal   (Followers: 160)
Journal of East Asian Libraries     Open Access   (Followers: 7)
Journal of Education in Library and Information Science - JELIS     Full-text available via subscription   (Followers: 71)
Journal of Educational Media & Library Sciences     Open Access   (Followers: 9)
Journal of Educational Media, Memory, and Society     Full-text available via subscription   (Followers: 12)
Journal of Electronic Publishing     Open Access   (Followers: 76)
Journal of Electronic Resources Librarianship     Hybrid Journal   (Followers: 225)
Journal of eScience Librarianship     Open Access   (Followers: 113)
Journal of Global Information Management     Full-text available via subscription   (Followers: 9)
Journal of Health & Medical Informatics     Open Access   (Followers: 49)
Journal of Hospital Librarianship     Hybrid Journal   (Followers: 152)
Journal of Information & Knowledge Management     Hybrid Journal   (Followers: 141)
Journal of Information and Data Management     Open Access   (Followers: 14)
Journal of Information Engineering and Applications     Open Access   (Followers: 10)
Journal of Information Literacy     Open Access   (Followers: 773)
Journal of Information Science     Hybrid Journal   (Followers: 1013)
Journal of Information Studies & Technology     Open Access   (Followers: 1)

        1 2 | Last

Similar Journals
Journal Cover
Bioinformatics
Journal Prestige (SJR): 6.14
Citation Impact (citeScore): 8
Number of Followers: 216  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1367-4803 - ISSN (Online) 1460-2059
Published by Oxford University Press Homepage  [419 journals]
  • RegScaf: a regression approach to scaffolding

    • Free pre-print version: Loading...

      Authors: Li M; Li L, Alkan C.
      Pages: 2675 - 2682
      Abstract: MotivationCrucial to the correctness of a genome assembly is the accuracy of the underlying scaffolds that specify the orders and orientations of contigs together with the gap distances between contigs. The current methods construct scaffolds based on the alignments of ‘linking’ reads against contigs. We found that some ‘optimal’ alignments are mistaken due to factors such as the contig boundary effect, particularly in the presence of repeats. Occasionally, the incorrect alignments can even overwhelm the correct ones. The detection of the incorrect linking information is challenging in any existing methods.ResultsIn this study, we present a novel scaffolding method RegScaf. It first examines the distribution of distances between contigs from read alignment by the kernel density. When multiple modes are shown in a density, orientation-supported links are grouped into clusters, each of which defines a linking distance corresponding to a mode. The linear model parameterizes contigs by their positions on the genome; then each linking distance between a pair of contigs is taken as an observation on the difference of their positions. The parameters are estimated by minimizing a global loss function, which is a version of trimmed sum of squares. The least trimmed squares estimate has such a high breakdown value that it can automatically remove the mistaken linking distances. The results on both synthetic and real datasets demonstrate that RegScaf outperforms some popular scaffolders, especially in the accuracy of gap estimates by substantially reducing extremely abnormal errors. Its strength in resolving repeat regions is exemplified by a real case. Its adaptability to large genomes and TGS long reads is validated as well.Availability and implementationRegScaf is publicly available at https://github.com/lemontealala/RegScaf.git.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 25 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac174
      Issue No: Vol. 38, No. 10 (2022)
       
  • Deep learning identifies and quantifies recombination hotspot determinants

    • Free pre-print version: Loading...

      Authors: Li Y; Chen S, Rapakoulia T, et al.
      Pages: 2683 - 2691
      Abstract: MotivationRecombination is one of the essential genetic processes for sexually reproducing organisms, which can happen more frequently in some regions, called recombination hotspots. Although several factors, such as PRDM9 binding motifs, are known to be related to the hotspots, their contributions to the recombination hotspots have not been quantified, and other determinants are yet to be elucidated. Here, we propose a computational method, RHSNet, based on deep learning and signal processing, to identify and quantify the hotspot determinants in a purely data-driven manner, utilizing datasets from various studies, populations, sexes and species.ResultsRHSNet can significantly outperform other sequence-based methods on multiple datasets across different species, sexes and studies. In addition to being able to identify hotspot regions and the well-known determinants accurately, more importantly, RHSNet can quantify the determinants that contribute significantly to the recombination hotspot formation in the relation between PRDM9 binding motif, histone modification and GC content. Further cross-sex, cross-population and cross-species studies suggest that the proposed method has the generalization power and potential to identify and quantify the evolutionary determinant motifs.Availability and implementationhttps://github.com/frankchen121212/RHSNet.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 12 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac234
      Issue No: Vol. 38, No. 10 (2022)
       
  • EDClust: an EM–MM hybrid method for cell clustering in multiple-subject
           single-cell RNA sequencing

    • Free pre-print version: Loading...

      Authors: Wei X; Li Z, Ji H, et al.
      Pages: 2692 - 2699
      Abstract: MotivationSingle-cell RNA sequencing (scRNA-seq) has revolutionized biological research by enabling the measurement of transcriptomic profiles at the single-cell level. With the increasing application of scRNA-seq in larger-scale studies, the problem of appropriately clustering cells emerges when the scRNA-seq data are from multiple subjects. One challenge is the subject-specific variation; systematic heterogeneity from multiple subjects may have a significant impact on clustering accuracy. Existing methods seeking to address such effects suffer from several limitations.ResultsWe develop a novel statistical method, EDClust, for multi-subject scRNA-seq cell clustering. EDClust models the sequence read counts by a mixture of Dirichlet-multinomial distributions and explicitly accounts for cell-type heterogeneity, subject heterogeneity and clustering uncertainty. An EM-MM hybrid algorithm is derived for maximizing the data likelihood and clustering the cells. We perform a series of simulation studies to evaluate the proposed method and demonstrate the outstanding performance of EDClust. Comprehensive benchmarking on four real scRNA-seq datasets with various tissue types and species demonstrates the substantial accuracy improvement of EDClust compared to existing methods.Availability and implementationThe R package is freely available at https://github.com/weix21/EDClust.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 22 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac168
      Issue No: Vol. 38, No. 10 (2022)
       
  • EDGE COVID-19: a web platform to generate submission-ready genomes from
           SARS-CoV-2 sequencing efforts

    • Free pre-print version: Loading...

      Authors: Lo C; Shakya M, Connor R, et al.
      Pages: 2700 - 2704
      Abstract: SummaryGenomics has become an essential technology for surveilling emerging infectious disease outbreaks. A range of technologies and strategies for pathogen genome enrichment and sequencing are being used by laboratories worldwide, together with different and sometimes ad hoc, analytical procedures for generating genome sequences. A fully integrated analytical process for raw sequence to consensus genome determination, suited to outbreaks such as the ongoing COVID-19 pandemic, is critical to provide a solid genomic basis for epidemiological analyses and well-informed decision making. We have developed a web-based platform and integrated bioinformatic workflows that help to provide consistent high-quality analysis of SARS-CoV-2 sequencing data generated with either the Illumina or Oxford Nanopore Technologies (ONT). Using an intuitive web-based interface, this workflow automates data quality control, SARS-CoV-2 reference-based genome variant and consensus calling, lineage determination and provides the ability to submit the consensus sequence and necessary metadata to GenBank, GISAID and INSDC raw data repositories. We tested workflow usability using real world data and validated the accuracy of variant and lineage analysis using several test datasets, and further performed detailed comparisons with results from the COVID-19 Galaxy Project workflow. Our analyses indicate that EC-19 workflows generate high-quality SARS-CoV-2 genomes. Finally, we share a perspective on patterns and impact observed with Illumina versus ONT technologies on workflow congruence and differences.Availability and implementationhttps://edge-covid19.edgebioinformatics.org, and https://github.com/LANL-Bioinformatics/EDGE/tree/SARS-CoV2.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 24 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac176
      Issue No: Vol. 38, No. 10 (2022)
       
  • TransPPMP: predicting pathogenicity of frameshift and non-sense mutations
           by a Transformer based on protein features

    • Free pre-print version: Loading...

      Authors: Nie L; Quan L, Wu T, et al.
      Pages: 2705 - 2711
      Abstract: MotivationProtein structure can be severely disrupted by frameshift and non-sense mutations at specific positions in the protein sequence. Frameshift and non-sense mutation cases can also be found in healthy individuals. A method to distinguish neutral and potentially disease-associated frameshift and non-sense mutations is of practical and fundamental importance. It would allow researchers to rapidly screen out the potentially pathogenic sites from a large number of mutated genes and then use these sites as drug targets to speed up diagnosis and improve access to treatment. The problem of how to distinguish between neutral and potentially disease-associated frameshift and non-sense mutations remains under-researched.ResultsWe built a Transformer-based neural network model to predict the pathogenicity of frameshift and non-sense mutations on protein features and named it TransPPMP. The feature matrix of contextual sequences computed by the ESM pre-training model, type of mutation residue and the auxiliary features, including structure and function information, are combined as input features, and the focal loss function is designed to solve the sample imbalance problem during the training. In 10-fold cross-validation and independent blind test set, TransPPMP showed good robust performance and absolute advantages in all evaluation metrics compared with four other advanced methods, namely, ENTPRISE-X, VEST-indel, DDIG-in and CADD. In addition, we demonstrate the usefulness of the multi-head attention mechanism in Transformer to predict the pathogenicity of mutations—not only can multiple self-attention heads learn local and global interactions but also functional sites with a large influence on the mutated residue can be captured by attention focus. These could offer useful clues to study the pathogenicity mechanism of human complex diseases for which traditional machine learning methods fall short.Availability and implementationTransPPMP is available at https://github.com/lennylv/TransPPMP.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 28 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac188
      Issue No: Vol. 38, No. 10 (2022)
       
  • TPpred-ATMV: therapeutic peptide prediction by adaptive multi-view tensor
           learning model

    • Free pre-print version: Loading...

      Authors: Yan K; Lv H, Guo Y, et al.
      Pages: 2712 - 2718
      Abstract: MotivationTherapeutic peptide prediction is important for the discovery of efficient therapeutic peptides and drug development. Researchers have developed several computational methods to identify different therapeutic peptide types. However, these computational methods focus on identifying some specific types of therapeutic peptides, failing to predict the comprehensive types of therapeutic peptides. Moreover, it is still challenging to utilize different properties to predict the therapeutic peptides.ResultsIn this study, an adaptive multi-view based on the tensor learning framework TPpred-ATMV is proposed for predicting different types of therapeutic peptides. TPpred-ATMV constructs the class and probability information based on various sequence features. We constructed the latent subspace among the multi-view features and constructed an auto-weighted multi-view tensor learning model to utilize the high correlation based on the multi-view features. Experimental results showed that the TPpred-ATMV is better than or highly comparable with the other state-of-the-art methods for predicting eight types of therapeutic peptides.Availability and implementationThe code of TPpred-ATMV is accessed at: https://github.com/cokeyk/TPpred-ATMV.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 07 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac200
      Issue No: Vol. 38, No. 10 (2022)
       
  • TopHap: rapid inference of key phylogenetic structures from common
           haplotypes in large genome collections with limited diversity

    • Free pre-print version: Loading...

      Authors: Caraballo-Ortiz M; Miura S, Sanderford M, et al.
      Pages: 2719 - 2726
      Abstract: MotivationBuilding reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites but millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate and fast phylogenetic inference of resolvable phylogenetic features.ResultsWe present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. We develop a bootstrap strategy that resamples genomes spatiotemporally to assess topological robustness. The application of TopHap to build a phylogeny of 68 057 SARS-CoV-2 genomes (68KG) from the first year of the pandemic produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million SARS-CoV-2 genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major and recent variants of concern.Availability and implementationTopHap is available at https://github.com/SayakaMiura/TopHap.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 24 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac186
      Issue No: Vol. 38, No. 10 (2022)
       
  • Coarse-graining protein structures into their dynamic communities with
           DCI, a dynamic community identifier

    • Free pre-print version: Loading...

      Authors: Kumar A; Khade P, Dorman K, et al.
      Pages: 2727 - 2733
      Abstract: SummaryA new dynamic community identifier (DCI) is presented that relies upon protein residue dynamic cross-correlations generated by Gaussian elastic network models to identify those residue clusters exhibiting motions within a protein. A number of examples of communities are shown for diverse proteins, including GPCRs. It is a tool that can immediately simplify and clarify the most essential functional moving parts of any given protein. Proteins usually can be subdivided into groups of residues that move as communities. These are usually densely packed local sub-structures, but in some cases can be physically distant residues identified to be within the same community. The set of these communities for each protein are the moving parts. The ways in which these are organized overall can aid in understanding many aspects of functional dynamics and allostery. DCI enables a more direct understanding of functions including enzyme activity, action across membranes and changes in the community structure from mutations or ligand binding. The DCI server is freely available on a web site (https://dci.bb.iastate.edu/).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 17 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac159
      Issue No: Vol. 38, No. 10 (2022)
       
  • LPTD: a novel linear programming-based topology determination method for
           cryo-EM maps

    • Free pre-print version: Loading...

      Authors: Behkamal B; Naghibzadeh M, Pagnani A, et al.
      Pages: 2734 - 2741
      Abstract: SummaryTopology determination is one of the most important intermediate steps toward building the atomic structure of proteins from their medium-resolution cryo-electron microscopy (cryo-EM) map. The main goal in the topology determination is to identify correct matches (i.e. assignment and direction) between secondary structure elements (SSEs) (α-helices and β-sheets) detected in a protein sequence and cryo-EM density map. Despite many recent advances in molecular biology technologies, the problem remains a challenging issue. To overcome the problem, this article proposes a linear programming-based topology determination (LPTD) method to solve the secondary structure topology problem in three-dimensional geometrical space. Through modeling of the protein’s sequence with the aid of extracting highly reliable features and a distance-based scoring function, the secondary structure matching problem is transformed into a complete weighted bipartite graph matching problem. Subsequently, an algorithm based on linear programming is developed as a decision-making strategy to extract the true topology (native topology) between all possible topologies. The proposed automatic framework is verified using 12 experimental and 15 simulated α–β proteins. Results demonstrate that LPTD is highly efficient and extremely fast in such a way that for 77% of cases in the dataset, the native topology has been detected in the first rank topology in <2 s. Besides, this method is able to successfully handle large complex proteins with as many as 65 SSEs. Such a large number of SSEs have never been solved with current tools/methods.Availability and implementationThe LPTD package (source code and data) is publicly available at https://github.com/B-Behkamal/LPTD. Moreover, two test samples as well as the instruction of utilizing the graphical user interface have been provided in the shared readme file.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 21 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac170
      Issue No: Vol. 38, No. 10 (2022)
       
  • Impact of protein conformational diversity on AlphaFold predictions

    • Free pre-print version: Loading...

      Authors: Saldaño T; Escobedo N, Marchetti J, et al.
      Pages: 2742 - 2748
      Abstract: MotivationAfter the outstanding breakthrough of AlphaFold in predicting protein 3D models, new questions appeared and remain unanswered. The ensemble nature of proteins, for example, challenges the structural prediction methods because the models should represent a set of conformers instead of single structures. The evolutionary and structural features captured by effective deep learning techniques may unveil the information to generate several diverse conformations from a single sequence. Here, we address the performance of AlphaFold2 predictions obtained through ColabFold under this ensemble paradigm.ResultsUsing a curated collection of apo–holo pairs of conformers, we found that AlphaFold2 predicts the holo form of a protein in ∼70% of the cases, being unable to reproduce the observed conformational diversity with the same error for both conformers. More importantly, we found that AlphaFold2's performance worsens with the increasing conformational diversity of the studied protein. This impairment is related to the heterogeneity in the degree of conformational diversity found between different members of the homologous family of the protein under study. Finally, we found that main-chain flexibility associated with apo–holo pairs of conformers negatively correlates with the predicted local model quality score plDDT, indicating that plDDT values in a single 3D model could be used to infer local conformational changes linked to ligand binding transitions.Availability and implementationData and code used in this manuscript are publicly available at https://gitlab.com/sbgunq/publications/af2confdiv-oct2021.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 05 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac202
      Issue No: Vol. 38, No. 10 (2022)
       
  • Non-negative Independent Factor Analysis disentangles discrete and
           continuous sources of variation in scRNA-seq data

    • Free pre-print version: Loading...

      Authors: Mao W; Pouyan M, Kostka D, et al.
      Pages: 2749 - 2756
      Abstract: MotivationSingle-cell RNA-seq analysis has emerged as a powerful tool for understanding inter-cellular heterogeneity. Due to the inherent noise of the data, computational techniques often rely on dimensionality reduction (DR) as both a pre-processing step and an analysis tool. Ideally, DR should preserve the biological information while discarding the noise. However, if the DR is to be used directly to gain biological insight it must also be interpretable—that is the individual dimensions of the reduction should correspond to specific biological variables such as cell-type identity or pathway activity. Maximizing biological interpretability necessitates making assumption about the data structures and the choice of the model is critical.ResultsWe present a new probabilistic single-cell factor analysis model, Non-negative Independent Factor Analysis (NIFA), that incorporates different interpretability inducing assumptions into a single modeling framework. The key advantage of our NIFA model is that it simultaneously models uni- and multi-modal latent factors, and thus isolates discrete cell-type identity and continuous pathway activity into separate components. We apply our approach to a range of datasets where cell-type identity is known, and we show that NIFA-derived factors outperform results from ICA, PCA, NMF and scCoGAPS (an NMF method designed for single-cell data) in terms of disentangling biological sources of variation. Studying an immunotherapy dataset in detail, we show that NIFA is able to reproduce and refine previous findings in a single analysis framework and enables the discovery of new clinically relevant cell states.Availability and implementationNFIA is a R package which is freely available at GitHub (https://github.com/wgmao/NIFA). The test dataset is archived at https://zenodo.org/record/6286646.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 18 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac136
      Issue No: Vol. 38, No. 10 (2022)
       
  • VIQoR: a web service for visually supervised protein inference and protein
           quantification

    • Free pre-print version: Loading...

      Authors: Tsiamis V; Schwämmle V, Vitek O.
      Pages: 2757 - 2764
      Abstract: MotivationIn quantitative bottom-up mass spectrometry (MS)-based proteomics, the reliable estimation of protein concentration changes from peptide quantifications between different biological samples is essential. This estimation is not a single task but comprises the two processes of protein inference and protein abundance summarization. Furthermore, due to the high complexity of proteomics data and associated uncertainty about the performance of these processes, there is a demand for comprehensive visualization methods able to integrate protein with peptide quantitative data including their post-translational modifications. Hence, there is a lack of a suitable tool that provides post-identification quantitative analysis of proteins with simultaneous interactive visualization.ResultsIn this article, we present VIQoR, a user-friendly web service that accepts peptide quantitative data of both labeled and label-free experiments and accomplishes the crucial components protein inference and summarization and interactive visualization modules, including the novel VIQoR plot. We implemented two different parsimonious algorithms to solve the protein inference problem, while protein summarization is facilitated by a well-established factor analysis algorithm called fast-FARMS followed by a weighted average summarization function that minimizes the effect of missing values. In addition, summarization is optimized by the so-called Global Correlation Indicator (GCI). We test the tool on three publicly available ground truth datasets and demonstrate the ability of the protein inference algorithms to handle shared peptides. We furthermore show that GCI increases the accuracy of the quantitative analysis in datasets with replicated design.Availability and implementationVIQoR is accessible at: http://computproteomics.bmb.sdu.dk/Apps/VIQoR/. The source code is available at: https://bitbucket.org/veitveit/viqor/.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Wed, 23 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac182
      Issue No: Vol. 38, No. 10 (2022)
       
  • BSDE: barycenter single-cell differential expression for
           case–control studies

    • Free pre-print version: Loading...

      Authors: Zhang M; Guo F, Mathelier A.
      Pages: 2765 - 2772
      Abstract: MotivationSingle-cell sequencing brings about a revolutionarily high resolution for finding differentially expressed genes (DEGs) by disentangling highly heterogeneous cell tissues. Yet, such analysis is so far mostly focused on comparing between different cell types from the same individual. As single-cell sequencing becomes cheaper and easier to use, an increasing number of datasets from case–control studies are becoming available, which call for new methods for identifying differential expressions between case and control individuals.ResultsTo bridge this gap, we propose barycenter single-cell differential expression (BSDE), a nonparametric method for finding DEGs for case–control studies. Through the use of optimal transportation for aggregating distributions and computing their distances, our method overcomes the restrictive parametric assumptions imposed by standard mixed-effect-modeling approaches. Through simulations, we show that BSDE can accurately detect a variety of differential expressions while maintaining the type-I error at a prescribed level. Further, 1345 and 1568 cell type-specific DEGs are identified by BSDE from datasets on pulmonary fibrosis and multiple sclerosis, among which the top findings are supported by previous results from the literature.Availability and implementationR package BSDE is freely available from doi.org/10.5281/zenodo.6332254. For real data analysis with the R package, see doi.org/10.5281/zenodo.6332566. These can also be accessed thorough GitHub at github.com/mqzhanglab/BSDE and github.com/mqzhanglab/BSDE_pipeline. The two single-cell sequencing datasets can be download with UCSC cell browser from cells.ucsc.edu/'ds=ms and cells.ucsc.edu/'ds=lung-pf-control.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 25 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac171
      Issue No: Vol. 38, No. 10 (2022)
       
  • Airpart: interpretable statistical models for analyzing allelic imbalance
           in single-cell datasets

    • Free pre-print version: Loading...

      Authors: Mu W; Sarkar H, Srivastava A, et al.
      Pages: 2773 - 2780
      Abstract: MotivationAllelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation, which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial- or time-dependent AI signals may be dampened or not detected.ResultsWe introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing data, or dynamics AI from other spatially or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower Root Mean Square Error (RMSE) of allelic ratio estimates than existing methods. In real data, airpart identified differential allelic imbalance patterns across cell states and could be used to define trends of AI signal over spatial or time axes.Availability and implementationThe airpart package is available as an R/Bioconductor package at https://bioconductor.org/packages/airpart.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Wed, 06 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac212
      Issue No: Vol. 38, No. 10 (2022)
       
  • Driver gene detection through Bayesian network integration of mutation and
           expression profiles

    • Free pre-print version: Loading...

      Authors: Chen Z; Lu Y, Cao B, et al.
      Pages: 2781 - 2790
      Abstract: MotivationThe identification of mutated driver genes and the corresponding pathways is one of the primary goals in understanding tumorigenesis at the patient level. Integration of multi-dimensional genomic data from existing repositories, e.g., The Cancer Genome Atlas (TCGA), offers an effective way to tackle this issue. In this study, we aimed to leverage the complementary genomic information of individuals and create an integrative framework to identify cancer-related driver genes. Specifically, based on pinpointed differentially expressed genes, variants in somatic mutations and a gene interaction network, we proposed an unsupervised Bayesian network integration (BNI) method to detect driver genes and estimate the disease propagation at the patient and/or cohort levels. This new method first captures inherent structural information to construct a functional gene mutation network and then extracts the driver genes and their controlled downstream modules using the minimum cover subset method.ResultsUsing other credible sources (e.g. Cancer Gene Census and Network of Cancer Genes), we validated the driver genes predicted by the BNI method in three TCGA pan-cancer cohorts. The proposed method provides an effective approach to address tumor heterogeneity faced by personalized medicine. The pinpointed drivers warrant further wet laboratory validation.Availability and implementationThe supplementary tables and source code can be obtained from https://xavieruniversityoflouisiana.sharefile.com/d-se6df2c8d0ebe4800a3030311efddafe5.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 07 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac203
      Issue No: Vol. 38, No. 10 (2022)
       
  • BFF and cellhashR: analysis tools for accurate demultiplexing of cell
           hashing data

    • Free pre-print version: Loading...

      Authors: Boggy G; McElfresh G, Mahyari E, et al.
      Pages: 2791 - 2801
      Abstract: MotivationSingle-cell sequencing methods provide previously impossible resolution into the transcriptome of individual cells. Cell hashing reduces single-cell sequencing costs by increasing capacity on droplet-based platforms. Cell hashing methods rely on demultiplexing algorithms to accurately classify droplets; however, assumptions underlying these algorithms limit accuracy of demultiplexing, ultimately impacting the quality of single-cell sequencing analyses.ResultsWe present Bimodal Flexible Fitting (BFF) demultiplexing algorithms BFFcluster and BFFraw, a novel class of algorithms that rely on the single inviolable assumption that barcode count distributions are bimodal. We integrated these and other algorithms into cellhashR, a new R package that provides integrated QC and a single command to execute and compare multiple demultiplexing algorithms. We demonstrate that BFFcluster demultiplexing is both tunable and insensitive to issues with poorly behaved data that can confound other algorithms. Using two well-characterized reference datasets, we demonstrate that demultiplexing with BFF algorithms is accurate and consistent for both well-behaved and poorly behaved input data.Availability and implementationcellhashR is available as an R package at https://github.com/BimberLab/cellhashR. cellhashR version 1.0.3 was used for the analyses in this manuscript and is archived on Zenodo at https://www.doi.org/10.5281/zenodo.6402477.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 08 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac213
      Issue No: Vol. 38, No. 10 (2022)
       
  • Fast and accurate inference of gene regulatory networks through robust
           precision matrix estimation

    • Free pre-print version: Loading...

      Authors: Passemiers A; Moreau Y, Raimondi D, et al.
      Pages: 2802 - 2809
      Abstract: MotivationTranscriptional regulation mechanisms allow cells to adapt and respond to external stimuli by altering gene expression. The possible cell transcriptional states are determined by the underlying gene regulatory network (GRN), and reliably inferring such network would be invaluable to understand biological processes and disease progression.ResultsIn this article, we present a novel method for the inference of GRNs, called PORTIA, which is based on robust precision matrix estimation, and we show that it positively compares with state-of-the-art methods while being orders of magnitude faster. We extensively validated PORTIA using the DREAM and MERLIN+P datasets as benchmarks. In addition, we propose a novel scoring metric that builds on graph-theoretical concepts.Availability and implementationThe code and instructions for data acquisition and full reproduction of our results are available at https://github.com/AntoinePassemiers/PORTIA-Manuscript. PORTIA is available on PyPI as a Python package (portia-grn).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Wed, 23 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac178
      Issue No: Vol. 38, No. 10 (2022)
       
  • A comprehensive evaluation of regression-based drug responsiveness
           prediction models, using cell viability inhibitory concentrations (IC50
           values)

    • Free pre-print version: Loading...

      Authors: Park A; Joo M, Kim K, et al.
      Pages: 2810 - 2817
      Abstract: MotivationPredicting drug response is critical for precision medicine. Diverse methods have predicted drug responsiveness, as measured by the half-maximal drug inhibitory concentration (IC50), in cultured cells. Although IC50s are continuous, traditional prediction models have dealt mainly with binary classification of responsiveness. However, since there are few regression-based IC50 predictions, comprehensive evaluations of regression-based IC50 prediction models, including machine learning (ML) and deep learning (DL), for diverse data types and dataset sizes, have not been addressed.ResultsHere, we constructed 11 input data settings, including multi-omics settings, with varying dataset sizes, then evaluated the performance of regression-based ML and DL models to predict IC50s. DL models considered two convolutional neural network architectures: CDRScan and residual neural network (ResNet). ResNet was introduced in regression-based DL models for predicting drug response for the first time. As a result, DL models performed better than ML models in all the settings. Also, ResNet performed better than or comparable to CDRScan and ML models in all settings.Availability and implementationThe data underlying this article are available in GitHub at https://github.com/labnams/IC50evaluation.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Wed, 23 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac177
      Issue No: Vol. 38, No. 10 (2022)
       
  • Overcoming biases in causal inference of molecular interactions

    • Free pre-print version: Loading...

      Authors: Kumar S; Song M, Robinson P.
      Pages: 2818 - 2825
      Abstract: MotivationComputer inference of biological mechanisms is increasingly approachable due to dynamically rich data sources such as single-cell genomics. Inferred molecular interactions can prioritize hypotheses for wet-lab experiments to expedite biological discovery. However, complex data often come with unwanted biological or technical variations, exposing biases over marginal distribution and sample size in current methods to favor spurious causal relationships.ResultsConsidering function direction and strength as evidence for causality, we present an adapted functional chi-squared test (AdpFunChisq) that rewards functional patterns over non-functional or independent patterns. On synthetic and three biology datasets, we demonstrate the advantages of AdpFunChisq over 10 methods on overcoming biases that give rise to wide fluctuations in the performance of alternative approaches. On single-cell multiomics data of multiple phenotype acute leukemia, we found that the T-cell surface glycoprotein CD3 delta chain may causally mediate specific genes in the viral carcinogenesis pathway. Using the causality-by-functionality principle, AdpFunChisq offers a viable option for robust causal inference in dynamical systems.Availability and implementationThe AdpFunChisq test is implemented in the R package ‘FunChisq’ (2.5.2 or above) at https://cran.r-project.org/package=FunChisq. All other source code along with pre-processed data is available at Code Ocean https://doi.org/10.24433/CO.2907738.v1Supplementary informationSupplementary materialsSupplementary materials are available at Bioinformatics online.
      PubDate: Wed, 06 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac206
      Issue No: Vol. 38, No. 10 (2022)
       
  • Relational graph convolutional networks for predicting blood–brain
           barrier penetration of drug molecules

    • Free pre-print version: Loading...

      Authors: Ding Y; Jiang X, Kim Y, et al.
      Pages: 2826 - 2831
      Abstract: MotivationEvaluating the blood–brain barrier (BBB) permeability of drug molecules is a critical step in brain drug development. Traditional methods for the evaluation require complicated in vitro or in vivo testing. Alternatively, in silico predictions based on machine learning have proved to be a cost-efficient way to complement the in vitro and in vivo methods. However, the performance of the established models has been limited by their incapability of dealing with the interactions between drugs and proteins, which play an important role in the mechanism behind the BBB penetrating behaviors. To address this limitation, we employed the relational graph convolutional network (RGCN) to handle the drug–protein interactions as well as the properties of each individual drug.ResultsThe RGCN model achieved an overall accuracy of 0.872, an area under the receiver operating characteristic (AUROC) of 0.919 and an area under the precision-recall curve (AUPRC) of 0.838 for the testing dataset with the drug–protein interactions and the Mordred descriptors as the input. Introducing drug–drug similarity to connect structurally similar drugs in the data graph further improved the testing results, giving an overall accuracy of 0.876, an AUROC of 0.926 and an AUPRC of 0.865. In particular, the RGCN model was found to greatly outperform the LightGBM base model when evaluated with the drugs whose BBB penetration was dependent on drug–protein interactions. Our model is expected to provide high-confidence predictions of BBB permeability for drug prioritization in the experimental screening of BBB-penetrating drugs.Availability and implementationThe data and the codes are freely available at https://github.com/dingyan20/BBB-Penetration-Prediction.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 07 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac211
      Issue No: Vol. 38, No. 10 (2022)
       
  • Boost-RS: boosted embeddings for recommender systems and its application
           to enzyme–substrate interaction prediction

    • Free pre-print version: Loading...

      Authors: Li X; Liu L, Hassoun S, et al.
      Pages: 2832 - 2838
      Abstract: MotivationDespite experimental and curation efforts, the extent of enzyme promiscuity on substrates continues to be largely unexplored and under documented. Providing computational tools for the exploration of the enzyme–substrate interaction space can expedite experimentation and benefit applications such as constructing synthesis pathways for novel biomolecules, identifying products of metabolism on ingested compounds, and elucidating xenobiotic metabolism. Recommender systems (RS), which are currently unexplored for the enzyme–substrate interaction prediction problem, can be utilized to provide enzyme recommendations for substrates, and vice versa. The performance of Collaborative-Filtering (CF) RSs; however, hinges on the quality of embedding vectors of users and items (enzymes and substrates in our case). Importantly, enhancing CF embeddings with heterogeneous auxiliary data, specially relational data (e.g. hierarchical, pairwise or groupings), remains a challenge.ResultsWe propose an innovative general RS framework, termed Boost-RS that enhances RS performance by ‘boosting’ embedding vectors through auxiliary data. Specifically, Boost-RS is trained and dynamically tuned on multiple relevant auxiliary learning tasks Boost-RS utilizes contrastive learning tasks to exploit relational data. To show the efficacy of Boost-RS for the enzyme–substrate prediction interaction problem, we apply the Boost-RS framework to several baseline CF models. We show that each of our auxiliary tasks boosts learning of the embedding vectors, and that contrastive learning using Boost-RS outperforms attribute concatenation and multi-label learning. We also show that Boost-RS outperforms similarity-based models. Ablation studies and visualization of learned representations highlight the importance of using contrastive learning on some of the auxiliary data in boosting the embedding vectors.Availability and implementationA Python implementation for Boost-RS is provided at https://github.com/HassounLab/Boost-RS. The enzyme-substrate interaction data is available from the KEGG database (https://www.genome.jp/kegg/).
      PubDate: Tue, 12 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac201
      Issue No: Vol. 38, No. 10 (2022)
       
  • TRANSDIRE: data-driven direct reprogramming by a pioneer factor-guided
           trans-omics approach

    • Free pre-print version: Loading...

      Authors: Eguchi R; Hamano M, Iwata M, et al.
      Pages: 2839 - 2846
      Abstract: MotivationDirect reprogramming involves the direct conversion of fully differentiated mature cell types into various other cell types while bypassing an intermediate pluripotent state (e.g. induced pluripotent stem cells). Cell differentiation by direct reprogramming is determined by two types of transcription factors (TFs): pioneer factors (PFs) and cooperative TFs. PFs have the distinct ability to open chromatin aggregations, assemble a collective of cooperative TFs and activate gene expression. The experimental determination of two types of TFs is extremely difficult and costly.ResultsIn this study, we developed a novel computational method, TRANSDIRE (TRANS-omics-based approach for DIrect REprogramming), to predict the TFs that induce direct reprogramming in various human cell types using multiple omics data. In the algorithm, potential PFs were predicted based on low signal chromatin regions, and the cooperative TFs were predicted through a trans-omics analysis of genomic data (e.g. enhancers), transcriptome data (e.g. gene expression profiles in human cells), epigenome data (e.g. chromatin immunoprecipitation sequencing data) and interactome data. We applied the proposed methods to the reconstruction of TFs that induce direct reprogramming from fibroblasts to six other cell types: hepatocytes, cartilaginous cells, neurons, cardiomyocytes, pancreatic cells and Paneth cells. We demonstrated that the methods successfully predicted TFs for most cell conversions with high accuracy. Thus, the proposed methods are expected to be useful for various practical applications in regenerative medicine.Availability and implementationThe source code and data are available at the following website: http://figshare.com/s/b653781a5b9e6639972bSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 12 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac209
      Issue No: Vol. 38, No. 10 (2022)
       
  • Supervised graph co-contrastive learning for drug–target interaction
           prediction

    • Free pre-print version: Loading...

      Authors: Li Y; Qiao G, Gao X, et al.
      Pages: 2847 - 2854
      Abstract: MotivationIdentification of Drug–Target Interactions (DTIs) is an essential step in drug discovery and repositioning. DTI prediction based on biological experiments is time-consuming and expensive. In recent years, graph learning-based methods have aroused widespread interest and shown certain advantages on this task, where the DTI prediction is often modeled as a binary classification problem of the nodes composed of drug and protein pairs (DPPs). Nevertheless, in many real applications, labeled data are very limited and expensive to obtain. With only a few thousand labeled data, models could hardly recognize comprehensive patterns of DPP node representations, and are unable to capture enough commonsense knowledge, which is required in DTI prediction. Supervised contrastive learning gives an aligned representation of DPP node representations with the same class label. In embedding space, DPP node representations with the same label are pulled together, and those with different labels are pushed apart.ResultsWe propose an end-to-end supervised graph co-contrastive learning model for DTI prediction directly from heterogeneous networks. By contrasting the topology structures and semantic features of the drug–protein-pair network, as well as the new selection strategy of positive and negative samples, SGCL-DTI generates a contrastive loss to guide the model optimization in a supervised manner. Comprehensive experiments on three public datasets demonstrate that our model outperforms the SOTA methods significantly on the task of DTI prediction, especially in the case of cold start. Furthermore, SGCL-DTI provides a new research perspective of contrastive learning for DTI prediction.Availability and implementationThe research shows that this method has certain applicability in the discovery of drugs, the identification of drug–target pairs and so on.
      PubDate: Mon, 21 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac164
      Issue No: Vol. 38, No. 10 (2022)
       
  • Network-based cancer heterogeneity analysis incorporating multi-view of
           prior information

    • Free pre-print version: Loading...

      Authors: Li Y; Xu S, Ma S, et al.
      Pages: 2855 - 2862
      Abstract: MotivationCancer genetic heterogeneity analysis has critical implications for tumour classification, response to therapy and choice of biomarkers to guide personalized cancer medicine. However, existing heterogeneity analysis based solely on molecular profiling data usually suffers from a lack of information and has limited effectiveness. Many biomedical and life sciences databases have accumulated a substantial volume of meaningful biological information. They can provide additional information beyond molecular profiling data, yet pose challenges arising from potential noise and uncertainty.ResultsIn this study, we aim to develop a more effective heterogeneity analysis method with the help of prior information. A network-based penalization technique is proposed to innovatively incorporate a multi-view of prior information from multiple databases, which accommodates heterogeneity attributed to both differential genes and gene relationships. To account for the fact that the prior information might not be fully credible, we propose a weighted strategy, where the weight is determined dependent on the data and can ensure that the present model is not excessively disturbed by incorrect information. Simulation and analysis of The Cancer Genome Atlas glioblastoma multiforme data demonstrate the practical applicability of the proposed method.Availability and implementationR code implementing the proposed method is available at https://github.com/mengyunwu2020/PECM. The data that support the findings in this paper are openly available in TCGA (The Cancer Genome Atlas) at https://portal.gdc.cancer.gov/.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Wed, 23 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac183
      Issue No: Vol. 38, No. 10 (2022)
       
  • Interpretable-ADMET: a web service for ADMET prediction and optimization
           based on deep neural representation

    • Free pre-print version: Loading...

      Authors: Wei Y; Li S, Li Z, et al.
      Pages: 2863 - 2871
      Abstract: MotivationIn the process of discovery and optimization of lead compounds, it is difficult for non-expert pharmacologists to intuitively determine the contribution of substructure to a particular property of a molecule.ResultsIn this work, we develop a user-friendly web service, named interpretable-absorption, distribution, metabolism, excretion and toxicity (ADMET), which predict 59 ADMET-associated properties using 90 qualitative classification models and 28 quantitative regression models based on graph convolutional neural network and graph attention network algorithms. In interpretable-ADMET, there are 250 729 entries associated with 59 kinds of ADMET-associated properties for 80 167 chemical compounds. In addition to making predictions, interpretable-ADMET provides interpretation models based on gradient-weighted class activation map for identifying the substructure, which is important to the particular property. Interpretable-ADMET also provides an optimize module to automatically generate a set of novel virtual candidates based on matched molecular pair rules. We believe that interpretable-ADMET could serve as a useful tool for lead optimization in drug discovery.Availability and implementationInterpretable-ADMET is available at http://cadd.pharmacy.nankai.edu.cn/interpretableadmet/.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 29 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac192
      Issue No: Vol. 38, No. 10 (2022)
       
  • Improving confidence in lipidomic annotations by incorporating empirical
           ion mobility regression analysis and chemical class prediction

    • Free pre-print version: Loading...

      Authors: Rose B; May J, Picache J, et al.
      Pages: 2872 - 2879
      Abstract: MotivationMass spectrometry-based untargeted lipidomics aims to globally characterize the lipids and lipid-like molecules in biological systems. Ion mobility increases coverage and confidence by offering an additional dimension of separation and a highly reproducible metric for feature annotation, the collision cross-section (CCS).ResultsWe present a data processing workflow to increase confidence in molecular class annotations based on CCS values. This approach uses class-specific regression models built from a standardized CCS repository (the Unified CCS Compendium) in a parallel scheme that combines a new annotation filtering approach with a machine learning class prediction strategy. In a proof-of-concept study using murine brain lipid extracts, 883 lipids were assigned higher confidence identifications using the filtering approach, which reduced the tentative candidate lists by over 50% on average. An additional 192 unannotated compounds were assigned a predicted chemical class.Availability and implementationAll relevant source code is available at https://github.com/McLeanResearchGroup/CCS-filter.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 31 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac197
      Issue No: Vol. 38, No. 10 (2022)
       
  • Design and application of a knowledge network for automatic prioritization
           of drug mechanisms

    • Free pre-print version: Loading...

      Authors: Mayers M; Tu R, Steinecke D, et al.
      Pages: 2880 - 2891
      Abstract: MotivationDrug repositioning is an attractive alternative to de novo drug discovery due to reduced time and costs to bring drugs to market. Computational repositioning methods, particularly non-black-box methods that can account for and predict a drug’s mechanism, may provide great benefit for directing future development. By tuning both data and algorithm to utilize relationships important to drug mechanisms, a computational repositioning algorithm can be trained to both predict and explain mechanistically novel indications.ResultsIn this work, we examined the 123 curated drug mechanism paths found in the drug mechanism database (DrugMechDB) and after identifying the most important relationships, we integrated 18 data sources to produce a heterogeneous knowledge graph, MechRepoNet, capable of capturing the information in these paths. We applied the Rephetio repurposing algorithm to MechRepoNet using only a subset of relationships known to be mechanistic in nature and found adequate predictive ability on an evaluation set with AUROC value of 0.83. The resulting repurposing model allowed us to prioritize paths in our knowledge graph to produce a predicted treatment mechanism. We found that DrugMechDB paths, when present in the network were rated highly among predicted mechanisms. We then demonstrated MechRepoNet’s ability to use mechanistic insight to identify a drug’s mechanistic target, with a mean reciprocal rank of 0.525 on a test set of known drug–target interactions. Finally, we walked through repurposing examples of the anti-cancer drug imatinib for use in the treatment of asthma, and metolazone for use in the treatment of osteoporosis, to demonstrate this method’s utility in providing mechanistic insight into repurposing predictions it provides.Availability and implementationThe Python code to reproduce the entirety of this analysis is available at: https://github.com/SuLab/MechRepoNet (archived at https://doi.org/10.5281/zenodo.6456335).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Wed, 06 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac205
      Issue No: Vol. 38, No. 10 (2022)
       
  • Context-aware learning for cancer cell nucleus recognition in pathology
           images

    • Free pre-print version: Loading...

      Authors: Bai T; Xu J, Zhang Z, et al.
      Pages: 2892 - 2898
      Abstract: MotivationNucleus identification supports many quantitative analysis studies that rely on nuclei positions or categories. Contextual information in pathology images refers to information near the to-be-recognized cell, which can be very helpful for nucleus subtyping. Current CNN-based methods do not explicitly encode contextual information within the input images and point annotations.ResultsIn this article, we propose a novel framework with context to locate and classify nuclei in microscopy image data. Specifically, first we use state-of-the-art network architectures to extract multi-scale feature representations from multi-field-of-view, multi-resolution input images and then conduct feature aggregation on-the-fly with stacked convolutional operations. Then, two auxiliary tasks are added to the model to effectively utilize the contextual information. One for predicting the frequencies of nuclei, and the other for extracting the regional distribution information of the same kind of nuclei. The entire framework is trained in an end-to-end, pixel-to-pixel fashion. We evaluate our method on two histopathological image datasets with different tissue and stain preparations, and experimental results demonstrate that our method outperforms other recent state-of-the-art models in nucleus identification.Availability and implementationThe source code of our method is freely available at https://github.com/qjxjy123/DonRabbit.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 21 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac167
      Issue No: Vol. 38, No. 10 (2022)
       
  • Annotating regulatory elements by heterogeneous network embedding

    • Free pre-print version: Loading...

      Authors: Lu Y; Feng Z, Zhang S, et al.
      Pages: 2899 - 2911
      Abstract: MotivationRegulatory elements (REs), such as enhancers and promoters, are known as regulatory sequences functional in a heterogeneous regulatory network to control gene expression by recruiting transcription regulators and carrying genetic variants in a context specific way. Annotating those REs relies on costly and labor-intensive next-generation sequencing and RNA-guided editing technologies in many cellular contexts.ResultsWe propose a systematic Gene Ontology Annotation method for Regulatory Elements (RE-GOA) by leveraging the powerful word embedding in natural language processing. We first assemble a heterogeneous network by integrating context specific regulations, protein–protein interactions and gene ontology (GO) terms. Then we perform network embedding and associate regulatory elements with GO terms by assessing their similarity in a low dimensional vector space. With three applications, we show that RE-GOA outperforms existing methods in annotating TFs’ binding sites from ChIP-seq data, in functional enrichment analysis of differentially accessible peaks from ATAC-seq data, and in revealing genetic correlation among phenotypes from their GWAS summary statistics data.Availability and implementationThe source code and the systematic RE annotation for human and mouse are available at https://github.com/AMSSwanglab/RE-GOA.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 24 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac185
      Issue No: Vol. 38, No. 10 (2022)
       
  • SimSCSnTree: a simulator of single-cell DNA sequencing data

    • Free pre-print version: Loading...

      Authors: Mallory X; Nakhleh L, Robinson P.
      Pages: 2912 - 2914
      Abstract: SummaryWe report on a new single-cell DNA sequence simulator, SimSCSnTree, which generates an evolutionary tree of cells and evolves single nucleotide variants (SNVs) and copy number aberrations (CNAs) along its branches. Data generated by the simulator can be used to benchmark tools for single-cell genomic analyses, particularly in cancer where SNVs and CNAs are ubiquitous.Availability and implementationSimSCSnTree is now on BioConda and also is freely available for download at https://github.com/compbiofan/SimSCSnTree.git with detailed documentation.
      PubDate: Mon, 21 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac169
      Issue No: Vol. 38, No. 10 (2022)
       
  • Integrative analysis of relative abundance data and presence–absence
           data of the microbiome using the LDM

    • Free pre-print version: Loading...

      Authors: Zhu Z; Satten G, Hu Y, et al.
      Pages: 2915 - 2917
      Abstract: SummaryWe previously developed the LDM for testing hypotheses about the microbiome that performs the test at both the community level and the individual taxon level. The LDM can be applied to relative abundance data and presence–absence data separately, which work well when associated taxa are abundant and rare, respectively. Here, we propose LDM-omni3 that combines LDM analyses at the relative abundance and presence–absence data scales, thereby offering optimal power across scenarios with different association mechanisms. The new LDM-omni3 test is available for the wide range of data types and analyses that are supported by the LDM.Availability and implementationThe LDM-omni3 test has been added to the R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 25 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac181
      Issue No: Vol. 38, No. 10 (2022)
       
  • InParanoid-DIAMOND: faster orthology analysis with the InParanoid
           algorithm

    • Free pre-print version: Loading...

      Authors: Persson E; Sonnhammer E, Marschall T.
      Pages: 2918 - 2919
      Abstract: SummaryPredicting orthologs, genes in different species having shared ancestry, is an important task in bioinformatics. Orthology prediction tools are required to make accurate and fast predictions, in order to analyze large amounts of data within a feasible time frame. InParanoid is a well-known algorithm for orthology analysis, shown to perform well in benchmarks, but having the major limitation of long runtimes on large datasets. Here, we present an update to the InParanoid algorithm that can use the faster tool DIAMOND instead of BLAST for the homolog search step. We show that it reduces the runtime by 94%, while still obtaining similar performance in the Quest for Orthologs benchmark.Availability and implementationThe source code is available at (https://bitbucket.org/sonnhammergroup/inparanoid).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 31 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac194
      Issue No: Vol. 38, No. 10 (2022)
       
  • SASpector: analysis of missing genomic regions in draft genomes of
           prokaryotes

    • Free pre-print version: Loading...

      Authors: Lood C; Correa Rojo A, Sinar D, et al.
      Pages: 2920 - 2921
      Abstract: SummaryMissing regions in short-read assemblies of prokaryote genomes are often attributed to biases in sequencing technologies and to repetitive elements, the former resulting in low sequencing coverage of certain loci and the latter to unresolved loops in the de novo assembly graph. We developed SASpector, a command-line tool that compares short-read assemblies (draft genomes) to their corresponding closed assemblies and extracts missing regions to analyze them at the sequence and functional level. SASpector allows to benchmark the need for resolved genomes, can be integrated into pipelines to control the quality of assemblies, and could be used for comparative investigations of missingness in assemblies for which both short-read and long-read data are available in the public databases.Availability and implementationSASpector is available at https://github.com/LoGT-KULeuven/SASpector. The tool is implemented in Python3 and available through pip and Docker (0mician/saspector).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Wed, 06 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac208
      Issue No: Vol. 38, No. 10 (2022)
       
  • plotsr: visualizing structural similarities and rearrangements between
           multiple genomes

    • Free pre-print version: Loading...

      Authors: Goel M; Schneeberger K, Robinson P.
      Pages: 2922 - 2926
      Abstract: SummaryThird-generation genome sequencing technologies have led to a sharp increase in the number of high-quality genome assemblies. This allows the comparison of multiple assembled genomes of individual species and demands new tools for visualizing their structural properties. Here, we present plotsr, an efficient tool to visualize structural similarities and rearrangements between genomes. It can be used to compare genomes on chromosome level or to zoom in on any selected region. In addition, plotsr can augment the visualization with regional identifiers (e.g. genes or genomic markers) or histogram tracks for continuous features (e.g. GC content or polymorphism density).Availability and implementationplotsr is implemented as a python package and uses the standard matplotlib library for plotting. It is freely available under the MIT license at GitHub (https://github.com/schneebergerlab/plotsr) and bioconda (https://anaconda.org/bioconda/plotsr).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 15 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac196
      Issue No: Vol. 38, No. 10 (2022)
       
  • PacRAT: a program to improve barcode-variant mapping from PacBio long
           reads using multiple sequence alignment

    • Free pre-print version: Loading...

      Authors: Yeh C; Amorosi C, Showman S, et al.
      Pages: 2927 - 2929
      Abstract: SummaryUse of PacBio sequencing for characterizing barcoded libraries of genetic variants is on the rise. However, current approaches in resolving PacBio sequencing artifacts can result in a high number of incorrectly identified or unusable reads. Here, we developed a PacBio Read Alignment Tool (PacRAT) that improves the accuracy of barcode-variant mapping through several steps of read alignment and consensus calling. To quantify the performance of our approach, we simulated PacBio reads from eight variant libraries of various lengths and showed that PacRAT improves the accuracy in pairing barcodes and variants across these libraries. Analysis of real (non-simulated) libraries also showed an increase in the number of reads that can be used for downstream analyses when using PacRAT.Availability and implementationPacRAT is written in Python and is freely available (https://github.com/dunhamlab/PacRAT).Supplementary informationSupplemental dataSupplemental data are available at Bioinformatics online.
      PubDate: Mon, 21 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac165
      Issue No: Vol. 38, No. 10 (2022)
       
  • MetaSquare: an integrated metadatabase of 16S rRNA gene amplicon for
           microbiome taxonomic classification

    • Free pre-print version: Loading...

      Authors: Liao C; Fu P, Huang C, et al.
      Pages: 2930 - 2931
      Abstract: MotivationTaxonomic classification of 16S ribosomal RNA gene amplicon is an efficient and economic approach in microbiome analysis. 16S rRNA sequence databases like SILVA, RDP, EzBioCloud and HOMD used in downstream bioinformatic pipelines have limitations on either the sequence redundancy or the delay on new sequence recruitment. To improve the 16S rRNA gene-based taxonomic classification, we merged these widely used databases and a collection of novel sequences systemically into an integrated resource.ResultsMetaSquare version 1.0 is an integrated 16S rRNA sequence database. It is composed of more than 6 million sequences and improves taxonomic classification resolution on both long-read and short-read methods.Availability and implementationAccessible at https://hub.docker.com/r/lsbnb/metasquare_db and https://github.com/lsbnb/MetaSquareSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Wed, 23 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac184
      Issue No: Vol. 38, No. 10 (2022)
       
  • RabbitV: fast detection of viruses and microorganisms in sequencing data
           on multi-core architectures

    • Free pre-print version: Loading...

      Authors: Zhang H; Chang Q, Yin Z, et al.
      Pages: 2932 - 2933
      Abstract: MotivationDetection and identification of viruses and microorganisms in sequencing data plays an important role in pathogen diagnosis and research. However, existing tools for this problem often suffer from high runtimes and memory consumption.ResultsWe present RabbitV, a tool for rapid detection of viruses and microorganisms in Illumina sequencing datasets based on fast identification of unique k-mers. It can exploit the power of modern multi-core CPUs by using multi-threading, vectorization and fast data parsing. Experiments show that RabbitV outperforms fastv by a factor of at least 42.5 and 14.4 in unique k-mer generation (RabbitUniq) and pathogen identification (RabbitV), respectively. Furthermore, RabbitV is able to detect COVID-19 from 40 samples of sequencing data (255 GB in FASTQ format) in only 320 s.Availability and implementationRabbitUniq and RabbitV are available at https://github.com/RabbitBio/RabbitUniq and https://github.com/RabbitBio/RabbitV.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 25 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac187
      Issue No: Vol. 38, No. 10 (2022)
       
  • tRNAstudio: facilitating the study of human mature tRNAs from deep
           sequencing datasets

    • Free pre-print version: Loading...

      Authors: Murillo-Recio M; Martínez de Lejarza Samper I, Tuñí i Domínguez C, et al.
      Pages: 2934 - 2936
      Abstract: SummaryHigh-throughput sequencing of transfer RNAs (tRNA-Seq) is a powerful approach to characterize the cellular tRNA pool. Currently, however, analyzing tRNA-Seq datasets requires strong bioinformatics and programming skills. tRNAstudio facilitates the analysis of tRNA-Seq datasets and extracts information on tRNA gene expression, post-transcriptional tRNA modification levels, and tRNA processing steps. Users need only running a few simple bash commands to activate a graphical user interface that allows the easy processing of tRNA-Seq datasets in local mode. Output files include extensive graphical representations and associated numerical tables, and an interactive html summary report to help interpret the data. We have validated tRNAstudio using datasets generated by different experimental methods and derived from human cell lines and tissues that present distinct patterns of tRNA expression, modification and processing.Availability and implementationFreely available at https://github.com/GeneTranslationLab-IRB/tRNAstudio under an open-source GNU GPL v3.0 license.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 07 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac198
      Issue No: Vol. 38, No. 10 (2022)
       
  • AMIGOS III: pseudo-torsion angle visualization and motif-based structure
           comparison of nucleic acids

    • Free pre-print version: Loading...

      Authors: Shine M; Zhang C, Pyle A, et al.
      Pages: 2937 - 2939
      Abstract: MotivationThe full description of nucleic acid conformation involves eight torsion angles per nucleotide. To simplify this description, we previously developed a representation of the nucleic acid backbone that assigns each nucleotide a pair of pseudo-torsion angles (eta and theta defined by P and C4ʹ atoms; or etaʹ and thetaʹ defined by P and C1ʹ atoms). A Java program, AMIGOS II, is currently available for calculating eta and theta angles for RNA and for performing motif searches based on eta and theta angles. However, AMIGOS II lacks the ability to parse DNA structures and to calculate etaʹ and thetaʹ angles. It also has little visualization capacity for 3D structure, making it difficult for users to interpret the computational results.ResultsWe present AMIGOS III, a PyMOL plugin that calculates the pseudo-torsion angles eta, theta, etaʹ and thetaʹ for both DNA and RNA structures and performs motif searching based on these angles. Compared to AMIGOS II, AMIGOS III offers improved pseudo-torsion angle visualization for RNA and faster nucleic acid worm database generation; it also introduces pseudo-torsion angle visualization for DNA and nucleic acid worm visualization. Its integration into PyMOL enables easy preparation of tertiary structure inputs and intuitive visualization of involved structures.Availability and implementationhttps://github.com/pylelab/AMIGOSIII.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Wed, 06 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac207
      Issue No: Vol. 38, No. 10 (2022)
       
  • hacksig: a unified and tidy R framework to easily compute gene expression
           signature scores

    • Free pre-print version: Loading...

      Authors: Carenzo A; Pistore F, Serafini M, et al.
      Pages: 2940 - 2942
      Abstract: SummaryHundreds of gene expression signatures have been developed during the last two decades. However, due to the multitude of development procedures and sometimes a lack of explanation for their implementation, it can become challenging to apply the original method on custom data. Moreover, at present, there is no unified and tidy interface to compute signature scores with different single sample enrichment methods. For these reasons, we developed hacksig, an R package intended as a unified framework to obtain single sample scores with a tidy output as well as a collection of manually curated gene signatures and methods from cancer transcriptomics literature.Availability and implementationThe hacksig R package is freely available on CRAN (https://CRAN.R-project.org/package=hacksig) under the MIT license. The source code can be found on GitHub at https://github.com/Acare/hacksig.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 18 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac161
      Issue No: Vol. 38, No. 10 (2022)
       
  • Analysing high-throughput sequencing data in Python with HTSeq 2.0

    • Free pre-print version: Loading...

      Authors: Putri G; Anders S, Pyl P, et al.
      Pages: 2943 - 2945
      Abstract: SummaryHTSeq 2.0 provides a more extensive application programming interface including a new representation for sparse genomic data, enhancements for htseq-count to suit single-cell omics, a new script for data using cell and molecular barcodes, improved documentation, testing and deployment, bug fixes and Python 3 support.Availability and implementationHTSeq 2.0 is released as an open-source software under the GNU General Public License and is available from the Python Package Index at https://pypi.python.org/pypi/HTSeq. The source code is available on Github at https://github.com/htseq/htseq.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 21 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac166
      Issue No: Vol. 38, No. 10 (2022)
       
  • PyLiger: scalable single-cell multi-omic data integration in Python

    • Free pre-print version: Loading...

      Authors: Lu L; Welch J, Mathelier A.
      Pages: 2946 - 2948
      Abstract: MotivationLIGER (Linked Inference of Genomic Experimental Relationships) is a widely used R package for single-cell multi-omic data integration. However, many users prefer to analyze their single-cell datasets in Python, which offers an attractive syntax and highly optimized scientific computing libraries for increased efficiency.ResultsWe developed PyLiger, a Python package for integrating single-cell multi-omic datasets. PyLiger offers faster performance than the previous R implementation (2–5× speedup), interoperability with AnnData format, flexible on-disk or in-memory analysis capability and new functionality for gene ontology enrichment analysis. The on-disk capability enables analysis of arbitrarily large single-cell datasets using fixed memory.Availability and implementationPyLiger is available on Github at https://github.com/welch-lab/pyliger and on the Python Package Index.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 31 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac190
      Issue No: Vol. 38, No. 10 (2022)
       
  • findPC: An R package to automatically select the number of principal
           components in single-cell analysis

    • Free pre-print version: Loading...

      Authors: Zhuang H; Wang H, Ji Z, et al.
      Pages: 2949 - 2951
      Abstract: SummaryPrincipal component analysis is widely used in analyzing single-cell genomic data. Selecting the optimal number of principal components (PCs) is a crucial step for downstream analyses. The elbow method is most commonly used for this task, but it requires one to visually inspect the elbow plot and manually choose the elbow point. To address this limitation, we developed six methods to automatically select the optimal number of PCs based on the elbow method. We evaluated the performance of these methods on real single-cell RNA-seq data from multiple human and mouse tissues and cell types. The perpendicular line method with 30 PCs has the best overall performance, and its results are highly consistent with the numbers of PCs identified manually. We implemented the six methods in an R package, findPC, that objectively selects the number of PCs and can be easily incorporated into any automatic analysis pipeline.Availability and ImplementationfindPC R package is freely available at https://github.com/haotian-zhuang/findPC.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 08 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac235
      Issue No: Vol. 38, No. 10 (2022)
       
  • rbioapi: user-friendly R interface to biologic web services’ API

    • Free pre-print version: Loading...

      Authors: Rezwani M; Pourfathollah A, Noorbakhsh F, et al.
      Pages: 2952 - 2953
      Abstract: SummaryMany packages serve as an interface between R language and the Application Programming Interface (API) of databases and web services. There is usually a ‘one-package to one-service’ correspondence, which poses challenges such as consistency to the users and scalability to the developers. This, among other issues, has motivated us to develop a package as a framework to facilitate the implementation of API resources in the R language. This R package, rbioapi, is a consistent, user-friendly and scalable interface to biological and medical databases and web services. To date, rbioapi fully supports Enrichr, JASPAR, miEAA, PANTHER, Reactome, STRING and UniProt. We aim to expand this list by collaborations and contributions and gradually make rbioapi as comprehensive as possible.Availability and implementationrbioapi is deposited in CRAN under the https://cran.r-project.org/package=rbioapi address. The source code is publicly available in a GitHub repository at https://github.com/moosa-r/rbioapi/. Also, the documentation website is available at https://rbioapi.moosa-r.com.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 22 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac172
      Issue No: Vol. 38, No. 10 (2022)
       
  • MorphoTools2: an R package for multivariate morphometric analysis

    • Free pre-print version: Loading...

      Authors: Šlenker M; Koutecký P, Marhold K, et al.
      Pages: 2954 - 2955
      Abstract: SummaryThe package MorphoTools2 is intended for multivariate analyses of morphological data. Commonly used tools are missing or scattered across several R packages. The new package, in order to make the workflow convenient and fast, wraps available statistical and graphical tools and provides a comprehensive framework for checking and manipulating input data, core statistical analyses and a wide palette of functions designed to visualize results.Availability and implementationStable version is available from CRAN: https://cran.r-project.org/package=MorphoTools2. The development version is available from the following GitHub repository: https://github.com/MarekSlenker/MorphoTools2. The software is distributed under the GNU General Public Licence (v.3).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 22 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac173
      Issue No: Vol. 38, No. 10 (2022)
       
  • MOSS: multi-omic integration with sparse value decomposition

    • Free pre-print version: Loading...

      Authors: Gonzalez-Reymundez A; Grueneberg A, Lu G, et al.
      Pages: 2956 - 2958
      Abstract: SummaryThis article presents multi-omic integration with sparse value decomposition (MOSS), a free and open-source R package for integration and feature selection in multiple large omics datasets. This package is computationally efficient and offers biological insight through capabilities, such as cluster analysis and identification of informative omic features.Availability and implementationhttps://CRAN.R-project.org/package=MOSS.Supplementary informationSupplementary information can be found at https://github.com/agugonrey/GonzalezReymundez2021.
      PubDate: Thu, 24 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac179
      Issue No: Vol. 38, No. 10 (2022)
       
  • CBNplot: Bayesian network plots for enrichment analysis

    • Free pre-print version: Loading...

      Authors: Sato N; Tamada Y, Yu G, et al.
      Pages: 2959 - 2960
      Abstract: SummaryWhen investigating gene expression profiles, determining important directed edges between genes can provide valuable insights in addition to identifying differentially expressed genes. In the subsequent functional enrichment analysis (EA), understanding how enriched pathways or genes in the pathway interact with one another can help infer the gene regulatory network (GRN), important for studying the underlying molecular mechanisms. However, packages for easy inference of the GRN based on EA are scarce. Here, we developed an R package, CBNplot, which infers the Bayesian network (BN) from gene expression data, explicitly utilizing EA results obtained from curated biological pathway databases. The core features include convenient wrapping for structure learning, visualization of the BN from EA results, comparison with reference networks, and reflection of gene-related information on the plot. As an example, we demonstrate the analysis of bladder cancer-related datasets using CBNplot, including probabilistic reasoning, which is a unique aspect of BN analysis. We display the transformability of results obtained from one dataset to another, the validity of the analysis as assessed using established knowledge and literature, and the possibility of facilitating knowledge discovery from gene expression datasets.Availability and implementationThe library, documentation and web server are available at https://github.com/noriakis/CBNplot.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 25 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac175
      Issue No: Vol. 38, No. 10 (2022)
       
  • kc-hits: a tool to aid in the evaluation and classification of chemical
           carcinogens

    • Free pre-print version: Loading...

      Authors: Reisfeld B; de Conti A, El Ghissassi F, et al.
      Pages: 2961 - 2962
      Abstract: MotivationThe evaluation of chemicals for their carcinogenic hazard requires the analysis of a wide range of data and the characterization of these results relative to the key characteristics of carcinogens. The workflow used historically requires many manual steps that are labor-intensive and can introduce errors, bias and inconsistencies.ResultsThe automation of parts of the evaluation workflow using the kc-hits software has led to significant improvements in process efficiency, as well as more consistent and comprehensive results.Availability and implementationhttps://gitlab.com/i1650/kc-hits.git.Supplementary informationSupplementary data are available at Bioinformatics online.
      PubDate: Mon, 28 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac189
      Issue No: Vol. 38, No. 10 (2022)
       
  • BIODICA: a computational environment for Independent Component Analysis of
           omics data

    • Free pre-print version: Loading...

      Authors: Captier N; Merlevede J, Molkenov A, et al.
      Pages: 2963 - 2964
      Abstract: SummaryWe developed BIODICA, an integrated computational environment for application of independent component analysis (ICA) to bulk and single-cell molecular profiles, interpretation of the results in terms of biological functions and correlation with metadata. The computational core is the novel Python package stabilized-ica which provides interface to several ICA algorithms, a stabilization procedure, meta-analysis and component interpretation tools. BIODICA is equipped with a user-friendly graphical user interface, allowing non-experienced users to perform the ICA-based omics data analysis. The results are provided in interactive ways, thus facilitating communication with biology experts.Availability and implementationBIODICA is implemented in Java, Python and JavaScript. The source code is freely available on GitHub under the MIT and the GNU LGPL licenses. BIODICA is supported on all major operating systems. URL: https://sysbio-curie.github.io/biodica-environment/.
      PubDate: Wed, 06 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac204
      Issue No: Vol. 38, No. 10 (2022)
       
  • OMAMO: orthology-based alternative model organism selection

    • Free pre-print version: Loading...

      Authors: Nicheperovich A; Altenhoff A, Dessimoz C, et al.
      Pages: 2965 - 2966
      Abstract: SummaryThe conservation of pathways and genes across species has allowed scientists to use non-human model organisms to gain a deeper understanding of human biology. However, the use of traditional model systems such as mice, rats and zebrafish is costly, time-consuming and increasingly raises ethical concerns, which highlights the need to search for less complex model organisms. Existing tools only focus on the few well-studied model systems, most of which are complex animals. To address these issues, we have developed Orthologous Matrix and Alternative Model Organism (OMAMO), a software and a web service that provides the user with the best non-complex organism for research into a biological process of interest based on orthologous relationships between human and the species. The outputs provided by OMAMO were supported by a systematic literature review.Availability and implementationhttps://omabrowser.org/omamo/, https://github.com/DessimozLab/omamo.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Fri, 18 Mar 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac163
      Issue No: Vol. 38, No. 10 (2022)
       
  • MitoVisualize: a resource for analysis of variants in human mitochondrial
           RNAs and DNA

    • Free pre-print version: Loading...

      Authors: Lake N; Zhou L, Xu J, et al.
      Pages: 2967 - 2969
      Abstract: SummaryWe present MitoVisualize, a new tool for analysis of the human mitochondrial DNA (mtDNA). MitoVisualize enables visualization of: (i) the position and effect of variants in mitochondrial transfer RNA and ribosomal RNA secondary structures alongside curated variant annotations, (ii) data across RNA structures, such as to show all positions with disease-associated variants or with post-transcriptional modifications and (iii) the position of a base, gene or region in the circular mtDNA map, such as to show the location of a large deletion. All visualizations can be easily downloaded as figures for reuse. MitoVisualize can be useful for anyone interested in exploring mtDNA variation, though is designed to facilitate mtDNA variant interpretation in particular.Availability and implementationMitoVisualize can be accessed via https://www.mitovisualize.org/. The source code is available at https://github.com/leklab/mito_visualize/.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Tue, 12 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac216
      Issue No: Vol. 38, No. 10 (2022)
       
  • VisuStatR: visualizing motility and morphology statistics on images in R

    • Free pre-print version: Loading...

      Authors: Harmel C; Sid Ahmed S, Koch R, et al.
      Pages: 2970 - 2972
      Abstract: MotivationLive-cell microscopy has become an essential tool for analyzing dynamic processes in various biological applications. Thereby, high-throughput and automated tracking analyses allow the simultaneous evaluation of large numbers of objects. However, to critically assess the influence of individual objects on calculated summary statistics, and to detect heterogeneous dynamics or possible artifacts, such as misclassified or -tracked objects, a direct mapping of gained statistical information onto the actual image data would be necessary.ResultsWe present VisuStatR as a platform independent software package that allows the direct visualization of time-resolved summary statistics of morphological characteristics or motility dynamics onto raw images. The software contains several display modes to compare user-defined summary statistics and the underlying image data in various levels of detail.Availability and implementationVisuStatR is a free and open-source R-package, containing a user-friendly graphical-user interface and is available via GitHub at https://github.com/grrchrr/VisuStatR/ under the MIT+ license.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: Thu, 21 Apr 2022 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btac191
      Issue No: Vol. 38, No. 10 (2022)
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 34.231.247.88
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-