Subjects -> LIBRARY AND INFORMATION SCIENCES (Total: 392 journals)
    - DIGITAL CURATION AND PRESERVATION (13 journals)
    - LIBRARY ADMINISTRATION (1 journals)
    - LIBRARY AND INFORMATION SCIENCES (378 journals)

LIBRARY AND INFORMATION SCIENCES (378 journals)                  1 2 | Last

Showing 1 - 200 of 379 Journals sorted by number of followers
Library & Information Science Research     Hybrid Journal   (Followers: 1821)
Journal of Librarianship and Information Science     Hybrid Journal   (Followers: 1337)
Library Hi Tech     Hybrid Journal   (Followers: 1140)
Journal of Information Science     Hybrid Journal   (Followers: 1112)
Journal of Academic Librarianship     Hybrid Journal   (Followers: 1100)
Library Management     Hybrid Journal   (Followers: 977)
The Electronic Library     Hybrid Journal   (Followers: 976)
Library Quarterly     Full-text available via subscription   (Followers: 941)
Global Knowledge, Memory and Communication     Hybrid Journal   (Followers: 882)
Journal of Information Literacy     Open Access   (Followers: 858)
Library Hi Tech News     Hybrid Journal   (Followers: 788)
Information Technology and Libraries     Open Access   (Followers: 736)
New Library World     Hybrid Journal   (Followers: 684)
Journal of Library & Information Services in Distance Learning     Hybrid Journal   (Followers: 635)
Information Retrieval     Hybrid Journal   (Followers: 616)
Information Sciences     Hybrid Journal   (Followers: 602)
International Journal on Digital Libraries     Hybrid Journal   (Followers: 580)
Information Processing & Management     Hybrid Journal   (Followers: 567)
Information Systems Research     Full-text available via subscription   (Followers: 557)
College & Research Libraries     Open Access   (Followers: 528)
Evidence Based Library and Information Practice     Open Access   (Followers: 461)
Journal of Library and Information Science     Open Access   (Followers: 444)
International Information & Library Review     Hybrid Journal   (Followers: 437)
The Information Society: An International Journal     Hybrid Journal   (Followers: 406)
Library Trends     Full-text available via subscription   (Followers: 390)
Library and Information Research     Open Access   (Followers: 364)
Forensic Science International: Digital Investigation     Full-text available via subscription   (Followers: 344)
Annals of Library and Information Studies (ALIS)     Open Access   (Followers: 337)
International Journal of Library Science     Open Access   (Followers: 303)
Canadian Journal of Information and Library Science     Full-text available via subscription   (Followers: 289)
College & Research Libraries News     Partially Free   (Followers: 286)
Bioinformatics     Hybrid Journal   (Followers: 283)
The Reference Librarian     Hybrid Journal   (Followers: 267)
College & Undergraduate Libraries     Hybrid Journal   (Followers: 261)
IFLA Journal     Hybrid Journal   (Followers: 261)
Library Leadership & Management     Open Access   (Followers: 261)
Journal of Electronic Resources Librarianship     Hybrid Journal   (Followers: 259)
Journal of Library Administration     Hybrid Journal   (Followers: 254)
Library Collections, Acquisitions, and Technical Services     Hybrid Journal   (Followers: 253)
Communications in Information Literacy     Open Access   (Followers: 244)
Data Technologies and Applications     Hybrid Journal   (Followers: 236)
American Libraries     Partially Free   (Followers: 223)
Journal of the Medical Library Association     Open Access   (Followers: 222)
Code4Lib Journal     Open Access   (Followers: 218)
Journal of Information & Knowledge Management     Hybrid Journal   (Followers: 214)
International Journal of Information Management     Hybrid Journal   (Followers: 212)
Cataloging & Classification Quarterly     Hybrid Journal   (Followers: 207)
Journal of Library Metadata     Hybrid Journal   (Followers: 206)
Australian Library Journal     Full-text available via subscription   (Followers: 198)
Journal of Documentation     Hybrid Journal   (Followers: 195)
portal: Libraries and the Academy     Full-text available via subscription   (Followers: 190)
Ariadne Magazine     Open Access   (Followers: 185)
Journal of Hospital Librarianship     Hybrid Journal   (Followers: 184)
Behavioral & Social Sciences Librarian     Hybrid Journal   (Followers: 179)
Aslib Proceedings     Hybrid Journal   (Followers: 172)
Library & Information History     Hybrid Journal   (Followers: 165)
American Archivist     Hybrid Journal   (Followers: 161)
EDUCAUSE Review     Full-text available via subscription   (Followers: 161)
Research Library Issues     Free   (Followers: 159)
The Serials Librarian     Hybrid Journal   (Followers: 156)
The Library : The Transactions of the Bibliographical Society     Hybrid Journal   (Followers: 154)
New Review of Academic Librarianship     Hybrid Journal   (Followers: 151)
Book History     Full-text available via subscription   (Followers: 149)
Against the Grain     Partially Free   (Followers: 143)
Library Technology Reports     Full-text available via subscription   (Followers: 141)
Journal of eScience Librarianship     Open Access   (Followers: 134)
DESIDOC Journal of Library & Information Technology     Open Access   (Followers: 105)
Archives and Museum Informatics     Hybrid Journal   (Followers: 99)
Australian Academic & Research Libraries     Full-text available via subscription   (Followers: 99)
European Journal of Information Systems     Hybrid Journal   (Followers: 95)
Online Information Review     Hybrid Journal   (Followers: 91)
Journal of Librarianship and Scholarly Communication     Open Access   (Followers: 88)
International Journal of Digital Curation     Open Access   (Followers: 85)
Information Technologies & International Development     Open Access   (Followers: 84)
Journal of Electronic Publishing     Open Access   (Followers: 77)
Serials Review     Hybrid Journal   (Followers: 75)
Journal of Education in Library and Information Science - JELIS     Full-text available via subscription   (Followers: 74)
International Journal of Digital Library Systems     Full-text available via subscription   (Followers: 74)
Journal of Interlibrary Loan Document Delivery & Electronic Reserve     Hybrid Journal   (Followers: 69)
LIBER Quarterly : The Journal of the Association of European Research Libraries     Open Access   (Followers: 68)
Archival Science     Hybrid Journal   (Followers: 66)
Ethics and Information Technology     Hybrid Journal   (Followers: 66)
Journal of the Canadian Health Libraries Association / Journal de l'Association des bibliothèques de la santé du Canada     Open Access   (Followers: 66)
Library Philosophy and Practice     Open Access   (Followers: 66)
Insights : the UKSG journal     Open Access   (Followers: 65)
Practical Academic Librarianship : The International Journal of the SLA Academic Division     Open Access   (Followers: 65)
MIS Quarterly : Management Information Systems Quarterly     Hybrid Journal   (Followers: 63)
Journal of Management Information Systems     Full-text available via subscription   (Followers: 60)
Science & Technology Libraries     Hybrid Journal   (Followers: 59)
Journal of Information Technology     Hybrid Journal   (Followers: 56)
The Bottom Line: Managing Library Finances     Hybrid Journal   (Followers: 56)
Alexandria : The Journal of National and International Library and Information Issues     Full-text available via subscription   (Followers: 56)
Journal of Health & Medical Informatics     Open Access   (Followers: 54)
Partnership : the Canadian Journal of Library and Information Practice and Research     Open Access   (Followers: 54)
Archives and Manuscripts     Hybrid Journal   (Followers: 52)
International Journal of Legal Information     Full-text available via subscription   (Followers: 51)
Library & Archival Security     Hybrid Journal   (Followers: 49)
Bangladesh Journal of Library and Information Science     Open Access   (Followers: 47)
OCLC Systems & Services     Hybrid Journal   (Followers: 46)
Community & Junior College Libraries     Hybrid Journal   (Followers: 45)
Information Discovery and Delivery     Hybrid Journal   (Followers: 44)
Journal of Access Services     Hybrid Journal   (Followers: 40)
Medical Reference Services Quarterly     Hybrid Journal   (Followers: 40)
VINE Journal of Information and Knowledge Management Systems     Hybrid Journal   (Followers: 40)
Journal of the Society of Archivists     Hybrid Journal   (Followers: 36)
Scholarly and Research Communication     Open Access   (Followers: 36)
Public Library Quarterly     Hybrid Journal   (Followers: 32)
Journal of Archival Organization     Hybrid Journal   (Followers: 31)
Information & Culture : A Journal of History     Full-text available via subscription   (Followers: 31)
Australasian Public Libraries and Information Services     Full-text available via subscription   (Followers: 31)
Journal of the Association for Information Systems     Open Access   (Followers: 31)
Research Evaluation     Hybrid Journal   (Followers: 30)
Foundations and Trends® in Information Retrieval     Full-text available via subscription   (Followers: 30)
Information     Open Access   (Followers: 29)
International Journal of Information Retrieval Research     Full-text available via subscription   (Followers: 29)
Information Systems Frontiers     Hybrid Journal   (Followers: 27)
International Journal of Intellectual Property Management     Hybrid Journal   (Followers: 26)
International Journal of Information Privacy, Security and Integrity     Hybrid Journal   (Followers: 26)
Proceedings of the American Society for Information Science and Technology     Hybrid Journal   (Followers: 26)
Health Information Management Journal     Hybrid Journal   (Followers: 26)
Journal of the Institute of Conservation     Hybrid Journal   (Followers: 25)
Access     Full-text available via subscription   (Followers: 24)
Nordic Journal of Information Literacy in Higher Education     Open Access   (Followers: 24)
South African Journal of Libraries and Information Science     Open Access   (Followers: 23)
Sci-Tech News     Open Access   (Followers: 23)
LASIE : Library Automated Systems Information Exchange     Free   (Followers: 22)
Journal of Information, Communication and Ethics in Society     Hybrid Journal   (Followers: 22)
NASIG Newsletter     Open Access   (Followers: 21)
InCite     Full-text available via subscription   (Followers: 20)
Georgia Library Quarterly     Open Access   (Followers: 20)
LOEX Quarterly     Full-text available via subscription   (Followers: 20)
RBM : A Journal of Rare Books, Manuscripts, and Cultural Heritage     Open Access   (Followers: 20)
Urban Library Journal     Open Access   (Followers: 19)
El Profesional de la Informacion     Full-text available via subscription   (Followers: 18)
Journal of Research on Libraries and Young Adults     Open Access   (Followers: 18)
International Journal of Web Portals     Full-text available via subscription   (Followers: 17)
Communication Booknotes Quarterly     Hybrid Journal   (Followers: 16)
Theological Librarianship : An Online Journal of the American Theological Library Association     Open Access   (Followers: 16)
Perspectives in International Librarianship     Open Access   (Followers: 16)
Biblioteca Universitaria     Open Access   (Followers: 16)
Collection and Curation     Hybrid Journal   (Followers: 15)
Manuscripta     Full-text available via subscription   (Followers: 15)
Bibliotheca Orientalis     Full-text available via subscription   (Followers: 14)
International Journal of Business Information Systems     Hybrid Journal   (Followers: 14)
International Journal of Information Technology, Communications and Convergence     Hybrid Journal   (Followers: 14)
Notes     Full-text available via subscription   (Followers: 14)
Online Journal of Public Health Informatics     Open Access   (Followers: 14)
Alexandría : Revista de Ciencias de la Información     Open Access   (Followers: 14)
Anales de Documentacion     Open Access   (Followers: 14)
Journal of Educational Media, Memory, and Society     Full-text available via subscription   (Followers: 13)
Biblios     Open Access   (Followers: 13)
International Journal of Intercultural Information Management     Hybrid Journal   (Followers: 12)
Alsic : Apprentissage des Langues et Systèmes d'Information et de Communication     Open Access   (Followers: 12)
Journal of Information Technology Teaching Cases     Hybrid Journal   (Followers: 12)
Journal of Religious & Theological Information     Hybrid Journal   (Followers: 11)
Universal Access in the Information Society     Hybrid Journal   (Followers: 11)
InterActions: UCLA Journal of Education and Information     Open Access   (Followers: 11)
International Journal of Information and Decision Sciences     Hybrid Journal   (Followers: 11)
Journal of Information Systems     Full-text available via subscription   (Followers: 11)
Kansas Library Association College & University Libraries Section Proceedings     Open Access   (Followers: 11)
Journal of Information Engineering and Applications     Open Access   (Followers: 10)
Journal of Global Information Management     Full-text available via subscription   (Followers: 9)
Southeastern Librarian     Open Access   (Followers: 9)
e & i Elektrotechnik und Informationstechnik     Hybrid Journal   (Followers: 8)
JLIS.it     Open Access   (Followers: 8)
International Journal of Multicriteria Decision Making     Hybrid Journal   (Followers: 8)
JISTEM : Journal of Information Systems and Technology Management     Open Access   (Followers: 8)
International Journal of Multimedia Information Retrieval     Partially Free   (Followers: 8)
BIBLOS - Revista do Departamento de Biblioteconomia e História     Open Access   (Followers: 7)
New Review of Information Networking     Hybrid Journal   (Followers: 7)
Idaho Librarian     Free   (Followers: 7)
Slavic & East European Information Resources     Hybrid Journal   (Followers: 6)
Egyptian Informatics Journal     Open Access   (Followers: 6)
Informaatiotutkimus     Open Access   (Followers: 5)
Revista Interamericana de Bibliotecología     Open Access   (Followers: 5)
CIC. Cuadernos de Informacion y Comunicacion     Open Access   (Followers: 5)
Bridgewater Review     Open Access   (Followers: 5)
Bilgi Dünyası     Open Access   (Followers: 5)
Open Systems & Information Dynamics     Hybrid Journal   (Followers: 4)
ProInflow : Journal for Information Sciences     Open Access   (Followers: 4)
Nordic Journal of Library and Information Studies     Open Access   (Followers: 4)
International Journal of Cooperative Information Systems     Hybrid Journal   (Followers: 4)
OJS på dansk     Open Access   (Followers: 4)
Investigación Bibliotecológica     Open Access   (Followers: 4)
Revista Española de Documentación Científica     Open Access   (Followers: 4)
International Journal of Organisational Design and Engineering     Hybrid Journal   (Followers: 3)
Journal of Information Systems Teaching Notes     Hybrid Journal   (Followers: 3)
HLA News     Full-text available via subscription   (Followers: 3)
Encontros Bibli : revista eletrônica de biblioteconomia e ciência da informação     Open Access   (Followers: 3)
SLIS Student Research Journal     Open Access   (Followers: 3)
VRA Bulletin     Open Access   (Followers: 3)
Türk Kütüphaneciliği : Turkish Librarianship     Open Access   (Followers: 2)
Información, Cultura y Sociedad     Open Access   (Followers: 2)
Revista General de Información y Documentación     Open Access   (Followers: 2)
Informação & Informação     Open Access   (Followers: 2)
In Monte Artium     Full-text available via subscription   (Followers: 1)
Knjižnica : Revija za Področje Bibliotekarstva in Informacijske Znanosti     Open Access   (Followers: 1)
Documentación de las Ciencias de la Información     Open Access   (Followers: 1)
Palabra Clave (La Plata)     Open Access  
Liinc em Revista     Open Access  

        1 2 | Last

Similar Journals
Journal Cover
Bioinformatics
Journal Prestige (SJR): 6.14
Citation Impact (citeScore): 8
Number of Followers: 283  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1367-4803 - ISSN (Online) 1460-2059
Published by Oxford University Press Homepage  [425 journals]
  • Correction to: Phylovar: toward scalable phylogeny-aware inference of
           single-nucleotide variations from single-cell DNA sequencing data

    • Free pre-print version: Loading...

      First page: btad321
      Abstract: This is a correction to: Mohammadamin Edrisi, Monica V Valecha, Sunkara B V Chowdary, Sergio Robledo, Huw A Ogilvie, David Posada, Hamim Zafar, Luay Nakhleh, Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data, Bioinformatics, Volume 38, Issue Supplement_1, July 2022, Pages i195–i202, https://doi.org/10.1093/bioinformatics/btac254
      PubDate: Tue, 23 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad321
      Issue No: Vol. 39, No. 5 (2023)
       
  • Correction to: Integrative analysis of individual-level data and
           high-dimensional summary statistics

    • Free pre-print version: Loading...

      First page: btad324
      Abstract: This is a correction to: Sheng Fu, Lu Deng, Han Zhang, William Wheeler, Jing Qin, Kai Yu, Integrative analysis of individual-level data and high-dimensional summary statistics, Bioinformatics, Volume 39, Issue 4, April 2023, btad156, https://doi.org/10.1093/bioinformatics/btad156
      PubDate: Fri, 19 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad324
      Issue No: Vol. 39, No. 5 (2023)
       
  • Correction to: wpLogicNet: logic gate and structure inference in gene
           regulatory networks

    • Free pre-print version: Loading...

      First page: btad304
      Abstract: This is a correction to: Seyed Amir Malekpour, Maryam Shahdoust, Rosa Aghdam, Mehdi Sadeghi, wpLogicNet: logic gate and structure inference in gene regulatory networks, Bioinformatics, Volume 39, Issue 2, February 2023, https://doi.org/10.1093/bioinformatics/btad072
      PubDate: Wed, 17 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad304
      Issue No: Vol. 39, No. 5 (2023)
       
  • NanoPack2: population-scale evaluation of long-read sequencing data

    • Free pre-print version: Loading...

      First page: btad311
      Abstract: SummaryIncreases in the cohort size in long-read sequencing projects necessitate more efficient software for quality assessment and processing of sequencing data from Oxford Nanopore Technologies and Pacific Biosciences. Here, we describe novel tools for summarizing experiments, filtering datasets, visualizing phased alignments results, and updates to the NanoPack software suite.Availability and implementationThe cramino, chopper, kyber, and phasius tools are written in Rust and available as executable binaries without requiring installation or managing dependencies. Binaries build on musl are available for broad compatibility. NanoPlot and NanoComp are written in Python3. Links to the separate tools and their documentation can be found at https://github.com/wdecoster/nanopack. All tools are compatible with Linux, Mac OS, and the MS Windows Subsystem for Linux and are released under the MIT license. The repositories include test data, and the tools are continuously tested using GitHub Actions and can be installed with the conda dependency manager.
      PubDate: Fri, 12 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad311
      Issue No: Vol. 39, No. 5 (2023)
       
  • CscoreTool-M infers 3D sub-compartment probabilities within cell
           population

    • Free pre-print version: Loading...

      First page: btad314
      Abstract: MotivationComputational inference of genome organization based on Hi-C sequencing has greatly aided the understanding of chromatin and nuclear organization in three dimensions (3D). However, existing computational methods fail to address the cell population heterogeneity. Here we describe a probabilistic-modeling-based method called CscoreTool-M that infers multiple 3D genome sub-compartments from Hi-C data.ResultsThe compartment scores inferred using CscoreTool-M represents the probability of a genomic region locating in a specific sub-compartment. Compared to published methods, CscoreTool-M is more accurate in inferring sub-compartments corresponding to both active and repressed chromatin. The compartment scores calculated by CscoreTool-M also help to quantify the levels of heterogeneity in sub-compartment localization within cell populations. By comparing proliferating cells and terminally differentiated non-proliferating cells, we show that the proliferating cells have higher genome organization heterogeneity, which is likely caused by cells at different cell-cycle stages. By analyzing 10 sub-compartments, we found a sub-compartment containing chromatin potentially related to the early-G1 chromatin regions proximal to the nuclear lamina in HCT116 cells, suggesting the method can deconvolve cell cycle stage-specific genome organization among asynchronously dividing cells. Finally, we show that CscoreTool-M can identify sub-compartments that contain genes enriched in housekeeping or cell-type-specific functions.Availability and implementationhttps://github.com/scoutzxb/CscoreTool-M.
      PubDate: Thu, 11 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad314
      Issue No: Vol. 39, No. 5 (2023)
       
  • High-quality, customizable heuristics for RNA 3D structure alignment

    • Free pre-print version: Loading...

      First page: btad315
      Abstract: MotivationTertiary structure alignment is one of the main challenges in the computer-aided comparative study of molecular structures. Its aim is to optimally overlay the 3D shapes of two or more molecules in space to find the correspondence between their nucleotides. Alignment is the starting point for most algorithms that assess structural similarity or find common substructures. Thus, it has applications in solving a variety of bioinformatics problems, e.g. in the search for structural patterns, structure clustering, identifying structural redundancy, and evaluating the prediction accuracy of 3D models. To date, several tools have been developed to align 3D structures of RNA. However, most of them are not applicable to arbitrarily large structures and do not allow users to parameterize the optimization algorithm.ResultsWe present two customizable heuristics for flexible alignment of 3D RNA structures, geometric search (GEOS), and genetic algorithm (GENS). They work in sequence-dependent/independent mode and find the suboptimal alignment of expected quality (below a predefined RMSD threshold). We compare their performance with those of state-of-the-art methods for aligning RNA structures. We show the results of quantitative and qualitative tests run for all of these algorithms on benchmark sets of RNA structures.Availability and implementationSource codes for both heuristics are hosted at https://github.com/RNApolis/rnahugs.
      PubDate: Thu, 11 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad315
      Issue No: Vol. 39, No. 5 (2023)
       
  • TRASH: Tandem Repeat Annotation and Structural Hierarchy

    • Free pre-print version: Loading...

      First page: btad308
      Abstract: MotivationThe advent of long-read DNA sequencing is allowing complete assembly of highly repetitive genomic regions for the first time, including the megabase-scale satellite repeat arrays found in many eukaryotic centromeres. The assembly of such repetitive regions creates a need for their de novo annotation, including patterns of higher order repetition. To annotate tandem repeats, methods are required that can be widely applied to diverse genome sequences, without prior knowledge of monomer sequences.ResultsTandem Repeat Annotation and Structural Hierarchy (TRASH) is a tool that identifies and maps tandem repeats in nucleotide sequence, without prior knowledge of repeat composition. TRASH analyses a fasta assembly file, identifies regions occupied by repeats and then precisely maps them and their higher order structures. To demonstrate the applicability and scalability of TRASH for centromere research, we apply our method to the recently published Col-CEN genome of Arabidopsis thaliana and the complete human CHM13 genome.Availability and implementationTRASH is freely available at:https://github.com/vlothec/TRASH and supported on Linux.
      PubDate: Wed, 10 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad308
      Issue No: Vol. 39, No. 5 (2023)
       
  • Genome mining for anti-CRISPR operons using machine learning

    • Free pre-print version: Loading...

      First page: btad309
      Abstract: MotivationEncoded by (pro-)viruses, anti-CRISPR (Acr) proteins inhibit the CRISPR-Cas immune system of their prokaryotic hosts. As a result, Acr proteins can be employed to develop more controllable CRISPR-Cas genome editing tools. Recent studies revealed that known acr genes often coexist with other acr genes and with phage structural genes within the same operon. For example, we found that 47 of 98 known acr genes (or their homologs) co-exist in the same operons. None of the current Acr prediction tools have considered this important genomic context feature. We have developed a new software tool AOminer to facilitate the improved discovery of new Acrs by fully exploiting the genomic context of known acr genes and their homologs.ResultsAOminer is the first machine learning based tool focused on the discovery of Acr operons (AOs). A two-state HMM (hidden Markov model) was trained to learn the conserved genomic context of operons that contain known acr genes or their homologs, and the learnt features could distinguish AOs and non-AOs. AOminer allows automated mining for potential AOs from query genomes or operons. AOminer outperformed all existing Acr prediction tools with an accuracy = 0.85. AOminer will facilitate the discovery of novel anti-CRISPR operons.Availability and implementationThe webserver is available at: http://aca.unl.edu/AOminer/AOminer_APP/. The python program is at: https://github.com/boweny920/AOminer.
      PubDate: Tue, 09 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad309
      Issue No: Vol. 39, No. 5 (2023)
       
  • GTExVisualizer: a web platform for supporting ageing studies

    • Free pre-print version: Loading...

      First page: btad303
      Abstract: MotivationStudying ageing effects on molecules is an important new topic for life science. To perform such studies, the need for data, models, algorithms, and tools arises to elucidate molecular mechanisms. GTEx (standing for Genotype-Tissue Expression) portal is a web-based data source allowing to retrieve patients’ transcriptomics data annotated with tissues, gender, and age information. It represents the more complete data sources for ageing effects studies. Nevertheless, it lacks functionalities to query data at the sex/age level, as well as tools for protein interaction studies, thereby limiting ageing studies. As a result, users need to download query results to proceed to further analysis, such as retrieving the expression of a given gene on different age (or sex) classes in many tissues.ResultsWe present the GTExVisualizer, a platform to query and analyse GTEx data. This tool contains a web interface able to: (i) graphically represent and study query results; (ii) analyse genes using sex/age expression patterns, also integrated with network-based modules; and (iii) report results as plot-based representation as well as (gene) networks. Finally, it allows the user to obtain basic statistics which evidence differences in gene expression among sex/age groups.ConclusionThe GTExVisualizer novelty consists in providing a tool for studying ageing/sex-related effects on molecular processes.Availability and implementationGTExVisualizer is available at: http://gtexvisualizer.herokuapp.com. The source code and data are available at: https://github.com/UgoLomoio/gtex_visualizer.
      PubDate: Mon, 08 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad303
      Issue No: Vol. 39, No. 5 (2023)
       
  • STEMSIM: a simulator of within-strain short-term evolutionary mutations
           for longitudinal metagenomic data

    • Free pre-print version: Loading...

      First page: btad302
      Abstract: MotivationAs the resolution of metagenomic analysis increases, the evolution of microbial genomes in longitudinal metagenomic data has become a research focus. Some software has been developed for the simulation of complex microbial communities at the strain level. However, the tool for simulating within-strain evolutionary signals in longitudinal samples is still lacking.ResultsIn this study, we introduce STEMSIM, a user-friendly command-line simulator of short-term evolutionary mutations for longitudinal metagenomic data. The input is simulated longitudinal raw sequencing reads of microbial communities or single species. The output is the modified reads with within-strain evolutionary mutations and the relevant information of these mutations. STEMSIM will be of great use for the evaluation of analytic tools that detect short-term evolutionary mutations in metagenomic data.Availability and implementationSTEMSIM and its tutorial are freely available online at https://github.com/BoyanZhou/STEMSim.
      PubDate: Mon, 08 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad302
      Issue No: Vol. 39, No. 5 (2023)
       
  • Atomic protein structure refinement using all-atom graph representations
           and SE(3)-equivariant graph transformer

    • Free pre-print version: Loading...

      First page: btad298
      Abstract: MotivationThe state-of-art protein structure prediction methods such as AlphaFold are being widely used to predict structures of uncharacterized proteins in biomedical research. There is a significant need to further improve the quality and nativeness of the predicted structures to enhance their usability. In this work, we develop ATOMRefine, a deep learning-based, end-to-end, all-atom protein structural model refinement method. It uses a SE(3)-equivariant graph transformer network to directly refine protein atomic coordinates in a predicted tertiary structure represented as a molecular graph.ResultsThe method is first trained and tested on the structural models in AlphaFoldDB whose experimental structures are known, and then blindly tested on 69 CASP14 regular targets and 7 CASP14 refinement targets. ATOMRefine improves the quality of both backbone atoms and all-atom conformation of the initial structural models generated by AlphaFold. It also performs better than two state-of-the-art refinement methods in multiple evaluation metrics including an all-atom model quality score—the MolProbity score based on the analysis of all-atom contacts, bond length, atom clashes, torsion angles, and side-chain rotamers. As ATOMRefine can refine a protein structure quickly, it provides a viable, fast solution for improving protein geometry and fixing structural errors of predicted structures through direct coordinate refinement.Availability and implementationThe source code of ATOMRefine is available in the GitHub repository (https://github.com/BioinfoMachineLearning/ATOMRefine). All the required data for training and testing are available at https://doi.org/10.5281/zenodo.6944368.
      PubDate: Fri, 05 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad298
      Issue No: Vol. 39, No. 5 (2023)
       
  • ROptimus: a parallel general-purpose adaptive optimization engine

    • Free pre-print version: Loading...

      First page: btad292
      Abstract: SummaryMotivationVarious computational biology calculations require a probabilistic optimization protocol to determine the parameters that capture the system at a desired state in the configurational space. Many existing methods excel at certain scenarios, but fail in others due, in part, to an inefficient exploration of the parameter space and easy trapping into local minima. Here, we developed a general-purpose optimization engine in R that can be plugged to any, simple or complex, modelling initiative through a few lucid interfacing functions, to perform a seamless optimization with rigorous parameter sampling.ResultsROptimus features simulated annealing and replica exchange implementations equipped with adaptive thermoregulation to drive Monte Carlo optimization process in a flexible manner, through constrained acceptance frequency but unconstrained adaptive pseudo temperature regimens. We exemplify the applicability of our R optimizer to a diverse set of problems spanning data analyses and computational biology tasks.Availability and implementationROptimus is written and implemented in R, and is freely available from CRAN (http://cran.r-project.org/web/packages/ROptimus/index.html) and GitHub (http://github.com/SahakyanLab/ROptimus).
      PubDate: Thu, 04 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad292
      Issue No: Vol. 39, No. 5 (2023)
       
  • HAMPLE: deciphering TF-DNA binding mechanism in different cellular
           environments by characterizing higher-order nucleotide dependency

    • Free pre-print version: Loading...

      First page: btad299
      Abstract: MotivationTranscription factor (TF) binds to conservative DNA binding sites in different cellular environments and development stages by physical interaction with interdependent nucleotides. However, systematic computational characterization of the relationship between higher-order nucleotide dependency and TF-DNA binding mechanism in diverse cell types remains challenging.ResultsHere, we propose a novel multi-task learning framework HAMPLE to simultaneously predict TF binding sites (TFBS) in distinct cell types by characterizing higher-order nucleotide dependencies. Specifically, HAMPLE first represents a DNA sequence through three higher-order nucleotide dependencies, including k-mer encoding, DNA shape and histone modification. Then, HAMPLE uses the customized gate control and the channel attention convolutional architecture to further capture cell-type-specific and cell-type-shared DNA binding motifs and epigenomic languages. Finally, HAMPLE exploits the joint loss function to optimize the TFBS prediction for different cell types in an end-to-end manner. Extensive experimental results on seven datasets demonstrate that HAMPLE significantly outperforms the state-of-the-art approaches in terms of auROC. In addition, feature importance analysis illustrates that k-mer encoding, DNA shape, and histone modification have predictive power for TF-DNA binding in different cellular environments and are complementary to each other. Furthermore, ablation study, and interpretable analysis validate the effectiveness of the customized gate control and the channel attention convolutional architecture in characterizing higher-order nucleotide dependencies.Availability and implementationThe source code is available at https://github.com/ZhangLab312/Hample.
      PubDate: Thu, 04 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad299
      Issue No: Vol. 39, No. 5 (2023)
       
  • ppBAM: ProteinPaint BAM track for read alignment visualization and variant
           genotyping

    • Free pre-print version: Loading...

      First page: btad300
      Abstract: SummaryProteinPaint BAM track (ppBAM) is designed to assist variant review for cancer research and clinical genomics. With performant server-side computing and rendering, ppBAM supports on-the-fly variant genotyping of thousands of reads using Smith–Waterman alignment. To better visualize support for complex variants, reads are realigned against the mutated reference sequence using ClustalO. ppBAM also supports the BAM slicing API of the NCI Genomic Data Commons (GDC) portal, letting researchers conveniently examine genomic details of vast amounts of cancer sequencing data and reinterpret variant calls.Availability and implementationBAM track examples, tutorial, and GDC file access links are available at https://proteinpaint.stjude.org/bam/. Source code is available at https://github.com/stjude/proteinpaint.
      PubDate: Thu, 04 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad300
      Issue No: Vol. 39, No. 5 (2023)
       
  • Kimma: flexible linear mixed effects modeling with kinship covariance for
           RNA-seq data

    • Free pre-print version: Loading...

      First page: btad279
      Abstract: MotivationThe identification of differentially expressed genes (DEGs) from transcriptomic datasets is a major avenue of research across diverse disciplines. However, current bioinformatic tools do not support covariance matrices in DEG modeling. Here, we introduce kimma (Kinship In Mixed Model Analysis), an open-source R package for flexible linear mixed effects modeling including covariates, weights, random effects, covariance matrices, and fit metrics.ResultsIn simulated datasets, kimma detects DEGs with similar specificity, sensitivity, and computational time as limma unpaired and dream paired models. Unlike other software, kimma supports covariance matrices as well as fit metrics like Akaike information criterion (AIC). Utilizing genetic kinship covariance, kimma revealed that kinship impacts model fit and DEG detection in a related cohort. Thus, kimma equals or outcompetes current DEG pipelines in sensitivity, computational time, and model complexity.Availability and implementationKimma is freely available on GitHub https://github.com/BIGslu/kimma with an instructional vignette at https://bigslu.github.io/kimma_vignette/kimma_vignette.html.
      PubDate: Thu, 04 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad279
      Issue No: Vol. 39, No. 5 (2023)
       
  • PascalX: a Python library for GWAS gene and pathway enrichment tests

    • Free pre-print version: Loading...

      First page: btad296
      Abstract: Summary‘PascalX’ is a Python library providing fast and accurate tools for mapping SNP-wise GWAS summary statistics. Specifically, it allows for scoring genes and annotated gene sets for enrichment signals based on data from, both, single GWAS and pairs of GWAS. The gene scores take into account the correlation pattern between SNPs. They are based on the cumulative density function of a linear combination of χ2 distributed random variables, which can be calculated either approximately or exactly to high precision. Acceleration via multithreading and GPU is supported. The code of PascalX is fully open source and well suited as a base for method development in the GWAS enrichment test context.Availability and implementationThe source code is available at https://github.com/BergmannLab/PascalX and archived under doi://10.5281/zenodo.4429922. A user manual with usage examples is available at https://bergmannlab.github.io/PascalX/.
      PubDate: Wed, 03 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad296
      Issue No: Vol. 39, No. 5 (2023)
       
  • pyGOMoDo: GPCRs modeling and docking with python

    • Free pre-print version: Loading...

      First page: btad294
      Abstract: MotivationWe present pyGOMoDo, a Python library to perform homology modeling and docking, specifically designed for human GPCRs. pyGOMoDo is a python wrap-up of the updated functionalities of GOMoDo web server (https://molsim.sci.univr.it/gomodo). It was developed having in mind its usage through Jupyter notebooks, where users can create their own protocols of modeling and docking of GPCRs. In this article, we focus on the internal structure and general capabilities of pyGOMoDO and on how it can be useful for carrying out structural biology studies of GPCRs.ResultsThe source code is freely available at https://github.com/rribeiro-sci/pygomodo under the Apache 2.0 license. Tutorial notebooks containing minimal working examples can be found at https://github.com/rribeiro-sci/pygomodo/tree/main/examples.
      PubDate: Wed, 03 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad294
      Issue No: Vol. 39, No. 5 (2023)
       
  • Signed Distance Correlation (SiDCo): an online implementation of distance
           correlation and partial distance correlation for data-driven network
           analysis

    • Free pre-print version: Loading...

      First page: btad210
      Abstract: MotivationThere is a need for easily accessible implementations that measure the strength of both linear and non-linear relationships between metabolites in biological systems as an approach for data-driven network development. While multiple tools implement linear Pearson and Spearman methods, there are no such tools that assess distance correlation.ResultsWe present here SIgned Distance COrrelation (SiDCo). SiDCo is a GUI platform for calculation of distance correlation in omics data, measuring linear and non-linear dependencies between variables, as well as correlation between vectors of different lengths, e.g. different sample sizes. By combining the sign of the overall trend from Pearson’s correlation with distance correlation values, we further provide a novel “signed distance correlation” of particular use in metabolomic and lipidomic analyses. Distance correlations can be selected as one-to-one or one-to-all correlations, showing relationships between each feature and all other features one at a time or in combination. Additionally, we implement “partial distance correlation,” calculated using the Gaussian Graphical model approach adapted to distance covariance. Our platform provides an easy-to-use software implementation that can be applied to the investigation of any dataset.Availability and implementationThe SiDCo software application is freely available at https://complimet.ca/sidco. Supplementary help pages are provided at https://complimet.ca/sidco. Supplementary MaterialSupplementary Material shows an example of an application of SiDCo in metabolomics.
      PubDate: Wed, 03 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad210
      Issue No: Vol. 39, No. 5 (2023)
       
  • PepGM: a probabilistic graphical model for taxonomic inference of viral
           proteome samples with associated confidence scores

    • Free pre-print version: Loading...

      First page: btad289
      Abstract: MotivationInferring taxonomy in mass spectrometry-based shotgun proteomics is a complex task. In multi-species or viral samples of unknown taxonomic origin, the presence of proteins and corresponding taxa must be inferred from a list of identified peptides, which is often complicated by protein homology: many proteins do not only share peptides within a taxon but also between taxa. However, the correct taxonomic inference is crucial when identifying different viral strains with high-sequence homology—considering, e.g., the different epidemiological characteristics of the various strains of severe acute respiratory syndrome-related coronavirus-2. Additionally, many viruses mutate frequently, further complicating the correct identification of viral proteomic samples.ResultsWe present PepGM, a probabilistic graphical model for the taxonomic assignment of virus proteomic samples with strain-level resolution and associated confidence scores. PepGM combines the results of a standard proteomic database search algorithm with belief propagation to calculate the marginal distributions, and thus confidence scores, for potential taxonomic assignments. We demonstrate the performance of PepGM using several publicly available virus proteomic datasets, showing its strain-level resolution performance. In two out of eight cases, the taxonomic assignments were only correct on the species level, which PepGM clearly indicates by lower confidence scores.Availability and implementationPepGM is written in Python and embedded into a Snakemake workflow. It is available at https://github.com/BAMeScience/PepGM.
      PubDate: Tue, 02 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad289
      Issue No: Vol. 39, No. 5 (2023)
       
  • BUSZ: compressed BUS files

    • Free pre-print version: Loading...

      First page: btad295
      Abstract: SummaryWe describe a compression scheme for BUS files and an implementation of the algorithm in the BUStools software. Our compression algorithm yields smaller file sizes than gzip, at significantly faster compression and decompression speeds. We evaluated our algorithm on 533 BUS files from scRNA-seq experiments with a total size of 1TB. Our compression is 2.2× faster than the fastest gzip option 35% slower than the fastest zstd option and results in 1.5× smaller files than both methods. This amounts to an 8.3× reduction in the file size, resulting in a compressed size of 122GB for the dataset.Availability and implementationA complete description of the format is available at https://github.com/BUStools/BUSZ-format and an implementation at https://github.com/BUStools/bustools. The code to reproduce the results of this article is available at https://github.com/pmelsted/BUSZ_paper.
      PubDate: Tue, 02 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad295
      Issue No: Vol. 39, No. 5 (2023)
       
  • VirPipe: an easy-to-use and customizable pipeline for detecting viral
           genomes from Nanopore sequencing

    • Free pre-print version: Loading...

      First page: btad293
      Abstract: Summary Detection and analysis of viral genomes with Nanopore sequencing has shown great promise in the surveillance of pathogen outbreaks. However, the number of virus detection pipelines supporting Nanopore sequencing is very limited. Here, we present VirPipe, a new pipeline for the detection of viral genomes from Nanopore or Illumina sequencing input featuring streamlined installation and customization.Availability and implementationVirPipe source code and documentation are freely available for download at https://github.com/KijinKims/VirPipe, implemented in Python and Nextflow.
      PubDate: Tue, 02 May 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad293
      Issue No: Vol. 39, No. 5 (2023)
       
  • Predicting allosteric pockets in protein biological assemblages

    • Free pre-print version: Loading...

      First page: btad275
      Abstract: MotivationAllostery enables changes to the dynamic behavior of a protein at distant positions induced by binding. Here, we present APOP, a new allosteric pocket prediction method, which perturbs the pockets formed in the structure by stiffening pairwise interactions in the elastic network across the pocket, to emulate ligand binding. Ranking the pockets based on the shifts in the global mode frequencies, as well as their mean local hydrophobicities, leads to high prediction success when tested on a dataset of allosteric proteins, composed of both monomers and multimeric assemblages.ResultsOut of the 104 test cases, APOP predicts known allosteric pockets for 92 within the top 3 rank out of multiple pockets available in the protein. In addition, we demonstrate that APOP can also find new alternative allosteric pockets in proteins. Particularly interesting findings are the discovery of previously overlooked large pockets located in the centers of many protein biological assemblages; binding of ligands at these sites would likely be particularly effective in changing the protein’s global dynamics.Availability and implementationAPOP is freely available as an open-source code (https://github.com/Ambuj-UF/APOP) and as a web server at https://apop.bb.iastate.edu/.
      PubDate: Fri, 28 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad275
      Issue No: Vol. 39, No. 5 (2023)
       
  • CNV-ClinViewer: enhancing the clinical interpretation of large copy-number
           variants online

    • Free pre-print version: Loading...

      First page: btad290
      Abstract: MotivationPathogenic copy-number variants (CNVs) can cause a heterogeneous spectrum of rare and severe disorders. However, most CNVs are benign and are part of natural variation in human genomes. CNV pathogenicity classification, genotype–phenotype analyses, and therapeutic target identification are challenging and time-consuming tasks that require the integration and analysis of information from multiple scattered sources by experts.ResultsHere, we introduce the CNV-ClinViewer, an open-source web application for clinical evaluation and visual exploration of CNVs. The application enables real-time interactive exploration of large CNV datasets in a user-friendly designed interface and facilitates semi-automated clinical CNV interpretation following the ACMG guidelines by integrating the ClassifCNV tool. In combination with clinical judgment, the application enables clinicians and researchers to formulate novel hypotheses and guide their decision-making process. Subsequently, the CNV-ClinViewer enhances for clinical investigators’ patient care and for basic scientists’ translational genomic research.Availability and implementationThe web application is freely available at https://cnv-ClinViewer.broadinstitute.org and the open-source code can be found at https://github.com/LalResearchGroup/CNV-clinviewer.
      PubDate: Thu, 27 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad290
      Issue No: Vol. 39, No. 5 (2023)
       
  • A maximum kernel-based association test to detect the pleiotropic genetic
           effects on multiple phenotypes

    • Free pre-print version: Loading...

      First page: btad291
      Abstract: MotivationTesting the association between multiple phenotypes with a set of genetic variants simultaneously, rather than analyzing one trait at a time, is receiving increasing attention for its high statistical power and easy explanation on pleiotropic effects. The kernel-based association test (KAT), being free of data dimensions and structures, has proven to be a good alternative method for genetic association analysis with multiple phenotypes. However, KAT suffers from substantial power loss when multiple phenotypes have moderate to strong correlations. To handle this issue, we propose a maximum KAT (MaxKAT) and suggest using the generalized extreme value distribution to calculate its statistical significance under the null hypothesis.ResultsWe show that MaxKAT reduces computational intensity greatly while maintaining high accuracy. Extensive simulations demonstrate that MaxKAT can properly control type I error rates and obtain remarkably higher power than KAT under most of the considered scenarios. Application to a porcine dataset used in biomedical experiments of human disease further illustrates its practical utility.Availability and implementationThe R package MaxKAT that implements the proposed method is available on Github https://github.com/WangJJ-xrk/MaxKAT.
      PubDate: Thu, 27 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad291
      Issue No: Vol. 39, No. 5 (2023)
       
  • DeepMicroGen: a generative adversarial network-based method for
           longitudinal microbiome data imputation

    • Free pre-print version: Loading...

      First page: btad286
      Abstract: MotivationThe human microbiome, which is linked to various diseases by growing evidence, has a profound impact on human health. Since changes in the composition of the microbiome across time are associated with disease and clinical outcomes, microbiome analysis should be performed in a longitudinal study. However, due to limited sample sizes and differing numbers of timepoints for different subjects, a significant amount of data cannot be utilized, directly affecting the quality of analysis results. Deep generative models have been proposed to address this lack of data issue. Specifically, a generative adversarial network (GAN) has been successfully utilized for data augmentation to improve prediction tasks. Recent studies have also shown improved performance of GAN-based models for missing value imputation in a multivariate time series dataset compared with traditional imputation methods.ResultsThis work proposes DeepMicroGen, a bidirectional recurrent neural network-based GAN model, trained on the temporal relationship between the observations, to impute the missing microbiome samples in longitudinal studies. DeepMicroGen outperforms standard baseline imputation methods, showing the lowest mean absolute error for both simulated and real datasets. Finally, the proposed model improved the predicted clinical outcome for allergies, by providing imputation for an incomplete longitudinal dataset used to train the classifier.Availability and implementationDeepMicroGen is publicly available at https://github.com/joungmin-choi/DeepMicroGen.
      PubDate: Wed, 26 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad286
      Issue No: Vol. 39, No. 5 (2023)
       
  • twas_sim, a Python-based tool for simulation and power analysis of
           transcriptome-wide association analysis

    • Free pre-print version: Loading...

      First page: btad288
      Abstract: SummaryGenome-wide association studies (GWASs) have identified numerous genetic variants associated with complex disease risk; however, most of these associations are non-coding, complicating identifying their proximal target gene. Transcriptome-wide association studies (TWASs) have been proposed to mitigate this gap by integrating expression quantitative trait loci (eQTL) data with GWAS data. Numerous methodological advancements have been made for TWAS, yet each approach requires ad hoc simulations to demonstrate feasibility. Here, we present twas_sim, a computationally scalable and easily extendable tool for simplified performance evaluation and power analysis for TWAS methods.Availability and implementationSoftware and documentation are available at https://github.com/mancusolab/twas_sim.
      PubDate: Wed, 26 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad288
      Issue No: Vol. 39, No. 5 (2023)
       
  • Fixing molecular complexes in BioPAX standards to enrich interactions and
           detect redundancies using semantic web technologies

    • Free pre-print version: Loading...

      First page: btad257
      Abstract: MotivationMolecular complexes play a major role in the regulation of biological pathways. The Biological Pathway Exchange format (BioPAX) facilitates the integration of data sources describing interactions some of which involving complexes. The BioPAX specification explicitly prevents complexes to have any component that is another complex (unless this component is a black-box complex whose composition is unknown). However, we observed that the well-curated Reactome pathway database contains such recursive complexes of complexes. We propose reproductible and semantically rich SPARQL queries for identifying and fixing invalid complexes in BioPAX databases, and evaluate the consequences of fixing these nonconformities in the Reactome database.ResultsFor the Homo sapiens version of Reactome, we identify 5833 recursively defined complexes out of the 14 987 complexes (39%). This situation is not specific to the Human dataset, as all tested species of Reactome exhibit between 30% (Plasmodium falciparum) and 40% (Sus scrofa, Bos taurus, Canis familiaris, and Gallus gallus) of recursive complexes. As an additional consequence, the procedure also allows the detection of complex redundancies. Overall, this method improves the conformity and the automated analysis of the graph by repairing the topology of the complexes in the graph. This will allow to apply further reasoning methods on better consistent data.Availability and implementationWe provide a Jupyter notebook detailing the analysis https://github.com/cjuigne/non_conformities_detection_biopax.
      PubDate: Tue, 25 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad257
      Issue No: Vol. 39, No. 5 (2023)
       
  • pyInfinityFlow: optimized imputation and analysis of high-dimensional flow
           cytometry data for millions of cells

    • Free pre-print version: Loading...

      First page: btad287
      Abstract: MotivationWhile conventional flow cytometry is limited to dozens of markers, new experimental and computational strategies, such as Infinity Flow, allow for the generation and imputation of hundreds of cell surface protein markers in millions of cells. Here, we describe an end-to-end analysis workflow for Infinity Flow data in Python.ResultspyInfinityFlow enables the efficient analysis of millions of cells, without down-sampling, through direct integration with well-established Python packages for single-cell genomics analysis. pyInfinityFlow accurately identifies both common and extremely rare cell populations which are challenging to define from single-cell genomics studies alone. We demonstrate that this workflow can nominate novel markers to design new flow cytometry gating strategies for predicted cell populations. pyInfinityFlow can be extended to diverse cell discovery analyses with flexibility to adapt to diverse Infinity Flow experimental designs.Availability and implementationpyInfinityFlow is freely available in GitHub (https://github.com/KyleFerchen/pyInfinityFlow) and on PyPI (https://pypi.org/project/pyInfinityFlow/). Package documentation with tutorials on a test dataset is available by Read the Docs (pyinfinityflow.readthedocs.io). The scripts and data for reproducing the results are available at https://github.com/KyleFerchen/pyInfinityFlow/tree/main/analysis_scripts, along with the raw flow cytometry input data.
      PubDate: Tue, 25 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad287
      Issue No: Vol. 39, No. 5 (2023)
       
  • epiTCR: a highly sensitive predictor for TCR–peptide binding

    • Free pre-print version: Loading...

      First page: btad284
      Abstract: MotivationPredicting the binding between T-cell receptor (TCR) and peptide presented by human leucocyte antigen molecule is a highly challenging task and a key bottleneck in the development of immunotherapy. Existing prediction tools, despite exhibiting good performance on the datasets they were built with, suffer from low true positive rates when used to predict epitopes capable of eliciting T-cell responses in patients. Therefore, an improved tool for TCR–peptide prediction built upon a large dataset combining existing publicly available data is still needed.ResultsWe collected data from five public databases (IEDB, TBAdb, VDJdb, McPAS-TCR, and 10X) to form a dataset of >3 million TCR–peptide pairs, 3.27% of which were binding interactions. We proposed epiTCR, a Random Forest-based method dedicated to predicting the TCR–peptide interactions. epiTCR used simple input of TCR CDR3β sequences and antigen sequences, which are encoded by flattened BLOSUM62. epiTCR performed with area under the curve (0.98) and higher sensitivity (0.94) than other existing tools (NetTCR, Imrex, ATM-TCR, and pMTnet), while maintaining comparable prediction specificity (0.9). We identified seven epitopes that contributed to 98.67% of false positives predicted by epiTCR and exerted similar effects on other tools. We also demonstrated a considerable influence of peptide sequences on prediction, highlighting the need for more diverse peptides in a more balanced dataset. In conclusion, epiTCR is among the most well-performing tools, thanks to the use of combined data from public sources and its use will contribute to the quest in identifying neoantigens for precision cancer immunotherapy.Availability and implementationepiTCR is available on GitHub (https://github.com/ddiem-ri-4D/epiTCR).
      PubDate: Mon, 24 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad284
      Issue No: Vol. 39, No. 5 (2023)
       
  • DIGGER-Bac: prediction of seed regions for high-fidelity construction of
           synthetic small RNAs in bacteria

    • Free pre-print version: Loading...

      First page: btad285
      Abstract: SummarySynthetic small RNAs (sRNAs) are gaining increasing attention in the field of synthetic biology and bioengineering for efficient post-transcriptional regulation of gene expression. However, the optimal design of synthetic sRNAs is challenging because alterations may impair functions or off-target effects can arise. Here, we introduce DIGGER-Bac, a toolbox for Design and Identification of seed regions for Golden Gate assembly and Expression of synthetic sRNAs in Bacteria. The SEEDling tool predicts optimal sRNA seed regions in combination with user-defined sRNA scaffolds for efficient regulation of specified mRNA targets. Results are passed on to the G-GArden tool, which assists with primer design for high-fidelity Golden Gate assembly of the desired synthetic sRNA constructs.
      PubDate: Sat, 22 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad285
      Issue No: Vol. 39, No. 5 (2023)
       
  • Digital PCR cluster predictor: a universal R-package and shiny app for the
           automated analysis of multiplex digital PCR data

    • Free pre-print version: Loading...

      First page: btad282
      Abstract: SummaryDigital polymerase chain reaction (dPCR) is an emerging technology that enables accurate and sensitive quantification of nucleic acids. Most available dPCR systems have two channel optics, with ad hoc software limited to the analysis of single and duplex assays. Although multiplexing strategies were developed, variable assay designs, dPCR systems, and the analysis of low DNA input data restricted the ability for a universal automated clustering approach. To overcome these issues, we developed dPCR Cluster Predictor (dPCP), an R package and a Shiny app for automated analysis of up to 4-plex dPCR data. dPCP can analyse and visualize data generated by multiple dPCR systems carrying out accurate and fast clustering not influenced by the amount and integrity of input of nucleic acids. With the companion Shiny app, the functionalities of dPCP can be accessed through a web browser.
      PubDate: Sat, 22 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad282
      Issue No: Vol. 39, No. 5 (2023)
       
  • ICAT: a novel algorithm to robustly identify cell states following
           perturbations in single-cell transcriptomes

    • Free pre-print version: Loading...

      First page: btad278
      Abstract: MotivationThe detection of distinct cellular identities is central to the analysis of single-cell RNA sequencing (scRNA-seq) experiments. However, in perturbation experiments, current methods typically fail to correctly match cell states between conditions or erroneously remove population substructure. Here, we present the novel, unsupervised algorithm Identify Cell states Across Treatments (ICAT) that employs self-supervised feature weighting and control-guided clustering to accurately resolve cell states across heterogeneous conditions.ResultsUsing simulated and real datasets, we show ICAT is superior in identifying and resolving cell states compared with current integration workflows. While requiring no a priori knowledge of extant cell states or discriminatory marker genes, ICAT is robust to low signal strength, high perturbation severity, and disparate cell type proportions. We empirically validate ICAT in a developmental model and find that only ICAT identifies a perturbation-unique cellular response. Taken together, our results demonstrate that ICAT offers a significant improvement in defining cellular responses to perturbation in scRNA-seq data.Availability and implementationhttps://github.com/BradhamLab/icat.
      PubDate: Sat, 22 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad278
      Issue No: Vol. 39, No. 5 (2023)
       
  • HOTSPOT: hierarchical host prediction for assembled plasmid contigs with
           transformer

    • Free pre-print version: Loading...

      First page: btad283
      Abstract: MotivationAs prevalent extrachromosomal replicons in many bacteria, plasmids play an essential role in their hosts’ evolution and adaptation. The host range of a plasmid refers to the taxonomic range of bacteria in which it can replicate and thrive. Understanding host ranges of plasmids sheds light on studying the roles of plasmids in bacterial evolution and adaptation. Metagenomic sequencing has become a major means to obtain new plasmids and derive their hosts. However, host prediction for assembled plasmid contigs still needs to tackle several challenges: different sequence compositions and copy numbers between plasmids and the hosts, high diversity in plasmids, and limited plasmid annotations. Existing tools have not yet achieved an ideal tradeoff between sensitivity and precision on metagenomic assembled contigs.ResultsIn this work, we construct a hierarchical classification tool named HOTSPOT, whose backbone is a phylogenetic tree of the bacterial hosts from phylum to species. By incorporating the state-of-the-art language model, Transformer, in each node’s taxon classifier, the top-down tree search achieves an accurate host taxonomy prediction for the input plasmid contigs. We rigorously tested HOTSPOT on multiple datasets, including RefSeq complete plasmids, artificial contigs, simulated metagenomic data, mock metagenomic data, the Hi-C dataset, and the CAMI2 marine dataset. All experiments show that HOTSPOT outperforms other popular methods.Availability and implementationThe source code of HOTSPOT is available via: https://github.com/Orin-beep/HOTSPOT
      PubDate: Sat, 22 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad283
      Issue No: Vol. 39, No. 5 (2023)
       
  • Predicting the pathogenicity of missense variants using features derived
           from AlphaFold2

    • Free pre-print version: Loading...

      First page: btad280
      Abstract: MotivationMissense variants are a frequent class of variation within the coding genome, and some of them cause Mendelian diseases. Despite advances in computational prediction, classifying missense variants into pathogenic or benign remains a major challenge in the context of personalized medicine. Recently, the structure of the human proteome was derived with unprecedented accuracy using the artificial intelligence system AlphaFold2. This raises the question of whether AlphaFold2 wild-type structures can improve the accuracy of computational pathogenicity prediction for missense variants.ResultsTo address this, we first engineered a set of features for each amino acid from these structures. We then trained a random forest to distinguish between relatively common (proxy-benign) and singleton (proxy-pathogenic) missense variants from gnomAD v3.1. This yielded a novel AlphaFold2-based pathogenicity prediction score, termed AlphScore. Important feature classes used by AlphScore are solvent accessibility, amino acid network related features, features describing the physicochemical environment, and AlphaFold2’s quality parameter (predicted local distance difference test). AlphScore alone showed lower performance than existing in silico scores used for missense prediction, such as CADD or REVEL. However, when AlphScore was added to those scores, the performance increased, as measured by the approximation of deep mutational scan data, as well as the prediction of expert-curated missense variants from the ClinVar database. Overall, our data indicate that the integration of AlphaFold2-predicted structures can improve pathogenicity prediction of missense variants.Availability and implementationAlphScore, combinations of AlphScore with existing scores, as well as variants used for training and testing are publicly available.
      PubDate: Fri, 21 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad280
      Issue No: Vol. 39, No. 5 (2023)
       
  • Effective design and inference for cell sorting and sequencing based
           massively parallel reporter assays

    • Free pre-print version: Loading...

      First page: btad277
      Abstract: MotivationThe ability to measure the phenotype of millions of different genetic designs using Massively Parallel Reporter Assays (MPRAs) has revolutionized our understanding of genotype-to-phenotype relationships and opened avenues for data-centric approaches to biological design. However, our knowledge of how best to design these costly experiments and the effect that our choices have on the quality of the data produced is lacking.ResultsIn this article, we tackle the issues of data quality and experimental design by developing FORECAST, a Python package that supports the accurate simulation of cell-sorting and sequencing-based MPRAs and robust maximum likelihood-based inference of genetic design function from MPRA data. We use FORECAST’s capabilities to reveal rules for MPRA experimental design that help ensure accurate genotype-to-phenotype links and show how the simulation of MPRA experiments can help us better understand the limits of prediction accuracy when this data are used for training deep learning-based classifiers. As the scale and scope of MPRAs grows, tools like FORECAST will help ensure we make informed decisions during their development and the most of the data produced.Availability and implementationThe FORECAST package is available at: https://gitlab.com/Pierre-Aurelien/forecast. Code for the deep learning analysis performed in this study is available at: https://gitlab.com/Pierre-Aurelien/rebeca.
      PubDate: Fri, 21 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad277
      Issue No: Vol. 39, No. 5 (2023)
       
  • AcrNET: predicting anti-CRISPR with deep learning

    • Free pre-print version: Loading...

      First page: btad259
      Abstract: MotivationAs an important group of proteins discovered in phages, anti-CRISPR inhibits the activity of the immune system of bacteria (i.e. CRISPR-Cas), offering promise for gene editing and phage therapy. However, the prediction and discovery of anti-CRISPR are challenging due to their high variability and fast evolution. Existing biological studies rely on known CRISPR and anti-CRISPR pairs, which may not be practical considering the huge number. Computational methods struggle with prediction performance. To address these issues, we propose a novel deep neural network for anti-CRISPR analysis (AcrNET), which achieves significant performance.ResultsOn both the cross-fold and cross-dataset validation, our method outperforms the state-of-the-art methods. Notably, AcrNET improves the prediction performance by at least 15% regarding the F1 score for the cross-dataset test problem comparing with state-of-art Deep Learning method. Moreover, AcrNET is the first computational method to predict the detailed anti-CRISPR classes, which may help illustrate the anti-CRISPR mechanism. Taking advantage of a Transformer protein language model ESM-1b, which was pre-trained on 250 million protein sequences, AcrNET overcomes the data scarcity problem. Extensive experiments and analysis suggest that the Transformer model feature, evolutionary feature, and local structure feature complement each other, which indicates the critical properties of anti-CRISPR proteins. AlphaFold prediction, further motif analysis, and docking experiments further demonstrate that AcrNET can capture the evolutionarily conserved pattern and the interaction between anti-CRISPR and the target implicitly.Availability and implementationWeb server: https://proj.cse.cuhk.edu.hk/aihlab/AcrNET/. Training code and pre-trained model are available at.
      PubDate: Fri, 21 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad259
      Issue No: Vol. 39, No. 5 (2023)
       
  • FAS: assessing the similarity between proteins using multi-layered feature
           architectures

    • Free pre-print version: Loading...

      First page: btad226
      Abstract: MotivationProtein sequence comparison is a fundamental element in the bioinformatics toolkit. When sequences are annotated with features such as functional domains, transmembrane domains, low complexity regions or secondary structure elements, the resulting feature architectures allow better informed comparisons. However, many existing schemes for scoring architecture similarities cannot cope with features arising from multiple annotation sources. Those that do fall short in the resolution of overlapping and redundant feature annotations.ResultsHere, we introduce FAS, a scoring method that integrates features from multiple annotation sources in a directed acyclic architecture graph. Redundancies are resolved as part of the architecture comparison by finding the paths through the graphs that maximize the pair-wise architecture similarity. In a large-scale evaluation on more than 10 000 human-yeast ortholog pairs, architecture similarities assessed with FAS are consistently more plausible than those obtained using e-values to resolve overlaps or leaving overlaps unresolved. Three case studies demonstrate the utility of FAS on architecture comparison tasks: benchmarking of orthology assignment software, identification of functionally diverged orthologs, and diagnosing protein architecture changes stemming from faulty gene predictions. With the help of FAS, feature architecture comparisons can now be routinely integrated into these and many other applications.Availability and implementationFAS is available as python package: https://pypi.org/project/greedyFAS/.
      PubDate: Fri, 21 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad226
      Issue No: Vol. 39, No. 5 (2023)
       
  • Deciphering associations between gut microbiota and clinical factors using
           microbial modules

    • Free pre-print version: Loading...

      First page: btad213
      Abstract: MotivationHuman gut microbiota plays a vital role in maintaining body health. The dysbiosis of gut microbiota is associated with a variety of diseases. It is critical to uncover the associations between gut microbiota and disease states as well as other intrinsic or environmental factors. However, inferring alterations of individual microbial taxa based on relative abundance data likely leads to false associations and conflicting discoveries in different studies. Moreover, the effects of underlying factors and microbe–microbe interactions could lead to the alteration of larger sets of taxa. It might be more robust to investigate gut microbiota using groups of related taxa instead of the composition of individual taxa.ResultsWe proposed a novel method to identify underlying microbial modules, i.e. groups of taxa with similar abundance patterns affected by a common latent factor, from longitudinal gut microbiota and applied it to inflammatory bowel disease (IBD). The identified modules demonstrated closer intragroup relationships, indicating potential microbe–microbe interactions and influences of underlying factors. Associations between the modules and several clinical factors were investigated, especially disease states. The IBD-associated modules performed better in stratifying the subjects compared with the relative abundance of individual taxa. The modules were further validated in external cohorts, demonstrating the efficacy of the proposed method in identifying general and robust microbial modules. The study reveals the benefit of considering the ecological effects in gut microbiota analysis and the great promise of linking clinical factors with underlying microbial modules.Availability and implementationhttps://github.com/rwang-z/microbial_module.git.
      PubDate: Fri, 21 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad213
      Issue No: Vol. 39, No. 5 (2023)
       
  • A functional analysis of omic network embedding spaces reveals key altered
           functions in cancer

    • Free pre-print version: Loading...

      First page: btad281
      Abstract: MotivationAdvances in omics technologies have revolutionized cancer research by producing massive datasets. Common approaches to deciphering these complex data are by embedding algorithms of molecular interaction networks. These algorithms find a low-dimensional space in which similarities between the network nodes are best preserved. Currently available embedding approaches mine the gene embeddings directly to uncover new cancer-related knowledge. However, these gene-centric approaches produce incomplete knowledge, since they do not account for the functional implications of genomic alterations. We propose a new, function-centric perspective and approach, to complement the knowledge obtained from omic data.ResultsWe introduce our Functional Mapping Matrix (FMM) to explore the functional organization of different tissue-specific and species-specific embedding spaces generated by a Non-negative Matrix Tri-Factorization algorithm. Also, we use our FMM to define the optimal dimensionality of these molecular interaction network embedding spaces. For this optimal dimensionality, we compare the FMMs of the most prevalent cancers in human to FMMs of their corresponding control tissues. We find that cancer alters the positions in the embedding space of cancer-related functions, while it keeps the positions of the noncancer-related ones. We exploit this spacial ‘movement’ to predict novel cancer-related functions. Finally, we predict novel cancer-related genes that the currently available methods for gene-centric analyses cannot identify; we validate these predictions by literature curation and retrospective analyses of patient survival data.Availability and implementationData and source code can be accessed at https://github.com/gaiac/FMM.
      PubDate: Fri, 21 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad281
      Issue No: Vol. 39, No. 5 (2023)
       
  • matchRanges: generating null hypothesis genomic ranges via
           covariate-matched sampling

    • Free pre-print version: Loading...

      First page: btad197
      Abstract: MotivationDeriving biological insights from genomic data commonly requires comparing attributes of selected genomic loci to a null set of loci. The selection of this null set is non-trivial, as it requires careful consideration of potential covariates, a problem that is exacerbated by the non-uniform distribution of genomic features including genes, enhancers, and transcription factor binding sites. Propensity score-based covariate matching methods allow the selection of null sets from a pool of possible items while controlling for multiple covariates; however, existing packages do not operate on genomic data classes and can be slow for large data sets making them difficult to integrate into genomic workflows.ResultsTo address this, we developed matchRanges, a propensity score-based covariate matching method for the efficient and convenient generation of matched null ranges from a set of background ranges within the Bioconductor framework.Availability and implementationPackage: https://bioconductor.org/packages/nullranges, Code: https://github.com/nullranges, Documentation: https://nullranges.github.io/nullranges.
      PubDate: Fri, 21 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad197
      Issue No: Vol. 39, No. 5 (2023)
       
  • DFHiC: a dilated full convolution model to enhance the resolution of Hi-C
           data

    • Free pre-print version: Loading...

      First page: btad211
      Abstract: MotivationHi-C technology has been the most widely used chromosome conformation capture (3C) experiment that measures the frequency of all paired interactions in the entire genome, which is a powerful tool for studying the 3D structure of the genome. The fineness of the constructed genome structure depends on the resolution of Hi-C data. However, due to the fact that high-resolution Hi-C data require deep sequencing and thus high experimental cost, most available Hi-C data are in low-resolution. Hence, it is essential to enhance the quality of Hi-C data by developing the effective computational methods.ResultsIn this work, we propose a novel method, so-called DFHiC, which generates the high-resolution Hi-C matrix from the low-resolution Hi-C matrix in the framework of the dilated convolutional neural network. The dilated convolution is able to effectively explore the global patterns in the overall Hi-C matrix by taking advantage of the information of the Hi-C matrix in a way of the longer genomic distance. Consequently, DFHiC can improve the resolution of the Hi-C matrix reliably and accurately. More importantly, the super-resolution Hi-C data enhanced by DFHiC is more in line with the real high-resolution Hi-C data than those done by the other existing methods, in terms of both chromatin significant interactions and identifying topologically associating domains.Availability and implementationhttps://github.com/BinWangCSU/DFHiC.
      PubDate: Fri, 21 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad211
      Issue No: Vol. 39, No. 5 (2023)
       
  • Molecular property prediction by contrastive learning with
           attention-guided positive sample selection

    • Free pre-print version: Loading...

      First page: btad258
      Abstract: MotivationPredicting molecular properties is one of the fundamental problems in drug design and discovery. In recent years, self-supervised learning (SSL) has shown its promising performance in image recognition, natural language processing, and single-cell data analysis. Contrastive learning (CL) is a typical SSL method used to learn the features of data so that the trained model can more effectively distinguish the data. One important issue of CL is how to select positive samples for each training example, which will significantly impact the performance of CL.ResultsIn this article, we propose a new method for molecular property prediction (MPP) by Contrastive Learning with Attention-guided Positive-sample Selection (CLAPS). First, we generate positive samples for each training example based on an attention-guided selection scheme. Second, we employ a Transformer encoder to extract latent feature vectors and compute the contrastive loss aiming to distinguish positive and negative sample pairs. Finally, we use the trained encoder for predicting molecular properties. Experiments on various benchmark datasets show that our approach outperforms the state-of-the-art (SOTA) methods in most cases.Availability and implementationThe code is publicly available at https://github.com/wangjx22/CLAPS.
      PubDate: Thu, 20 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad258
      Issue No: Vol. 39, No. 5 (2023)
       
  • LogBTF: gene regulatory network inference using Boolean threshold network
           model from single-cell gene expression data

    • Free pre-print version: Loading...

      First page: btad256
      Abstract: MotivationFrom a systematic perspective, it is crucial to infer and analyze gene regulatory network (GRN) from high-throughput single-cell RNA sequencing data. However, most existing GRN inference methods mainly focus on the network topology, only few of them consider how to explicitly describe the updated logic rules of regulation in GRNs to obtain their dynamics. Moreover, some inference methods also fail to deal with the over-fitting problem caused by the noise in time series data.ResultsIn this article, we propose a novel embedded Boolean threshold network method called LogBTF, which effectively infers GRN by integrating regularized logistic regression and Boolean threshold function. First, the continuous gene expression values are converted into Boolean values and the elastic net regression model is adopted to fit the binarized time series data. Then, the estimated regression coefficients are applied to represent the unknown Boolean threshold function of the candidate Boolean threshold network as the dynamical equations. To overcome the multi-collinearity and over-fitting problems, a new and effective approach is designed to optimize the network topology by adding a perturbation design matrix to the input data and thereafter setting sufficiently small elements of the output coefficient vector to zeros. In addition, the cross-validation procedure is implemented into the Boolean threshold network model framework to strengthen the inference capability. Finally, extensive experiments on one simulated Boolean value dataset, dozens of simulation datasets, and three real single-cell RNA sequencing datasets demonstrate that the LogBTF method can infer GRNs from time series data more accurately than some other alternative methods for GRN inference.Availability and implementationThe source data and code are available at https://github.com/zpliulab/LogBTF.
      PubDate: Thu, 20 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad256
      Issue No: Vol. 39, No. 5 (2023)
       
  • RING-PyMOL: residue interaction networks of structural ensembles and
           molecular dynamics

    • Free pre-print version: Loading...

      First page: btad260
      Abstract:  RING-PyMOL is a plugin for PyMOL providing a set of analysis tools for structural ensembles and molecular dynamic simulations. RING-PyMOL combines residue interaction networks, as provided by the RING software, with structural clustering to enhance the analysis and visualization of the conformational complexity. It combines precise calculation of non-covalent interactions with the power of PyMOL to manipulate and visualize protein structures. The plugin identifies and highlights correlating contacts and interaction patterns that can explain structural allostery, active sites, and structural heterogeneity connected with molecular function. It is easy to use and extremely fast, processing and rendering hundreds of models and long trajectories in seconds. RING-PyMOL generates a number of interactive plots and output files for use with external tools. The underlying RING software has been improved extensively. It is 10 times faster, can process mmCIF files and it identifies typed interactions also for nucleic acids.Availability and implementationhttps://github.com/BioComputingUP/ring-pymol
      PubDate: Thu, 20 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad260
      Issue No: Vol. 39, No. 5 (2023)
       
  • CONNECTOR, fitting and clustering of longitudinal data to reveal a new
           risk stratification system

    • Free pre-print version: Loading...

      First page: btad201
      Abstract: MotivationThe transition from evaluating a single time point to examining the entire dynamic evolution of a system is possible only in the presence of the proper framework. The strong variability of dynamic evolution makes the definition of an explanatory procedure for data fitting and clustering challenging.ResultsWe developed CONNECTOR, a data-driven framework able to analyze and inspect longitudinal data in a straightforward and revealing way. When used to analyze tumor growth kinetics over time in 1599 patient-derived xenograft growth curves from ovarian and colorectal cancers, CONNECTOR allowed the aggregation of time-series data through an unsupervised approach in informative clusters. We give a new perspective of mechanism interpretation, specifically, we define novel model aggregations and we identify unanticipated molecular associations with response to clinically approved therapies.Availability and implementationCONNECTOR is freely available under GNU GPL license at https://qbioturin.github.io/connector and https://doi.org/10.17504/protocols.io.8epv56e74g1b/v1.
      PubDate: Thu, 20 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad201
      Issue No: Vol. 39, No. 5 (2023)
       
  • ConsAlign: simultaneous RNA structural aligner based on rich transfer
           learning and thermodynamic ensemble model of alignment scoring

    • Free pre-print version: Loading...

      First page: btad255
      Abstract: MotivationTo capture structural homology in RNAs, alignment and folding (AF) of RNA homologs has been a fundamental framework around RNA science. Learning sufficient scoring parameters for simultaneous AF (SAF) is an undeveloped subject because evaluating them is computationally expensive.ResultsWe developed ConsTrain—a gradient-based machine learning method for rich SAF scoring. We also implemented ConsAlign—a SAF tool composed of ConsTrain’s learned scoring parameters. To aim for better AF quality, ConsAlign employs (1) transfer learning from well-defined scoring models and (2) the ensemble model between the ConsTrain model and a well-established thermodynamic scoring model. Keeping comparable running time, ConsAlign demonstrated competitive AF prediction quality among current AF tools.Availability and implementationOur code and our data are freely available at https://github.com/heartsh/consalign and https://github.com/heartsh/consprob-trained.
      PubDate: Wed, 19 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad255
      Issue No: Vol. 39, No. 5 (2023)
       
  • Evolink: a phylogenetic approach for rapid identification of
           genotype–phenotype associations in large-scale microbial multispecies
           data

    • Free pre-print version: Loading...

      First page: btad215
      Abstract: MotivationThe discovery of the genetic features that underly a phenotype is a fundamental task in microbial genomics. With the growing number of microbial genomes that are paired with phenotypic data, new challenges, and opportunities are arising for genotype-phenotype inference. Phylogenetic approaches are frequently used to adjust for the population structure of microbes but scaling them to trees with thousands of leaves representing heterogeneous populations is highly challenging. This greatly hinders the identification of prevalent genetic features that contribute to phenotypes that are observed in a wide diversity of species.ResultsIn this study, Evolink was developed as an approach to rapidly identify genotypes associated with phenotypes in large-scale multispecies microbial datasets. Compared with other similar tools, Evolink was consistently among the top-performing methods in terms of precision and sensitivity when applied to simulated and real-world flagella datasets. In addition, Evolink significantly outperformed all other approaches in terms of computation time. Application of Evolink on flagella and gram-staining datasets revealed findings that are consistent with known markers and supported by the literature. In conclusion, Evolink can rapidly detect phenotype-associated genotypes across multiple species, demonstrating its potential to be broadly utilized to identify gene families associated with traits of interest.Availability and implementationThe source code, docker container, and web server for Evolink are freely available at https://github.com/nlm-irp-jianglab/Evolink.
      PubDate: Wed, 19 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad215
      Issue No: Vol. 39, No. 5 (2023)
       
  • PyHMMER: a Python library binding to HMMER for efficient sequence analysis

    • Free pre-print version: Loading...

      First page: btad214
      Abstract: SummaryPyHMMER provides Python integration of the popular profile Hidden Markov Model software HMMER via Cython bindings. This allows the annotation of protein sequences with profile HMMs and building new ones directly with Python. PyHMMER increases flexibility of use, allowing creating queries directly from Python code, launching searches, and obtaining results without I/O, or accessing previously unavailable statistics like uncorrected P-values. A new parallelization model greatly improves performance when running multithreaded searches, while producing the exact same results as HMMER.Availability and implementationPyHMMER supports all modern Python versions (Python 3.6+) and similar platforms as HMMER (x86 or PowerPC UNIX systems). Pre-compiled packages are released via PyPI (https://pypi.org/project/pyhmmer/) and Bioconda (https://anaconda.org/bioconda/pyhmmer). The PyHMMER source code is available under the terms of the open-source MIT licence and hosted on GitHub (https://github.com/althonos/pyhmmer); its documentation is available on ReadTheDocs (https://pyhmmer.readthedocs.io).
      PubDate: Wed, 19 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad214
      Issue No: Vol. 39, No. 5 (2023)
       
  • 3D-MSNet: a point cloud-based deep learning model for untargeted feature
           detection and quantification in profile LC-HRMS data

    • Free pre-print version: Loading...

      First page: btad195
      Abstract: MotivationLiquid chromatography coupled with high-resolution mass spectrometry is widely used in composition profiling in untargeted metabolomics research. While retaining complete sample information, mass spectrometry (MS) data naturally have the characteristics of high dimensionality, high complexity, and huge data volume. In mainstream quantification methods, none of the existing methods can perform direct 3D analysis on lossless profile MS signals. All software simplify calculations by dimensionality reduction or lossy grid transformation, ignoring the full 3D signal distribution of MS data and resulting in inaccurate feature detection and quantification.ResultsOn the basis that the neural network is effective for high-dimensional data analysis and can discover implicit features from large amounts of complex data, in this work, we propose 3D-MSNet, a novel deep learning-based model for untargeted feature extraction. 3D-MSNet performs direct feature detection on 3D MS point clouds as an instance segmentation task. After training on a self-annotated 3D feature dataset, we compared our model with nine popular software (MS-DIAL, MZmine 2, XCMS Online, MarkerView, Compound Discoverer, MaxQuant, Dinosaur, DeepIso, PointIso) on two metabolomics and one proteomics public benchmark datasets. Our 3D-MSNet model outperformed other software with significant improvement in feature detection and quantification accuracy on all evaluation datasets. Furthermore, 3D-MSNet has high feature extraction robustness and can be widely applied to profile MS data acquired with various high-resolution mass spectrometers with various resolutions.Availability and implementation3D-MSNet is an open-source model and is freely available at https://github.com/CSi-Studio/3D-MSNet under a permissive license. Benchmark datasets, training dataset, evaluation methods, and results are available at https://doi.org/10.5281/zenodo.6582912.
      PubDate: Tue, 18 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad195
      Issue No: Vol. 39, No. 5 (2023)
       
  • Neither random nor censored: estimating intensity-dependent probabilities
           for missing values in label-free proteomics

    • Free pre-print version: Loading...

      First page: btad200
      Abstract: MotivationMass spectrometry proteomics is a powerful tool in biomedical research but its usefulness is limited by the frequent occurrence of missing values in peptides that cannot be reliably quantified (detected) for particular samples. Many analysis strategies have been proposed for missing values where the discussion often focuses on distinguishing whether values are missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR).ResultsStatistical models and algorithms are proposed for estimating the detection probabilities and for evaluating how much statistical information can or cannot be recovered from the missing value pattern. The probability that an intensity is detected is shown to be accurately modeled as a logit-linear function of the underlying intensity, showing that missing value process is intermediate between MAR and censoring. The detection probability asymptotes to 100% for high intensities, showing that missing values unrelated to intensity are rare. The rule applies globally to each dataset and is appropriate for both high and lowly expressed peptides. A probability model is developed that allows the distribution of unobserved intensities to be inferred from the observed values. The detection probability model is incorporated into a likelihood-based approach for assessing differential expression and successfully recovers statistical power compared to omitting the missing values from the analysis. In contrast, imputation methods are shown to perform poorly, either reducing statistical power or increasing the false discovery rate to unacceptable levels.Availability and implementationData and code to reproduce the results shown in this article are available from https://mengbo-li.github.io/protDP/.
      PubDate: Mon, 17 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad200
      Issue No: Vol. 39, No. 5 (2023)
       
  • EBD: an eye biomarker database

    • Free pre-print version: Loading...

      First page: btad194
      Abstract: MotivationMany ophthalmic disease biomarkers have been identified through comprehensive multiomics profiling, and hold significant potential in advancing the diagnosis, prognosis, and management of diseases. Meanwhile, the eye itself serves as a natural biomarker for several systemic diseases including neurological, renal, and cardiovascular systems. We aimed to collect and standardize this eye biomarkers information and construct the eye biomarker database (EBD) to provide ophthalmologists with a platform to search, analyze, and download these eye biomarker data.Results In this study, we present the EBD <http://www.eyeseeworld.com/ebd/index.html>, a world-first online compilation comprising 889 biomarkers for 26 ocular diseases and 939 eye biomarkers for 181 systemic diseases. The EBD also includes the information of 78 “nonbiomarkers”—the objects that have been proven cannot be biomarkers. Biological function and network analysis were conducted for these ocular disease biomarkers, and several hub pathways and common network topology characteristics were newly identified, which may promote future ocular disease biomarker discovery and characterizes the landscape of biomarkers for eye diseases at the pathway and network level. The EBD is expected to yield broader utility among developmental biologists and clinical scientists in and outside of the eye field by assisting in the identification of biomarkers linked to eye disorders and related systemic diseases.Availability and implementationEBD is available at http://www.eyeseeworld.com/ebd/index.html.
      PubDate: Thu, 13 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad194
      Issue No: Vol. 39, No. 5 (2023)
       
  • bootRanges: flexible generation of null sets of genomic ranges for
           hypothesis testing

    • Free pre-print version: Loading...

      First page: btad190
      Abstract: MotivationEnrichment analysis is a widely utilized technique in genomic analysis that aims to determine if there is a statistically significant association between two sets of genomic features. To conduct this type of hypothesis testing, an appropriate null model is typically required. However, the null distribution that is commonly used can be overly simplistic and may result in inaccurate conclusions.ResultsbootRanges provides fast functions for generation of block bootstrapped genomic ranges representing the null hypothesis in enrichment analysis. As part of a modular workflow, bootRanges offers greater flexibility for computing various test statistics leveraging other Bioconductor packages. We show that shuffling or permutation schemes may result in overly narrow test statistic null distributions and over-estimation of statistical significance, while creating new range sets with a block bootstrap preserves local genomic correlation structure and generates more reliable null distributions. It can also be used in more complex analyses, such as accessing correlations between cis-regulatory elements (CREs) and genes across cell types or providing optimized thresholds, e.g. log fold change (logFC) from differential analysis.Availability and implementationbootRanges is freely available in the R/Bioconductor package nullranges hosted at https://bioconductor.org/packages/nullranges.
      PubDate: Wed, 12 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad190
      Issue No: Vol. 39, No. 5 (2023)
       
  • FISHFactor: a probabilistic factor model for spatial transcriptomics data
           with subcellular resolution

    • Free pre-print version: Loading...

      First page: btad183
      Abstract: MotivationFactor analysis is a widely used tool for unsupervised dimensionality reduction of high-throughput datasets in molecular biology, with recently proposed extensions designed specifically for spatial transcriptomics data. However, these methods expect (count) matrices as data input and are therefore not directly applicable to single molecule resolution data, which are in the form of coordinate lists annotated with genes and provide insight into subcellular spatial expression patterns. To address this, we here propose FISHFactor, a probabilistic factor model that combines the benefits of spatial, non-negative factor analysis with a Poisson point process likelihood to explicitly model and account for the nature of single molecule resolution data. In addition, FISHFactor shares information across a potentially large number of cells in a common weight matrix, allowing consistent interpretation of factors across cells and yielding improved latent variable estimates.ResultsWe compare FISHFactor to existing methods that rely on aggregating information through spatial binning and cannot combine information from multiple cells and show that our method leads to more accurate results on simulated data. We show that our method is scalable and can be readily applied to large datasets. Finally, we demonstrate on a real dataset that FISHFactor is able to identify major subcellular expression patterns and spatial gene clusters in a data-driven manner.Availability and implementationThe model implementation, data simulation and experiment scripts are available under https://www.github.com/bioFAM/FISHFactor.
      PubDate: Tue, 11 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad183
      Issue No: Vol. 39, No. 5 (2023)
       
  • Accurate flux predictions using tissue-specific gene expression in plant
           metabolic modeling

    • Free pre-print version: Loading...

      First page: btad186
      Abstract: MotivationThe accurate prediction of complex phenotypes such as metabolic fluxes in living systems is a grand challenge for systems biology and central to efficiently identifying biotechnological interventions that can address pressing industrial needs. The application of gene expression data to improve the accuracy of metabolic flux predictions using mechanistic modeling methods such as flux balance analysis (FBA) has not been previously demonstrated in multi-tissue systems, despite their biotechnological importance. We hypothesized that a method for generating metabolic flux predictions informed by relative expression levels between tissues would improve prediction accuracy.ResultsRelative gene expression levels derived from multiple transcriptomic and proteomic datasets were integrated into FBA predictions of a multi-tissue, diel model of Arabidopsis thaliana’s central metabolism. This integration dramatically improved the agreement of flux predictions with experimentally based flux maps from 13C metabolic flux analysis compared with a standard parsimonious FBA approach. Disagreement between FBA predictions and MFA flux maps was measured using weighted averaged percent error values, and for parsimonious FBA this was169%–180% for high light conditions and 94%–103% for low light conditions, depending on the gene expression dataset used. This fell to 10%-13% and 9%-11% upon incorporating expression data into the modeling process, which also substantially altered the predicted carbon and energy economy of the plant.Availability and implementationCode and data generated as part of this study are available from https://github.com/Gibberella/ArabidopsisGeneExpressionWeights.
      PubDate: Tue, 11 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad186
      Issue No: Vol. 39, No. 5 (2023)
       
  • Finite mixtures of matrix variate Poisson-log normal distributions for
           three-way count data

    • Free pre-print version: Loading...

      First page: btad167
      Abstract: MotivationThree-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p conditions at r occasions. Matrix variate distributions offer a natural way to model three-way data and mixtures of matrix variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means of discovering gene co-expression networks.ResultsIn this work, a mixture of matrix variate Poisson-log normal distributions is proposed for clustering read counts from RNA sequencing. By considering the matrix variate structure, full information on the conditions and occasions of the RNA sequencing dataset is simultaneously considered, and the number of covariance parameters to be estimated is reduced. We propose three different frameworks for parameter estimation: a Markov chain Monte Carlo-based approach, a variational Gaussian approximation-based approach, and a hybrid approach. Various information criteria are used for model selection. The models are applied to both real and simulated data, and we demonstrate that the proposed approaches can recover the underlying cluster structure in both cases. In simulation studies where the true model parameters are known, our proposed approach shows good parameter recovery.Availability and implementationThe GitHub R package for this work is available at https://github.com/anjalisilva/mixMVPLN and is released under the open source MIT license.
      PubDate: Wed, 05 Apr 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad167
      Issue No: Vol. 39, No. 5 (2023)
       
  • A framework for high-throughput sequence alignment using real
           processing-in-memory systems

    • Free pre-print version: Loading...

      First page: btad155
      Abstract: MotivationSequence alignment is a memory bound computation whose performance in modern systems is limited by the memory bandwidth bottleneck. Processing-in-memory (PIM) architectures alleviate this bottleneck by providing the memory with computing competencies. We propose Alignment-in-Memory (AIM), a framework for high-throughput sequence alignment using PIM, and evaluate it on UPMEM, the first publicly available general-purpose programmable PIM system.ResultsOur evaluation shows that a real PIM system can substantially outperform server-grade multi-threaded CPU systems running at full-scale when performing sequence alignment for a variety of algorithms, read lengths, and edit distance thresholds. We hope that our findings inspire more work on creating and accelerating bioinformatics algorithms for such real PIM systems.Availability and implementationOur code is available at https://github.com/safaad/aim.
      PubDate: Mon, 27 Mar 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad155
      Issue No: Vol. 39, No. 5 (2023)
       
  • nf-core/isoseq: simple gene and isoform annotation with PacBio Iso-Seq
           long-read sequencing

    • Free pre-print version: Loading...

      First page: btad150
      Abstract: MotivationIso-Seq RNA long-read sequencing enables the identification of full-length transcripts and isoforms, removing the need for complex analysis such as transcriptome assembly. However, the raw sequencing data need to be processed in a series of steps before annotation is complete. Here, we present nf-core/isoseq, a pipeline for automatic read processing and genome annotation. Following nf-core guidelines, the pipeline has few dependencies and can be run on any of platforms.Availability and implementationThe pipeline is freely available online on the nf-core website (https://nf-co.re/isoseq) and on GitHub (https://github.com/nf-core/isoseq) under MIT License (
      DOI : 10.5281/zenodo.7116979).
      PubDate: Fri, 24 Mar 2023 00:00:00 GMT
      Issue No: Vol. 39, No. 5 (2023)
       
  • Scrooge: a fast and memory-frugal genomic sequence aligner for CPUs, GPUs,
           and ASICs

    • Free pre-print version: Loading...

      First page: btad151
      Abstract: MotivationPairwise sequence alignment is a very time-consuming step in common bioinformatics pipelines. Speeding up this step requires heuristics, efficient implementations, and/or hardware acceleration. A promising candidate for all of the above is the recently proposed GenASM algorithm. We identify and address three inefficiencies in the GenASM algorithm: it has a high amount of data movement, a large memory footprint, and does some unnecessary work.ResultsWe propose Scrooge, a fast and memory-frugal genomic sequence aligner. Scrooge includes three novel algorithmic improvements which reduce the data movement, memory footprint, and the number of operations in the GenASM algorithm. We provide efficient open-source implementations of the Scrooge algorithm for CPUs and GPUs, which demonstrate the significant benefits of our algorithmic improvements. For long reads, the CPU version of Scrooge achieves a 20.1×, 1.7×, and 2.1× speedup over KSW2, Edlib, and a CPU implementation of GenASM, respectively. The GPU version of Scrooge achieves a 4.0×, 80.4×, 6.8×, 12.6×, and 5.9× speedup over the CPU version of Scrooge, KSW2, Edlib, Darwin-GPU, and a GPU implementation of GenASM, respectively. We estimate an ASIC implementation of Scrooge to use 3.6× less chip area and 2.1× less power than a GenASM ASIC while maintaining the same throughput. Further, we systematically analyze the throughput and accuracy behavior of GenASM and Scrooge under various configurations. As the best configuration of Scrooge depends on the computing platform, we make several observations that can help guide future implementations of Scrooge.Availability and implementationhttps://github.com/CMU-SAFARI/Scrooge.
      PubDate: Fri, 24 Mar 2023 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btad151
      Issue No: Vol. 39, No. 5 (2023)
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 44.200.112.172
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-