Subjects -> LIBRARY AND INFORMATION SCIENCES (Total: 392 journals)
    - DIGITAL CURATION AND PRESERVATION (13 journals)
    - LIBRARY ADMINISTRATION (1 journals)
    - LIBRARY AND INFORMATION SCIENCES (378 journals)

LIBRARY AND INFORMATION SCIENCES (378 journals)                  1 2 | Last

Showing 1 - 200 of 379 Journals sorted by number of followers
Library & Information Science Research     Hybrid Journal   (Followers: 2037)
Journal of Librarianship and Information Science     Hybrid Journal   (Followers: 1522)
Library Hi Tech     Hybrid Journal   (Followers: 1197)
Journal of Information Science     Hybrid Journal   (Followers: 1191)
Journal of Academic Librarianship     Hybrid Journal   (Followers: 1134)
Journal of Library & Information Services in Distance Learning     Hybrid Journal   (Followers: 1048)
Library Management     Hybrid Journal   (Followers: 1040)
The Electronic Library     Hybrid Journal   (Followers: 1004)
Library Quarterly     Full-text available via subscription   (Followers: 977)
Journal of Information Literacy     Open Access   (Followers: 935)
Global Knowledge, Memory and Communication     Hybrid Journal   (Followers: 933)
Information Technology and Libraries     Open Access   (Followers: 860)
Library Hi Tech News     Hybrid Journal   (Followers: 828)
International Journal of Library and Information Science     Open Access   (Followers: 781)
Information Retrieval     Hybrid Journal   (Followers: 768)
Information Sciences     Hybrid Journal   (Followers: 762)
New Library World     Hybrid Journal   (Followers: 718)
Information Systems Research     Full-text available via subscription   (Followers: 705)
Information Processing & Management     Hybrid Journal   (Followers: 701)
International Journal on Digital Libraries     Hybrid Journal   (Followers: 601)
College & Research Libraries     Open Access   (Followers: 577)
Evidence Based Library and Information Practice     Open Access   (Followers: 529)
Journal of Library and Information Science     Open Access   (Followers: 493)
International Information & Library Review     Hybrid Journal   (Followers: 484)
The Information Society: An International Journal     Hybrid Journal   (Followers: 452)
Library and Information Research     Open Access   (Followers: 415)
Library Trends     Full-text available via subscription   (Followers: 399)
Forensic Science International: Digital Investigation     Full-text available via subscription   (Followers: 364)
Canadian Journal of Information and Library Science     Full-text available via subscription   (Followers: 332)
International Journal of Library Science     Open Access   (Followers: 317)
Bioinformatics     Hybrid Journal   (Followers: 307)
College & Research Libraries News     Partially Free   (Followers: 303)
Journal of Information & Knowledge Management     Hybrid Journal   (Followers: 301)
portal: Libraries and the Academy     Full-text available via subscription   (Followers: 290)
Communications in Information Literacy     Open Access   (Followers: 285)
Library Leadership & Management     Open Access   (Followers: 283)
Journal of Electronic Resources Librarianship     Hybrid Journal   (Followers: 282)
The Reference Librarian     Hybrid Journal   (Followers: 281)
College & Undergraduate Libraries     Hybrid Journal   (Followers: 278)
Data Technologies and Applications     Hybrid Journal   (Followers: 275)
IFLA Journal     Hybrid Journal   (Followers: 273)
Journal of Library Administration     Hybrid Journal   (Followers: 271)
International Journal of Information Management     Hybrid Journal   (Followers: 268)
Library Collections, Acquisitions, and Technical Services     Hybrid Journal   (Followers: 260)
American Libraries     Partially Free   (Followers: 244)
Code4Lib Journal     Open Access   (Followers: 229)
Journal of the Medical Library Association     Open Access   (Followers: 226)
Australian Library Journal     Full-text available via subscription   (Followers: 224)
Cataloging & Classification Quarterly     Hybrid Journal   (Followers: 222)
Journal of Library Metadata     Hybrid Journal   (Followers: 222)
Journal of Documentation     Hybrid Journal   (Followers: 204)
Journal of Hospital Librarianship     Hybrid Journal   (Followers: 199)
Ariadne Magazine     Open Access   (Followers: 190)
Behavioral & Social Sciences Librarian     Hybrid Journal   (Followers: 189)
Aslib Proceedings     Hybrid Journal   (Followers: 188)
Library & Information History     Hybrid Journal   (Followers: 184)
Book History     Full-text available via subscription   (Followers: 180)
In the Library with the Lead Pipe     Open Access   (Followers: 177)
EDUCAUSE Review     Full-text available via subscription   (Followers: 175)
The Serials Librarian     Hybrid Journal   (Followers: 165)
Research Library Issues     Free   (Followers: 164)
New Review of Academic Librarianship     Hybrid Journal   (Followers: 158)
The Library : The Transactions of the Bibliographical Society     Hybrid Journal   (Followers: 156)
Library Technology Reports     Full-text available via subscription   (Followers: 155)
Against the Grain     Partially Free   (Followers: 153)
Journal of Creative Library Practice     Open Access   (Followers: 111)
DESIDOC Journal of Library & Information Technology     Open Access   (Followers: 108)
Australian Academic & Research Libraries     Full-text available via subscription   (Followers: 106)
Archives and Museum Informatics     Hybrid Journal   (Followers: 101)
European Journal of Information Systems     Hybrid Journal   (Followers: 99)
Online Information Review     Hybrid Journal   (Followers: 95)
Journal of Librarianship and Scholarly Communication     Open Access   (Followers: 89)
International Journal of Digital Curation     Open Access   (Followers: 87)
Information Technologies & International Development     Open Access   (Followers: 86)
Serials Review     Hybrid Journal   (Followers: 80)
Journal of Electronic Publishing     Open Access   (Followers: 80)
International Journal of Digital Library Systems     Full-text available via subscription   (Followers: 77)
Journal of Education in Library and Information Science - JELIS     Full-text available via subscription   (Followers: 75)
Library Resources & Technical Services     Full-text available via subscription   (Followers: 73)
African Journal of Library, Archives and Information Science     Full-text available via subscription   (Followers: 72)
Archival Science     Hybrid Journal   (Followers: 70)
Communicate : Journal of Library and Information Science     Full-text available via subscription   (Followers: 70)
LIBER Quarterly : The Journal of the Association of European Research Libraries     Open Access   (Followers: 69)
027.7 Zeitschrift für Bibliothekskultur / Journal for Library Culture     Open Access   (Followers: 69)
Journal of Interlibrary Loan Document Delivery & Electronic Reserve     Hybrid Journal   (Followers: 68)
Ethics and Information Technology     Hybrid Journal   (Followers: 66)
Journal of the Canadian Health Libraries Association / Journal de l'Association des bibliothèques de la santé du Canada     Open Access   (Followers: 66)
Practical Academic Librarianship : The International Journal of the SLA Academic Division     Open Access   (Followers: 65)
Library Philosophy and Practice     Open Access   (Followers: 65)
MIS Quarterly : Management Information Systems Quarterly     Hybrid Journal   (Followers: 62)
International Journal of Library Science     Full-text available via subscription   (Followers: 62)
Journal of Management Information Systems     Full-text available via subscription   (Followers: 59)
Science & Technology Libraries     Hybrid Journal   (Followers: 59)
Alexandria : The Journal of National and International Library and Information Issues     Full-text available via subscription   (Followers: 57)
Journal of Information Technology     Hybrid Journal   (Followers: 56)
The Bottom Line: Managing Library Finances     Hybrid Journal   (Followers: 56)
International Journal of Legal Information     Full-text available via subscription   (Followers: 56)
Journal of Health & Medical Informatics     Open Access   (Followers: 55)
Archives and Manuscripts     Hybrid Journal   (Followers: 55)
Partnership : the Canadian Journal of Library and Information Practice and Research     Open Access   (Followers: 54)
Library & Archival Security     Hybrid Journal   (Followers: 50)
Bangladesh Journal of Library and Information Science     Open Access   (Followers: 48)
OCLC Systems & Services     Hybrid Journal   (Followers: 47)
Community & Junior College Libraries     Hybrid Journal   (Followers: 45)
Information Discovery and Delivery     Hybrid Journal   (Followers: 44)
Medical Reference Services Quarterly     Hybrid Journal   (Followers: 41)
VINE Journal of Information and Knowledge Management Systems     Hybrid Journal   (Followers: 40)
Journal of Access Services     Hybrid Journal   (Followers: 39)
Journal of the Society of Archivists     Hybrid Journal   (Followers: 36)
Scholarly and Research Communication     Open Access   (Followers: 36)
Journal of Archival Organization     Hybrid Journal   (Followers: 33)
Public Library Quarterly     Hybrid Journal   (Followers: 33)
Information & Culture : A Journal of History     Full-text available via subscription   (Followers: 32)
Australasian Public Libraries and Information Services     Full-text available via subscription   (Followers: 32)
Journal of the Association for Information Systems     Open Access   (Followers: 31)
Research Evaluation     Hybrid Journal   (Followers: 30)
Foundations and Trends® in Information Retrieval     Full-text available via subscription   (Followers: 30)
International Journal of Information Retrieval Research     Full-text available via subscription   (Followers: 30)
Information     Open Access   (Followers: 29)
Health Information Management Journal     Hybrid Journal   (Followers: 28)
Information Manager (The)     Open Access   (Followers: 28)
Information Systems Frontiers     Hybrid Journal   (Followers: 27)
Access     Full-text available via subscription   (Followers: 27)
International Journal of Intellectual Property Management     Hybrid Journal   (Followers: 26)
International Journal of Information Privacy, Security and Integrity     Hybrid Journal   (Followers: 26)
Proceedings of the American Society for Information Science and Technology     Hybrid Journal   (Followers: 26)
Journal of the Institute of Conservation     Hybrid Journal   (Followers: 25)
Nordic Journal of Information Literacy in Higher Education     Open Access   (Followers: 25)
South African Journal of Libraries and Information Science     Open Access   (Followers: 23)
Journal of Information, Communication and Ethics in Society     Hybrid Journal   (Followers: 23)
LASIE : Library Automated Systems Information Exchange     Free   (Followers: 22)
InCite     Full-text available via subscription   (Followers: 21)
Georgia Library Quarterly     Open Access   (Followers: 21)
RBM : A Journal of Rare Books, Manuscripts, and Cultural Heritage     Open Access   (Followers: 21)
NASIG Newsletter     Open Access   (Followers: 21)
LOEX Quarterly     Full-text available via subscription   (Followers: 20)
Urban Library Journal     Open Access   (Followers: 19)
El Profesional de la Informacion     Full-text available via subscription   (Followers: 18)
Alexandría : Revista de Ciencias de la Información     Open Access   (Followers: 17)
Anales de Documentacion     Open Access   (Followers: 17)
International Journal of Web Portals     Full-text available via subscription   (Followers: 17)
Communication Booknotes Quarterly     Hybrid Journal   (Followers: 16)
Manuscripta     Full-text available via subscription   (Followers: 16)
International Journal of Information Technology, Communications and Convergence     Hybrid Journal   (Followers: 16)
Theological Librarianship : An Online Journal of the American Theological Library Association     Open Access   (Followers: 16)
Perspectives in International Librarianship     Open Access   (Followers: 16)
Ghana Library Journal     Full-text available via subscription   (Followers: 16)
Information Technologist (The)     Full-text available via subscription   (Followers: 16)
Bibliotheca Orientalis     Full-text available via subscription   (Followers: 15)
Collection and Curation     Hybrid Journal   (Followers: 15)
International Journal of Business Information Systems     Hybrid Journal   (Followers: 15)
Biblios     Open Access   (Followers: 15)
Notes     Full-text available via subscription   (Followers: 14)
Journal of Educational Media, Memory, and Society     Full-text available via subscription   (Followers: 14)
Alsic : Apprentissage des Langues et Systèmes d'Information et de Communication     Open Access   (Followers: 13)
InterActions: UCLA Journal of Education and Information     Open Access   (Followers: 13)
International Journal of Intercultural Information Management     Hybrid Journal   (Followers: 12)
Journal of Information Technology Teaching Cases     Hybrid Journal   (Followers: 12)
Eastern Librarian     Open Access   (Followers: 12)
Journal of Religious & Theological Information     Hybrid Journal   (Followers: 11)
Universal Access in the Information Society     Hybrid Journal   (Followers: 11)
International Journal of Information and Decision Sciences     Hybrid Journal   (Followers: 11)
Kansas Library Association College & University Libraries Section Proceedings     Open Access   (Followers: 11)
Journal of Global Information Management     Full-text available via subscription   (Followers: 10)
AIB Studi     Full-text available via subscription   (Followers: 10)
Southeastern Librarian     Open Access   (Followers: 9)
e & i Elektrotechnik und Informationstechnik     Hybrid Journal   (Followers: 8)
BIBLOS - Revista do Departamento de Biblioteconomia e História     Open Access   (Followers: 8)
International Journal of Multicriteria Decision Making     Hybrid Journal   (Followers: 8)
JISTEM : Journal of Information Systems and Technology Management     Open Access   (Followers: 8)
International Journal of Multimedia Information Retrieval     Partially Free   (Followers: 8)
eLucidate     Open Access   (Followers: 8)
Judaica Librarianship     Open Access   (Followers: 8)
New Review of Information Networking     Hybrid Journal   (Followers: 7)
Idaho Librarian     Free   (Followers: 7)
Journal of the South African Society of Archivists     Full-text available via subscription   (Followers: 7)
Slavic & East European Information Resources     Hybrid Journal   (Followers: 6)
Egyptian Informatics Journal     Open Access   (Followers: 6)
Nordic Journal of Library and Information Studies     Open Access   (Followers: 6)
Informaatiotutkimus     Open Access   (Followers: 5)
Revista Interamericana de Bibliotecología     Open Access   (Followers: 5)
CIC. Cuadernos de Informacion y Comunicacion     Open Access   (Followers: 5)
Bridgewater Review     Open Access   (Followers: 5)
Open Systems & Information Dynamics     Hybrid Journal   (Followers: 4)
International Journal of Cooperative Information Systems     Hybrid Journal   (Followers: 4)
OJS på dansk     Open Access   (Followers: 4)
Revista Española de Documentación Científica     Open Access   (Followers: 4)
International Journal of Organisational Design and Engineering     Hybrid Journal   (Followers: 3)
HLA News     Full-text available via subscription   (Followers: 3)
SLIS Student Research Journal     Open Access   (Followers: 3)
VRA Bulletin     Open Access   (Followers: 3)
SLIS Connecting     Open Access   (Followers: 3)
Información, Cultura y Sociedad     Open Access   (Followers: 2)
Revista General de Información y Documentación     Open Access   (Followers: 2)
Revue française des sciences de l’information et de la communication     Open Access   (Followers: 2)
Journal of the Southern Association for Information Systems     Open Access   (Followers: 2)
In Monte Artium     Full-text available via subscription   (Followers: 1)
Documentación de las Ciencias de la Información     Open Access   (Followers: 1)
RUIDERAe : Revista de Unidades de Información. Descripción de Experiencias y Resultados Aplicados     Open Access  
Palabra Clave (La Plata)     Open Access  

        1 2 | Last

Similar Journals
Journal Cover
Bioinformatics
Journal Prestige (SJR): 6.14
Citation Impact (citeScore): 8
Number of Followers: 307  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1367-4803 - ISSN (Online) 1460-2059
Published by Oxford University Press Homepage  [425 journals]
  • A curated rotamer library for common post-translational modifications of
           proteins

    • Free pre-print version: Loading...

      First page: btae444
      Abstract: AbstractMotivationSidechain rotamer libraries of the common amino acids of a protein are useful for folded protein structure determination and for generating ensembles of intrinsically disordered proteins (IDPs). However, much of protein function is modulated beyond the translated sequence through the introduction of post-translational modifications (PTMs).ResultsIn this work, we have provided a curated set of side chain rotamers for the most common PTMs derived from the RCSB PDB database, including phosphorylated, methylated, and acetylated sidechains. Our rotamer libraries improve upon existing methods such as SIDEpro, Rosetta, and AlphaFold3 in predicting the experimental structures for PTMs in folded proteins. In addition, we showcase our PTM libraries in full use by generating ensembles with the Monte Carlo Side Chain Entropy (MCSCE) for folded proteins, and combining MCSCE with the Local Disordered Region Sampling algorithms within IDPConformerGenerator for proteins with intrinsically disordered regions.Availability and implementationThe codes for dihedral angle computations and library creation are available at https://github.com/THGLab/ptm_sc.git.
      PubDate: Fri, 12 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae444
      Issue No: Vol. 40, No. 7 (2024)
       
  • isolateR: an R package for generating microbial libraries from Sanger
           sequencing data

    • Free pre-print version: Loading...

      First page: btae448
      Abstract: AbstractMotivationSanger sequencing of taxonomic marker genes (e.g. 16S/18S/ITS/rpoB/cpn60) represents the leading method for identifying a wide range of microorganisms including bacteria, archaea, and fungi. However, the manual processing of sequence data and limitations associated with conventional BLAST searches impede the efficient generation of strain libraries essential for cataloging microbial diversity and discovering novel species.ResultsisolateR addresses these challenges by implementing a standardized and scalable three-step pipeline that includes: (1) automated batch processing of Sanger sequence files, (2) taxonomic classification via global alignment to type strain databases in accordance with the latest international nomenclature standards, and (3) straightforward creation of strain libraries and handling of clonal isolates, with the ability to set customizable sequence dereplication thresholds and combine data from multiple sequencing runs into a single library. The tool’s user-friendly design also features interactive HTML outputs that simplify data exploration and analysis. Additionally, in silico benchmarking done on two comprehensive human gut genome catalogues (IMGG and Hadza hunter-gather populations) showcase the proficiency of isolateR in uncovering and cataloging the nuanced spectrum of microbial diversity, advocating for a more targeted and granular exploration within individual hosts to achieve the highest strain-level resolution possible when generating culture collections.Availability and implementationisolateR is available at: https://github.com/bdaisley/isolateR.
      PubDate: Thu, 11 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae448
      Issue No: Vol. 40, No. 7 (2024)
       
  • Geometric epitope and paratope prediction

    • Free pre-print version: Loading...

      First page: btae405
      Abstract: AbstractMotivationIdentifying the binding sites of antibodies is essential for developing vaccines and synthetic antibodies. In this article, we investigate the optimal representation for predicting the binding sites in the two molecules and emphasize the importance of geometric information.ResultsSpecifically, we compare different geometric deep learning methods applied to proteins’ inner (I-GEP) and outer (O-GEP) structures. We incorporate 3D coordinates and spectral geometric descriptors as input features to fully leverage the geometric information. Our research suggests that different geometrical representation information is useful for different tasks. Surface-based models are more efficient in predicting the binding of the epitope, while graph models are better in paratope prediction, both achieving significant performance improvements. Moreover, we analyze the impact of structural changes in antibodies and antigens resulting from conformational rearrangements or reconstruction errors. Through this investigation, we showcase the robustness of geometric deep learning methods and spectral geometric descriptors to such perturbations.Availability and ImplementationThe python code for the models, together with the data and the processing pipeline, is open-source and available at https://github.com/Marco-Peg/GEP.
      PubDate: Wed, 10 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae405
      Issue No: Vol. 40, No. 7 (2024)
       
  • Weighted centroid trees: a general approach to summarize phylogenies in
           single-labeled tumor mutation tree inference

    • Free pre-print version: Loading...

      First page: btae120
      Abstract: AbstractMotivationTumor trees, which depict the evolutionary process of cancer, provide a backbone for discovering recurring evolutionary processes in cancer. While they are not the primary information extracted from genomic data, they are valuable for this purpose. One such extraction method involves summarizing multiple trees into a single representative tree, such as consensus trees or supertrees.ResultsWe define the “weighted centroid tree problem” to find the centroid tree of a set of single-labeled rooted trees through the following steps: (i) mapping the given trees into the Euclidean space, (ii) computing the weighted centroid matrix of the mapped trees, and (iii) finding the nearest mapped tree (NMTP) to the centroid matrix. We show that this setup encompasses previously studied parent–child and ancestor–descendent metrics as well as the GraPhyC and TuELiP consensus tree algorithms. Moreover, we show that, while the NMTP problem is polynomial-time solvable for the adjacency embedding, it is NP-hard for ancestry and distance mappings. We introduce integer linear programs for NMTP in different setups where we also provide a new algorithm for the case of ancestry embedding called 2-AncL2, that uses a novel weighting scheme for ancestry signals. Our experimental results show that 2-AncL2 has a superior performance compared to available consensus tree algorithms. We also illustrate our setup’s application on providing representative trees for a large real breast cancer dataset, deducing that the cluster centroid trees summarize reliable evolutionary information about the original dataset.Availability and implementationhttps://github.com/vasei/WAncILP.
      PubDate: Wed, 10 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae120
      Issue No: Vol. 40, No. 7 (2024)
       
  • Efficient protein structure archiving using ProteStAr

    • Free pre-print version: Loading...

      First page: btae428
      Abstract: AbstractMotivationThe introduction of Deep Minds’ Alpha Fold 2 enabled the prediction of protein structures at an unprecedented scale. AlphaFold Protein Structure Database and ESM Metagenomic Atlas contain hundreds of millions of structures stored in CIF and/or PDB formats. When compressed with a general-purpose utility like gzip, this translates to tens of terabytes of data, which hinders the effective use of predicted structures in large-scale analyses.ResultsHere, we present ProteStAr, a compressor dedicated to CIF/PDB, as well as supplementary PAE files. Its main contribution is a novel approach to predicting atom coordinates on the basis of the previously analyzed atoms. This allows efficient encoding of the coordinates, the largest component of the protein structure files. The compression is lossless by default, though the lossy mode with a controlled maximum error of coordinates reconstruction is also present. Compared to the competing packages, i.e. BinaryCIF, Foldcomp, PDC, our approach offers a superior compression ratio at established reconstruction accuracy. By the efficient use of threads at both compression and decompression stages, the algorithm takes advantage of the multicore architecture of current central processing units and operates with speeds of about 1 GB/s. The presence of Python and C++ API further increases the usability of the presented method.Availability and implementationThe source code of ProteStAr is available at https://github.com/refresh-bio/protestar.
      PubDate: Wed, 10 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae428
      Issue No: Vol. 40, No. 7 (2024)
       
  • GALEON: a comprehensive bioinformatic tool to analyse and visualize gene
           clusters in complete genomes

    • Free pre-print version: Loading...

      First page: btae439
      Abstract: AbstractMotivationGene clusters, defined as a set of genes encoding functionally related proteins, are abundant in eukaryotic genomes. Despite the increasing availability of chromosome-level genomes, the comprehensive analysis of gene family evolution remains largely unexplored, particularly for large and highly dynamic gene families or those including very recent family members. These challenges stem from limitations in genome assembly contiguity, particularly in repetitive regions such as large gene clusters. Recent advancements in sequencing technology, such as long reads and chromatin contact mapping, hold promise in addressing these challenges.ResultsTo facilitate the identification, analysis, and visualization of physically clustered gene family members within chromosome-level genomes, we introduce GALEON, a user-friendly bioinformatic tool. GALEON identifies gene clusters by studying the spatial distribution of pairwise physical distances among gene family members along with the genome-wide gene density. The pipeline also enables the simultaneous analysis and comparison of two gene families and allows the exploration of the relationship between physical and evolutionary distances. This tool offers a novel approach for studying the origin and evolution of gene families.Availability and implementationGALEON is freely available from https://www.ub.edu/softevol/galeon and https://github.com/molevol-ub/galeon
      PubDate: Mon, 08 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae439
      Issue No: Vol. 40, No. 7 (2024)
       
  • InstaPrism: an R package for fast implementation of BayesPrism

    • Free pre-print version: Loading...

      First page: btae440
      Abstract: AbstractSummaryComputational cell-type deconvolution is an important analytic technique for modeling the compositional heterogeneity of bulk gene expression data. A conceptually new Bayesian approach to this problem, BayesPrism, has recently been proposed and has subsequently been shown to be superior in accuracy and robustness against model misspecifications by independent studies; however, given that BayesPrism relies on Gibbs sampling, it is orders of magnitude more computationally expensive than standard approaches. Here, we introduce the InstaPrism package which re-implements BayesPrism in a derandomized framework by replacing the time-consuming Gibbs sampling step with a fixed-point algorithm. We demonstrate that the new algorithm is effectively equivalent to BayesPrism while providing a considerable speed and memory advantage. Furthermore, the InstaPrism package is equipped with a precompiled, curated set of references tailored for a variety of cancer types, streamlining the deconvolution process.Availability and implementationThe package InstaPrism is freely available at: https://github.com/humengying0907/InstaPrism. The source code and evaluation pipeline used in this paper can be found at: https://github.com/humengying0907/InstaPrismSourceCode.
      PubDate: Fri, 05 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae440
      Issue No: Vol. 40, No. 7 (2024)
       
  • GraphADT: empowering interpretable predictions of acute dermal toxicity
           with multi-view graph pooling and structure remapping

    • Free pre-print version: Loading...

      First page: btae438
      Abstract: AbstractMotivationAccurate prediction of acute dermal toxicity (ADT) is essential for the safe and effective development of contact drugs. Currently, graph neural networks, a form of deep learning technology, accurately model the structure of compound molecules, enhancing predictions of their ADT. However, many existing methods emphasize atom-level information transfer and overlook crucial data conveyed by molecular bonds and their interrelationships. Additionally, these methods often generate “equal” node representations across the entire graph, failing to accentuate “important” substructures like functional groups, pharmacophores, and toxicophores, thereby reducing interpretability.ResultsWe introduce a novel model, GraphADT, utilizing structure remapping and multi-view graph pooling (MVPool) technologies to accurately predict compound ADT. Initially, our model applies structure remapping to better delineate bonds, transforming “bonds” into new nodes and “bond-atom-bond” interactions into new edges, thereby reconstructing the compound molecular graph. Subsequently, we use MVPool to amalgamate data from various perspectives, minimizing biases inherent to single-view analyses. Following this, the model generates a robust node ranking collaboratively, emphasizing critical nodes or substructures to enhance model interpretability. Lastly, we apply a graph comparison learning strategy to train both the original and structure remapped molecular graphs, deriving the final molecular representation. Experimental results on public datasets indicate that the GraphADT model outperforms existing state-of-the-art models. The GraphADT model has been demonstrated to effectively predict compound ADT, offering potential guidance for the development of contact drugs and related treatments.Availability and implementationOur code and data are accessible at: https://github.com/mxqmxqmxq/GraphADT.git.
      PubDate: Thu, 04 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae438
      Issue No: Vol. 40, No. 7 (2024)
       
  • DualNetGO: a dual network model for protein function prediction via
           effective feature selection

    • Free pre-print version: Loading...

      First page: btae437
      Abstract: AbstractMotivationProtein–protein interaction (PPI) networks are crucial for automatically annotating protein functions. As multiple PPI networks exist for the same set of proteins that capture properties from different aspects, it is a challenging task to effectively utilize these heterogeneous networks. Recently, several deep learning models have combined PPI networks from all evidence, or concatenated all graph embeddings for protein function prediction. However, the lack of a judicious selection procedure prevents the effective harness of information from different PPI networks, as these networks vary in densities, structures, and noise levels. Consequently, combining protein features indiscriminately could increase the noise level, leading to decreased model performance.ResultsWe develop DualNetGO, a dual-network model comprised of a Classifier and a Selector, to predict protein functions by effectively selecting features from different sources including graph embeddings of PPI networks, protein domain, and subcellular location information. Evaluation of DualNetGO on human and mouse datasets in comparison with other network-based models shows at least 4.5%, 6.2%, and 14.2% improvement on Fmax in BP, MF, and CC gene ontology categories, respectively, for human, and 3.3%, 10.6%, and 7.7% improvement on Fmax for mouse. We demonstrate the generalization capability of our model by training and testing on the CAFA3 data, and show its versatility by incorporating Esm2 embeddings. We further show that our model is insensitive to the choice of graph embedding method and is time- and memory-saving. These results demonstrate that combining a subset of features including PPI networks and protein attributes selected by our model is more effective in utilizing PPI network information than only using one kind of or concatenating graph embeddings from all kinds of PPI networks.Availability and implementationThe source code of DualNetGO and some of the experiment data are available at: https://github.com/georgedashen/DualNetGO.
      PubDate: Thu, 04 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae437
      Issue No: Vol. 40, No. 7 (2024)
       
  • mLiftOver: harmonizing data across Infinium DNA methylation platforms

    • Free pre-print version: Loading...

      First page: btae423
      Abstract: AbstractMotivationInfinium DNA methylation BeadChips are widely used for genome-wide DNA methylation profiling at the population scale. Recent updates to probe content and naming conventions in the EPIC version 2 (EPICv2) arrays have complicated integrating new data with previous Infinium array platforms, such as the MethylationEPIC (EPIC) and the HumanMethylation450 (HM450) BeadChip.ResultsWe present mLiftOver, a user-friendly tool that harmonizes probe ID, methylation level, and signal intensity data across different Infinium platforms. It manages probe replicates, missing data imputation, and platform-specific bias for accurate data conversion. We validated the tool by applying HM450-based cancer classifiers to EPICv2 cancer data, achieving high accuracy. Additionally, we successfully integrated EPICv2 healthy tissue data with legacy HM450 data for tissue identity analysis and produced consistent copy number profiles in cancer cells.Availability and implementationmLiftOver is implemented R and available in the Bioconductor package SeSAMe (version 1.21.13+): https://bioconductor.org/packages/release/bioc/html/sesame.html. Analysis of EPIC and EPICv2 platform-specific bias and high-confidence mapping is available at https://github.com/zhou-lab/InfiniumAnnotationV1/raw/main/Anno/EPICv2/EPICv2ToEPIC_conversion.tsv.gz. The source code is available at https://github.com/zwdzwd/sesame/blob/devel/R/mLiftOver.R under the MIT license.
      PubDate: Thu, 04 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae423
      Issue No: Vol. 40, No. 7 (2024)
       
  • Reverse network diffusion to remove indirect noise for better inference of
           gene regulatory networks

    • Free pre-print version: Loading...

      First page: btae435
      Abstract: AbstractMotivationGene regulatory networks (GRNs) are vital tools for delineating regulatory relationships between transcription factors and their target genes. The boom in computational biology and various biotechnologies has made inferring GRNs from multi-omics data a hot topic. However, when networks are constructed from gene expression data, they often suffer from false-positive problem due to the transitive effects of correlation. The presence of spurious noise edges obscures the real gene interactions, which makes downstream analyses, such as detecting gene function modules and predicting disease-related genes, difficult and inefficient. Therefore, there is an urgent and compelling need to develop network denoising methods to improve the accuracy of GRN inference.ResultsIn this study, we proposed a novel network denoising method named REverse Network Diffusion On Random walks (RENDOR). RENDOR is designed to enhance the accuracy of GRNs afflicted by indirect effects. RENDOR takes noisy networks as input, models higher-order indirect interactions between genes by transitive closure, eliminates false-positive effects using the inverse network diffusion method, and produces refined networks as output. We conducted a comparative assessment of GRN inference accuracy before and after denoising on simulated networks and real GRNs. Our results emphasized that the network derived from RENDOR more accurately and effectively captures gene interactions. This study demonstrates the significance of removing network indirect noise and highlights the effectiveness of the proposed method in enhancing the signal-to-noise ratio of noisy networks.Availability and implementationThe R package RENDOR is provided at https://github.com/Wu-Lab/RENDOR and other source code and data are available at https://github.com/Wu-Lab/RENDOR-reproduce
      PubDate: Thu, 04 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae435
      Issue No: Vol. 40, No. 7 (2024)
       
  • Unravelling reference bias in ancient DNA datasets

    • Free pre-print version: Loading...

      First page: btae436
      Abstract: AbstractMotivationThe alignment of sequencing reads is a critical step in the characterization of ancient genomes. However, reference bias and spurious mappings pose a significant challenge, particularly as cutting-edge wet lab methods generate datasets that push the boundaries of alignment tools. Reference bias occurs when reference alleles are favoured over alternative alleles during mapping, whereas spurious mappings stem from either contamination or when endogenous reads fail to align to their correct position. Previous work has shown that these phenomena are correlated with read length but a more thorough investigation of reference bias and spurious mappings for ancient DNA has been lacking. Here, we use a range of empirical and simulated palaeogenomic datasets to investigate the impacts of mapping tools, quality thresholds, and reference genome on mismatch rates across read lengths.ResultsFor these analyses, we introduce AMBER, a new bioinformatics tool for assessing the quality of ancient DNA mapping directly from BAM-files and informing on reference bias, read length cut-offs and reference selection. AMBER rapidly and simultaneously computes the sequence read mapping bias in the form of the mismatch rates per read length, cytosine deamination profiles at both CpG and non-CpG sites, fragment length distributions, and genomic breadth and depth of coverage. Using AMBER, we find that mapping algorithms and quality threshold choices dictate reference bias and rates of spurious alignment at different read lengths in a predictable manner, suggesting that optimized mapping parameters for each read length will be a key step in alleviating reference bias and spurious mappings.Availability and implementationAMBER is available for noncommercial use on GitHub (https://github.com/tvandervalk/AMBER.git). Scripts used to generate and analyse simulated datasets are available on Github (https://github.com/sdolenz/refbias_scripts).
      PubDate: Wed, 03 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae436
      Issue No: Vol. 40, No. 7 (2024)
       
  • Pangenome graph layout by Path-Guided Stochastic Gradient Descent

    • Free pre-print version: Loading...

      First page: btae363
      Abstract: AbstractMotivationThe increasing availability of complete genomes demands for models to study genomic variability within entire populations. Pangenome graphs capture the full genomic similarity and diversity between multiple genomes. In order to understand them, we need to see them. For visualization, we need a human-readable graph layout: a graph embedding in low (e.g. two) dimensional depictions. Due to a pangenome graph’s potential excessive size, this is a significant challenge.ResultsIn response, we introduce a novel graph layout algorithm: the Path-Guided Stochastic Gradient Descent (PG-SGD). PG-SGD uses the genomes, represented in the pangenome graph as paths, as an embedded positional system to sample genomic distances between pairs of nodes. This avoids the quadratic cost seen in previous versions of graph drawing by SGD. We show that our implementation efficiently computes the low-dimensional layouts of gigabase-scale pangenome graphs, unveiling their biological features.Availability and implementationWe integrated PG-SGD in ODGI which is released as free software under the MIT open source license. Source code is available at https://github.com/pangenome/odgi.
      PubDate: Wed, 03 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae363
      Issue No: Vol. 40, No. 7 (2024)
       
  • Calib-RT: an open source python package for peptide retention time
           calibration in DIA mass spectrometry data

    • Free pre-print version: Loading...

      First page: btae417
      Abstract: AbstractMotivationThe data independent acquisition (DIA) mass spectrometry (MS) method is increasingly popular in the field of proteomics. But the loss of the correspondence between peptide ions and their spectra in DIA makes the identification challenging. One effective approach to reduce false positive identification is to calculate the deviation between the peptide’s estimated retention time (RT) and measured RT. During this process, scaling the spectral library RT into the estimated RT, known as the RT calibration, is a prerequisite for calculating the deviation. Currently, within the DIA algorithm ecosystem, there is a lack of engine-independent and readily usable RT calibration toolkits.ResultsIn this work, we introduce Calib-RT, a RT calibration method tailored to the characteristics of RT data. This method can achieve the nonlinear calibration across various data scales and tolerate a certain level of noise interference. Calib-RT is expected to enrich the open source DIA algorithm toolchain and assist in the development of DIA identification algorithms.Availability and implementationCalib-RT is released as an open source software under the MIT license and can be installed from PyPi as a python module. The source code is available on GitHub at https://github.com/chenghui03/Calib_RT.
      PubDate: Wed, 03 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae417
      Issue No: Vol. 40, No. 7 (2024)
       
  • MiPRIME: an integrated and intelligent platform for mining primer and
           probe sequences of microbial species

    • Free pre-print version: Loading...

      First page: btae429
      Abstract: AbstractMotivationAccurately detecting pathogenic microorganisms requires effective primers and probe designs. Literature-derived primers are a valuable resource as they have been tested and proven effective in previous research. However, manually mining primers from published texts is time-consuming and limited in species scop.ResultsTo address these challenges, we have developed MiPRIME, a real-time Microbial Primer Mining platform for primer/probe sequences extraction of pathogenic microorganisms with three highlights: (i) comprehensive integration. Covering >40 million articles and 548 942 organisms, the platform enables high-frequency microbial gene discovery from a global perspective, facilitating user-defined primer design and advancing microbial research. (ii) Using a BioBERT-based text mining model with 98.02% accuracy, greatly reducing information processing time. (iii) Using a primer ranking score, PRscore, for intelligent recommendation of species-specific primers. Overall, MiPRIME is a practical tool for primer mining in the pan-microbial field, saving time and cost of trial-and-error experiments.Availability and implementationThe web is available at {{https://www.ai-bt.com}}.
      PubDate: Tue, 02 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae429
      Issue No: Vol. 40, No. 7 (2024)
       
  • DODGE: automated point source bacterial outbreak detection using
           cumulative long term genomic surveillance

    • Free pre-print version: Loading...

      First page: btae427
      Abstract: AbstractSummaryThe reliable and timely recognition of outbreaks is a key component of public health surveillance for foodborne diseases. Whole genome sequencing (WGS) offers high resolution typing of foodborne bacterial pathogens and facilitates the accurate detection of outbreaks. This detection relies on grouping WGS data into clusters at an appropriate genetic threshold. However, methods and tools for selecting and adjusting such thresholds according to the required resolution of surveillance and epidemiological context are lacking. Here we present DODGE (Dynamic Outbreak Detection for Genomic Epidemiology), an algorithm to dynamically select and compare these genetic thresholds. DODGE can analyse expanding datasets over time and clusters that are predicted to correspond to outbreaks (or “investigation clusters”) can be named with established genomic nomenclature systems to facilitate integrated analysis across jurisdictions. DODGE was tested in two real-world Salmonella genomic surveillance datasets of different duration, 2 months from Australia and 9 years from the United Kingdom. In both cases only a minority of isolates were identified as investigation clusters. Two known outbreaks in the United Kingdom dataset were detected by DODGE and were recognized at an earlier timepoint than the outbreaks were reported. These findings demonstrated the potential of the DODGE approach to improve the effectiveness and timeliness of genomic surveillance for foodborne diseases and the effectiveness of the algorithm developed.Availability and implementationDODGE is freely available at https://github.com/LanLab/dodge and can easily be installed using Conda.
      PubDate: Tue, 02 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae427
      Issue No: Vol. 40, No. 7 (2024)
       
  • hopsy — a methods marketplace for convex polytope sampling in Python

    • Free pre-print version: Loading...

      First page: btae430
      Abstract: AbstractSummaryEffective collaboration between developers of Bayesian inference methods and users is key to advance our quantitative understanding of biosystems. We here present hopsy, a versatile open-source platform designed to provide convenient access to powerful Markov chain Monte Carlo sampling algorithms tailored to models defined on convex polytopes (CP). Based on the high-performance C++ sampling library HOPS, hopsy inherits its strengths and extends its functionalities with the accessibility of the Python programming language. A versatile plugin-mechanism enables seamless integration with domain-specific models, providing method developers with a framework for testing, benchmarking, and distributing CP samplers to approach real-world inference tasks. We showcase hopsy by solving common and newly composed domain-specific sampling problems, highlighting important design choices. By likening hopsy to a marketplace, we emphasize its role in bringing together users and developers, where users get access to state-of-the-art methods, and developers contribute their own innovative solutions for challenging domain-specific inference problems.Availability and implementationSources, documentation and a continuously updated list of sampling algorithms are available at https://jugit.fz-juelich.de/IBG-1/ModSim/hopsy, with Linux, Windows and MacOS binaries at https://pypi.org/project/hopsy/.
      PubDate: Mon, 01 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae430
      Issue No: Vol. 40, No. 7 (2024)
       
  • A multi-bin rarefying method for evaluating alpha diversities in TCR
           sequencing data

    • Free pre-print version: Loading...

      First page: btae431
      Abstract: AbstractMotivationT cell receptors (TCRs) constitute a major component of our adaptive immune system, governing the recognition and response to internal and external antigens. Studying the TCR diversity via sequencing technology is critical for a deeper understanding of immune dynamics. However, library sizes differ substantially across samples, hindering the accurate estimation/comparisons of alpha diversities. To address this, researchers frequently use an overall rarefying approach in which all samples are sub-sampled to an even depth. Despite its pervasive application, its efficacy has never been rigorously assessed.ResultsIn this paper, we develop an innovative “multi-bin” rarefying approach that partitions samples into multiple bins according to their library sizes, conducts rarefying within each bin for alpha diversity calculations, and performs meta-analysis across bins. Extensive simulations using real-world data highlight the inadequacy of the overall rarefying approach in controlling the confounding effect of library size. Our method proves robust in addressing library size confounding, outperforming competing normalization strategies by achieving better-controlled type-I error rates and enhanced statistical power in association tests.Availability and implementationThe code is available at https://github.com/mli171/MultibinAlpha. The datasets are freely available at https://doi.org/10.21417/B7001Z and https://doi.org/10.21417/AR2019NC.
      PubDate: Mon, 01 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae431
      Issue No: Vol. 40, No. 7 (2024)
       
  • DeepGSEA: explainable deep gene set enrichment analysis for single-cell
           transcriptomic data

    • Free pre-print version: Loading...

      First page: btae434
      Abstract: AbstractMotivationGene set enrichment (GSE) analysis allows for an interpretation of gene expression through pre-defined gene set databases and is a critical step in understanding different phenotypes. With the rapid development of single-cell RNA sequencing (scRNA-seq) technology, GSE analysis can be performed on fine-grained gene expression data to gain a nuanced understanding of phenotypes of interest. However, with the cellular heterogeneity in single-cell gene profiles, current statistical GSE analysis methods sometimes fail to identify enriched gene sets. Meanwhile, deep learning has gained traction in applications like clustering and trajectory inference in single-cell studies due to its prowess in capturing complex data patterns. However, its use in GSE analysis remains limited, due to interpretability challenges.ResultsIn this paper, we present DeepGSEA, an explainable deep gene set enrichment analysis approach which leverages the expressiveness of interpretable, prototype-based neural networks to provide an in-depth analysis of GSE. DeepGSEA learns the ability to capture GSE information through our designed classification tasks, and significance tests can be performed on each gene set, enabling the identification of enriched sets. The underlying distribution of a gene set learned by DeepGSEA can be explicitly visualized using the encoded cell and cellular prototype embeddings. We demonstrate the performance of DeepGSEA over commonly used GSE analysis methods by examining their sensitivity and specificity with four simulation studies. In addition, we test our model on three real scRNA-seq datasets and illustrate the interpretability of DeepGSEA by showing how its results can be explained.Availability and implementationhttps://github.com/Teddy-XiongGZ/DeepGSEA
      PubDate: Mon, 01 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae434
      Issue No: Vol. 40, No. 7 (2024)
       
  • SFINN: inferring gene regulatory network from single-cell and spatial
           transcriptomic data with shared factor neighborhood and integrated neural
           network

    • Free pre-print version: Loading...

      First page: btae433
      Abstract: AbstractMotivationThe rise of single-cell RNA sequencing (scRNA-seq) technology presents new opportunities for constructing detailed cell type-specific gene regulatory networks (GRNs) to study cell heterogeneity. However, challenges caused by noises, technical errors, and dropout phenomena in scRNA-seq data pose significant obstacles to GRN inference, making the design of accurate GRN inference algorithms still essential. The recent growth of both single-cell and spatial transcriptomic sequencing data enables the development of supervised deep learning methods to infer GRNs on these diverse single-cell datasets.ResultsIn this study, we introduce a novel deep learning framework based on shared factor neighborhood and integrated neural network (SFINN) for inferring potential interactions and causalities between transcription factors and target genes from single-cell and spatial transcriptomic data. SFINN utilizes shared factor neighborhood to construct cellular neighborhood network based on gene expression data and additionally integrates cellular network generated from spatial location information. Subsequently, the cell adjacency matrix and gene pair expression are fed into an integrated neural network framework consisting of a graph convolutional neural network and a fully-connected neural network to determine whether the genes interact. Performance evaluation in the tasks of gene interaction and causality prediction against the existing GRN reconstruction algorithms demonstrates the usability and competitiveness of SFINN across different kinds of data. SFINN can be applied to infer GRNs from conventional single-cell sequencing data and spatial transcriptomic data.Availability and implementationSFINN can be accessed at GitHub: https://github.com/JGuan-lab/SFINN.
      PubDate: Mon, 01 Jul 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae433
      Issue No: Vol. 40, No. 7 (2024)
       
  • msproteomics sitereport: reporting DIA-MS phosphoproteomics experiments at
           site level with ease

    • Free pre-print version: Loading...

      First page: btae432
      Abstract: AbstractSummaryIdentification and quantification of phosphorylation sites are essential for biological interpretation of a phosphoproteomics experiment. For data independent acquisition mass spectrometry-based (DIA-MS) phosphoproteomics, extracting a site-level report from the output of current processing software is not straightforward as multiple peptides might contribute to a single site, multiple phosphorylation sites can occur on the same peptides, and protein isoforms complicate site specification. Currently only limited support is available from a commercial software package via a platform-specific solution with a rather simple site quantification method. Here, we present sitereport, a software tool implemented in an extendable Python package called msproteomics to report phosphosites and phosphopeptides from a DIA-MS phosphoproteomics experiment with a proven quantification method called MaxLFQ. We demonstrate the use of sitereport for downstream data analysis at site level, allowing benchmarking different DIA-MS processing software tools.Availability and implementationsitereport is available as a command line tool in the Python package msproteomics, released under the Apache License 2.0 and available from the Python Package Index (PyPI) at https://pypi.org/project/msproteomics and GitHub at https://github.com/tvpham/msproteomics.
      PubDate: Sat, 29 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae432
      Issue No: Vol. 40, No. 7 (2024)
       
  • deTELpy: Python package for high-throughput detection of amino acid
           substitutions in mass spectrometry datasets

    • Free pre-print version: Loading...

      First page: btae424
      Abstract: AbstractMotivationErrors in the processing of genetic information during protein synthesis can lead to phenotypic mutations, such as amino acid substitutions, e.g. by transcription or translation errors. While genetic mutations can be readily identified using DNA sequencing, and mutations due to transcription errors by RNA sequencing, translation errors can only be identified proteome-wide using mass spectrometry.ResultsHere, we provide a Python package implementation of a high-throughput pipeline to detect amino acid substitutions in mass spectrometry datasets. Our tools enable users to process hundreds of mass spectrometry datasets in batch mode to detect amino acid substitutions and calculate codon-specific and site-specific translation error rates. deTELpy will facilitate the systematic understanding of amino acid misincorporation rates (translation error rates), and the inference of error models across organisms and under stress conditions, such as drug treatment or disease conditions.Availability and implementationdeTELpy is implemented in Python 3 and is freely available with detailed documentation and practical examples at https://git.mpi-cbg.de/tothpetroczylab/detelpy and https://pypi.org/project/deTELpy/ and can be easily installed via pip install deTELpy.
      PubDate: Fri, 28 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae424
      Issue No: Vol. 40, No. 7 (2024)
       
  • ScType enables fast and accurate cell type identification from spatial
           transcriptomics data

    • Free pre-print version: Loading...

      First page: btae426
      Abstract: AbstractSummaryThe limited resolution of spatial transcriptomics (ST) assays in the past has led to the development of cell type annotation methods that separate the convolved signal based on available external atlas data. In light of the rapidly increasing resolution of the ST assay technologies, we made available and investigated the performance of a deconvolution-free marker-based cell annotation method called scType. In contrast to existing methods, the spatial application of scType does not require computationally strenuous deconvolution, nor large single-cell reference atlases. We show that scType enables ultra-fast and accurate identification of abundant cell types from ST data, especially when a large enough panel of genes is detected. Examples of such assays are Visium and Slide-seq, which currently offer the best trade-off between high resolution and number of genes detected by the assay for cell type annotation.Availability and implementationscType source R and python codes for spatial data are openly available in GitHub (https://github.com/kris-nader/sp-type or https://github.com/kris-nader/sc-type-py). Step-by-step tutorials for R and python spatial data analysis can be found in https://github.com/kris-nader/sp-type and https://github.com/kris-nader/sc-type-py/blob/main/spatial_tutorial.md, respectively.
      PubDate: Thu, 27 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae426
      Issue No: Vol. 40, No. 7 (2024)
       
  • Assessing citation integrity in biomedical publications: corpus annotation
           and NLP models

    • Free pre-print version: Loading...

      First page: btae420
      Abstract: AbstractMotivationCitations have a fundamental role in scholarly communication and assessment. Citation accuracy and transparency is crucial for the integrity of scientific evidence. In this work, we focus on quotation errors, errors in citation content that can distort the scientific evidence and that are hard to detect for humans. We construct a corpus and propose natural language processing (NLP) methods to identify such errors in biomedical publications.ResultsWe manually annotated 100 highly-cited biomedical publications (reference articles) and citations to them. The annotation involved labeling citation context in the citing article, relevant evidence sentences in the reference article, and the accuracy of the citation. A total of 3063 citation instances were annotated (39.18% with accuracy errors). For NLP, we combined a sentence retriever with a fine-tuned claim verification model to label citations as ACCURATE, NOT_ACCURATE, or IRRELEVANT. We also explored few-shot in-context learning with generative large language models. The best performing model—which uses citation sentences as citation context, the BM25 model with MonoT5 reranker for retrieving top-20 sentences, and a fine-tuned MultiVerS model for accuracy label classification—yielded 0.59 micro-F1 and 0.52 macro-F1 score. GPT-4 in-context learning performed better in identifying accurate citations, but it lagged for erroneous citations (0.65 micro-F1, 0.45 macro-F1). Citation quotation errors are often subtle, and it is currently challenging for NLP models to identify erroneous citations. With further improvements, the models could serve to improve citation quality and accuracy.Availability and implementationWe make the corpus and the best-performing NLP model publicly available at https://github.com/ScienceNLP-Lab/Citation-Integrity/.
      PubDate: Wed, 26 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae420
      Issue No: Vol. 40, No. 7 (2024)
       
  • PredGCN: a Pruning-enabled Gene-Cell Net for automatic cell annotation of
           single cell transcriptome data

    • Free pre-print version: Loading...

      First page: btae421
      Abstract: AbstractMotivationThe annotation of cell types from single-cell transcriptomics is essential for understanding the biological identity and functionality of cellular populations. Although manual annotation remains the gold standard, the advent of automatic pipelines has become crucial for scalable, unbiased, and cost-effective annotations. Nonetheless, the effectiveness of these automatic methods, particularly those employing deep learning, significantly depends on the architecture of the classifier and the quality and diversity of the training datasets.ResultsTo address these limitations, we present a Pruning-enabled Gene-Cell Net (PredGCN) incorporating a Coupled Gene-Cell Net (CGCN) to enable representation learning and information storage. PredGCN integrates a Gene Splicing Net (GSN) and a Cell Stratification Net (CSN), employing a pruning operation (PrO) to dynamically tackle the complexity of heterogeneous cell identification. Among them, GSN leverages multiple statistical and hypothesis-driven feature extraction methods to selectively assemble genes with specificity for scRNA-seq data while CSN unifies elements based on diverse region demarcation principles, exploiting the representations from GSN and precise identification from different regional homogeneity perspectives. Furthermore, we develop a multi-objective Pareto pruning operation (Pareto PrO) to expand the dynamic capabilities of CGCN, optimizing the sub-network structure for accurate cell type annotation. Multiple comparison experiments on real scRNA-seq datasets from various species have demonstrated that PredGCN surpasses existing state-of-the-art methods, including its scalability to cross-species datasets. Moreover, PredGCN can uncover unknown cell types and provide functional genomic analysis by quantifying the influence of genes on cell clusters, bringing new insights into cell type identification and characterizing scRNA-seq data from different perspectives.Availability and implementationThe source code is available at https://github.com/IrisQi7/PredGCN and test data is available at https://figshare.com/articles/dataset/PredGCN/25251163.
      PubDate: Wed, 26 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae421
      Issue No: Vol. 40, No. 7 (2024)
       
  • Detecting gene–environment interactions from multiple continuous
           traits

    • Free pre-print version: Loading...

      First page: btae419
      Abstract: AbstractMotivationGenetic variants present differential effects on humans according to various environmental exposures, the so-called “gene–environment interactions” (GxE). Many diseases can be diagnosed with multiple traits, such as obesity, diabetes, and dyslipidemia. I developed a multivariate scale test (MST) for detecting the GxE of a disease with several continuous traits. Given a significant MST result, I continued to search for which trait and which E enriched the GxE signals. Simulation studies were performed to compare MST with the univariate scale test (UST).ResultsMST can gain more power than UST because of (1) integrating more traits with GxE information and (2) the less harsh penalty on multiple testing. However, if only few traits account for GxE, MST may lose power due to aggregating non-informative traits into the test statistic. As an example, MST was applied to a discovery set of 93 708 Taiwan Biobank (TWB) individuals and a replication set of 25 200 TWB individuals. From among 2 570 487 SNPs with minor allele frequencies ≥5%, MST identified 18 independent variance quantitative trait loci (P < 2.4E−9 in the discovery cohort and P < 2.8E−5 in the replication cohort) and 41 GxE signals (P < .00027) based on eight trait domains (including 29 traits).Availability and implementationhttps://github.com/WanYuLin/Multivariate-scale-test-MST-
      PubDate: Tue, 25 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae419
      Issue No: Vol. 40, No. 7 (2024)
       
  • Document-level biomedical relation extraction via hierarchical tree graph
           and relation segmentation module

    • Free pre-print version: Loading...

      First page: btae418
      Abstract: AbstractMotivationBiomedical relation extraction at the document level (Bio-DocRE) involves extracting relation instances from biomedical texts that span multiple sentences, often containing various entity concepts such as genes, diseases, chemicals, variants, etc. Currently, this task is usually implemented based on graphs or transformers. However, most work directly models entity features to relation prediction, ignoring the effectiveness of entity pair information as an intermediate state for relation prediction. In this article, we decouple this task into a three-stage process to capture sufficient information for improving relation prediction.ResultsWe propose an innovative framework HTGRS for Bio-DocRE, which constructs a hierarchical tree graph (HTG) to integrate key information sources in the document, achieving relation reasoning based on entity. In addition, inspired by the idea of semantic segmentation, we conceptualize the task as a table-filling problem and develop a relation segmentation (RS) module to enhance relation reasoning based on the entity pair. Extensive experiments on three datasets show that the proposed framework outperforms the state-of-the-art methods and achieves superior performance.Availability and implementationOur source code is available at https://github.com/passengeryjy/HTGRS.
      PubDate: Tue, 25 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae418
      Issue No: Vol. 40, No. 7 (2024)
       
  • Protein interaction explorer (PIE): a comprehensive platform for
           navigating protein–protein interactions and ligand binding pockets

    • Free pre-print version: Loading...

      First page: btae414
      Abstract: AbstractSummaryProtein Interaction Explorer (PIE) is a new web-based tool integrated to our database iPPI-DB, specifically crafted to support structure-based drug discovery initiatives focused on protein–protein interactions (PPIs). Drawing upon extensive structural data encompassing thousands of heterodimer complexes, including those with successful ligands, PIE provides a comprehensive suite of tools dedicated to aid decision-making in PPI drug discovery. PIE enables researchers/bioinformaticians to identify and characterize crucial factors such as the presence of binding pockets or functional binding sites at the interface, predicting hot spots, and foreseeing similar protein-embedded pockets for potential repurposing efforts.Availability and implementationPIE is user-friendly and readily accessible at https://ippidb.pasteur.fr/targetcentric/. It relies on the NGL visualizer.
      PubDate: Tue, 25 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae414
      Issue No: Vol. 40, No. 7 (2024)
       
  • ADMET-AI: a machine learning ADMET platform for evaluation of large-scale
           chemical libraries

    • Free pre-print version: Loading...

      First page: btae416
      Abstract: AbstractMotivationThe emergence of large chemical repositories and combinatorial chemical spaces, coupled with high-throughput docking and generative AI, have greatly expanded the chemical diversity of small molecules for drug discovery. Selecting compounds for experimental validation requires filtering these molecules based on favourable druglike properties, such as Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET).ResultsWe developed ADMET-AI, a machine learning platform that provides fast and accurate ADMET predictions both as a website and as a Python package. ADMET-AI has the highest average rank on the TDC ADMET Leaderboard, and it is currently the fastest web-based ADMET predictor, with a 45% reduction in time compared to the next fastest public ADMET web server. ADMET-AI can also be run locally with predictions for one million molecules taking just 3.1 h.Availability and implementationThe ADMET-AI platform is freely available both as a web server at admet.ai.greenstonebio.com and as an open-source Python package for local batch prediction at github.com/swansonk14/admet_ai (also archived on Zenodo at doi.org/10.5281/zenodo.10372930). All data and models are archived on Zenodo at doi.org/10.5281/zenodo.10372418.
      PubDate: Mon, 24 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae416
      Issue No: Vol. 40, No. 7 (2024)
       
  • Optimizing data integration improves gene regulatory network inference in
           Arabidopsis thaliana

    • Free pre-print version: Loading...

      First page: btae415
      Abstract: AbstractMotivationsGene regulatory networks (GRNs) are traditionally inferred from gene expression profiles monitoring a specific condition or treatment. In the last decade, integrative strategies have successfully emerged to guide GRN inference from gene expression with complementary prior data. However, datasets used as prior information and validation gold standards are often related and limited to a subset of genes. This lack of complete and independent evaluation calls for new criteria to robustly estimate the optimal intensity of prior data integration in the inference process.ResultsWe address this issue for two regression-based GRN inference models, a weighted random forest (weigthedRF) and a generalized linear model estimated under a weighted LASSO penalty with stability selection (weightedLASSO). These approaches are applied to data from the root response to nitrate induction in Arabidopsis thaliana. For each gene, we measure how the integration of transcription factor binding motifs influences model prediction. We propose a new approach, DIOgene, that uses model prediction error and a simulated null hypothesis in order to optimize data integration strength in a hypothesis-driven, gene-specific manner. This integration scheme reveals a strong diversity of optimal integration intensities between genes, and offers good performance in minimizing prediction error as well as retrieving experimental interactions. Experimental results show that DIOgene compares favorably against state-of-the-art approaches and allows to recover master regulators of nitrate induction.Availability and implementationThe R code and notebooks demonstrating the use of the proposed approaches are available in the repository https://github.com/OceaneCsn/integrative_GRN_N_induction
      PubDate: Mon, 24 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae415
      Issue No: Vol. 40, No. 7 (2024)
       
  • FastHPOCR: pragmatic, fast, and accurate concept recognition using the
           human phenotype ontology

    • Free pre-print version: Loading...

      First page: btae406
      Abstract: MotivationHuman Phenotype Ontology (HPO)-based phenotype concept recognition (CR) underpins a faster and more effective mechanism to create patient phenotype profiles or to document novel phenotype-centred knowledge statements. While the increasing adoption of large language models (LLMs) for natural language understanding has led to several LLM-based solutions, we argue that their intrinsic resource-intensive nature is not suitable for realistic management of the phenotype CR lifecycle. Consequently, we propose to go back to the basics and adopt a dictionary-based approach that enables both an immediate refresh of the ontological concepts as well as efficient re-analysis of past data.ResultsWe developed a dictionary-based approach using a pre-built large collection of clusters of morphologically equivalent tokens—to address lexical variability and a more effective CR step by reducing the entity boundary detection strictly to candidates consisting of tokens belonging to ontology concepts. Our method achieves state-of-the-art results (0.76 F1 on the GSC+ corpus) and a processing efficiency of 10 000 publication abstracts in 5 s.Availability and implementationFastHPOCR is available as a Python package installable via pip. The source code is available at https://github.com/tudorgroza/fast_hpo_cr. A Java implementation of FastHPOCR will be made available as part of the Fenominal Java library available at https://github.com/monarch-initiative/fenominal. The up-to-date GCS-2024 corpus is available at https://github.com/tudorgroza/code-for-papers/tree/main/gsc-2024.
      PubDate: Mon, 24 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae406
      Issue No: Vol. 40, No. 7 (2024)
       
  • hictk: blazing fast toolkit to work with .hic and .cool files

    • Free pre-print version: Loading...

      First page: btae408
      Abstract: AbstractMotivationHi-C is gaining prominence as a method for mapping genome organization. With declining sequencing costs and a growing demand for higher-resolution data, efficient tools for processing Hi-C datasets at different resolutions are crucial. Over the past decade, the .hic and Cooler file formats have become the de-facto standard to store interaction matrices produced by Hi-C experiments in binary format. Interoperability issues make it unnecessarily difficult to convert between the two formats and to develop applications that can process each format natively.ResultsWe developed hictk, a toolkit that can transparently operate on .hic and .cool files with excellent performance. The toolkit is written in C++ and consists of a C++ library with Python and R bindings as well as CLI tools to perform common operations directly from the shell, including converting between .hic and .mcool formats. We benchmark the performance of hictk and compare it with other popular tools and libraries. We conclude that hictk significantly outperforms existing tools while providing the flexibility of natively working with both file formats without code duplication.Availability and implementationThe hictk library, Python bindings and CLI tools are released under the MIT license as a multi-platform application available at github.com/paulsengroup/hictk. Pre-built binaries for Linux and macOS are available on bioconda. Python bindings for hictk are available on GitHub at github.com/paulsengroup/hictkpy, while R bindings are available on GitHub at github.com/paulsengroup/hictkR.
      PubDate: Mon, 24 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae408
      Issue No: Vol. 40, No. 7 (2024)
       
  • HGTDR: Advancing drug repurposing with heterogeneous graph transformers

    • Free pre-print version: Loading...

      First page: btae349
      Abstract: AbstractMotivationDrug repurposing is a viable solution for reducing the time and cost associated with drug development. However, thus far, the proposed drug repurposing approaches still need to meet expectations. Therefore, it is crucial to offer a systematic approach for drug repurposing to achieve cost savings and enhance human lives. In recent years, using biological network-based methods for drug repurposing has generated promising results. Nevertheless, these methods have limitations. Primarily, the scope of these methods is generally limited concerning the size and variety of data they can effectively handle. Another issue arises from the treatment of heterogeneous data, which needs to be addressed or converted into homogeneous data, leading to a loss of information. A significant drawback is that most of these approaches lack end-to-end functionality, necessitating manual implementation and expert knowledge in certain stages.ResultsWe propose a new solution, Heterogeneous Graph Transformer for Drug Repurposing (HGTDR), to address the challenges associated with drug repurposing. HGTDR is a three-step approach for knowledge graph-based drug repurposing: (1) constructing a heterogeneous knowledge graph, (2) utilizing a heterogeneous graph transformer network, and (3) computing relationship scores using a fully connected network. By leveraging HGTDR, users gain the ability to manipulate input graphs, extract information from diverse entities, and obtain their desired output. In the evaluation step, we demonstrate that HGTDR performs comparably to previous methods. Furthermore, we review medical studies to validate our method’s top 10 drug repurposing suggestions, which have exhibited promising results. We also demonstrated HGTDR’s capability to predict other types of relations through numerical and experimental validation, such as drug–protein and disease–protein inter-relations.Availability and implementationThe source code and data are available at https://github.com/bcb-sut/HGTDR and http://git.dml.ir/BCB/HGTDR
      PubDate: Mon, 24 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae349
      Issue No: Vol. 40, No. 7 (2024)
       
  • GENTANGLE: integrated computational design of gene entanglements

    • Free pre-print version: Loading...

      First page: btae380
      Abstract: AbstractSummaryThe design of two overlapping genes in a microbial genome is an emerging technique for adding more reliable control mechanisms in engineered organisms for increased stability. The design of functional overlapping gene pairs is a challenging procedure, and computational design tools are used to improve the efficiency to deploy successful designs in genetically engineered systems. GENTANGLE (Gene Tuples ArraNGed in overLapping Elements) is a high-performance containerized pipeline for the computational design of two overlapping genes translated in different reading frames of the genome. This new software package can be used to design and test gene entanglements for microbial engineering projects using arbitrary sets of user-specified gene pairs.Availability and implementationThe GENTANGLE source code and its submodules are freely available on GitHub at https://github.com/BiosecSFA/gentangle. The DATANGLE (DATA for genTANGLE) repository contains related data and results and is freely available on GitHub at https://github.com/BiosecSFA/datangle. The GENTANGLE container is freely available on Singularity Cloud Library at https://cloud.sylabs.io/library/khyox/gentangle/gentangle.sif. The GENTANGLE repository wiki (https://github.com/BiosecSFA/gentangle/wiki), website (https://biosecsfa.github.io/gentangle/), and user manual contain detailed instructions on how to use the different components of software and data, including examples and reproducing the results. The code is licensed under the GNU Affero General Public License version 3 (https://www.gnu.org/licenses/agpl.html).
      PubDate: Fri, 21 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae380
      Issue No: Vol. 40, No. 7 (2024)
       
  • Surface-based multimodal protein–ligand binding affinity prediction

    • Free pre-print version: Loading...

      First page: btae413
      Abstract: AbstractMotivationIn the field of drug discovery, accurately and effectively predicting the binding affinity between proteins and ligands is crucial for drug screening and optimization. However, current research primarily utilizes representations based on sequence or structure to predict protein–ligand binding affinity, with relatively less study on protein surface information, which is crucial for protein–ligand interactions. Moreover, when dealing with multimodal information of proteins, traditional approaches typically concatenate features from different modalities in a straightforward manner without considering the heterogeneity among them, which results in an inability to effectively exploit the complementary between modalities.ResultsWe introduce a novel multimodal feature extraction (MFE) framework that, for the first time, incorporates information from protein surfaces, 3D structures, and sequences, and uses cross-attention mechanism for feature alignment between different modalities. Experimental results show that our method achieves state-of-the-art performance in predicting protein–ligand binding affinity. Furthermore, we conduct ablation studies that demonstrate the effectiveness and necessity of protein surface information and multimodal feature alignment within the framework.Availability and implementationThe source code and data are available at https://github.com/Sultans0fSwing/MFE.
      PubDate: Fri, 21 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae413
      Issue No: Vol. 40, No. 7 (2024)
       
  • BAllC and BAllCools: efficient formatting and operating for single-cell
           DNA methylation data

    • Free pre-print version: Loading...

      First page: btae404
      Abstract: AbstractMotivationWith single-cell DNA methylation studies yielding vast datasets, existing data formats struggle with the unique challenges of storage and efficient operations, highlighting a need for improved solutions.ResultsBAllC (Binary All Cytosines) emerges as a tailored format for methylation data, addressing these challenges. BAllCools, its complementary software toolkit, enhances parsing, indexing, and querying capabilities, promising superior operational speeds and reduced storage needs.Availability and implementationhttps://github.com/jksr/ballcools
      PubDate: Fri, 21 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae404
      Issue No: Vol. 40, No. 7 (2024)
       
  • SEraster: a rasterization preprocessing framework for scalable spatial
           omics data analysis

    • Free pre-print version: Loading...

      First page: btae412
      Abstract: AbstractMotivationSpatial omics data demand computational analysis but many analysis tools have computational resource requirements that increase with the number of cells analyzed. This presents scalability challenges as researchers use spatial omics technologies to profile millions of cells.ResultsTo enhance the scalability of spatial omics data analysis, we developed a rasterization preprocessing framework called SEraster that aggregates cellular information into spatial pixels. We apply SEraster to both real and simulated spatial omics data prior to spatial variable gene expression analysis to demonstrate that such preprocessing can reduce computational resource requirements while maintaining high performance, including as compared to other down-sampling approaches. We further integrate SEraster with existing analysis tools to characterize cell-type spatial co-enrichment across length scales. Finally, we apply SEraster to enable analysis of a mouse pup spatial omics dataset with over a million cells to identify tissue-level and cell-type-specific spatially variable genes as well as spatially co-enriched cell types that recapitulate expected organ structures.Availability and implementationSEraster is implemented as an R package on GitHub (https://github.com/JEFworks-Lab/SEraster) with additional tutorials at https://JEF.works/SEraster.
      PubDate: Thu, 20 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae412
      Issue No: Vol. 40, No. 7 (2024)
       
  • MethParquet: an R package for rapid and efficient DNA methylation
           association analysis adopting Apache Parquet

    • Free pre-print version: Loading...

      First page: btae410
      Abstract: AbstractSummaryGenome-wide DNA methylation (DNAm) profiling is indispensable for unveiling how DNAm regulates biological pathways and individual phenotypes. However, managing and analyzing extensive DNAm data generated from large cohort studies present computational obstacles. Apache Parquet is a data file format that allows for efficient data storage, retrieval, and manipulation, alleviating computational hurdles associated with conventional row-based formats. We here introduce MethParquet, the first R package leveraging the columnar Parquet format for efficient DNAm data analysis. It can be used for data extraction, methylation risk score calculation, epigenome-wide association analyses, and other standard post-quality control tasks. The package flexibly implements diverse regression models. Via a public methylation dataset, we show the efficiency of this package in reducing running time and RAM usage in large-scale EWAS.Availability and implementationThe MethParquet R package is publicly available on the GitHub repository https://github.com/ZWangTen/MethParquet. It includes a vignette and a toy dataset derived from a public resource.
      PubDate: Wed, 19 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae410
      Issue No: Vol. 40, No. 7 (2024)
       
  • A deep learning method to predict bacterial ADP-ribosyltransferase toxins

    • Free pre-print version: Loading...

      First page: btae378
      Abstract: AbstractMotivationADP-ribosylation is a critical modification involved in regulating diverse cellular processes, including chromatin structure regulation, RNA transcription, and cell death. Bacterial ADP-ribosyltransferase toxins (bARTTs) serve as potent virulence factors that orchestrate the manipulation of host cell functions to facilitate bacterial pathogenesis. Despite their pivotal role, the bioinformatic identification of novel bARTTs poses a formidable challenge due to limited verified data and the inherent sequence diversity among bARTT members.ResultsWe proposed a deep learning-based model, ARTNet, specifically engineered to predict bARTTs from bacterial genomes. Initially, we introduced an effective data augmentation method to address the issue of data scarcity in training ARTNet. Subsequently, we employed a data optimization strategy by utilizing ART-related domain subsequences instead of the primary full sequences, thereby significantly enhancing the performance of ARTNet. ARTNet achieved a Matthew’s correlation coefficient (MCC) of 0.9351 and an F1-score (macro) of 0.9666 on repeated independent test datasets, outperforming three other deep learning models and six traditional machine learning models in terms of time efficiency and accuracy. Furthermore, we empirically demonstrated the ability of ARTNet to predict novel bARTTs across domain superfamilies without sequence similarity. We anticipate that ARTNet will greatly facilitate the screening and identification of novel bARTTs from bacterial genomes.Availability and implementationARTNet is publicly accessible at http://www.mgc.ac.cn/ARTNet/. The source code of ARTNet is freely available at https://github.com/zhengdd0422/ARTNet/.
      PubDate: Mon, 17 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae378
      Issue No: Vol. 40, No. 7 (2024)
       
  • D’or: deep orienter of protein–protein interaction networks

    • Free pre-print version: Loading...

      First page: btae355
      Abstract: AbstractMotivationProtein–protein interactions (PPIs) provide the skeleton for signal transduction in the cell. Current PPI measurement techniques do not provide information on their directionality which is critical for elucidating signaling pathways. To date, there are hundreds of thousands of known PPIs in public databases, yet only a small fraction of them have an assigned direction. This information gap calls for computational approaches for inferring the directionality of PPIs, aka network orientation.ResultsIn this work, we propose a novel deep learning approach for PPI network orientation. Our method first generates a set of proximity scores between a protein interaction and sets of cause and effect proteins using a network propagation procedure. Each of these score sets is fed, one at a time, to a deep set encoder whose outputs are used as features for predicting the interaction’s orientation. On a comprehensive dataset of oriented PPIs taken from five different sources, we achieve an area under the precision–recall curve of 0.89–0.92, outperforming previous methods. We further demonstrate the utility of the oriented network in prioritizing cancer driver genes and disease genes.Availability and implementationD’or is implemented in Python and is publicly available at https://github.com/pirakd/DeepOrienter.
      PubDate: Tue, 11 Jun 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae355
      Issue No: Vol. 40, No. 7 (2024)
       
  • Representations of lipid nanoparticles using large language models for
           transfection efficiency prediction

    • Free pre-print version: Loading...

      First page: btae342
      Abstract: AbstractMotivationLipid nanoparticles (LNPs) are the most widely used vehicles for mRNA vaccine delivery. The structure of the lipids composing the LNPs can have a major impact on the effectiveness of the mRNA payload. Several properties should be optimized to improve delivery and expression including biodegradability, synthetic accessibility, and transfection efficiency.ResultsTo optimize LNPs, we developed and tested models that enable the virtual screening of LNPs with high transfection efficiency. Our best method uses the lipid Simplified Molecular-Input Line-Entry System (SMILES) as inputs to a large language model. Large language model-generated embeddings are then used by a downstream gradient-boosting classifier. As we show, our method can more accurately predict lipid properties, which could lead to higher efficiency and reduced experimental time and costs.Availability and implementationCode and data links available at: https://github.com/Sanofi-Public/LipoBART.
      PubDate: Wed, 29 May 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae342
      Issue No: Vol. 40, No. 7 (2024)
       
  • CodonBERT: a BERT-based architecture tailored for codon optimization using
           the cross-attention mechanism

    • Free pre-print version: Loading...

      First page: btae330
      Abstract: AbstractMotivationDue to the varying delivery methods of mRNA vaccines, codon optimization plays a critical role in vaccine design to improve the stability and expression of proteins in specific tissues. Considering the many-to-one relationship between synonymous codons and amino acids, the number of mRNA sequences encoding the same amino acid sequence could be enormous. Finding stable and highly expressed mRNA sequences from the vast sequence space using in silico methods can generally be viewed as a path-search problem or a machine translation problem. However, current deep learning-based methods inspired by machine translation may have some limitations, such as recurrent neural networks, which have a weak ability to capture the long-term dependencies of codon preferences.ResultsWe develop a BERT-based architecture that uses the cross-attention mechanism for codon optimization. In CodonBERT, the codon sequence is randomly masked with each codon serving as a key and a value. In the meantime, the amino acid sequence is used as the query. CodonBERT was trained on high-expression transcripts from Human Protein Atlas mixed with different proportions of high codon adaptation index codon sequences. The result showed that CodonBERT can effectively capture the long-term dependencies between codons and amino acids, suggesting that it can be used as a customized training framework for specific optimization targets.Availability and implementationCodonBERT is freely available on https://github.com/FPPGroup/CodonBERT.
      PubDate: Fri, 24 May 2024 00:00:00 GMT
      DOI: 10.1093/bioinformatics/btae330
      Issue No: Vol. 40, No. 7 (2024)
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 44.213.60.33
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-
JournalTOCs
 
 
  Subjects -> LIBRARY AND INFORMATION SCIENCES (Total: 392 journals)
    - DIGITAL CURATION AND PRESERVATION (13 journals)
    - LIBRARY ADMINISTRATION (1 journals)
    - LIBRARY AND INFORMATION SCIENCES (378 journals)

LIBRARY AND INFORMATION SCIENCES (378 journals)                  1 2 | Last

Showing 1 - 200 of 379 Journals sorted by number of followers
Library & Information Science Research     Hybrid Journal   (Followers: 2037)
Journal of Librarianship and Information Science     Hybrid Journal   (Followers: 1522)
Library Hi Tech     Hybrid Journal   (Followers: 1197)
Journal of Information Science     Hybrid Journal   (Followers: 1191)
Journal of Academic Librarianship     Hybrid Journal   (Followers: 1134)
Journal of Library & Information Services in Distance Learning     Hybrid Journal   (Followers: 1048)
Library Management     Hybrid Journal   (Followers: 1040)
The Electronic Library     Hybrid Journal   (Followers: 1004)
Library Quarterly     Full-text available via subscription   (Followers: 977)
Journal of Information Literacy     Open Access   (Followers: 935)
Global Knowledge, Memory and Communication     Hybrid Journal   (Followers: 933)
Information Technology and Libraries     Open Access   (Followers: 860)
Library Hi Tech News     Hybrid Journal   (Followers: 828)
International Journal of Library and Information Science     Open Access   (Followers: 781)
Information Retrieval     Hybrid Journal   (Followers: 768)
Information Sciences     Hybrid Journal   (Followers: 762)
New Library World     Hybrid Journal   (Followers: 718)
Information Systems Research     Full-text available via subscription   (Followers: 705)
Information Processing & Management     Hybrid Journal   (Followers: 701)
International Journal on Digital Libraries     Hybrid Journal   (Followers: 601)
College & Research Libraries     Open Access   (Followers: 577)
Evidence Based Library and Information Practice     Open Access   (Followers: 529)
Journal of Library and Information Science     Open Access   (Followers: 493)
International Information & Library Review     Hybrid Journal   (Followers: 484)
The Information Society: An International Journal     Hybrid Journal   (Followers: 452)
Library and Information Research     Open Access   (Followers: 415)
Library Trends     Full-text available via subscription   (Followers: 399)
Forensic Science International: Digital Investigation     Full-text available via subscription   (Followers: 364)
Canadian Journal of Information and Library Science     Full-text available via subscription   (Followers: 332)
International Journal of Library Science     Open Access   (Followers: 317)
Bioinformatics     Hybrid Journal   (Followers: 307)
College & Research Libraries News     Partially Free   (Followers: 303)
Journal of Information & Knowledge Management     Hybrid Journal   (Followers: 301)
portal: Libraries and the Academy     Full-text available via subscription   (Followers: 290)
Communications in Information Literacy     Open Access   (Followers: 285)
Library Leadership & Management     Open Access   (Followers: 283)
Journal of Electronic Resources Librarianship     Hybrid Journal   (Followers: 282)
The Reference Librarian     Hybrid Journal   (Followers: 281)
College & Undergraduate Libraries     Hybrid Journal   (Followers: 278)
Data Technologies and Applications     Hybrid Journal   (Followers: 275)
IFLA Journal     Hybrid Journal   (Followers: 273)
Journal of Library Administration     Hybrid Journal   (Followers: 271)
International Journal of Information Management     Hybrid Journal   (Followers: 268)
Library Collections, Acquisitions, and Technical Services     Hybrid Journal   (Followers: 260)
American Libraries     Partially Free   (Followers: 244)
Code4Lib Journal     Open Access   (Followers: 229)
Journal of the Medical Library Association     Open Access   (Followers: 226)
Australian Library Journal     Full-text available via subscription   (Followers: 224)
Cataloging & Classification Quarterly     Hybrid Journal   (Followers: 222)
Journal of Library Metadata     Hybrid Journal   (Followers: 222)
Journal of Documentation     Hybrid Journal   (Followers: 204)
Journal of Hospital Librarianship     Hybrid Journal   (Followers: 199)
Ariadne Magazine     Open Access   (Followers: 190)
Behavioral & Social Sciences Librarian     Hybrid Journal   (Followers: 189)
Aslib Proceedings     Hybrid Journal   (Followers: 188)
Library & Information History     Hybrid Journal   (Followers: 184)
Book History     Full-text available via subscription   (Followers: 180)
In the Library with the Lead Pipe     Open Access   (Followers: 177)
EDUCAUSE Review     Full-text available via subscription   (Followers: 175)
The Serials Librarian     Hybrid Journal   (Followers: 165)
Research Library Issues     Free   (Followers: 164)
New Review of Academic Librarianship     Hybrid Journal   (Followers: 158)
The Library : The Transactions of the Bibliographical Society     Hybrid Journal   (Followers: 156)
Library Technology Reports     Full-text available via subscription   (Followers: 155)
Against the Grain     Partially Free   (Followers: 153)
Journal of Creative Library Practice     Open Access   (Followers: 111)
DESIDOC Journal of Library & Information Technology     Open Access   (Followers: 108)
Australian Academic & Research Libraries     Full-text available via subscription   (Followers: 106)
Archives and Museum Informatics     Hybrid Journal   (Followers: 101)
European Journal of Information Systems     Hybrid Journal   (Followers: 99)
Online Information Review     Hybrid Journal   (Followers: 95)
Journal of Librarianship and Scholarly Communication     Open Access   (Followers: 89)
International Journal of Digital Curation     Open Access   (Followers: 87)
Information Technologies & International Development     Open Access   (Followers: 86)
Serials Review     Hybrid Journal   (Followers: 80)
Journal of Electronic Publishing     Open Access   (Followers: 80)
International Journal of Digital Library Systems     Full-text available via subscription   (Followers: 77)
Journal of Education in Library and Information Science - JELIS     Full-text available via subscription   (Followers: 75)
Library Resources & Technical Services     Full-text available via subscription   (Followers: 73)
African Journal of Library, Archives and Information Science     Full-text available via subscription   (Followers: 72)
Archival Science     Hybrid Journal   (Followers: 70)
Communicate : Journal of Library and Information Science     Full-text available via subscription   (Followers: 70)
LIBER Quarterly : The Journal of the Association of European Research Libraries     Open Access   (Followers: 69)
027.7 Zeitschrift für Bibliothekskultur / Journal for Library Culture     Open Access   (Followers: 69)
Journal of Interlibrary Loan Document Delivery & Electronic Reserve     Hybrid Journal   (Followers: 68)
Ethics and Information Technology     Hybrid Journal   (Followers: 66)
Journal of the Canadian Health Libraries Association / Journal de l'Association des bibliothèques de la santé du Canada     Open Access   (Followers: 66)
Practical Academic Librarianship : The International Journal of the SLA Academic Division     Open Access   (Followers: 65)
Library Philosophy and Practice     Open Access   (Followers: 65)
MIS Quarterly : Management Information Systems Quarterly     Hybrid Journal   (Followers: 62)
International Journal of Library Science     Full-text available via subscription   (Followers: 62)
Journal of Management Information Systems     Full-text available via subscription   (Followers: 59)
Science & Technology Libraries     Hybrid Journal   (Followers: 59)
Alexandria : The Journal of National and International Library and Information Issues     Full-text available via subscription   (Followers: 57)
Journal of Information Technology     Hybrid Journal   (Followers: 56)
The Bottom Line: Managing Library Finances     Hybrid Journal   (Followers: 56)
International Journal of Legal Information     Full-text available via subscription   (Followers: 56)
Journal of Health & Medical Informatics     Open Access   (Followers: 55)
Archives and Manuscripts     Hybrid Journal   (Followers: 55)
Partnership : the Canadian Journal of Library and Information Practice and Research     Open Access   (Followers: 54)
Library & Archival Security     Hybrid Journal   (Followers: 50)
Bangladesh Journal of Library and Information Science     Open Access   (Followers: 48)
OCLC Systems & Services     Hybrid Journal   (Followers: 47)
Community & Junior College Libraries     Hybrid Journal   (Followers: 45)
Information Discovery and Delivery     Hybrid Journal   (Followers: 44)
Medical Reference Services Quarterly     Hybrid Journal   (Followers: 41)
VINE Journal of Information and Knowledge Management Systems     Hybrid Journal   (Followers: 40)
Journal of Access Services     Hybrid Journal   (Followers: 39)
Journal of the Society of Archivists     Hybrid Journal   (Followers: 36)
Scholarly and Research Communication     Open Access   (Followers: 36)
Journal of Archival Organization     Hybrid Journal   (Followers: 33)
Public Library Quarterly     Hybrid Journal   (Followers: 33)
Information & Culture : A Journal of History     Full-text available via subscription   (Followers: 32)
Australasian Public Libraries and Information Services     Full-text available via subscription   (Followers: 32)
Journal of the Association for Information Systems     Open Access   (Followers: 31)
Research Evaluation     Hybrid Journal   (Followers: 30)
Foundations and Trends® in Information Retrieval     Full-text available via subscription   (Followers: 30)
International Journal of Information Retrieval Research     Full-text available via subscription   (Followers: 30)
Information     Open Access   (Followers: 29)
Health Information Management Journal     Hybrid Journal   (Followers: 28)
Information Manager (The)     Open Access   (Followers: 28)
Information Systems Frontiers     Hybrid Journal   (Followers: 27)
Access     Full-text available via subscription   (Followers: 27)
International Journal of Intellectual Property Management     Hybrid Journal   (Followers: 26)
International Journal of Information Privacy, Security and Integrity     Hybrid Journal   (Followers: 26)
Proceedings of the American Society for Information Science and Technology     Hybrid Journal   (Followers: 26)
Journal of the Institute of Conservation     Hybrid Journal   (Followers: 25)
Nordic Journal of Information Literacy in Higher Education     Open Access   (Followers: 25)
South African Journal of Libraries and Information Science     Open Access   (Followers: 23)
Journal of Information, Communication and Ethics in Society     Hybrid Journal   (Followers: 23)
LASIE : Library Automated Systems Information Exchange     Free   (Followers: 22)
InCite     Full-text available via subscription   (Followers: 21)
Georgia Library Quarterly     Open Access   (Followers: 21)
RBM : A Journal of Rare Books, Manuscripts, and Cultural Heritage     Open Access   (Followers: 21)
NASIG Newsletter     Open Access   (Followers: 21)
LOEX Quarterly     Full-text available via subscription   (Followers: 20)
Urban Library Journal     Open Access   (Followers: 19)
El Profesional de la Informacion     Full-text available via subscription   (Followers: 18)
Alexandría : Revista de Ciencias de la Información     Open Access   (Followers: 17)
Anales de Documentacion     Open Access   (Followers: 17)
International Journal of Web Portals     Full-text available via subscription   (Followers: 17)
Communication Booknotes Quarterly     Hybrid Journal   (Followers: 16)
Manuscripta     Full-text available via subscription   (Followers: 16)
International Journal of Information Technology, Communications and Convergence     Hybrid Journal   (Followers: 16)
Theological Librarianship : An Online Journal of the American Theological Library Association     Open Access   (Followers: 16)
Perspectives in International Librarianship     Open Access   (Followers: 16)
Ghana Library Journal     Full-text available via subscription   (Followers: 16)
Information Technologist (The)     Full-text available via subscription   (Followers: 16)
Bibliotheca Orientalis     Full-text available via subscription   (Followers: 15)
Collection and Curation     Hybrid Journal   (Followers: 15)
International Journal of Business Information Systems     Hybrid Journal   (Followers: 15)
Biblios     Open Access   (Followers: 15)
Notes     Full-text available via subscription   (Followers: 14)
Journal of Educational Media, Memory, and Society     Full-text available via subscription   (Followers: 14)
Alsic : Apprentissage des Langues et Systèmes d'Information et de Communication     Open Access   (Followers: 13)
InterActions: UCLA Journal of Education and Information     Open Access   (Followers: 13)
International Journal of Intercultural Information Management     Hybrid Journal   (Followers: 12)
Journal of Information Technology Teaching Cases     Hybrid Journal   (Followers: 12)
Eastern Librarian     Open Access   (Followers: 12)
Journal of Religious & Theological Information     Hybrid Journal   (Followers: 11)
Universal Access in the Information Society     Hybrid Journal   (Followers: 11)
International Journal of Information and Decision Sciences     Hybrid Journal   (Followers: 11)
Kansas Library Association College & University Libraries Section Proceedings     Open Access   (Followers: 11)
Journal of Global Information Management     Full-text available via subscription   (Followers: 10)
AIB Studi     Full-text available via subscription   (Followers: 10)
Southeastern Librarian     Open Access   (Followers: 9)
e & i Elektrotechnik und Informationstechnik     Hybrid Journal   (Followers: 8)
BIBLOS - Revista do Departamento de Biblioteconomia e História     Open Access   (Followers: 8)
International Journal of Multicriteria Decision Making     Hybrid Journal   (Followers: 8)
JISTEM : Journal of Information Systems and Technology Management     Open Access   (Followers: 8)
International Journal of Multimedia Information Retrieval     Partially Free   (Followers: 8)
eLucidate     Open Access   (Followers: 8)
Judaica Librarianship     Open Access   (Followers: 8)
New Review of Information Networking     Hybrid Journal   (Followers: 7)
Idaho Librarian     Free   (Followers: 7)
Journal of the South African Society of Archivists     Full-text available via subscription   (Followers: 7)
Slavic & East European Information Resources     Hybrid Journal   (Followers: 6)
Egyptian Informatics Journal     Open Access   (Followers: 6)
Nordic Journal of Library and Information Studies     Open Access   (Followers: 6)
Informaatiotutkimus     Open Access   (Followers: 5)
Revista Interamericana de Bibliotecología     Open Access   (Followers: 5)
CIC. Cuadernos de Informacion y Comunicacion     Open Access   (Followers: 5)
Bridgewater Review     Open Access   (Followers: 5)
Open Systems & Information Dynamics     Hybrid Journal   (Followers: 4)
International Journal of Cooperative Information Systems     Hybrid Journal   (Followers: 4)
OJS på dansk     Open Access   (Followers: 4)
Revista Española de Documentación Científica     Open Access   (Followers: 4)
International Journal of Organisational Design and Engineering     Hybrid Journal   (Followers: 3)
HLA News     Full-text available via subscription   (Followers: 3)
SLIS Student Research Journal     Open Access   (Followers: 3)
VRA Bulletin     Open Access   (Followers: 3)
SLIS Connecting     Open Access   (Followers: 3)
Información, Cultura y Sociedad     Open Access   (Followers: 2)
Revista General de Información y Documentación     Open Access   (Followers: 2)
Revue française des sciences de l’information et de la communication     Open Access   (Followers: 2)
Journal of the Southern Association for Information Systems     Open Access   (Followers: 2)
In Monte Artium     Full-text available via subscription   (Followers: 1)
Documentación de las Ciencias de la Información     Open Access   (Followers: 1)
RUIDERAe : Revista de Unidades de Información. Descripción de Experiencias y Resultados Aplicados     Open Access  
Palabra Clave (La Plata)     Open Access  

        1 2 | Last

Similar Journals
Similar Journals
HOME > Browse the 73 Subjects covered by JournalTOCs  
SubjectTotal Journals
 
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 44.213.60.33
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-