Publisher: Oxford University Press   (Total: 413 journals)

 A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

        1 2 3 | Last   [Sort by number of followers]   [Restore default list]

Showing 1 - 200 of 413 Journals sorted alphabetically
ACS Symposium Series     Full-text available via subscription   (Followers: 3, SJR: 0.189, CiteScore: 0)
Acta Biochimica et Biophysica Sinica     Hybrid Journal   (Followers: 5, SJR: 0.79, CiteScore: 2)
Adaptation     Hybrid Journal   (Followers: 9, SJR: 0.143, CiteScore: 0)
Advances in Nutrition     Hybrid Journal   (Followers: 61, SJR: 2.196, CiteScore: 5)
Aesthetic Surgery J.     Hybrid Journal   (Followers: 8, SJR: 1.434, CiteScore: 1)
Aesthetic Surgery J. Open Forum     Open Access   (Followers: 1)
African Affairs     Hybrid Journal   (Followers: 74, SJR: 1.869, CiteScore: 2)
Age and Ageing     Hybrid Journal   (Followers: 95, SJR: 1.989, CiteScore: 4)
Alcohol and Alcoholism     Hybrid Journal   (Followers: 21, SJR: 1.376, CiteScore: 3)
American Entomologist     Hybrid Journal   (Followers: 8)
American Historical Review     Hybrid Journal   (Followers: 217, SJR: 0.467, CiteScore: 1)
American J. of Agricultural Economics     Hybrid Journal   (Followers: 54, SJR: 2.113, CiteScore: 3)
American J. of Clinical Nutrition     Hybrid Journal   (Followers: 233, SJR: 3.438, CiteScore: 6)
American J. of Epidemiology     Hybrid Journal   (Followers: 229, SJR: 2.713, CiteScore: 3)
American J. of Health-System Pharmacy     Full-text available via subscription   (Followers: 64, SJR: 0.595, CiteScore: 1)
American J. of Hypertension     Hybrid Journal   (Followers: 29, SJR: 1.322, CiteScore: 3)
American J. of Jurisprudence     Hybrid Journal   (Followers: 19, SJR: 0.281, CiteScore: 1)
American J. of Legal History     Full-text available via subscription   (Followers: 11, SJR: 0.116, CiteScore: 0)
American Law and Economics Review     Hybrid Journal   (Followers: 31, SJR: 1.053, CiteScore: 1)
American Literary History     Hybrid Journal   (Followers: 19, SJR: 0.391, CiteScore: 0)
Analysis     Hybrid Journal   (Followers: 25, SJR: 1.038, CiteScore: 1)
Animal Frontiers     Hybrid Journal   (Followers: 2)
Annals of Behavioral Medicine     Hybrid Journal   (Followers: 15, SJR: 1.423, CiteScore: 3)
Annals of Botany     Hybrid Journal   (Followers: 38, SJR: 1.721, CiteScore: 4)
Annals of Oncology     Hybrid Journal   (Followers: 62, SJR: 5.599, CiteScore: 9)
Annals of the Entomological Society of America     Full-text available via subscription   (Followers: 11, SJR: 0.722, CiteScore: 1)
Annals of Work Exposures and Health     Hybrid Journal   (Followers: 11, SJR: 0.728, CiteScore: 2)
Antibody Therapeutics     Open Access   (Followers: 1)
AoB Plants     Open Access   (Followers: 4, SJR: 1.28, CiteScore: 3)
Applied Economic Perspectives and Policy     Hybrid Journal   (Followers: 18, SJR: 0.858, CiteScore: 2)
Applied Linguistics     Hybrid Journal   (Followers: 66, SJR: 2.987, CiteScore: 3)
Applied Mathematics Research eXpress     Hybrid Journal   (Followers: 1, SJR: 1.241, CiteScore: 1)
Arbitration Intl.     Full-text available via subscription   (Followers: 20)
Arbitration Law Reports and Review     Hybrid Journal   (Followers: 14)
Archives of Clinical Neuropsychology     Hybrid Journal   (Followers: 32, SJR: 0.731, CiteScore: 2)
Aristotelian Society Supplementary Volume     Hybrid Journal   (Followers: 2)
Arthropod Management Tests     Hybrid Journal   (Followers: 2)
Astronomy & Geophysics     Hybrid Journal   (Followers: 47, SJR: 0.146, CiteScore: 0)
Behavioral Ecology     Hybrid Journal   (Followers: 58, SJR: 1.871, CiteScore: 3)
Bioinformatics     Hybrid Journal   (Followers: 397, SJR: 6.14, CiteScore: 8)
Biology Methods and Protocols     Open Access   (Followers: 1)
Biology of Reproduction     Full-text available via subscription   (Followers: 11, SJR: 1.446, CiteScore: 3)
Biometrika     Hybrid Journal   (Followers: 20, SJR: 3.485, CiteScore: 2)
BioScience     Hybrid Journal   (Followers: 30, SJR: 2.754, CiteScore: 4)
Bioscience Horizons : The National Undergraduate Research J.     Open Access   (Followers: 3, SJR: 0.146, CiteScore: 0)
Biostatistics     Hybrid Journal   (Followers: 17, SJR: 1.553, CiteScore: 2)
BJA : British J. of Anaesthesia     Hybrid Journal   (Followers: 235, SJR: 2.115, CiteScore: 3)
BJA Education     Hybrid Journal   (Followers: 69)
Brain     Hybrid Journal   (Followers: 78, SJR: 5.858, CiteScore: 7)
Brain Communications     Open Access   (Followers: 2)
Briefings in Bioinformatics     Hybrid Journal   (Followers: 53, SJR: 2.505, CiteScore: 5)
Briefings in Functional Genomics     Hybrid Journal   (Followers: 3, SJR: 2.15, CiteScore: 3)
British J. for the Philosophy of Science     Hybrid Journal   (Followers: 42, SJR: 2.161, CiteScore: 2)
British J. of Aesthetics     Hybrid Journal   (Followers: 24, SJR: 0.508, CiteScore: 1)
British J. of Criminology     Hybrid Journal   (Followers: 623, SJR: 1.828, CiteScore: 3)
British J. of Social Work     Hybrid Journal   (Followers: 99, SJR: 1.019, CiteScore: 2)
British Medical Bulletin     Hybrid Journal   (Followers: 6, SJR: 1.355, CiteScore: 3)
British Yearbook of Intl. Law     Hybrid Journal   (Followers: 35)
Bulletin of the London Mathematical Society     Hybrid Journal   (Followers: 3, SJR: 1.376, CiteScore: 1)
Cambridge J. of Economics     Hybrid Journal   (Followers: 76, SJR: 0.764, CiteScore: 2)
Cambridge J. of Regions, Economy and Society     Hybrid Journal   (Followers: 13, SJR: 2.438, CiteScore: 4)
Cambridge Quarterly     Hybrid Journal   (Followers: 11, SJR: 0.104, CiteScore: 0)
Capital Markets Law J.     Hybrid Journal   (Followers: 3, SJR: 0.222, CiteScore: 0)
Carcinogenesis     Hybrid Journal   (Followers: 2, SJR: 2.135, CiteScore: 5)
Cardiovascular Research     Hybrid Journal   (Followers: 16, SJR: 3.002, CiteScore: 5)
Cerebral Cortex     Hybrid Journal   (Followers: 56, SJR: 3.892, CiteScore: 6)
CESifo Economic Studies     Hybrid Journal   (Followers: 24, SJR: 0.483, CiteScore: 1)
Chemical Senses     Hybrid Journal   (Followers: 1, SJR: 1.42, CiteScore: 3)
Children and Schools     Hybrid Journal   (Followers: 8, SJR: 0.246, CiteScore: 0)
Chinese J. of Comparative Law     Hybrid Journal   (Followers: 5, SJR: 0.412, CiteScore: 0)
Chinese J. of Intl. Law     Hybrid Journal   (Followers: 24, SJR: 0.329, CiteScore: 0)
Chinese J. of Intl. Politics     Hybrid Journal   (Followers: 11, SJR: 1.392, CiteScore: 2)
Christian Bioethics: Non-Ecumenical Studies in Medical Morality     Hybrid Journal   (Followers: 10, SJR: 0.183, CiteScore: 0)
Classical Receptions J.     Hybrid Journal   (Followers: 29, SJR: 0.123, CiteScore: 0)
Clean Energy     Open Access   (Followers: 3)
Clinical Infectious Diseases     Hybrid Journal   (Followers: 79, SJR: 5.051, CiteScore: 5)
Communication Theory     Hybrid Journal   (Followers: 29, SJR: 2.424, CiteScore: 3)
Communication, Culture & Critique     Hybrid Journal   (Followers: 29, SJR: 0.222, CiteScore: 1)
Community Development J.     Hybrid Journal   (Followers: 28, SJR: 0.268, CiteScore: 1)
Computer J.     Hybrid Journal   (Followers: 9, SJR: 0.319, CiteScore: 1)
Conservation Physiology     Open Access   (Followers: 3, SJR: 1.818, CiteScore: 3)
Contemporary Women's Writing     Hybrid Journal   (Followers: 12, SJR: 0.121, CiteScore: 0)
Contributions to Political Economy     Hybrid Journal   (Followers: 8, SJR: 0.906, CiteScore: 1)
Critical Values     Full-text available via subscription  
Current Developments in Nutrition     Open Access   (Followers: 5)
Current Legal Problems     Hybrid Journal   (Followers: 29)
Current Zoology     Full-text available via subscription   (Followers: 6, SJR: 1.164, CiteScore: 2)
Database : The J. of Biological Databases and Curation     Open Access   (Followers: 10, SJR: 1.791, CiteScore: 3)
Digital Scholarship in the Humanities     Hybrid Journal   (Followers: 15, SJR: 0.259, CiteScore: 1)
Diplomatic History     Hybrid Journal   (Followers: 25, SJR: 0.45, CiteScore: 1)
DNA Research     Open Access   (Followers: 6, SJR: 2.866, CiteScore: 6)
Dynamics and Statistics of the Climate System     Open Access   (Followers: 4)
Early Music     Hybrid Journal   (Followers: 17, SJR: 0.139, CiteScore: 0)
Econometrics J.     Hybrid Journal   (Followers: 34, SJR: 2.926, CiteScore: 1)
Economic J.     Hybrid Journal   (Followers: 124, SJR: 5.161, CiteScore: 3)
Economic Policy     Hybrid Journal   (Followers: 51, SJR: 3.584, CiteScore: 3)
ELT J.     Hybrid Journal   (Followers: 27, SJR: 0.942, CiteScore: 1)
English Historical Review     Hybrid Journal   (Followers: 60, SJR: 0.612, CiteScore: 1)
English: J. of the English Association     Hybrid Journal   (Followers: 23, SJR: 0.1, CiteScore: 0)
Environmental Entomology     Full-text available via subscription   (Followers: 12, SJR: 0.818, CiteScore: 2)
Environmental Epigenetics     Open Access   (Followers: 2)
Environmental History     Hybrid Journal   (Followers: 28, SJR: 0.408, CiteScore: 1)
EP-Europace     Hybrid Journal   (Followers: 3, SJR: 2.748, CiteScore: 4)
Epidemiologic Reviews     Hybrid Journal   (Followers: 9, SJR: 4.505, CiteScore: 8)
ESHRE Monographs     Hybrid Journal  
Essays in Criticism     Hybrid Journal   (Followers: 23, SJR: 0.113, CiteScore: 0)
European Heart J.     Hybrid Journal   (Followers: 67, SJR: 9.315, CiteScore: 9)
European Heart J. - Cardiovascular Imaging     Hybrid Journal   (Followers: 10, SJR: 3.625, CiteScore: 3)
European Heart J. - Cardiovascular Pharmacotherapy     Full-text available via subscription   (Followers: 2)
European Heart J. - Quality of Care and Clinical Outcomes     Hybrid Journal  
European Heart J. : Case Reports     Open Access   (Followers: 1)
European Heart J. Supplements     Hybrid Journal   (Followers: 7, SJR: 0.223, CiteScore: 0)
European J. of Cardio-Thoracic Surgery     Hybrid Journal   (Followers: 9, SJR: 1.681, CiteScore: 2)
European J. of Intl. Law     Hybrid Journal   (Followers: 240, SJR: 0.694, CiteScore: 1)
European J. of Orthodontics     Hybrid Journal   (Followers: 5, SJR: 1.279, CiteScore: 2)
European J. of Public Health     Hybrid Journal   (Followers: 23, SJR: 1.36, CiteScore: 2)
European Review of Agricultural Economics     Hybrid Journal   (Followers: 12, SJR: 1.172, CiteScore: 2)
European Review of Economic History     Hybrid Journal   (Followers: 31, SJR: 0.702, CiteScore: 1)
European Sociological Review     Hybrid Journal   (Followers: 46, SJR: 2.728, CiteScore: 3)
Evolution, Medicine, and Public Health     Open Access   (Followers: 12)
Family Practice     Hybrid Journal   (Followers: 16, SJR: 1.018, CiteScore: 2)
Fems Microbiology Ecology     Hybrid Journal   (Followers: 19, SJR: 1.492, CiteScore: 4)
Fems Microbiology Letters     Hybrid Journal   (Followers: 29, SJR: 0.79, CiteScore: 2)
Fems Microbiology Reviews     Hybrid Journal   (Followers: 38, SJR: 7.063, CiteScore: 13)
Fems Yeast Research     Hybrid Journal   (Followers: 14, SJR: 1.308, CiteScore: 3)
Food Quality and Safety     Open Access   (Followers: 1)
Foreign Policy Analysis     Hybrid Journal   (Followers: 26, SJR: 1.425, CiteScore: 1)
Forest Science     Hybrid Journal   (Followers: 8, SJR: 0.89, CiteScore: 2)
Forestry: An Intl. J. of Forest Research     Hybrid Journal   (Followers: 16, SJR: 1.133, CiteScore: 3)
Forum for Modern Language Studies     Hybrid Journal   (Followers: 6, SJR: 0.104, CiteScore: 0)
French History     Hybrid Journal   (Followers: 36, SJR: 0.118, CiteScore: 0)
French Studies     Hybrid Journal   (Followers: 21, SJR: 0.148, CiteScore: 0)
French Studies Bulletin     Hybrid Journal   (Followers: 10, SJR: 0.152, CiteScore: 0)
Gastroenterology Report     Open Access   (Followers: 2)
Genome Biology and Evolution     Open Access   (Followers: 17, SJR: 2.578, CiteScore: 4)
Geophysical J. Intl.     Hybrid Journal   (Followers: 39, SJR: 1.506, CiteScore: 3)
German History     Hybrid Journal   (Followers: 27, SJR: 0.161, CiteScore: 0)
GigaScience     Open Access   (Followers: 6, SJR: 5.022, CiteScore: 7)
Global Summitry     Hybrid Journal   (Followers: 1)
Glycobiology     Hybrid Journal   (Followers: 10, SJR: 1.493, CiteScore: 3)
Health and Social Work     Hybrid Journal   (Followers: 68, SJR: 0.388, CiteScore: 1)
Health Education Research     Hybrid Journal   (Followers: 19, SJR: 0.854, CiteScore: 2)
Health Policy and Planning     Hybrid Journal   (Followers: 26, SJR: 1.512, CiteScore: 2)
Health Promotion Intl.     Hybrid Journal   (Followers: 27, SJR: 0.812, CiteScore: 2)
History Workshop J.     Hybrid Journal   (Followers: 33, SJR: 1.278, CiteScore: 1)
Holocaust and Genocide Studies     Hybrid Journal   (Followers: 30, SJR: 0.105, CiteScore: 0)
Human Communication Research     Hybrid Journal   (Followers: 16, SJR: 2.146, CiteScore: 3)
Human Molecular Genetics     Hybrid Journal   (Followers: 11, SJR: 3.555, CiteScore: 5)
Human Reproduction     Hybrid Journal   (Followers: 76, SJR: 2.643, CiteScore: 5)
Human Reproduction Open     Open Access   (Followers: 1)
Human Reproduction Update     Hybrid Journal   (Followers: 18, SJR: 5.317, CiteScore: 10)
Human Rights Law Review     Hybrid Journal   (Followers: 66, SJR: 0.756, CiteScore: 1)
ICES J. of Marine Science: J. du Conseil     Hybrid Journal   (Followers: 59, SJR: 1.591, CiteScore: 3)
ICSID Review : Foreign Investment Law J.     Hybrid Journal   (Followers: 11)
ILAR J.     Hybrid Journal   (Followers: 3, SJR: 1.732, CiteScore: 4)
IMA J. of Applied Mathematics     Hybrid Journal   (SJR: 0.679, CiteScore: 1)
IMA J. of Management Mathematics     Hybrid Journal   (SJR: 0.538, CiteScore: 1)
IMA J. of Mathematical Control and Information     Hybrid Journal   (Followers: 2, SJR: 0.496, CiteScore: 1)
IMA J. of Numerical Analysis - advance access     Hybrid Journal   (SJR: 1.987, CiteScore: 2)
Industrial and Corporate Change     Hybrid Journal   (Followers: 12, SJR: 1.792, CiteScore: 2)
Industrial Law J.     Hybrid Journal   (Followers: 29, SJR: 0.249, CiteScore: 1)
Inflammatory Bowel Diseases     Hybrid Journal   (Followers: 45, SJR: 2.511, CiteScore: 4)
Information and Inference     Free  
Innovation in Aging     Open Access   (Followers: 1)
Insect Systematics and Diversity     Hybrid Journal  
Integrative and Comparative Biology     Hybrid Journal   (Followers: 10, SJR: 1.319, CiteScore: 2)
Integrative Biology     Full-text available via subscription   (Followers: 5, SJR: 1.36, CiteScore: 3)
Integrative Organismal Biology     Open Access  
Interacting with Computers     Hybrid Journal   (Followers: 10, SJR: 0.292, CiteScore: 1)
Interactive CardioVascular and Thoracic Surgery     Hybrid Journal   (Followers: 7, SJR: 0.762, CiteScore: 1)
Intl. Affairs     Hybrid Journal   (Followers: 72, SJR: 1.505, CiteScore: 3)
Intl. Data Privacy Law     Hybrid Journal   (Followers: 22)
Intl. Health     Hybrid Journal   (Followers: 7, SJR: 0.851, CiteScore: 2)
Intl. Immunology     Hybrid Journal   (Followers: 4, SJR: 2.167, CiteScore: 4)
Intl. J. for Quality in Health Care     Hybrid Journal   (Followers: 40, SJR: 1.348, CiteScore: 2)
Intl. J. of Constitutional Law     Hybrid Journal   (Followers: 57, SJR: 0.601, CiteScore: 1)
Intl. J. of Epidemiology     Hybrid Journal   (Followers: 293, SJR: 3.969, CiteScore: 5)
Intl. J. of Law and Information Technology     Hybrid Journal   (Followers: 5, SJR: 0.202, CiteScore: 1)
Intl. J. of Law, Policy and the Family     Hybrid Journal   (Followers: 21, SJR: 0.223, CiteScore: 1)
Intl. J. of Lexicography     Hybrid Journal   (Followers: 9, SJR: 0.285, CiteScore: 1)
Intl. J. of Low-Carbon Technologies     Open Access   (Followers: 1, SJR: 0.403, CiteScore: 1)
Intl. J. of Neuropsychopharmacology     Open Access   (Followers: 3, SJR: 1.808, CiteScore: 4)
Intl. J. of Public Opinion Research     Hybrid Journal   (Followers: 11, SJR: 1.545, CiteScore: 1)
Intl. J. of Refugee Law     Hybrid Journal   (Followers: 38, SJR: 0.389, CiteScore: 1)
Intl. J. of Transitional Justice     Hybrid Journal   (Followers: 14, SJR: 0.724, CiteScore: 2)
Intl. Mathematics Research Notices     Hybrid Journal   (Followers: 1, SJR: 2.168, CiteScore: 1)
Intl. Political Sociology     Hybrid Journal   (Followers: 41, SJR: 1.465, CiteScore: 3)
Intl. Relations of the Asia-Pacific     Hybrid Journal   (Followers: 25, SJR: 0.401, CiteScore: 1)
Intl. Studies Perspectives     Hybrid Journal   (Followers: 9, SJR: 0.983, CiteScore: 1)
Intl. Studies Quarterly     Hybrid Journal   (Followers: 55, SJR: 2.581, CiteScore: 2)
Intl. Studies Review     Hybrid Journal   (Followers: 24, SJR: 1.201, CiteScore: 1)
ISLE: Interdisciplinary Studies in Literature and Environment     Hybrid Journal   (Followers: 2, SJR: 0.15, CiteScore: 0)
ITNOW     Hybrid Journal   (Followers: 1, SJR: 0.103, CiteScore: 0)
J. of African Economies     Hybrid Journal   (Followers: 18, SJR: 0.533, CiteScore: 1)
J. of American History     Hybrid Journal   (Followers: 55, SJR: 0.297, CiteScore: 1)
J. of Analytical Toxicology     Hybrid Journal   (Followers: 15, SJR: 1.065, CiteScore: 2)
J. of Antimicrobial Chemotherapy     Hybrid Journal   (Followers: 16, SJR: 2.419, CiteScore: 4)
J. of Antitrust Enforcement     Hybrid Journal   (Followers: 1)
J. of Applied Poultry Research     Hybrid Journal   (Followers: 5, SJR: 0.585, CiteScore: 1)
J. of Biochemistry     Hybrid Journal   (Followers: 46, SJR: 1.226, CiteScore: 2)

        1 2 3 | Last   [Sort by number of followers]   [Restore default list]

Similar Journals
Journal Cover
Database : The Journal of Biological Databases and Curation
Journal Prestige (SJR): 1.791
Citation Impact (citeScore): 3
Number of Followers: 10  

  This is an Open Access Journal Open Access journal
ISSN (Online) 1758-0463
Published by Oxford University Press Homepage  [413 journals]
  • SAGER: a database of Symbiodiniaceae and Algal Genomic Resource

    • Authors: Yu L; Li T, Li L, et al.
      Abstract: AbstractSymbiodiniaceae dinoflagellates are essential endosymbionts of reef building corals and some other invertebrates. Information of their genome structure and function is critical for understanding coral symbiosis and bleaching. With the rapid development of sequencing technology, genome draft assemblies of several Symbiodiniaceae species and diverse marine algal genomes have become publicly available but spread in multiple separate locations. Here, we present a Symbiodiniaceae and Algal Genomic Resource Database (SAGER), a user-friendly online repository for integrating existing genomic data of Symbiodiniaceae species and diverse marine algal gene sets from MMETSP and PhyloDB databases. Relevant algal data are included to facilitate comparative analyses. The database is freely accessible at http://sampgr.org.cn. It provides comprehensive tools for studying gene function, expression and comparative genomics, including search tools to identify gene information from Symbiodiniaceae species, and BLAST tool to find orthologs from marine algae and protists. Moreover, SAGER integrates transcriptome datasets derived from diverse culture conditions of corresponding Symbiodiniaceae species. SAGER was developed with the capacity to incorporate future Symbiodiniaceae and algal genome and transcriptome data, and will serve as an open-access and sustained platform providing genomic and molecular tools that can be conveniently used to study Symbiodiniaceae and other marine algae.Database URL: http://sampgr.org.cn
      PubDate: Sat, 04 Jul 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa051
      Issue No: Vol. 2020 (2020)
       
  • RNAWRE: a resource of writers, readers and erasers of RNA modifications

    • Authors: Nie F; Feng P, Song X, et al.
      Abstract: AbstractRNA modifications are involved in various kinds of cellular biological processes. Accumulated evidences have demonstrated that the functions of RNA modifications are determined by the effectors that can catalyze, recognize and remove RNA modifications. They are called ‘writers’, ‘readers’ and ‘erasers’. The identification of RNA modification effectors will be helpful for understanding the regulatory mechanisms and biological functions of RNA modifications. In this work, we developed a database called RNAWRE that specially deposits RNA modification effectors. The current version of RNAWRE stored 2045 manually curated writers, readers and erasers for the six major kinds of RNA modifications, namely Cap, m1A, m6A, m5C, ψ and Poly A. The main modules of RNAWRE not only allow browsing and downloading the RNA modification effectors but also support the BLAST search of the potential RNA modification effectors in other species. We hope that RNAWRE will be helpful for the researches on RNA modifications.Database URL: http://rnawre.bio2db.com
      PubDate: Wed, 01 Jul 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa049
      Issue No: Vol. 2020 (2020)
       
  • CHDGKB: a knowledgebase for systematic understanding of genetic variations
           associated with non-syndromic congenital heart disease

    • Authors: Yang L; Yang Y, Liu X, et al.
      Abstract: AbstractCongenital heart disease (CHD) is one of the most common birth defects, with complex genetic and environmental etiologies. The reports of genetic variation associated with CHD have increased dramatically in recent years due to the revolutionary development of molecular technology. However, CHD is a heterogeneous disease, and its genetic origins remain inconclusive in most patients. Here we present a database of genetic variations for non-syndromic CHD (NS-CHD). By manually literature extraction and analyses, 5345 NS-CHD-associated genetic variations were collected, curated and stored in the public online database. The objective of our database is to provide the most comprehensive updates on NS-CHD genetic research and to aid systematic analyses of pathogenesis of NS-CHD in molecular level and the correlation between NS-CHD genotypes and phenotypes.Database URL: http://www.sysbio.org.cn/CHDGKB/
      PubDate: Wed, 01 Jul 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa048
      Issue No: Vol. 2020 (2020)
       
  • mAML: an automated machine learning pipeline with a microbiome repository
           for human disease classification

    • Authors: Yang F; Zou Q.
      Abstract: AbstractDue to the concerted efforts to utilize the microbial features to improve disease prediction capabilities, automated machine learning (AutoML) systems aiming to get rid of the tediousness in manually performing ML tasks are in great demand. Here we developed mAML, an ML model-building pipeline, which can automatically and rapidly generate optimized and interpretable models for personalized microbiome-based classification tasks in a reproducible way. The pipeline is deployed on a web-based platform, while the server is user-friendly and flexible and has been designed to be scalable according to the specific requirements. This pipeline exhibits high performance for 13 benchmark datasets including both binary and multi-class classification tasks. In addition, to facilitate the application of mAML and expand the human disease-related microbiome learning repository, we developed GMrepo ML repository (GMrepo Microbiome Learning repository) from the GMrepo database. The repository involves 120 microbiome-based classification tasks for 85 human-disease phenotypes referring to 12 429 metagenomic samples and 38 643 amplicon samples. The mAML pipeline and the GMrepo ML repository are expected to be important resources for researches in microbiology and algorithm developments.Database URL: http://lab.malab.cn/soft/mAML
      PubDate: Thu, 25 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa050
      Issue No: Vol. 2020 (2020)
       
  • Automated generation of gene summaries at the Alliance of Genome Resources

    • Authors: Kishore R; Arnaboldi V, Van Slyke C, et al.
      Abstract: AbstractShort paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed an algorithm that uses curated, structured gene data at the Alliance of Genome Resources (Alliance; www.alliancegenome.org) to automatically generate gene summaries that simulate natural language. The gene data used for this purpose include curated associations (annotations) to ontology terms from the Gene Ontology, Disease Ontology, model organism knowledgebase (MOK)-specific anatomy ontologies and Alliance orthology data. The method uses sentence templates for each data category included in the gene summary in order to build a natural language sentence from the list of terms associated with each gene. To improve readability of the summaries when numerous gene annotations are present, we developed a new algorithm that traverses ontology graphs in order to group terms by their common ancestors. The algorithm optimizes the coverage of the initial set of terms and limits the length of the final summary, using measures of information content of each ontology term as a criterion for inclusion in the summary. The automated gene summaries are generated with each Alliance release, ensuring that they reflect current data at the Alliance. Our method effectively leverages category-specific curation efforts of the Alliance member databases to create modular, structured and standardized gene summaries for seven member species of the Alliance. These automatically generated gene summaries make cross-species gene function comparisons tenable and increase discoverability of potential models of human disease. In addition to being displayed on Alliance gene pages, these summaries are also included on several MOK gene pages.
      PubDate: Fri, 19 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa037
      Issue No: Vol. 2020 (2020)
       
  • FOBI: an ontology to represent food intake data and associate it with
           metabolomic data

    • Authors: Castellano-Escuder P; González-Domínguez R, Wishart D, et al.
      Abstract: AbstractNutrition research can be conducted by using two complementary approaches: (i) traditional self-reporting methods or (ii) via metabolomics techniques to analyze food intake biomarkers in biofluids. However, the complexity and heterogeneity of these two very different types of data often hinder their analysis and integration. To manage this challenge, we have developed a novel ontology that describes food and their associated metabolite entities in a hierarchical way. This ontology uses a formal naming system, category definitions, properties and relations between both types of data. The ontology presented is called FOBI (Food-Biomarker Ontology) and it is composed of two interconnected sub-ontologies. One is a ’Food Ontology’ consisting of raw foods and ‘multi-component foods’ while the second is a ‘Biomarker Ontology’ containing food intake biomarkers classified by their chemical classes. These two sub-ontologies are conceptually independent but interconnected by different properties. This allows data and information regarding foods and food biomarkers to be visualized in a bidirectional way, going from metabolomics to nutritional data or vice versa. Potential applications of this ontology include the annotation of foods and biomarkers using a well-defined and consistent nomenclature, the standardized reporting of metabolomics workflows (e.g. metabolite identification, experimental design) or the application of different enrichment analysis approaches to analyze nutrimetabolomic data. Availability: FOBI is freely available in both OWL (Web Ontology Language) and OBO (Open Biomedical Ontologies) formats at the project’s Github repository (https://github.com/pcastellanoescuder/FoodBiomarkerOntology) and FOBI visualization tool is available in https://polcastellano.shinyapps.io/FOBI_Visualization_Tool/.
      PubDate: Wed, 17 Jun 2020 00:00:00 GMT
      DOI: 10.1093/databa/baaa033
      Issue No: Vol. 2020 (2020)
       
  • MaizeCUBIC: a comprehensive variation database for a maize synthetic
           population

    • Authors: Luo J; Wei C, Liu H, et al.
      Abstract: AbstractMaizeCUBIC is a free database that describes genomic variations, gene expression, phenotypes and quantitative trait locus (QTLs) for a maize CUBIC population (24 founders and 1404 inbred offspring). The database not only includes information for over 14M single nucleotide polymorphism (SNPs) and 43K indels previously identified but also contains 660K structure variations (SVs) and 600M novel sequences newly identified in the present study, which represents a comprehensive high-density variant map for a diverse population. Based on these genomic variations, the database would demonstrate the mosaic structure for each progeny, reflecting a high-resolution reshuffle across parental genomes. A total of 23 agronomic traits measured on parents and progeny in five locations, where are representative of the maize main growing regions in China, were also included in the database. To further explore the genotype–phenotype relationships, two different methods of genome-wide association studies (GWAS) were employed for dissecting the genetic architecture of 23 agronomic traits. Additionally, the Basic Local Alignment Search Tool and primer design tools are developed to promote follow-up analysis and experimental verification. All the original data and corresponding analytical results can be accessed through user-friendly online queries and web interface dynamic visualization, as well as downloadable files. These data and tools provide valuable resources on genetic and genomic studies of maize and other crops.
      PubDate: Tue, 16 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa044
      Issue No: Vol. 2020 (2020)
       
  • GTDB: an integrated resource for glycosyltransferase sequences and
           annotations

    • Authors: Zhou C; Xu Q, He S, et al.
      Abstract: AbstractGlycosyltransferases (GTs), a large class of carbohydrate-active enzymes, adds glycosyl moieties to various substrates to generate multiple bioactive compounds, including natural products with pharmaceutical or agrochemical values. Here, we first collected comprehensive information on GTs, including amino acid sequences, coding region sequences, available tertiary structures, protein classification families, catalytic reactions and metabolic pathways. Then, we developed sequence search and molecular docking processes for GTs, resulting in a GTs database (GTDB). In the present study, 520 179 GTs from approximately 21 647 species that involved in 394 kinds of different reactions were deposited in GTDB. GTDB has the following useful features: (i) text search is provided for retrieving the complete details of a query by combining multiple identifiers and data sources; (ii) a convenient browser allows users to browse data by different classifications and download data in batches; (iii) BLAST is offered for searching against pre-defined sequences, which can facilitate the annotation of the biological functions of query GTs; and lastly, (iv) GTdock using AutoDock Vina performs docking simulations of several GTs with the same single acceptor and displays the results based on 3Dmol.js allowing easy view of models.
      PubDate: Tue, 16 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa047
      Issue No: Vol. 2020 (2020)
       
  • PvP01-DB: computational structural and functional characterization of
           soluble proteome of PvP01 strain of Plasmodium vivax

    • Authors: Singh A; Kaushik R, Chaurasia D, et al.
      Abstract: AbstractDespite Plasmodium vivax being the main offender in the majority of malarial infections, very little information is available about its adaptation and development in humans. Its capability for activating relapsing infections through its dormant liver stage and resistance to antimalarial drugs makes it as one of the major challenges in eradicating malaria. Noting the immediate necessity for the availability of a comprehensive and reliable structural and functional repository for P. vivax proteome, here we developed a web resource for the new reference genome, PvP01, furnishing information on sequence, structure, functions, active sites and metabolic pathways compiled and predicted using some of the state-of-the-art methods in respective fields. The PvP01 web resource comprises organized data on the soluble proteome consisting of 3664 proteins in blood and liver stages of malarial cycle. The current public resources represent only 163 proteins of soluble proteome of PvP01, with complete information about their molecular function, biological process and cellular components. Also, only 46 proteins of P. vivax have experimentally determined structures. In this milieu of extreme scarcity of structural and functional information, PvP01 web resource offers meticulously validated structures of 3664 soluble proteins. The sequence and structure-based functional characterization led to a quantum leap from 163 proteins available presently to whole soluble proteome offered through PvP01 web resource. We believe PvP01 web resource will serve the researchers in identifying novel protein drug targets and in accelerating the development of structure-based new drug candidates to combat malaria.Database Availability: http://www.scfbio-iitd.res.in/PvP01
      PubDate: Tue, 16 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa036
      Issue No: Vol. 2020 (2020)
       
  • Curation of cancer hallmark-based genes and pathways for in silico
           characterization of chemical carcinogenesis

    • Authors: Liang P; Wang C, Cheng H, et al.
      Abstract: AbstractExposure to toxic substances in the environment is one of the most important causes of cancer. However, the time-consuming process for the identification and characterization of carcinogens is not applicable to a huge amount of testing chemicals. The data gaps make the carcinogenic risk uncontrollable. An efficient and effective way of prioritizing chemicals of carcinogenic concern with interpretable mechanism information is highly desirable. This study presents a curation work for genes and pathways associated with 11 hallmarks of cancer (HOCs) reported by the Halifax Project. To demonstrate the usefulness of the curated HOC data, the interacting HOC genes and affected HOC pathways of chemicals of the three carcinogen lists from IARC, NTP and EPA were analyzed using the in silico toxicogenomics ChemDIS system. Results showed that a higher number of affected HOCs were observed for known carcinogens than the other chemicals. The curated HOC data is expected to be useful for prioritizing chemicals of carcinogenic concern.Database URL: The HOC database is available at https://github.com/hocdb-KMU-TMU/hocdb and the website of Database journal as Supplementary Data.
      PubDate: Mon, 15 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa045
      Issue No: Vol. 2020 (2020)
       
  • ForageGrassBase: molecular resource for the forage grass meadow fescue
           (Festuca pratensis Huds.)

    • Authors: Samy J; Rognli O, Kovi M.
      Abstract: AbstractMeadow fescue (Festuca pratensis Huds.) is one of the most important forage grasses in temperate regions. It is a diploid (2n = 14) outbreeding species that belongs to the genus Festuca. Together with Lolium perenne, they are the most important genera of forage grasses. Meadow fescue has very high quality of yield with good winter survival and persistency. However, extensive genomic resources for meadow fescue have not become available so far. To address this lack of comprehensive publicly available datasets, we have developed functionally annotated draft genome sequences of two meadow fescue genotypes, ‘HF7/2’ and ‘B14/16’, and constructed the platform ForageGrassBase, available at http://foragegrass.org/, for data visualization, download and querying. This is the first open-access platform that provides extensive genomic resources related to this forage grass species. The current database provides the most up-to-date draft genome sequence along with structural and functional annotations for genes that can be accessed using Genome Browser (GBrowse), along with comparative genomic alignments to Arabidopsis, L. perenne, barley, rice, Brachypodium and maize genomes. We have integrated homologous search tool BLAST also for the users to analyze their data. Combined, GBrowse, BLAST and downloadable data gives a user-friendly access to meadow fescue genomic resources. To our knowledge, ForageGrassBase is the first genome database dedicated to forage grasses. The current forage grass database provides valuable resources for a range of research fields related to meadow fescue and other forage crop species, as well as for plant research communities in general. The genome database can be accessed at http://foragegrass.org.
      PubDate: Mon, 15 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa046
      Issue No: Vol. 2020 (2020)
       
  • Identifying main finding sentences in clinical case reports

    • Authors: Luo M; Cohen A, Addepalli S, et al.
      Abstract: Clinical case reports are the ‘eyewitness reports’ of medicine and provide a valuable, unique, albeit noisy and underutilized type of evidence. Generally, a case report has a single main finding that represents the reason for writing up the report in the first place. However, no one has previously created an automatic way of identifying main finding sentences in case reports. We previously created a manual corpus of main finding sentences extracted from the abstracts and full text of clinical case reports. Here, we have utilized the corpus to create a machine learning-based model that automatically predicts which sentence(s) from abstracts state the main finding. The model has been evaluated on a separate manual corpus of clinical case reports and found to have good performance. This is a step toward setting up a retrieval system in which, given one case report, one can find other case reports that report the same or very similar main findings. The code and necessary files to run the main finding model can be downloaded from https://github.com/qi29/main_ finding_recognition, released under the Apache License, Version 2.0.
      PubDate: Thu, 11 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa041
      Issue No: Vol. 2020 (2020)
       
  • Effective biomedical document classification for identifying publications
           relevant to the mouse Gene Expression Database (GXD)

    • Authors: Jiang X; Ringwald M, Blake J, et al.
      Abstract: The authors thank Dr. Kathleen F. McCoy for pointing out an error in the formulae used for calculating the Utility measures in the original version of the publication “Effective biomedical document classification for identifying publications relevant to the mouse Gene Expression Database (GXD)” by Xiangying Jiang, Martin Ringwald, Judith Blake and Hagit Shatkay (Database, Volume 2017, bax017). As a consequence, two of the tables and two of the figures are also corrected.
      PubDate: Thu, 11 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa043
      Issue No: Vol. 2020 (2020)
       
  • GreenCircRNA: a database for plant circRNAs that act as miRNA decoys

    • Authors: Zhang J; Hao Z, Yin S, et al.
      Abstract: AbstractCircular RNAs (circRNAs) are endogenous non-coding RNAs that form a covalently closed continuous loop, are widely distributed and play important roles in a series of developmental processes. In plants, an increasing number of studies have found that circRNAs can regulate plant metabolism and are involved in plant responses to biotic or abiotic stress. Acting as miRNA decoys is a critical way for circRNAs to perform their functions. Therefore, we developed GreenCircRNA—a database for plant circRNAs acting as miRNA decoys that is dedicated to providing a plant-based platform for detailed exploration of plant circRNAs and their potential decoy functions. This database includes over 210 000 circRNAs from 69 species of plants; the main data sources of circRNAs in this database are NCBI, EMBL-EBI and Phytozome. To investigate the function of circRNAs as competitive endogenous RNAs, the possibility of circRNAs from 38 plants to act as miRNA decoys was predicted. Moreover, we provide basic information for the circRNAs in the database, including their locations, host genes and relative expression levels, as well as full-length sequences, host gene GO (Gene Ontology) numbers and circRNA visualization. GreenCircRNA is the first database for the prediction of circRNAs that act as miRNA decoys and contains the largest number of plant species.Database URL: http://greencirc.cn
      PubDate: Mon, 08 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa039
      Issue No: Vol. 2020 (2020)
       
  • The articles.ELM resource: simplifying access to protein linear motif
           literature by annotation, text-mining and classification

    • Authors: Palopoli N; Iserte J, Chemes L, et al.
      Abstract: AbstractModern biology produces data at a staggering rate. Yet, much of these biological data is still isolated in the text, figures, tables and supplementary materials of articles. As a result, biological information created at great expense is significantly underutilised. The protein motif biology field does not have sufficient resources to curate the corpus of motif-related literature and, to date, only a fraction of the available articles have been curated. In this study, we develop a set of tools and a web resource, ‘articles.ELM’, to rapidly identify the motif literature articles pertinent to a researcher’s interest. At the core of the resource is a manually curated set of about 8000 motif-related articles. These articles are automatically annotated with a range of relevant biological data allowing in-depth search functionality. Machine-learning article classification is used to group articles based on their similarity to manually curated motif classes in the Eukaryotic Linear Motif resource. Articles can also be manually classified within the resource. The ‘articles.ELM’ resource permits the rapid and accurate discovery of relevant motif articles thereby improving the visibility of motif literature and simplifying the recovery of valuable biological insights sequestered within scientific articles. Consequently, this web resource removes a critical bottleneck in scientific productivity for the motif biology field. Database URL: http://slim.icr.ac.uk/articles/
      PubDate: Mon, 08 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa040
      Issue No: Vol. 2020 (2020)
       
  • Obtaining extremely large and accurate protein multiple sequence
           alignments from curated hierarchical alignments

    • Authors: Neuwald A; Lanczycki C, Hodges T, et al.
      Abstract: AbstractFor optimal performance, machine learning methods for protein sequence/structural analysis typically require as input a large multiple sequence alignment (MSA), which is often created using query-based iterative programs, such as PSI-BLAST or JackHMMER. However, because these programs align database sequences using a query sequence as a template, they may fail to detect or may tend to misalign sequences distantly related to the query. More generally, automated MSA programs often fail to align sequences correctly due to the unpredictable nature of protein evolution. Addressing this problem typically requires manual curation in the light of structural data. However, curated MSAs tend to contain too few sequences to serve as input for statistically based methods. We address these shortcomings by making publicly available a set of 252 curated hierarchical MSAs (hiMSAs), containing a total of 26 212 066 sequences, along with programs for generating from these extremely large MSAs. Each hiMSA consists of a set of hierarchically arranged MSAs representing individual subgroups within a superfamily along with template MSAs specifying how to align each subgroup MSA against MSAs higher up the hierarchy. Central to this approach is the MAPGAPS search program, which uses a hiMSA as a query to align (potentially vast numbers of) matching database sequences with accuracy comparable to that of the curated hiMSA. We illustrate this process for the exonuclease–endonuclease–phosphatase superfamily and for pleckstrin homology domains. A set of extremely large MSAs generated from the hiMSAs in this way is available as input for deep learning, big data analyses. MAPGAPS, auxiliary programs CDD2MGS, AddPhylum, PurgeMSA and ConvertMSA and links to National Center for Biotechnology Information data files are available at https://www.igs.umaryland.edu/labs/neuwald/software/mapgaps/.
      PubDate: Fri, 05 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa042
      Issue No: Vol. 2020 (2020)
       
  • RGPDB: database of root-associated genes and promoters in maize, soybean,
           and sorghum

    • Authors: Moisseyev G; Park K, Cui A, et al.
      Abstract: AbstractRoot-associated genes play an important role in plants. Despite the fact that there have been studies on root biology, information on genes that are specifically expressed or upregulated in roots is poorly collected. There exist very few databases dedicated to genes and promoters associated with root biology, preventing effective root-related studies. Therefore, we analyzed multiple types of omics data to identify root-associated genes in maize, soybean, and sorghum and constructed a comprehensive online database of these genes and their promoter sequences. This database creates a pivotal platform capable of stimulating and facilitating further studies on manipulating root growth and development.
      PubDate: Fri, 05 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa038
      Issue No: Vol. 2020 (2020)
       
  • MIPPIE: the mouse integrated protein–protein interaction reference

    • Authors: Alanis-Lobato G; Möllmann J, Schaefer M, et al.
      Abstract: AbstractCells operate and react to environmental signals thanks to a complex network of protein–protein interactions (PPIs), the malfunction of which can severely disrupt cellular homeostasis. As a result, mapping and analyzing protein networks are key to advancing our understanding of biological processes and diseases. An invaluable part of these endeavors has been the house mouse (Mus musculus), the mammalian model organism par excellence, which has provided insights into human biology and disorders. The importance of investigating PPI networks in the context of mouse prompted us to develop the Mouse Integrated Protein–Protein Interaction rEference (MIPPIE). MIPPIE inherits a robust infrastructure from HIPPIE, its sister database of human PPIs, allowing for the assembly of reliable networks supported by different evidence sources and high-quality experimental techniques. MIPPIE networks can be further refined with tissue, directionality and effect information through a user-friendly web interface. Moreover, all MIPPIE data and meta-data can be accessed via a REST web service or downloaded as text files, thus facilitating the integration of mouse PPIs into follow-up bioinformatics pipelines.
      PubDate: Thu, 04 Jun 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa035
      Issue No: Vol. 2020 (2020)
       
  • A strategy for large-scale comparison of evolutionary- and reaction-based
           classifications of enzyme function

    • Authors: Holliday G; Brown S, Mischel D, et al.
      Abstract: AbstractDetermining the molecular function of enzymes discovered by genome sequencing represents a primary foundation for understanding many aspects of biology. Historically, classification of enzyme reactions has used the enzyme nomenclature system developed to describe the overall reactions performed by biochemically characterized enzymes, irrespective of their associated sequences. In contrast, functional classification and assignment for the millions of protein sequences of unknown function now available is largely done in two computational steps, first by similarity-based assignment of newly obtained sequences to homologous groups, followed by transferring to them the known functions of similar biochemically characterized homologs. Due to the fundamental differences in their etiologies and practice, `how’ these chemistry- and evolution-centric functional classification systems relate to each other has been difficult to explore on a large scale. To investigate this issue in a new way, we integrated two published ontologies that had previously described each of these classification systems independently. The resulting infrastructure was then used to compare the functional assignments obtained from each classification system for the well-studied and functionally diverse enolase superfamily. Mapping these function assignments to protein structure and reaction similarity networks shows a profound and complex disconnect between the homology- and chemistry-based classification systems. This conclusion mirrors previous observations suggesting that except for closely related sequences, facile annotation transfer from small numbers of characterized enzymes to the huge number uncharacterized homologs to which they are related is problematic. Our extension of these comparisons to large enzyme superfamilies in a computationally intelligent manner provides a foundation for new directions in protein function prediction for the huge proportion of sequences of unknown function represented in major databases. Interactive sequence, reaction, substrate and product similarity networks computed for this work for the enolase and two other superfamilies are freely available for download from the Structure Function Linkage Database Archive (http://sfld.rbvi.ucsf.edu).
      PubDate: Mon, 25 May 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa034
      Issue No: Vol. 2020 (2020)
       
  • Revenant: a database of resurrected proteins

    • Authors: Carletti M; Monzon A, Garcia-Rios E, et al.
      Abstract: AbstractRevenant is a database of resurrected proteins coming from extinct organisms. Currently, it contains a manually curated collection of 84 resurrected proteins derived from bibliographic data. Each protein is extensively annotated, including structural, biochemical and biophysical information. Revenant contains a browse capability designed as a timeline from where the different proteins can be accessed. The oldest Revenant entries are between 4200 and 3500 million years ago, while the younger entries are between 8.8 and 6.3 million years ago. These proteins have been resurrected using computational tools called ancestral sequence reconstruction techniques combined with wet-laboratory synthesis and expression. Resurrected proteins are commonly used, with a noticeable increase during the past years, to explore and test different evolutionary hypotheses such as protein stability, to explore the origin of new functions, to get biochemical insights into past metabolisms and to explore specificity and promiscuous behaviour of ancient proteins.
      PubDate: Wed, 13 May 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa031
      Issue No: Vol. 2020 (2020)
       
  • RESTful API for iPTMnet: a resource for protein post-translational
           modification network discovery

    • Authors: Gavali S; Cowart J, Chen C, et al.
      Abstract: AbstractiPTMnet is a bioinformatics resource that integrates protein post-translational modification (PTM) data from text mining and curated databases and ontologies to aid in knowledge discovery and scientific study. The current iPTMnet website can be used for querying and browsing rich PTM information but does not support automated iPTMnet data integration with other tools. Hence, we have developed a RESTful API utilizing the latest developments in cloud technologies to facilitate the integration of iPTMnet into existing tools and pipelines. We have packaged iPTMnet API software in Docker containers and published it on DockerHub for easy redistribution. We have also developed Python and R packages that allow users to integrate iPTMnet for scientific discovery, as demonstrated in a use case that connects PTM sites to kinase signaling pathways.
      PubDate: Tue, 12 May 2020 00:00:00 GMT
      DOI: 10.1093/database/baz157
      Issue No: Vol. 2020 (2020)
       
  • The Mnemiopsis Genome Project Portal: integrating new gene expression
           resources and improving data visualization

    • Authors: Moreland R; Nguyen A, Ryan J, et al.
      Abstract: AbstractFollowing the completion of the genome sequencing and gene prediction of Mnemiopsis leidyi, a lobate ctenophore that is native to the coastal waters of the western Atlantic Ocean, we developed and implemented the Mnemiopsis Genome Project Portal (MGP Portal), a comprehensive Web-based data portal for navigating the genome sequence and gene annotations. In the years following the first release of the MGP Portal, it has become evident that the inclusion of data from significant published studies on Mnemiopsis has been critical to its adoption as the centralized resource for this emerging model organism. With this most recent update, the Portal has significantly expanded to include in situ images, temporal developmental expression profiles and single-cell expression data. Recent enhancements also include implementations of an updated BLAST interface, new graphical visualization tools and updates to gene pages that integrate all new data types.Database URL: https://research.nhgri.nih.gov/mnemiopsis/
      PubDate: Sat, 09 May 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa029
      Issue No: Vol. 2020 (2020)
       
  • HotSpotAnnotations—a database for hotspot mutations and annotations
           in cancer

    • Authors: Trevino V.
      Abstract: AbstractHotspots, recurrently mutated DNA positions in cancer, are thought to be oncogenic drivers because random chance is unlikely and the knowledge of clear examples of oncogenic hotspots in genes like BRAF, IDH1, KRAS and NRAS among many other genes. Hotspots are attractive because provide opportunities for biomedical research and novel treatments. Nevertheless, recent evidence, such as DNA hairpins for APOBEC3A, suggests that a considerable fraction of hotspots seem to be passengers rather than drivers. To document hotspots, the database HotSpotsAnnotations is proposed. For this, a statistical model was implemented to detect putative hotspots, which was applied to TCGA cancer datasets covering 33 cancer types, 10 182 patients and 3 175 929 mutations. Then, genes and hotspots were annotated by two published methods (APOBEC3A hairpins and dN/dS ratio) that may inform and warn researchers about possible false functional hotspots. Moreover, manual annotation from users can be added and shared. From the 23 198 detected as possible hotspots, 4435 were selected after false discovery rate correction and minimum mutation count. From these, 305 were annotated as likely for APOBEC3A whereas 442 were annotated as unlikely. To date, this is the first database dedicated to annotating hotspots for possible false functional hotspots.
      PubDate: Fri, 08 May 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa025
      Issue No: Vol. 2020 (2020)
       
  • Why data citation isn't working, and what to do about it

    • Authors: Buneman P; Christie G, Davies J, et al.
      Abstract: AbstractWe describe a system that automatically generates from a curated database a collection of short conventional publications—citation summaries—that describe the contents of various components of the database. The purpose of these summaries is to ensure that the contributors to the database receive appropriate credit through the currently used measures such as h-indexes. Moreover, these summaries also serve to give credit to publications and people that are cited by the database. In doing this, we need to deal with granularity—how many summaries should be generated to represent effectively the contributions to a database' We also need to deal with evolution—for how long can a given summary serve as an appropriate reference when the database is evolving' We describe a journal specifically tailored to contain these citation summaries. We also briefly discuss the limitations that the current mechanisms for recording citations place on both the process and value of data citation.
      PubDate: Mon, 04 May 2020 00:00:00 GMT
      DOI: 10.1093/databa/baaa022
      Issue No: Vol. 2020 (2020)
       
  • MACSNVdb: a high-quality SNV database for interspecies genetic divergence
           investigation among macaques

    • Authors: Du L; Guo T, Liu Q, et al.
      Abstract: AbstractMacaques are the most widely used non-human primates in biomedical research. The genetic divergence between these animal models is responsible for their phenotypic differences in response to certain diseases. However, the macaque single nucleotide polymorphism resources mainly focused on rhesus macaque (Macaca mulatta), which hinders the broad research and biomedical application of other macaques. In order to overcome these limitations, we constructed a database named MACSNVdb that focuses on the interspecies genetic diversity among macaque genomes. MACSNVdb is a web-enabled database comprising ~74.51 million high-quality non-redundant single nucleotide variants (SNVs) identified among 20 macaque individuals from six species groups (muttla, fascicularis, sinica, arctoides, silenus, sylvanus). In addition to individual SNVs, MACSNVdb also allows users to browse and retrieve groups of user-defined SNVs. In particular, users can retrieve non-synonymous SNVs that may have deleterious effects on protein structure or function within macaque orthologs of human disease and drug-target genes. Besides position, alleles and flanking sequences, MACSNVdb integrated additional genomic information including SNV annotations and gene functional annotations. MACSNVdb will facilitate biomedical researchers to discover molecular mechanisms of diverse responses to diseases as well as primatologist to perform population genetic studies. We will continue updating MACSNVdb with newly available sequencing data and annotation to keep the resource up to date.Database URL: http://big.cdu.edu.cn/macsnvdb/
      PubDate: Mon, 04 May 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa027
      Issue No: Vol. 2020 (2020)
       
  • UPCLASS: a deep learning-based classifier for UniProtKB entry publications

    • Authors: Teodoro D; Knafou J, Naderi N, et al.
      Abstract: AbstractIn the UniProt Knowledgebase (UniProtKB), publications providing evidence for a specific protein annotation entry are organized across different categories, such as function, interaction and expression, based on the type of data they contain. To provide a systematic way of categorizing computationally mapped bibliographies in UniProt, we investigate a convolutional neural network (CNN) model to classify publications with accession annotations according to UniProtKB categories. The main challenge of categorizing publications at the accession annotation level is that the same publication can be annotated with multiple proteins and thus be associated with different category sets according to the evidence provided for the protein. We propose a model that divides the document into parts containing and not containing evidence for the protein annotation. Then, we use these parts to create different feature sets for each accession and feed them to separate layers of the network. The CNN model achieved a micro F1-score of 0.72 and a macro F1-score of 0.62, outperforming baseline models based on logistic regression and support vector machine by up to 22 and 18 percentage points, respectively. We believe that such an approach could be used to systematically categorize the computationally mapped bibliography in UniProtKB, which represents a significant set of the publications, and help curators to decide whether a publication is relevant for further curation for a protein accession.Database URL:https://goldorak.hesge.ch/bioexpclass/upclass/.
      PubDate: Mon, 04 May 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa026
      Issue No: Vol. 2020 (2020)
       
  • Community curation in PomBase: enabling fission yeast experts to provide
           detailed, standardized, sharable annotation from research publications

    • Authors: Lock A; Harris M, Rutherford K, et al.
      Abstract: AbstractMaximizing the impact and value of scientific research requires efficient knowledge distribution, which increasingly depends on the integration of standardized published data into online databases. To make data integration more comprehensive and efficient for fission yeast research, PomBase has pioneered a community curation effort that engages publication authors directly in FAIR-sharing of data representing detailed biological knowledge from hypothesis-driven experiments. Canto, an intuitive online curation tool that enables biologists to describe their detailed functional data using shared ontologies, forms the core of PomBase’s system. With 8 years’ experience, and as the author response rate reaches 50%, we review community curation progress and the insights we have gained from the project. We highlight incentives and nudges we deploy to maximize participation, and summarize project outcomes, which include increased knowledge integration and dissemination as well as the unanticipated added value arising from co-curation by publication authors and professional curators.
      PubDate: Thu, 30 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa028
      Issue No: Vol. 2020 (2020)
       
  • KRGDB: the large-scale variant database of 1722 Koreans based on whole
           genome sequencing

    • Authors: Jung K; Hong K, Jo H, et al.
      Abstract: The original version of this article did not include Kyung-Won Hong's affiliation with Theragen Etex Bio Institute or Jongpill Choi's affiliation with Thermo Fisher Scientific Solutions, and incorrectly stated that Hyo-Jeong Ban was affiliated with both Theragen Etex and Thermo Fisher. These errors have now been corrected.
      PubDate: Wed, 29 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa030
      Issue No: Vol. 2020 (2020)
       
  • RA-map: building a state-of-the-art interactive knowledge base for
           rheumatoid arthritis

    • Authors: Singh V; Kalliolias G, Ostaszewski M, et al.
      Abstract: AbstractRheumatoid arthritis (RA) is a progressive, inflammatory autoimmune disease of unknown aetiology. The complex mechanism of aetiopathogenesis, progress and chronicity of the disease involves genetic, epigenetic and environmental factors. To understand the molecular mechanisms underlying disease phenotypes, one has to place implicated factors in their functional context. However, integration and organization of such data in a systematic manner remains a challenging task. Molecular maps are widely used in biology to provide a useful and intuitive way of depicting a variety of biological processes and disease mechanisms. Recent large-scale collaborative efforts such as the Disease Maps Project demonstrate the utility of such maps as versatile tools to organize and formalize disease-specific knowledge in a comprehensive way, both human and machine-readable. We present a systematic effort to construct a fully annotated, expert validated, state-of-the-art knowledge base for RA in the form of a molecular map. The RA map illustrates molecular and signalling pathways implicated in the disease. Signal transduction is depicted from receptors to the nucleus using the Systems Biology Graphical Notation (SBGN) standard representation. High-quality manual curation, use of only human-specific studies and focus on small-scale experiments aim to limit false positives in the map. The state-of-the-art molecular map for RA, using information from 353 peer-reviewed scientific publications, comprises 506 species, 446 reactions and 8 phenotypes. The species in the map are classified to 303 proteins, 61 complexes, 106 genes, 106 RNA entities, 2 ions and 7 simple molecules. The RA map is available online at ramap.elixir-luxembourg.org as an open-access knowledge base allowing for easy navigation and search of molecular pathways implicated in the disease. Furthermore, the RA map can serve as a template for omics data visualization.
      PubDate: Mon, 20 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa017
      Issue No: Vol. 2020 (2020)
       
  • ASFVdb: an integrative resource for genomic and proteomic analyses of
           African swine fever virus

    • Authors: Zhu Z; Meng G.
      Abstract: AbstractThe recent outbreaks of African swine fever (ASF) in China and Europe have threatened the swine industry globally. To control the transmission of ASF virus (ASFV), we developed the African swine fever virus database (ASFVdb), an online data visualization and analysis platform for comparative genomics and proteomics. On the basis of known ASFV genes, ASFVdb reannotates the genomes of every strain and newly annotates 5352 possible open reading frames (ORFs) of 45 strains. Moreover, ASFVdb performs a thorough analysis of the population genetics of all the published genomes of ASFV strains and performs functional and structural predictions for all genes. Users can obtain not only basic information for each gene but also its distribution in strains and conserved or high mutation regions, possible subcellular location and topology. In the genome browser, ASFVdb provides a sliding window for results of population genetic analysis, which facilitates genetic and evolutionary analyses at the genomic level. The web interface was constructed based on SWAV 1.0. ASFVdb is freely accessible at http://asfvdb.popgenetics.net.
      PubDate: Wed, 15 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa023
      Issue No: Vol. 2020 (2020)
       
  • Integrated querying and version control of context-specific biological
           networks

    • Authors: Cowman T; Coşkun M, Grama A, et al.
      Abstract: AbstractMotivationBiomolecular data stored in public databases is increasingly specialized to organisms, context/pathology and tissue type, potentially resulting in significant overhead for analyses. These networks are often specializations of generic interaction sets, presenting opportunities for reducing storage and computational cost. Therefore, it is desirable to develop effective compression and storage techniques, along with efficient algorithms and a flexible query interface capable of operating on compressed data structures. Current graph databases offer varying levels of support for network integration. However, these solutions do not provide efficient methods for the storage and querying of versioned networks.ResultsWe present VerTIoN, a framework consisting of novel data structures and associated query mechanisms for integrated querying of versioned context-specific biological networks. As a use case for our framework, we study network proximity queries in which the user can select and compose a combination of tissue-specific and generic networks. Using our compressed version tree data structure, in conjunction with state-of-the-art numerical techniques, we demonstrate real-time querying of large network databases.ConclusionOur results show that it is possible to support flexible queries defined on heterogeneous networks composed at query time while drastically reducing response time for multiple simultaneous queries. The flexibility offered by VerTIoN in composing integrated network versions opens significant new avenues for the utilization of ever increasing volume of context-specific network data in a broad range of biomedical applications.Availability and ImplementationVerTIoN is implemented as a C++ library and is available at http://compbio.case.edu/omics/software/vertion and https://github.com/tjcowman/vertionContacttyler.cowman@case.edu
      PubDate: Wed, 15 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa018
      Issue No: Vol. 2020 (2020)
       
  • ctcRbase: the gene expression database of circulating tumor cells and
           microemboli

    • Authors: Zhao L; Wu X, Li T, et al.
      Abstract: AbstractCirculating tumor cells/microemboli (CTCs/CTMs) are malignant cells that depart from cancerous lesions and shed into the bloodstream. Analysis of CTCs can allow the investigation of tumor cell biomarker expression from a non-invasive liquid biopsy. To date, high-throughput technologies have become a powerful tool to provide a genome-wide view of transcriptomic changes associated with CTCs/CTMs. These data provided us much information to understand the tumor heterogeneity, and the underlying molecular mechanism of tumor metastases. Unfortunately, these data have been deposited into various repositories, and a uniform resource for the cancer metastasis is still unavailable. To this end, we integrated previously published transcriptome datasets of CTCs/CTMs and constructed a web-accessible database. The first release of ctcRbase contains 526 CTCs/CTM samples across seven cancer types. The expression of 14 631 mRNAs and 3642 long non-coding RNAs of CTCs/CTMs were included. Experimental validations from the published literature are also included. Since CTCs/CTMs are considered to be precursors of metastases, ctcRbase also collected the expression data of primary tumors and metastases, which allows user to discover a unique ‘circulating tumor cell gene signature’ that is distinct from primary tumor and metastases. An easy-to-use database was constructed to query and browse CTCs/CTMs genes. ctcRbase can be freely accessible at http://www.origin-gene.cn/database/ctcRbase/.
      PubDate: Wed, 15 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa020
      Issue No: Vol. 2020 (2020)
       
  • Integrating image caption information into biomedical document
           classification in support of biocuration

    • Authors: Jiang X; Li P, Kadin J, et al.
      Abstract: Gathering information from the scientific literature is essential for biomedical research, as much knowledge is conveyed through publications. However, the large and rapidly increasing publication rate makes it impractical for researchers to quickly identify all and only those documents related to their interest. As such, automated biomedical document classification attracts much interest. Such classification is critical in the curation of biological databases, because biocurators must scan through a vast number of articles to identify pertinent information within documents most relevant to the database. This is a slow, labor-intensive process that can benefit from effective automation.We present a document classification scheme aiming to identify papers containing information relevant to a specific topic, among a large collection of articles, for supporting the biocuration classification task. Our framework is based on a meta-classification scheme we have introduced before; here we incorporate into it features gathered from figure captions, in addition to those obtained from titles and abstracts. We trained and tested our classifier over a large imbalanced dataset, originally curated by the Gene Expression Database (GXD). GXD collects all the gene expression information in the Mouse Genome Informatics (MGI) resource. As part of the MGI literature classification pipeline, GXD curators identify MGI-selected papers that are relevant for GXD. The dataset consists of ~60 000 documents (5469 labeled as relevant; 52 866 as irrelevant), gathered throughout 2012–2016, in which each document is represented by the text of its title, abstract and figure captions. Our classifier attains precision 0.698, recall 0.784, f-measure 0.738 and Matthews correlation coefficient 0.711, demonstrating that the proposed framework effectively addresses the high imbalance in the GXD classification task. Moreover, our classifier’s performance is significantly improved by utilizing information from image captions compared to using titles and abstracts alone; this observation clearly demonstrates that image captions provide substantial information for supporting biomedical document classification and curation.Database URL:
      PubDate: Wed, 15 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa024
      Issue No: Vol. 2020 (2020)
       
  • Prot2HG: a database of protein domains mapped to the human genome

    • Authors: Stanek D; Bis-Brewer D, Saghira C, et al.
      Abstract: AbstractGenetic variation occurring within conserved functional protein domains warrants special attention when examining DNA variation in the context of disease causation. Here we introduce a resource, freely available at www.prot2hg.com, that addresses the question of whether a particular variant falls onto an annotated protein domain and directly translates chromosomal coordinates onto protein residues. The tool can perform a multiple-site query in a simple way, and the whole dataset is available for download as well as incorporated into our own accessible pipeline. To create this resource, National Center for Biotechnology Information protein data were retrieved using the Entrez Programming Utilities. After processing all human protein domains, residue positions were reverse translated and mapped to the reference genome hg19 and stored in a MySQL database. In total, 760 487 protein domains from 42 371 protein models were mapped to hg19 coordinates and made publicly available for search or download (www.prot2hg.com). In addition, this annotation was implemented into the genomics research platform GENESIS in order to query nearly 8000 exomes and genomes of families with rare Mendelian disorders (tgp-foundation.org). When applied to patient genetic data, we found that rare (<1%) variants in the Genome Aggregation Database were significantly more annotated onto a protein domain in comparison to common (>1%) variants. Similarly, variants described as pathogenic or likely pathogenic in ClinVar were more likely to be annotated onto a domain. In addition, we tested a dataset consisting of 60 causal variants in a cohort of patients with epileptic encephalopathy and found that 71% of them (43 variants) were propagated onto protein domains. In summary, we developed a resource that annotates variants in the coding part of the genome onto conserved protein domains in order to increase variant prioritization efficiency.Database URL:www.prot2hg.com
      PubDate: Wed, 15 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baz161
      Issue No: Vol. 2020 (2020)
       
  • A negative storage model for precise but compact storage of genetic
           variation data

    • Authors: Gonzalez-Calderon G; Liu R, Carvajal R, et al.
      Abstract: AbstractFalling sequencing costs and large initiatives are resulting in increasing amounts of data available for investigator use. However, there are informatics challenges in being able to access genomic data. Performance and storage are well-appreciated issues, but precision is critical for meaningful analysis and interpretation of genomic data. There is an inherent accuracy vs. performance trade-off with existing solutions. The most common approach (Variant-only Storage Model, VOSM) stores only variant data. Systems must therefore assume that everything not variant is reference, sacrificing precision and potentially accuracy. A more complete model (Full Storage Model, FSM) would store the state of every base (variant, reference and missing) in the genome thereby sacrificing performance. A compressed variation of the FSM can store the state of contiguous regions of the genome as blocks (Block Storage Model, BLSM), much like the file-based gVCF model. We propose a novel approach by which this state is encoded such that both performance and accuracy are maintained. The Negative Storage Model (NSM) can store and retrieve precise genomic state from different sequencing sources, including clinical and whole exome sequencing panels. Reduced storage requirements are achieved by storing only the variant and missing states and inferring the reference state. We evaluate the performance characteristics of FSM, BLSM and NSM and demonstrate dramatic improvements in storage and performance using the NSM approach.
      PubDate: Wed, 15 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baz158
      Issue No: Vol. 2020 (2020)
       
  • Strengthening of banana breeding through data digitalization

    • Authors: Vignesh Kumar B; Backiyarani S, Chandrasekar A, et al.
      Abstract: AbstractImprovement of edible bananas (a triploid and sterile crop) through conventional breeding is a challenging task owing to its recalcitrant nature for seed set, prolonged crop duration. In addition, the need of huge man power at different stages of progeny development and evaluation often leads to mislabeling, poor data management and loss of vital data. All this can be overcome by the application of advanced information technology source. This ensured secure and efficient data management such as storage, retrieval and data analysis and further could assist in tracking the breeding status in real time. Thus, a user-friendly web-based banana breeding tracker (BBT) has been developed using MySQL database with Hypertext Preprocessor (PHP). This BBT works on all operating systems with access to multiple users from anywhere at any time. Quick responsive (QR) code labels can be generated by the tracker, which can be decoded using QR scanner. Also for each and every updated progress in breeding stages, a new QR code can be generated, which in turn reduce labeling errors. Moreover, the tracker has additional tools to search, sort and filter the data from the data sets for efficient retrieval and analysis. This tracker is being upgraded with phenotypic and genotypic data that will be made available in the public domain for hastening the banana improvement program.
      PubDate: Sat, 11 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baz145
      Issue No: Vol. 2020 (2020)
       
  • A structured model for immune exposures

    • Authors: Vita R; Overton J, Dunn P, et al.
      Abstract: AbstractAn Immune Exposure is the process by which components of the immune system first encounter a potential trigger. The ability to describe consistently the details of the Immune Exposure process was needed for data resources responsible for housing scientific data related to the immune response. This need was met through the development of a structured model for Immune Exposures. This model was created during curation of the immunology literature, resulting in a robust model capable of meeting the requirements of such data. We present this model with the hope that overlapping projects will adopt and or contribute to this work.
      PubDate: Sat, 11 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa016
      Issue No: Vol. 2020 (2020)
       
  • An open-source GIS-enabled lookup service for Nagoya Protocol party
           information

    • Authors: Seifert H; Weber M, Glöckner F, et al.
      Abstract: AbstractThe Nagoya Protocol on Access and Benefit Sharing is a transparent legal framework, which governs the access to genetic resources and the fair and equitable sharing of benefits arising from their utilization. Complying with the Nagoya regulations ensures legal use and re-use of data from genetic resources. Providing detailed provenance information and clear re-usage conditions plays a key role in ensuring the re-usability of research data according to the FAIR (findable, accessible, interoperable and re-usable) Guiding Principles for scientific data management and stewardship. Even with the framework provided by the ABS (access and benefit sharing) Clearing House and the support of the National Focal Points, establishing a direct link between the research data from genetic resources and the relevant Nagoya information remains a challenge. This is particularly true for re-using publicly available data. The Nagoya Lookup Service was developed for stakeholders in biological sciences with the aim at facilitating the legal and FAIR data management, specifically for data publication and re-use. The service provides up-to-date information on the Nagoya party status for a geolocation provided by GPS coordinates, directing the user to the relevant local authorities for further information. It integrates open data from the ABS Clearing House, Marine Regions, GeoNames and Wikidata. The service is accessible through a REST API and a user-friendly web form. Stakeholders include data librarians, data brokers, scientists and data archivists who may use this service before, during and after data acquisition or publication to check whether legal documents need to be prepared, considered or verified. The service allows researchers to estimate whether genetic data they plan to produce or re-use might fall under Nagoya regulations or not, within the limits of the technology and without constituting legal advice. It is implemented using portable Docker containers and can easily be deployed locally or on a cloud infrastructure. The source code for building the service is available under an open-source license on GitHub, with a functional image on Docker Hub and can be used by anyone free of charge.
      PubDate: Sat, 11 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa014
      Issue No: Vol. 2020 (2020)
       
  • Structured reviews for data and knowledge-driven research

    • Authors: Queralt-Rosinach N; Stupp G, Li T, et al.
      Abstract: AbstractHypothesis generation is a critical step in research and a cornerstone in the rare disease field. Research is most efficient when those hypotheses are based on the entirety of knowledge known to date. Systematic review articles are commonly used in biomedicine to summarize existing knowledge and contextualize experimental data. But the information contained within review articles is typically only expressed as free-text, which is difficult to use computationally. Researchers struggle to navigate, collect and remix prior knowledge as it is scattered in several silos without seamless integration and access. This lack of a structured information framework hinders research by both experimental and computational scientists. To better organize knowledge and data, we built a structured review article that is specifically focused on NGLY1 Deficiency, an ultra-rare genetic disease first reported in 2012. We represented this structured review as a knowledge graph and then stored this knowledge graph in a Neo4j database to simplify dissemination, querying and visualization of the network. Relative to free-text, this structured review better promotes the principles of findability, accessibility, interoperability and reusability (FAIR). In collaboration with domain experts in NGLY1 Deficiency, we demonstrate how this resource can improve the efficiency and comprehensiveness of hypothesis generation. We also developed a read–write interface that allows domain experts to contribute FAIR structured knowledge to this community resource. In contrast to traditional free-text review articles, this structured review exists as a living knowledge graph that is curated by humans and accessible to computational analyses. Finally, we have generalized this workflow into modular and repurposable components that can be applied to other domain areas. This NGLY1 Deficiency-focused network is publicly available at http://ngly1graph.org/.Availability and implementationDatabase URL: http://ngly1graph.org/. Network data files are at: https://github.com/SuLab/ngly1-graph and source code at: https://github.com/SuLab/bioknowledge-reviewer.Contactasu@scripps.edu
      PubDate: Sat, 11 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa015
      Issue No: Vol. 2020 (2020)
       
  • Geographic assessment of cancer genome profiling studies

    • Authors: Carrio-Cordo P; Acheson E, Huang Q, et al.
      Abstract: AbstractCancers arise from the accumulation of somatic genome mutations, which can be influenced by inherited genomic variants and external factors such as environmental or lifestyle-related exposure. Due to the heterogeneity of cancers, precise information about the genomic composition of germline and malignant tissues has to be correlated with morphological, clinical and extrinsic features to advance medical knowledge and treatment options. With global differences in cancer frequencies and disease types, geographic data is of importance to understand the interplay between genetic ancestry and environmental influence in cancer incidence, progression and treatment outcome. In this study, we analyzed the current landscape of oncogenomic screening publications for geographic information content and quality, to address underrepresented study populations and thereby to fill prominent gaps in our understanding of interactions between somatic variations, population genetics and environmental factors in oncogenesis. We conclude that while the use of proxy-derived geographic annotations can be useful for coarse-grained associations, the study of geo-correlated factors in cancer causation and progression will benefit from standardized geographic provenance annotations. Additionally, publication-derived geographic provenance data allowed us to highlight stark inequality in the geographies of cancer genome profiling, with a near lack of sizable studies from Africa and other large regions.
      PubDate: Thu, 02 Apr 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa009
      Issue No: Vol. 2020 (2020)
       
  • FerrDb: a manually curated resource for regulators and markers of
           ferroptosis and ferroptosis-disease associations

    • Authors: Zhou N; Bao J.
      Abstract: AbstractFerroptosis is a mode of regulated cell death that depends on iron. Cells die from the toxic accumulation of lipid reactive oxygen species. Ferroptosis is tightly linked to a variety of human diseases, such as cancers and degenerative diseases. The ferroptotic process is complicated and consists of a wide range of metabolites and biomolecules. Although great progress has been achieved, the mechanism of ferroptosis remains enigmatic. We have currently entered an era of extensive knowledge advancement, and thus, it is important to find ways to organize and utilize data efficiently. We have observed a high-quality knowledge base of ferroptosis research is lacking. In this study, we downloaded 784 ferroptosis articles from the PubMed database. Ferroptosis regulators and markers and associated diseases were extracted from these articles and annotated. In summary, 253 regulators (including 108 drivers, 69 suppressors, 35 inducers and 41 inhibitors), 111 markers and 95 ferroptosis-disease associations were found. We then developed FerrDb, the first manually curated database for regulators and markers of ferroptosis and ferroptosis-disease associations. The database has a user-friendly interface, and it will be updated every 6 months to offer long-term service. FerrDb is expected to help researchers acquire insights into ferroptosis.Database URL:http://www.zhounan.org/ferrdb
      PubDate: Fri, 27 Mar 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa021
      Issue No: Vol. 2020 (2020)
       
  • Circad: a comprehensive manually curated resource of circular RNA
           associated with diseases

    • Authors: Rophina M; Sharma D, Poojary M, et al.
      Abstract: AbstractCircular RNAs (circRNAs) are unique transcript isoforms characterized by back splicing of exon ends to form a covalently closed loop or circular conformation. These transcript isoforms are now known to be expressed in a variety of organisms across the kingdoms of life. Recent studies have shown the role of circRNAs in a number of diseases and increasing evidence points to their potential application as biomarkers in these diseases. We have created a comprehensive manually curated database of circular RNAs associated with diseases. This database is available at URL http://clingen.igib.res.in/circad/. The Database lists more than 1300 circRNAs associated with 150 diseases and mapping to 113 International Statistical Classification of Diseases (ICD) codes with evidence of association linked to published literature. The database is unique in many ways. Firstly, it provides ready-to-use primers to work with, in order to use circRNAs as biomarkers or to perform functional studies. It additionally lists the assay and PCR primer details including experimentally validated ones as a ready reference to researchers along with fold change and statistical significance. It also provides standard disease nomenclature as per the ICD codes. To the best of our knowledge, circad is the most comprehensive and updated database of disease associated circular RNAs.Availability: http://clingen.igib.res.in/circad/
      PubDate: Fri, 27 Mar 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa019
      Issue No: Vol. 2020 (2020)
       
  • Text mining meets community curation: a newly designed curation platform
           to improve author experience and participation at WormBase

    • Authors: Arnaboldi V; Raciti D, Van Auken K, et al.
      Abstract: AbstractBiological knowledgebases rely on expert biocuration of the research literature to maintain up-to-date collections of data organized in machine-readable form. To enter information into knowledgebases, curators need to follow three steps: (i) identify papers containing relevant data, a process called triaging; (ii) recognize named entities; and (iii) extract and curate data in accordance with the underlying data models. WormBase (WB), the authoritative repository for research data on Caenorhabditis elegans and other nematodes, uses text mining (TM) to semi-automate its curation pipeline. In addition, WB engages its community, via an Author First Pass (AFP) system, to help recognize entities and classify data types in their recently published papers. In this paper, we present a new WB AFP system that combines TM and AFP into a single application to enhance community curation. The system employs string-searching algorithms and statistical methods (e.g. support vector machines (SVMs)) to extract biological entities and classify data types, and it presents the results to authors in a web form where they validate the extracted information, rather than enter it de novo as the previous form required. With this new system, we lessen the burden for authors, while at the same time receive valuable feedback on the performance of our TM tools. The new user interface also links out to specific structured data submission forms, e.g. for phenotype or expression pattern data, giving the authors the opportunity to contribute a more detailed curation that can be incorporated into WB with minimal curator review. Our approach is generalizable and could be applied to additional knowledgebases that would like to engage their user community in assisting with the curation. In the five months succeeding the launch of the new system, the response rate has been comparable with that of the previous AFP version, but the quality and quantity of the data received has greatly improved.
      PubDate: Tue, 17 Mar 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa006
      Issue No: Vol. 2020 (2020)
       
  • PncStress: a manually curated database of experimentally validated
           stress-responsive non-coding RNAs in plants

    • Authors: Wu W; Wu Y, Hu D, et al.
      Abstract: AbstractNon-coding RNAs (ncRNAs) are recognized as key regulatory molecules in many biological processes. Accumulating evidence indicates that ncRNA-related mechanisms play important roles in plant stress responses. Although abundant plant stress-responsive ncRNAs have been identified, these experimentally validated results have not been gathered into a single public domain archive. Therefore, we established PncStress by curating experimentally validated stress-responsive ncRNAs in plants, including microRNAs, long non-coding RNAs and circular RNAs. The current version of PncStress contains 4227 entries from 114 plants covering 48 biotic and 91 abiotic stresses. For each entry, PncStress has biological information and network visualization. Serving as a manually curated database, PncStress will become a valuable resource in support of plant stress response research.
      PubDate: Tue, 17 Mar 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa001
      Issue No: Vol. 2020 (2020)
       
  • Artificial intelligence with multi-functional machine learning platform
           development for better healthcare and precision medicine

    • Authors: Ahmed Z; Mohamed K, Zeeshan S, et al.
      Abstract: AbstractPrecision medicine is one of the recent and powerful developments in medical care, which has the potential to improve the traditional symptom-driven practice of medicine, allowing earlier interventions using advanced diagnostics and tailoring better and economically personalized treatments. Identifying the best pathway to personalized and population medicine involves the ability to analyze comprehensive patient information together with broader aspects to monitor and distinguish between sick and relatively healthy people, which will lead to a better understanding of biological indicators that can signal shifts in health. While the complexities of disease at the individual level have made it difficult to utilize healthcare information in clinical decision-making, some of the existing constraints have been greatly minimized by technological advancements. To implement effective precision medicine with enhanced ability to positively impact patient outcomes and provide real-time decision support, it is important to harness the power of electronic health records by integrating disparate data sources and discovering patient-specific patterns of disease progression. Useful analytic tools, technologies, databases, and approaches are required to augment networking and interoperability of clinical, laboratory and public health systems, as well as addressing ethical and social issues related to the privacy and protection of healthcare data with effective balance. Developing multifunctional machine learning platforms for clinical data extraction, aggregation, management and analysis can support clinicians by efficiently stratifying subjects to understand specific scenarios and optimize decision-making. Implementation of artificial intelligence in healthcare is a compelling vision that has the potential in leading to the significant improvements for achieving the goals of providing real-time, better personalized and population medicine at lower costs. In this study, we focused on analyzing and discussing various published artificial intelligence and machine learning solutions, approaches and perspectives, aiming to advance academic solutions in paving the way for a new data-centric era of discovery in healthcare.
      PubDate: Tue, 17 Mar 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa010
      Issue No: Vol. 2020 (2020)
       
  • uORFlight: a vehicle toward uORF-mediated translational regulation
           mechanisms in eukaryotes

    • Authors: Niu R; Zhou Y, Zhang Y, et al.
      Abstract: AbstractUpstream open reading frames (uORFs) are prevalent in eukaryotic mRNAs. They act as a translational control element for precisely tuning the expression of the downstream major open reading frame (mORF). uORF variation has been clearly associated with several human diseases. In contrast, natural uORF variants in plants have not ever been identified or linked with any phenotypic changes. The paucity of such evidence encouraged us to generate this database-uORFlight (http://uorflight.whu.edu.cn). It facilitates the exploration of uORF variation among different splicing models of Arabidopsis and rice genes. Most importantly, users can evaluate uORF frequency among different accessions at the population scale and find out the causal single nucleotide polymorphism (SNP) or insertion/deletion (INDEL), which can be associated with phenotypic variation through database mining or simple experiments. Such information will help to make hypothesis of uORF function in plant development or adaption to changing environments on the basis of the cognate mORF function. This database also curates plant uORF relevant literature into distinct groups. To be broadly interesting, our database expands uORF annotation into more species of fungus (Botrytis cinerea and Saccharomyces cerevisiae), plant (Brassica napus, Glycine max, Gossypium raimondii, Medicago truncatula, Solanum lycopersicum, Solanum tuberosum, Triticum aestivum and Zea mays), metazoan (Caenorhabditis elegans and Drosophila melanogaster) and vertebrate (Homo sapiens, Mus musculus and Danio rerio). Therefore, uORFlight will light up the runway toward how uORF genetic variation determines phenotypic diversity and advance our understanding of translational control mechanisms in eukaryotes.
      PubDate: Fri, 13 Mar 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa007
      Issue No: Vol. 2020 (2020)
       
  • MMHub, a database for the mulberry metabolome

    • Authors: Li D; Ma B, Xu X, et al.
      Abstract: AbstractMulberry is an important economic crop plant and traditional medicine. It contains a huge array of bioactive metabolites such as flavonoids, amino acids, alkaloids and vitamins. Consequently, mulberry has received increasing attention in recent years. MMHub (version 1.0) is the first open public repository of mass spectra of small chemical compounds (<1000 Da) in mulberry leaves. The database contains 936 electrospray ionization tandem mass spectrometry (ESI-MS2) data and lists the specific distribution of compounds in 91 mulberry resources with two biological duplicates. ESI-MS2 data were obtained under non-standardized and independent experimental conditions. In total, 124 metabolites were identified or tentatively annotated and details of 90 metabolites with associated chemical structures have been deposited in the database. Supporting information such as PubChem compound information, molecular formula and metabolite classification are also provided in the MS2 spectral tag library. The MMHub provides important and comprehensive metabolome data for scientists working with mulberry. This information will be useful for the screening of quality resources and specific metabolites of mulberry.Database URL: https://biodb.swu.edu.cn/mmdb/
      PubDate: Wed, 11 Mar 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa011
      Issue No: Vol. 2020 (2020)
       
  • TeaMiD: a comprehensive database of simple sequence repeat markers of tea

    • Authors: Dubey H; Rawal H, Rohilla M, et al.
      Abstract: AbstractTea is a highly cross-pollinated, woody, perennial tree. High heterozygosity combined with a long gestational period makes conventional breeding a cumbersome process. Therefore, marker-assisted breeding is a better alternative approach when compared with conventional breeding. Considering the large genome size of tea (~3 Gb), information about simple sequence repeat (SSR) is scanty. Thus, we have taken advantage of the recently published tea genomes to identify large numbers of SSR markers in the tea. Besides the genomic sequences, we identified SSRs from the other publicly available sequences such as RNA-seq, GSS, ESTs and organelle genomes (chloroplasts and mitochondrial) and also searched published literature to catalog validated set of tea SSR markers. The complete exercise yielded a total of 935 547 SSRs. Out of the total, 82 SSRs were selected for validation among a diverse set of tea genotypes. Six primers (each with four to six alleles, an average of five alleles per locus) out of the total 27 polymorphic primers were used for a diversity analysis in 36 tea genotypes with mean polymorphic information content of 0.61–0.76. Finally, using all the information generated in this study, we have developed a user-friendly database (TeaMiD; http://indianteagenome.in:8080/teamid/) that hosts SSR from all the six resources including three nuclear genomes of tea and transcriptome sequences of 17 Camellia wild species.Database URL: http://indianteagenome.in:8080/teamid/
      PubDate: Wed, 11 Mar 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa013
      Issue No: Vol. 2020 (2020)
       
  • CancerLivER: a database of liver cancer gene expression resources and
           biomarkers

    • Authors: Kaur H; Bhalla S, Kaur D, et al.
      Abstract: AbstractLiver cancer is the fourth major lethal malignancy worldwide. To understand the development and progression of liver cancer, biomedical research generated a tremendous amount of transcriptomics and disease-specific biomarker data. However, dispersed information poses pragmatic hurdles to delineate the significant markers for the disease. Hence, a dedicated resource for liver cancer is required that integrates scattered multiple formatted datasets and information regarding disease-specific biomarkers. Liver Cancer Expression Resource (CancerLivER) is a database that maintains gene expression datasets of liver cancer along with the putative biomarkers defined for the same in the literature. It manages 115 datasets that include gene-expression profiles of 9611 samples. Each of incorporated datasets was manually curated to remove any artefact; subsequently, a standard and uniform pipeline according to the specific technique is employed for their processing. Additionally, it contains comprehensive information on 594 liver cancer biomarkers which include mainly 315 gene biomarkers or signatures and 178 protein- and 46 miRNA-based biomarkers. To explore the full potential of data on liver cancer, a web-based interactive platform was developed to perform search, browsing and analyses. Analysis tools were also integrated to explore and visualize the expression patterns of desired genes among different types of samples based on individual gene, GO ontology and pathways. Furthermore, a dataset matrix download facility was provided to facilitate the users for their extensive analysis to elucidate more robust disease-specific signatures. Eventually, CancerLivER is a comprehensive resource which is highly useful for the scientific community working in the field of liver cancer.Availability: CancerLivER can be accessed on the web at https://webs.iiitd.edu.in/raghava/cancerliver.
      PubDate: Sat, 07 Mar 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa012
      Issue No: Vol. 2020 (2020)
       
  • KRGDB: the large-scale variant database of 1722 Koreans based on whole
           genome sequencing

    • Authors: Jung K; Hong K, Jo H, et al.
      Abstract: AbstractSince 2012, the Center for Genome Science of the Korea National Institute of Health (KNIH) has been sequencing complete genomes of 1722 Korean individuals. As a result, more than 32 million variant sites have been identified, and a large proportion of the variant sites have been detected for the first time. In this article, we describe the Korean Reference Genome Database (KRGDB) and its genome browser. The current version of our database contains both single nucleotide and short insertion/deletion variants. The DNA samples were obtained from four different origins and sequenced in different sequencing depths (10× coverage of 63 individuals, 20× coverage of 194 individuals, combined 10× and 20× coverage of 135 individuals, 30× coverage of 230 individuals and 30× coverage of 1100 individuals). The major features of the KRGDB are that it contains information on the Korean genomic variant frequency, frequency difference between the Korean and other populations and the variant functional annotation (such as regulatory elements in ENCODE regions and coding variant functions) of the variant sites. Additionally, we performed the genome-wide association study (GWAS) between Korean genome variant sites for the 30×230 individuals and three major common diseases (diabetes, hypertension and metabolic syndrome). The association results are displayed on our browser. The KRGDB uses the MySQL database and Apache-Tomcat web server adopted with Java Server Page (JSP) and is freely available at http://coda.nih.go.kr/coda/KRGDB/index.jsp.Availability: http://coda.nih.go.kr/coda/KRGDB/index.jsp
      PubDate: Wed, 04 Mar 2020 00:00:00 GMT
      DOI: 10.1093/database/baz146
      Issue No: Vol. 2020 (2020)
       
  • GXD’s RNA-Seq and Microarray Experiment Search: using curated metadata
           to reliably find mouse expression studies of interest

    • Authors: Smith C; Kadin J, Baldarelli R, et al.
      Abstract: AbstractThe Gene Expression Database (GXD), an extensive community resource of curated expression information for the mouse, has developed an RNA-Seq and Microarray Experiment Search (http://www.informatics.jax.org/gxd/htexp_index). This tool allows users to quickly and reliably find specific experiments in ArrayExpress and the Gene Expression Omnibus (GEO) that study endogenous gene expression in wild-type and mutant mice. Standardized metadata annotations, curated by GXD, allow users to specify the anatomical structure, developmental stage, mutated gene, strain and sex of samples of interest, as well as the study type and key parameters of the experiment. These searches, powered by controlled vocabularies and ontologies, can be combined with free text searching of experiment titles and descriptions. Search result summaries include link-outs to ArrayExpress and GEO, providing easy access to the expression data itself. Links to the PubMed entries for accompanying publications are also included. More information about this tool and GXD can be found at the GXD home page (http://www.informatics.jax.org/expression.shtml).Database URL:http://www.informatics.jax.org/expression.shtml
      PubDate: Wed, 04 Mar 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa002
      Issue No: Vol. 2020 (2020)
       
  • LeukmiR: a database for miRNAs and their targets in acute lymphoblastic
           leukemia

    • Authors: Rawoof A; Swaminathan G, Tiwari S, et al.
      Abstract: AbstractAcute lymphoblastic leukemia (ALL) is one of the most common hematological malignancies in children. Recent studies suggest the involvement of multiple microRNAs in the tumorigenesis of various leukemias. However, until now, no comprehensive database exists for miRNAs and their cognate target genes involved specifically in ALL. Therefore, we developed ‘LeukmiR’ a dynamic database comprising in silico predicted microRNAs, and experimentally validated miRNAs along with the target genes they regulate in mouse and human. LeukmiR is a user-friendly platform with search strings for ALL-associated microRNAs, their sequences, description of target genes, their location on the chromosomes and the corresponding deregulated signaling pathways. For the user query, different search modules exist where either quick search can be carried out using any fuzzy term or by providing exact terms in specific modules. All entries for both human and mouse genomes can be retrieved through multiple options such as miRNA ID, their accession number, sequence, target genes, Ensemble-ID or Entrez-ID. User can also access miRNA: mRNA interaction networks in different signaling pathways, the genomic location of the targeted regions such as 3′UTR, 5′UTR and exons with their gene ontology and disease ontology information in both human and mouse systems. Herein, we also report 51 novel microRNAs which are not described earlier for ALL. Thus, LeukmiR database will be a valuable source of information for researchers to understand and investigate miRNAs and their targets with diagnostic and therapeutic potential in ALL.Database URL: http://tdb.ccmb.res.in/LeukmiR/
      PubDate: Wed, 04 Mar 2020 00:00:00 GMT
      DOI: 10.1093/database/baz151
      Issue No: Vol. 2020 (2020)
       
  • Incorporation of a unified protein abundance dataset into the
           Saccharomyces genome database

    • Authors: Nash R; Weng S, Karra K, et al.
      Abstract: AbstractThe identification and accurate quantitation of protein abundance has been a major objective of proteomics research. Abundance studies have the potential to provide users with data that can be used to gain a deeper understanding of protein function and regulation and can also help identify cellular pathways and modules that operate under various environmental stress conditions. One of the central missions of the Saccharomyces Genome Database (SGD; https://www.yeastgenome.org) is to work with researchers to identify and incorporate datasets of interest to the wider scientific community, thereby enabling hypothesis-driven research. A large number of studies have detailed efforts to generate proteome-wide abundance data, but deeper analyses of these data have been hampered by the inability to compare results between studies. Recently, a unified protein abundance dataset was generated through the evaluation of more than 20 abundance datasets, which were normalized and converted to common measurement units, in this case molecules per cell. We have incorporated these normalized protein abundance data and associated metadata into the SGD database, as well as the SGD YeastMine data warehouse, resulting in the addition of 56 487 values for untreated cells grown in either rich or defined media and 28 335 values for cells treated with environmental stressors. Abundance data for protein-coding genes are displayed in a sortable, filterable table on Protein pages, available through Locus Summary pages. A median abundance value was incorporated, and a median absolute deviation was calculated for each protein-coding gene and incorporated into SGD. These values are displayed in the Protein section of the Locus Summary page. The inclusion of these data has enhanced the quality and quantity of protein experimental information presented at SGD and provides opportunities for researchers to access and utilize the data to further their research.
      PubDate: Wed, 04 Mar 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa008
      Issue No: Vol. 2020 (2020)
       
  • Predicted Drosophila Interactome Resource and web tool for functional
           interpretation of differentially expressed genes

    • Authors: Ding X; Jin J, Tao Y, et al.
      Abstract: AbstractDrosophila melanogaster is a well-established model organism that is widely used in genetic studies. This species enjoys the availability of a wide range of research tools, well-annotated reference databases and highly similar gene circuitry to other insects. To facilitate molecular mechanism studies in Drosophila, we present the Predicted Drosophila Interactome Resource (PDIR), a database of high-quality predicted functional gene interactions. These interactions were inferred from evidence in 10 public databases providing information for functional gene interactions from diverse perspectives. The current version of PDIR includes 102 835 putative functional associations with balanced sensitivity and specificity, which are expected to cover 22.56% of all Drosophila protein interactions. This set of functional interactions is a good reference for hypothesis formulation in molecular mechanism studies. At the same time, these interactions also serve as a high-quality reference interactome for gene set linkage analysis (GSLA), which is a web tool for the interpretation of the potential functional impacts of a set of changed genes observed in transcriptomics analyses. In a case study, we show that the PDIR/GSLA system was able to produce a more comprehensive and concise interpretation of the collective functional impact of multiple simultaneously changed genes compared with the widely used gene set annotation tools, including PANTHER and David. PDIR and its associated GSLA service can be accessed at http://drosophila.biomedtzc.cn.
      PubDate: Tue, 25 Feb 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa005
      Issue No: Vol. 2020 (2020)
       
  • NipahVR: a resource of multi-targeted putative therapeutics and epitopes
           for the Nipah virus

    • Authors: Gupta A; Kumar A, Rajput A, et al.
      Abstract: AbstractNipah virus (NiV) is an emerging and priority pathogen from the Paramyxoviridae family with a high fatality rate. It causes various diseases such as respiratory ailments and encephalitis and poses a great threat to humans and livestock. Despite various efforts, there is no approved antiviral treatment available. Therefore, to expedite and assist the research, we have developed an integrative resource NipahVR (http://bioinfo.imtech.res.in/manojk/nipahvr/) for the multi-targeted putative therapeutics and epitopes for NiV. It is structured into different sections, i.e. genomes, codon usage, phylogenomics, molecular diagnostic primers, therapeutics (siRNAs, sgRNAs, miRNAs) and vaccine epitopes (B-cell, CTL, MHC-I and -II binders). Most decisively, potentially efficient therapeutic regimens targeting different NiV proteins and genes were anticipated and projected. We hope this computational resource would be helpful in developing combating strategies against this deadly pathogen.Database URL: http://bioinfo.imtech.res.in/manojk/nipahvr/
      PubDate: Mon, 24 Feb 2020 00:00:00 GMT
      DOI: 10.1093/database/baz159
      Issue No: Vol. 2020 (2020)
       
  • dbPepNeo: a manually curated database for human tumor neoantigen peptides

    • Authors: Tan X; Li D, Huang P, et al.
      Abstract: AbstractNeoantigens can function as actual antigens to facilitate tumor rejection, which play a crucial role in cancer immunology and immunotherapy. Emerging evidence revealed that neoantigens can be used to develop personalized, cancer-specific vaccines. To date, large numbers of immunogenomic peptides have been computationally predicted to be potential neoantigens. However, experimental validation remains the gold standard for potential clinical application. Experimentally validated neoantigens are rare and mostly appear scattered among scientific papers and various databases. Here, we constructed dbPepNeo, a specific database for human leukocyte antigen class I (HLA-I) binding neoantigen peptides based on mass spectrometry (MS) validation or immunoassay in human tumors. According to the verification methods of these neoantigens, the collection of peptides was classified as 295 high confidence, 247 medium confidence and 407 794 low confidence neoantigens, respectively. This can serve as a valuable resource to aid further screening for effective neoantigens, optimize a neoantigen prediction pipeline and study T-cell receptor (TCR) recognition. Three applications of dbPepNeo are shown. In summary, this work resulted in a platform to promote the screening and confirmation of potential neoantigens in cancer immunotherapy.Database URL: www.biostatistics.online/dbPepNeo/.
      PubDate: Sat, 22 Feb 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa004
      Issue No: Vol. 2020 (2020)
       
  • HSPMdb: a computational repository of heat shock protein modulators

    • Authors: Singh P; Unik B, Puri A, et al.
      Abstract: AbstractHeat shock proteins (Hsp) are among highly conserved proteins across all domains of life. Though originally discovered as a cellular response to stress, these proteins are also involved in a wide range of cellular functions such as protein refolding, protein trafficking and cellular signalling. A large number of potential Hsp modulators are under clinical trials against various human diseases. As the number of modulators targeting Hsps is growing, there is a need to develop a comprehensive knowledge repository of these findings which is largely scattered. We have thus developed a web-accessible database, HSPMdb, which is a first of its kind manually curated repository of experimentally validated Hsp modulators (activators and inhibitors). The data was collected from 176 research articles and current version of HSPMdb holds 10 223 entries of compounds that are known to modulate activities of five major Hsps (Hsp100, Hsp90, Hsp70, Hsp60 and Hsp40) originated from 15 different organisms (i.e. human, yeast, bacteria, virus, mouse, rat, bovine, porcine, canine, chicken, Trypanosoma brucei and Plasmodium falciparum). HSPMdb provides comprehensive information on biological activities as well as the chemical properties of Hsp modulators. The biological activities of modulators are presented as enzymatic activity and cellular activity. Under the enzymatic activity field, parameters such as IC50, EC50, DC50, Ki and KD have been provided. In the cellular activity field, complete information on cellular activities (percentage cell growth inhibition, EC50 and GI50), type of cell viability assays and cell line used has been provided. One of the important features of HSPMdb is that it allows users to screen whether or not their compound of interest has any similarity with the previously known Hsp modulators. We anticipate that HSPMdb would become a valuable resource for the broader scientific community working in the area of chaperone biology and protein misfolding diseases. HSPMdb is freely accessible at http://bioinfo.imtech.res.in/bvs/hspmdb/index.php
      PubDate: Fri, 21 Feb 2020 00:00:00 GMT
      DOI: 10.1093/database/baaa003
      Issue No: Vol. 2020 (2020)
       
  • GREG—studying transcriptional regulation using integrative graph
           databases

    • Authors: Mei S; Huang X, Xie C, et al.
      Abstract: AbstractA gene regulatory process is the result of the concerted action of transcription factors, co-factors, regulatory non-coding RNAs (ncRNAs) and chromatin interactions. Therefore, the combination of protein–DNA, protein–protein, ncRNA–DNA, ncRNA–protein and DNA–DNA data in a single graph database offers new possibilities regarding generation of biological hypotheses. GREG (The Gene Regulation Graph Database) is an integrative database and web resource that allows the user to visualize and explore the network of all above-mentioned interactions for a query transcription factor, long non-coding RNA, genomic range or DNA annotation, as well as extracting node and interaction information, identifying connected nodes and performing advanced graphical queries directly on the regulatory network, in a simple and efficient way. In this article, we introduce GREG together with some application examples (including exploratory research of Nanog’s regulatory landscape and the etiology of chronic obstructive pulmonary disease), which we use as a demonstration of the advantages of using graph databases in biomedical research.Database URL:https://mora-lab.github.io/projects/greg.html, www.moralab.science/GREG/
      PubDate: Thu, 13 Feb 2020 00:00:00 GMT
      DOI: 10.1093/database/baz162
      Issue No: Vol. 2020 (2020)
       
  • An update on the Symbiotic Genomes Database (SymGenDB): a collection of
           metadata, genomic, genetic and protein sequences, orthologs and metabolic
           networks of symbiotic organisms

    • Authors: Reyes-Prieto M; Vargas-Chávez C, Llabrés M, et al.
      Abstract: AbstractThe Symbiotic Genomes Database (SymGenDB; http://symbiogenomesdb.uv.es/) is a public resource of manually curated associations between organisms involved in symbiotic relationships, maintaining a catalog of completely sequenced/finished bacterial genomes exclusively. It originally consisted of three modules where users could search for the bacteria involved in a specific symbiotic relationship, their genomes and their genes (including their orthologs). In this update, we present an additional module that includes a representation of the metabolic network of each organism included in the database, as Directed Acyclic Graphs (MetaDAGs). This module provides unique opportunities to explore the metabolism of each individual organism and/or to evaluate the shared and joint metabolic capabilities of the organisms of the same genera included in our listing, allowing users to construct predictive analyses of metabolic associations and complementation between systems. We also report a ~25% increase in manually curated content in the database, i.e. bacterial genomes and their associations, with a final count of 2328 bacterial genomes associated to 498 hosts. We describe new querying possibilities for all the modules, as well as new display features for the MetaDAGs module, providing a relevant range of content and utility. This update continues to improve SymGenDB and can help elucidate the mechanisms by which organisms depend on each other.
      PubDate: Thu, 13 Feb 2020 00:00:00 GMT
      DOI: 10.1093/database/baz160
      Issue No: Vol. 2020 (2020)
       
  • Ewé: a web-based ethnobotanical database for storing and analysing
           data

    • Authors: do Nascimento Fernandes de Souza E; Hawkins J.
      Abstract: AbstractEthnobotanical databases serve as repositories of traditional knowledge (TK), either at international or local scales. By documenting plant species with traditional use, and most importantly, the applications and modes of use of such species, ethnobotanical databases play a role in the conservation of TK and also provide access to information that could improve hypothesis generation and testing in ethnobotanical studies. Brazil has a rich medicinal flora and a rich cultural landscape. Nevertheless, cultural change and ecological degradation can lead to loss of TK. Here, we present an online database developed with open-source tools with a capacity to include all medicinal flora of Brazil. We present test data for the Leguminosae comprising a total of 2078 records, referred to here as use reports, including data compiled from literature and herbarium sources. Unlike existing databases, Ewé provides tools for the visualization of large datasets, facilitating hypothesis generation and meta-analyses. The Ewé database is currently available at www.ewedb.com.
      PubDate: Wed, 12 Feb 2020 00:00:00 GMT
      DOI: 10.1093/database/baz144
      Issue No: Vol. 2020 (2020)
       
  • RBPTD: a database of cancer-related RNA-binding proteins in humans

    • Authors: Li K; Guo Z, Zhai X, et al.
      Abstract: AbstractRNA-binding proteins (RBPs) play important roles in regulating the expression of genes involved in human physiological and pathological processes, especially in cancers. Many RBPs have been found to be dysregulated in cancers; however, there was no tool to incorporate high-throughput data from different dimensions to systematically identify cancer-related RBPs and to explore their causes of abnormality and their potential functions. Therefore, we developed a database named RBPTD to identify cancer-related RBPs in humans and systematically explore their functions and abnormalities by integrating different types of data, including gene expression profiles, prognosis data and DNA copy number variation (CNV), among 28 cancers. We found a total of 454 significantly differentially expressed RBPs, 1970 RBPs with significant prognostic value, and 53 dysregulated RBPs correlated with CNV abnormality. Functions of 26 cancer-related RBPs were explored by analysing high-throughput RNA sequencing data obtained by crosslinking immunoprecipitation, and the remaining RBP functions were predicted by calculating their correlation coefficient with other genes. Finally, we developed the RBPTD for users to explore functions and abnormalities of cancer-related RBPs to improve our understanding of their roles in tumorigenesis.Database URL: http: //www.rbptd.com
      PubDate: Tue, 11 Feb 2020 00:00:00 GMT
      DOI: 10.1093/database/baz156
      Issue No: Vol. 2020 (2020)
       
  • PLANiTS: a curated sequence reference dataset for plant ITS DNA
           metabarcoding

    • Authors: Banchi E; Ametrano C, Greco S, et al.
      Abstract: AbstractDNA metabarcoding combines DNA barcoding with high-throughput sequencing to identify different taxa within environmental communities. The ITS has already been proposed and widely used as universal barcode marker for plants, but a comprehensive, updated and accurate reference dataset of plant ITS sequences has not been available so far. Here, we constructed reference datasets of Viridiplantae ITS1, ITS2 and entire ITS sequences including both Chlorophyta and Streptophyta. The sequences were retrieved from NCBI, and the ITS region was extracted. The sequences underwent identity check to remove misidentified records and were clustered at 99% identity to reduce redundancy and computational effort. For this step, we developed a script called ‘better clustering for QIIME’ (bc4q) to ensure that the representative sequences are chosen according to the composition of the cluster at a different taxonomic level. The three datasets obtained with the bc4q script are PLANiTS1 (100 224 sequences), PLANiTS2 (96 771 sequences) and PLANiTS (97 550 sequences), and all are pre-formatted for QIIME, being this the most used bioinformatic pipeline for metabarcoding analysis. Being curated and updated reference databases, PLANiTS1, PLANiTS2 and PLANiTS are proposed as a reliable, pivotal first step for a general standardization of plant DNA metabarcoding studies. The bc4q script is presented as a new tool useful in each research dealing with sequences clustering.Database URL: https://github.com/apallavicini/bc4q; https://github.com/apallavicini/PLANiTS.
      PubDate: Tue, 04 Feb 2020 00:00:00 GMT
      DOI: 10.1093/database/baz155
      Issue No: Vol. 2020 (2020)
       
  • Variation benchmark datasets: update, criteria, quality and applications

    • Authors: Sarkar A; Yang Y, Vihinen M.
      Abstract: AbstractDevelopment of new computational methods and testing their performance has to be carried out using experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets used mainly for variation interpretation. They have been used for training and benchmarking predictors for various types of variations and their effects. VariBench was updated with 419 new datasets from 109 papers containing altogether 329 014 152 variants; however, there is plenty of redundancy between the datasets. VariBench is freely available at http://structure.bmc.lu.se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performances to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and show that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data.Database URL: http://structure.bmc.lu.se/VariBench
      PubDate: Tue, 04 Feb 2020 00:00:00 GMT
      DOI: 10.1093/database/baz117
      Issue No: Vol. 2020 (2020)
       
  • Human Breathomics Database

    • Authors: Kuo T; Tan C, Wang S, et al.
      Abstract: AbstractBreathomics is a special branch of metabolomics that quantifies volatile organic compounds (VOCs) from collected exhaled breath samples. Understanding how breath molecules are related to diseases, mechanisms and pathways identified from experimental analytical measurements is challenging due to the lack of an organized resource describing breath molecules, related references and biomedical information embedded in the literature. To provide breath VOCs, related references and biomedical information, we aim to organize a database composed of manually curated information and automatically extracted biomedical information. First, VOCs-related disease information was manually organized from 207 literature linked to 99 VOCs and known Medical Subject Headings (MeSH) terms. Then an automated text mining algorithm was used to extract biomedical information from this literature. In the end, the manually curated information and auto-extracted biomedical information was combined to form a breath molecule database—the Human Breathomics Database (HBDB). We first manually curated and organized disease information including MeSH term from 207 literatures associated with 99 VOCs. Then, an automatic pipeline of text mining approach was used to collect 2766 literatures and extract biomedical information from breath researches. We combined curated information with automatically extracted biomedical information to assemble a breath molecule database, the HBDB. The HBDB is a database that includes references, VOCs and diseases associated with human breathomics. Most of these VOCs were detected in human breath samples or exhaled breath condensate samples. So far, the database contains a total of 913 VOCs in relation to human exhaled breath researches reported in 2766 publications. The HBDB is the most comprehensive HBDB of VOCs in human exhaled breath to date. It is a useful and organized resource for researchers and clinicians to identify and further investigate potential biomarkers from the breath of patients.Database URL: https://hbdb.cmdm.tw
      PubDate: Fri, 24 Jan 2020 00:00:00 GMT
      DOI: 10.1093/database/baz139
      Issue No: Vol. 2020 (2020)
       
  • Tripal EUtils: a Tripal module to increase exchange and reuse of genome
           assembly metadata

    • Authors: Condon B; Almsaeed A, Buehler S, et al.
      Abstract: AbstractData and metadata interoperability between data storage systems is a critical component of the FAIR data principles. Programmatic and consistent means of reconciling metadata models between databases promote data exchange and thus increases its access to the scientific community. This process requires (i) metadata mapping between the models and (ii) software to perform the mapping. Here, we describe our efforts to map metadata associated with genome assemblies between the National Center for Biotechnology Information (NCBI) data resources and the Chado biological database schema. We present mappings for multiple NCBI data structures and introduce a Tripal software module, Tripal EUtils, to pull metadata from NCBI into a Tripal/Chado database. We discuss potential mapping challenges and solutions and provide suggestions for future development to further increase interoperability between these platforms.Database URL: https://github.com/NAL-i5K/tripal_eutils
      PubDate: Tue, 21 Jan 2020 00:00:00 GMT
      DOI: 10.1093/database/baz143
      Issue No: Vol. 2020 (2020)
       
  • WormQTL2: an interactive platform for systems genetics in Caenorhabditis
           elegans

    • Authors: Snoek B; Sterken M, Hartanto M, et al.
      Abstract: AbstractQuantitative genetics provides the tools for linking polymorphic loci to trait variation. Linkage analysis of gene expression is an established and widely applied method, leading to the identification of expression quantitative trait loci (eQTLs). (e)QTL detection facilitates the identification and understanding of the underlying molecular components and pathways, yet (e)QTL data access and mining often is a bottleneck. Here, we present WormQTL2, a database and platform for comparative investigations and meta-analyses of published (e)QTL data sets in the model nematode worm C. elegans. WormQTL2 integrates six eQTL studies spanning 11 conditions as well as over 1000 traits from 32 studies and allows experimental results to be compared, reused and extended upon to guide further experiments and conduct systems-genetic analyses. For example, one can easily screen a locus for specific cis-eQTLs that could be linked to variation in other traits, detect gene-by-environment interactions by comparing eQTLs under different conditions, or find correlations between QTL profiles of classical traits and gene expression. WormQTL2 makes data on natural variation in C. elegans and the identified QTLs interactively accessible, allowing studies beyond the original publications.Database URL: www.bioinformatics.nl/WormQTL2/
      PubDate: Tue, 21 Jan 2020 00:00:00 GMT
      DOI: 10.1093/database/baz149
      Issue No: Vol. 2020 (2020)
       
  • Building a pipeline to solicit expert knowledge from the community to aid
           gene summary curation

    • Authors: Antonazzo G; Urbano J, Marygold S, et al.
      Abstract: AbstractBrief summaries describing the function of each gene’s product(s) are of great value to the research community, especially when interpreting genome-wide studies that reveal changes to hundreds of genes. However, manually writing such summaries, even for a single species, is a daunting task; for example, the Drosophila melanogaster genome contains almost 14 000 protein-coding genes. One solution is to use computational methods to generate summaries, but this often fails to capture the key functions or express them eloquently. Here, we describe how we solicited help from the research community to generate manually written summaries of D. melanogaster gene function. Based on the data within the FlyBase database, we developed a computational pipeline to identify researchers who have worked extensively on each gene. We e-mailed these researchers to ask them to draft a brief summary of the main function(s) of the gene’s product, which we edited for consistency to produce a ‘gene snapshot’. This approach yielded 1800 gene snapshot submissions within a 3-month period. We discuss the general utility of this strategy for other databases that capture data from the research literature.Database URL: https://flybase.org/
      PubDate: Tue, 21 Jan 2020 00:00:00 GMT
      DOI: 10.1093/database/baz152
      Issue No: Vol. 2020 (2020)
       
  • PCaLiStDB: a lifestyle database for precision prevention of prostate
           cancer

    • Authors: Chen Y; Liu X, Yu Y, et al.
      Abstract: AbstractThe interaction between genes, lifestyles and environmental factors makes the genesis and progress of prostate cancer (PCa) very heterogeneous. Positive lifestyle is important to the prevention and controlling of PCa. To investigate the relationship between PCa and lifestyle at systems level, we established a PCa related lifestyle database (PCaLiStDB) and collected the PCa-related lifestyles including foods, nutrients, life habits and social and environmental factors as well as associated genes and physiological and biochemical indexes together with the disease phenotypes and drugs. Data format standardization was implemented for the future Lifestyle-Wide Association Studies of PCa (PCa_LWAS). Currently, 2290 single-factor lifestyles and 856 joint effects of two or more lifestyles were collected. Among these, 394 are protective factors, 556 are risk factors, 45 are no-influencing factors, 52 are factors with contradictory views and 1977 factors are lacking effective literatures support. PCaLiStDB is expected to facilitate the prevention and control of PCa, as well as the promotion of mechanistic study of lifestyles on PCa.Database URL: http://www.sysbio.org.cn/pcalistdb/
      PubDate: Thu, 16 Jan 2020 00:00:00 GMT
      DOI: 10.1093/database/baz154
      Issue No: Vol. 2020 (2020)
       
  • RNA CoSSMos 2.0: an improved searchable database of secondary structure
           motifs in RNA three-dimensional structures

    • Authors: Richardson K; Kirkpatrick C, Znosko B.
      Abstract: AbstractThe RNA Characterization of Secondary Structure Motifs, RNA CoSSMos, database is a freely accessible online database that allows users to identify secondary structure motifs among RNA 3D structures and explore their structural features. RNA CoSSMos 2.0 now requires two closing base pairs for all RNA loop motifs to create a less redundant database of secondary structures. Furthermore, RNA CoSSMos 2.0 represents an upgraded database with new features that summarize search findings and aid in the search for 3D structural patterns among RNA secondary structure motifs. Previously, users were limited to viewing search results individually, with no built-in tools to compare search results. RNA CoSSMos 2.0 provides two new features, allowing users to summarize, analyze and compare their search result findings. A function has been added to the website that calculates the average and representative structures of the search results. Additionally, users can now view a summary page of their search results that reports percentages of each structural feature found, including sugar pucker, glycosidic linkage, hydrogen bonding patterns and stacking interactions. Other upgrades include a newly embedded NGL structural viewer, the option to download the clipped structure coordinates in *.pdb format and improved NMR structure results. RNA CoSSMos 2.0 is no longer simply a search engine for a structure database; it now has the capability of analyzing, comparing and summarizing search results.Database URL: http://rnacossmos.com
      PubDate: Thu, 16 Jan 2020 00:00:00 GMT
      DOI: 10.1093/database/baz153
      Issue No: Vol. 2020 (2020)
       
  • miR-TV: an interactive microRNA Target Viewer for microRNA and target gene
           expression interrogation for human cancer studies

    • Authors: Pan C; Lin W.
      Abstract: AbstractMicroRNAs (miRNAs) have been identified in many organisms, and they are essential for gene expression regulation in many critical cellular processes. The expression levels of these genes and miRNAs are closely associated with the progression of diseases such as cancers. Furthermore, survival analysis is a significant indicator for evaluating the criticality of these cellular processes in cancer progression. We established a web tool, miRNA Target Viewer (miR-TV), which integrates 5p-arm and 3p-arm miRNA expression profiles, mRNA target gene expression levels in healthy and cancer populations, and clinical data of cancer patients and their survival information. The developed miR-TV obtained miRNA-seq, mRNA-seq and clinical data from the Cancer Genome Atlas and potential miRNA target gene predictions from miRDB, targetScan and miRanda. The data presentation was implemented using the D3 javascript toolkit. The D3 toolkit is frequently used to provide an easy-to-use interactive interface. Our miR-TV provides a user-friendly and interactive interface, which can be beneficial for biomedical researchers to freely interrogate miRNA expression information and their potential target genes. We believe that such a data visualization bioinformatics tool is excellent for obtaining information from massive biological data.Database URL: http://mirtv.ibms.sinica.edu.tw
      PubDate: Thu, 16 Jan 2020 00:00:00 GMT
      DOI: 10.1093/database/baz148
      Issue No: Vol. 2020 (2020)
       
  • Phenotype–genotype network construction and characterization: a case
           study of cardiovascular diseases and associated non-coding RNAs

    • Authors: Wu R; Lin Y, Liu X, et al.
      Abstract: AbstractThe phenotype–genotype relationship is a key for personalized and precision medicine for complex diseases. To unravel the complexity of the clinical phenotype–genotype network, we used cardiovascular diseases (CVDs) and associated non-coding RNAs (ncRNAs) (i.e. miRNAs, long ncRNAs, etc.) as the case for the study of CVDs at a systems or network level. We first integrated a database of CVDs and ncRNAs (CVDncR, http://sysbio.org.cn/cvdncr/) to construct CVD–ncRNA networks and annotate their clinical associations. To characterize the networks, we then separated the miRNAs into two groups, i.e. universal miRNAs associated with at least two types of CVDs and specific miRNAs related only to one type of CVD. Our analyses indicated two interesting patterns in these CVD–ncRNA networks. First, scale-free features were present within both CVD–miRNA and CVD–lncRNA networks; second, universal miRNAs were more likely to be CVDs biomarkers. These results were confirmed by computational functional analyses. The findings offer theoretical guidance for decoding CVD–ncRNA associations and will facilitate the screening of CVD ncRNA biomarkers.Database URL: http://sysbio.org.cn/cvdncr/
      PubDate: Wed, 15 Jan 2020 00:00:00 GMT
      DOI: 10.1093/database/baz147
      Issue No: Vol. 2020 (2020)
       
  • Annotation and curation of the causality information in LncRNADisease

    • Authors: Jia K; Gao Y, Shi J, et al.
      Abstract: AbstractDisease causative non-coding RNAs (ncRNAs) are of great importance in understanding a disease, for they directly contribute to the development or progress of a disease. Identifying the causative ncRNAs can provide vital implications for biomedical researches. In this work, we updated the long non-coding RNA disease database (LncRNADisease) with long non-coding RNA (lncRNA) causality information with manual annotations of the causal associations between lncRNAs/circular RNAs (circRNAs) and diseases by reviewing related publications. Of the total 11 568 experimental associations, 2297 out of 10 564 lncRNA-disease associations and 198 out of 1004 circRNA-disease associations were identified to be causal, whereas 635 lncRNAs and 126 circRNAs were identified to be causative for the development or progress of at least one disease. The updated information and functions of the database can offer great help to future researches involving lncRNA/circRNA-disease relationship. The latest LncRNADisease database is available at http://www.rnanut.net/lncrnadisease.
      PubDate: Wed, 15 Jan 2020 00:00:00 GMT
      DOI: 10.1093/database/baz150
      Issue No: Vol. 2020 (2020)
       
  • ChIPSummitDB: a ChIP-seq-based database of human transcription factor
           binding sites and the topological arrangements of the proteins bound to
           them

    • Authors: Czipa E; Schiller M, Nagy T, et al.
      Abstract: AbstractChIP-seq reveals genomic regions where proteins, e.g. transcription factors (TFs) interact with DNA. A substantial fraction of these regions, however, do not contain the cognate binding site for the TF of interest. This phenomenon might be explained by protein–protein interactions and co-precipitation of interacting gene regulatory elements. We uniformly processed 3727 human ChIP-seq data sets and determined the cistrome of 292 TFs, as well as the distances between the TF binding motif centers and the ChIP-seq peak summits. ChIPSummitDB enables the analysis of ChIP-seq data using multiple approaches. The 292 cistromes and corresponding ChIP-seq peak sets can be browsed in GenomeView. Overlapping SNPs can be inspected in dbSNPView. Most importantly, the MotifView and PairShiftView pages show the average distance between motif centers and overlapping ChIP-seq peak summits and distance distributions thereof, respectively. In addition to providing a comprehensive human TF binding site collection, the ChIPSummitDB database and web interface allows for the examination of the topological arrangement of TF complexes genome-wide. ChIPSummitDB is freely accessible at http://summit.med.unideb.hu/summitdb/. The database will be regularly updated and extended with the newly available human and mouse ChIP-seq data sets.
      PubDate: Tue, 14 Jan 2020 00:00:00 GMT
      DOI: 10.1093/database/baz141
      Issue No: Vol. 2020 (2020)
       
  • Bio-AnswerFinder: a system to find answers to questions from biomedical
           texts

    • Authors: Ozyurt I; Bandrowski A, Grethe J.
      Abstract: The ever accelerating pace of biomedical research results in corresponding acceleration in the volume of biomedical literature created. Since new research builds upon existing knowledge, the rate of increase in the available knowledge encoded in biomedical literature makes the easy access to that implicit knowledge more vital over time. Toward the goal of making implicit knowledge in the biomedical literature easily accessible to biomedical researchers, we introduce a question answering system called Bio-AnswerFinder. Bio-AnswerFinder uses a weighted-relaxed word mover's distance based similarity on word/phrase embeddings learned from PubMed abstracts to rank answers after question focus entity type filtering. Our approach retrieves relevant documents iteratively via enhanced keyword queries from a traditional search engine. To improve document retrieval performance, we introduced a supervised long short term memory neural network to select keywords from the question to facilitate iterative keyword search. Our unsupervised baseline system achieves a mean reciprocal rank score of 0.46 and Precision@1 of 0.32 on 936 questions from BioASQ. The answer sentences are further ranked by a fine-tuned bidirectional encoder representation from transformers (BERT) classifier trained using 100 answer candidate sentences per question for 492 BioASQ questions. To test ranking performance, we report a blind test on 100 questions that three independent annotators scored. These experts preferred BERT based reranking with 7% improvement on MRR and 13% improvement on Precision@1 scores on average.
      PubDate: Fri, 10 Jan 2020 00:00:00 GMT
      DOI: 10.1093/database/baz137
      Issue No: Vol. 2020 (2020)
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 18.232.186.117
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-