for Journals by Title or ISSN
for Articles by Keywords

Publisher: Oxford University Press   (Total: 396 journals)

 A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

Showing 1 - 200 of 396 Journals sorted alphabetically
ACS Symposium Series     Full-text available via subscription   (SJR: 0.189, CiteScore: 0)
Acta Biochimica et Biophysica Sinica     Hybrid Journal   (Followers: 5, SJR: 0.79, CiteScore: 2)
Adaptation     Hybrid Journal   (Followers: 8, SJR: 0.143, CiteScore: 0)
Advances in Nutrition     Hybrid Journal   (Followers: 44, SJR: 2.196, CiteScore: 5)
Aesthetic Surgery J.     Hybrid Journal   (Followers: 6, SJR: 1.434, CiteScore: 1)
African Affairs     Hybrid Journal   (Followers: 63, SJR: 1.869, CiteScore: 2)
Age and Ageing     Hybrid Journal   (Followers: 91, SJR: 1.989, CiteScore: 4)
Alcohol and Alcoholism     Hybrid Journal   (Followers: 18, SJR: 1.376, CiteScore: 3)
American Entomologist     Full-text available via subscription   (Followers: 7)
American Historical Review     Hybrid Journal   (Followers: 148, SJR: 0.467, CiteScore: 1)
American J. of Agricultural Economics     Hybrid Journal   (Followers: 40, SJR: 2.113, CiteScore: 3)
American J. of Clinical Nutrition     Hybrid Journal   (Followers: 145, SJR: 3.438, CiteScore: 6)
American J. of Epidemiology     Hybrid Journal   (Followers: 171, SJR: 2.713, CiteScore: 3)
American J. of Hypertension     Hybrid Journal   (Followers: 25, SJR: 1.322, CiteScore: 3)
American J. of Jurisprudence     Hybrid Journal   (Followers: 18, SJR: 0.281, CiteScore: 1)
American J. of Legal History     Full-text available via subscription   (Followers: 8, SJR: 0.116, CiteScore: 0)
American Law and Economics Review     Hybrid Journal   (Followers: 27, SJR: 1.053, CiteScore: 1)
American Literary History     Hybrid Journal   (Followers: 15, SJR: 0.391, CiteScore: 0)
Analysis     Hybrid Journal   (Followers: 21, SJR: 1.038, CiteScore: 1)
Animal Frontiers     Hybrid Journal  
Annals of Behavioral Medicine     Hybrid Journal   (Followers: 14, SJR: 1.423, CiteScore: 3)
Annals of Botany     Hybrid Journal   (Followers: 35, SJR: 1.721, CiteScore: 4)
Annals of Oncology     Hybrid Journal   (Followers: 42, SJR: 5.599, CiteScore: 9)
Annals of the Entomological Society of America     Full-text available via subscription   (Followers: 10, SJR: 0.722, CiteScore: 1)
Annals of Work Exposures and Health     Hybrid Journal   (Followers: 33, SJR: 0.728, CiteScore: 2)
AoB Plants     Open Access   (Followers: 4, SJR: 1.28, CiteScore: 3)
Applied Economic Perspectives and Policy     Hybrid Journal   (Followers: 17, SJR: 0.858, CiteScore: 2)
Applied Linguistics     Hybrid Journal   (Followers: 56, SJR: 2.987, CiteScore: 3)
Applied Mathematics Research eXpress     Hybrid Journal   (Followers: 1, SJR: 1.241, CiteScore: 1)
Arbitration Intl.     Full-text available via subscription   (Followers: 20)
Arbitration Law Reports and Review     Hybrid Journal   (Followers: 14)
Archives of Clinical Neuropsychology     Hybrid Journal   (Followers: 30, SJR: 0.731, CiteScore: 2)
Aristotelian Society Supplementary Volume     Hybrid Journal   (Followers: 3)
Arthropod Management Tests     Hybrid Journal   (Followers: 2)
Astronomy & Geophysics     Hybrid Journal   (Followers: 42, SJR: 0.146, CiteScore: 0)
Behavioral Ecology     Hybrid Journal   (Followers: 52, SJR: 1.871, CiteScore: 3)
Bioinformatics     Hybrid Journal   (Followers: 299, SJR: 6.14, CiteScore: 8)
Biology Methods and Protocols     Hybrid Journal  
Biology of Reproduction     Full-text available via subscription   (Followers: 10, SJR: 1.446, CiteScore: 3)
Biometrika     Hybrid Journal   (Followers: 20, SJR: 3.485, CiteScore: 2)
BioScience     Hybrid Journal   (Followers: 29, SJR: 2.754, CiteScore: 4)
Bioscience Horizons : The National Undergraduate Research J.     Open Access   (Followers: 1, SJR: 0.146, CiteScore: 0)
Biostatistics     Hybrid Journal   (Followers: 17, SJR: 1.553, CiteScore: 2)
BJA : British J. of Anaesthesia     Hybrid Journal   (Followers: 163, SJR: 2.115, CiteScore: 3)
BJA Education     Hybrid Journal   (Followers: 64)
Brain     Hybrid Journal   (Followers: 68, SJR: 5.858, CiteScore: 7)
Briefings in Bioinformatics     Hybrid Journal   (Followers: 47, SJR: 2.505, CiteScore: 5)
Briefings in Functional Genomics     Hybrid Journal   (Followers: 3, SJR: 2.15, CiteScore: 3)
British J. for the Philosophy of Science     Hybrid Journal   (Followers: 35, SJR: 2.161, CiteScore: 2)
British J. of Aesthetics     Hybrid Journal   (Followers: 26, SJR: 0.508, CiteScore: 1)
British J. of Criminology     Hybrid Journal   (Followers: 575, SJR: 1.828, CiteScore: 3)
British J. of Social Work     Hybrid Journal   (Followers: 87, SJR: 1.019, CiteScore: 2)
British Medical Bulletin     Hybrid Journal   (Followers: 7, SJR: 1.355, CiteScore: 3)
British Yearbook of Intl. Law     Hybrid Journal   (Followers: 32)
Bulletin of the London Mathematical Society     Hybrid Journal   (Followers: 4, SJR: 1.376, CiteScore: 1)
Cambridge J. of Economics     Hybrid Journal   (Followers: 61, SJR: 0.764, CiteScore: 2)
Cambridge J. of Regions, Economy and Society     Hybrid Journal   (Followers: 11, SJR: 2.438, CiteScore: 4)
Cambridge Quarterly     Hybrid Journal   (Followers: 9, SJR: 0.104, CiteScore: 0)
Capital Markets Law J.     Hybrid Journal   (Followers: 2, SJR: 0.222, CiteScore: 0)
Carcinogenesis     Hybrid Journal   (Followers: 2, SJR: 2.135, CiteScore: 5)
Cardiovascular Research     Hybrid Journal   (Followers: 14, SJR: 3.002, CiteScore: 5)
Cerebral Cortex     Hybrid Journal   (Followers: 45, SJR: 3.892, CiteScore: 6)
CESifo Economic Studies     Hybrid Journal   (Followers: 17, SJR: 0.483, CiteScore: 1)
Chemical Senses     Hybrid Journal   (Followers: 1, SJR: 1.42, CiteScore: 3)
Children and Schools     Hybrid Journal   (Followers: 5, SJR: 0.246, CiteScore: 0)
Chinese J. of Comparative Law     Hybrid Journal   (Followers: 4, SJR: 0.412, CiteScore: 0)
Chinese J. of Intl. Law     Hybrid Journal   (Followers: 23, SJR: 0.329, CiteScore: 0)
Chinese J. of Intl. Politics     Hybrid Journal   (Followers: 9, SJR: 1.392, CiteScore: 2)
Christian Bioethics: Non-Ecumenical Studies in Medical Morality     Hybrid Journal   (Followers: 10, SJR: 0.183, CiteScore: 0)
Classical Receptions J.     Hybrid Journal   (Followers: 25, SJR: 0.123, CiteScore: 0)
Clean Energy     Open Access   (Followers: 1)
Clinical Infectious Diseases     Hybrid Journal   (Followers: 65, SJR: 5.051, CiteScore: 5)
Clinical Kidney J.     Open Access   (Followers: 3, SJR: 1.163, CiteScore: 2)
Communication Theory     Hybrid Journal   (Followers: 21, SJR: 2.424, CiteScore: 3)
Communication, Culture & Critique     Hybrid Journal   (Followers: 26, SJR: 0.222, CiteScore: 1)
Community Development J.     Hybrid Journal   (Followers: 27, SJR: 0.268, CiteScore: 1)
Computer J.     Hybrid Journal   (Followers: 9, SJR: 0.319, CiteScore: 1)
Conservation Physiology     Open Access   (Followers: 2, SJR: 1.818, CiteScore: 3)
Contemporary Women's Writing     Hybrid Journal   (Followers: 9, SJR: 0.121, CiteScore: 0)
Contributions to Political Economy     Hybrid Journal   (Followers: 5, SJR: 0.906, CiteScore: 1)
Critical Values     Full-text available via subscription  
Current Developments in Nutrition     Open Access  
Current Legal Problems     Hybrid Journal   (Followers: 27)
Current Zoology     Full-text available via subscription   (Followers: 2, SJR: 1.164, CiteScore: 2)
Database : The J. of Biological Databases and Curation     Open Access   (Followers: 8, SJR: 1.791, CiteScore: 3)
Digital Scholarship in the Humanities     Hybrid Journal   (Followers: 13, SJR: 0.259, CiteScore: 1)
Diplomatic History     Hybrid Journal   (Followers: 21, SJR: 0.45, CiteScore: 1)
DNA Research     Open Access   (Followers: 5, SJR: 2.866, CiteScore: 6)
Dynamics and Statistics of the Climate System     Open Access   (Followers: 3)
Early Music     Hybrid Journal   (Followers: 15, SJR: 0.139, CiteScore: 0)
Economic Policy     Hybrid Journal   (Followers: 39, SJR: 3.584, CiteScore: 3)
ELT J.     Hybrid Journal   (Followers: 24, SJR: 0.942, CiteScore: 1)
English Historical Review     Hybrid Journal   (Followers: 51, SJR: 0.612, CiteScore: 1)
English: J. of the English Association     Hybrid Journal   (Followers: 14, SJR: 0.1, CiteScore: 0)
Environmental Entomology     Full-text available via subscription   (Followers: 11, SJR: 0.818, CiteScore: 2)
Environmental Epigenetics     Open Access   (Followers: 3)
Environmental History     Hybrid Journal   (Followers: 27, SJR: 0.408, CiteScore: 1)
EP-Europace     Hybrid Journal   (Followers: 2, SJR: 2.748, CiteScore: 4)
Epidemiologic Reviews     Hybrid Journal   (Followers: 9, SJR: 4.505, CiteScore: 8)
ESHRE Monographs     Hybrid Journal  
Essays in Criticism     Hybrid Journal   (Followers: 16, SJR: 0.113, CiteScore: 0)
European Heart J.     Hybrid Journal   (Followers: 57, SJR: 9.315, CiteScore: 9)
European Heart J. - Cardiovascular Imaging     Hybrid Journal   (Followers: 9, SJR: 3.625, CiteScore: 3)
European Heart J. - Cardiovascular Pharmacotherapy     Full-text available via subscription   (Followers: 1)
European Heart J. - Quality of Care and Clinical Outcomes     Hybrid Journal  
European Heart J. : Case Reports     Open Access  
European Heart J. Supplements     Hybrid Journal   (Followers: 8, SJR: 0.223, CiteScore: 0)
European J. of Cardio-Thoracic Surgery     Hybrid Journal   (Followers: 9, SJR: 1.681, CiteScore: 2)
European J. of Intl. Law     Hybrid Journal   (Followers: 179, SJR: 0.694, CiteScore: 1)
European J. of Orthodontics     Hybrid Journal   (Followers: 4, SJR: 1.279, CiteScore: 2)
European J. of Public Health     Hybrid Journal   (Followers: 20, SJR: 1.36, CiteScore: 2)
European Review of Agricultural Economics     Hybrid Journal   (Followers: 10, SJR: 1.172, CiteScore: 2)
European Review of Economic History     Hybrid Journal   (Followers: 29, SJR: 0.702, CiteScore: 1)
European Sociological Review     Hybrid Journal   (Followers: 40, SJR: 2.728, CiteScore: 3)
Evolution, Medicine, and Public Health     Open Access   (Followers: 11)
Family Practice     Hybrid Journal   (Followers: 15, SJR: 1.018, CiteScore: 2)
Fems Microbiology Ecology     Hybrid Journal   (Followers: 12, SJR: 1.492, CiteScore: 4)
Fems Microbiology Letters     Hybrid Journal   (Followers: 24, SJR: 0.79, CiteScore: 2)
Fems Microbiology Reviews     Hybrid Journal   (Followers: 30, SJR: 7.063, CiteScore: 13)
Fems Yeast Research     Hybrid Journal   (Followers: 14, SJR: 1.308, CiteScore: 3)
Food Quality and Safety     Open Access   (Followers: 1)
Foreign Policy Analysis     Hybrid Journal   (Followers: 23, SJR: 1.425, CiteScore: 1)
Forest Science     Hybrid Journal   (Followers: 7, SJR: 0.89, CiteScore: 2)
Forestry: An Intl. J. of Forest Research     Hybrid Journal   (Followers: 16, SJR: 1.133, CiteScore: 3)
Forum for Modern Language Studies     Hybrid Journal   (Followers: 6, SJR: 0.104, CiteScore: 0)
French History     Hybrid Journal   (Followers: 32, SJR: 0.118, CiteScore: 0)
French Studies     Hybrid Journal   (Followers: 20, SJR: 0.148, CiteScore: 0)
French Studies Bulletin     Hybrid Journal   (Followers: 10, SJR: 0.152, CiteScore: 0)
Gastroenterology Report     Open Access   (Followers: 2)
Genome Biology and Evolution     Open Access   (Followers: 12, SJR: 2.578, CiteScore: 4)
Geophysical J. Intl.     Hybrid Journal   (Followers: 35, SJR: 1.506, CiteScore: 3)
German History     Hybrid Journal   (Followers: 22, SJR: 0.161, CiteScore: 0)
GigaScience     Open Access   (Followers: 3, SJR: 5.022, CiteScore: 7)
Global Summitry     Hybrid Journal   (Followers: 1)
Glycobiology     Hybrid Journal   (Followers: 14, SJR: 1.493, CiteScore: 3)
Health and Social Work     Hybrid Journal   (Followers: 56, SJR: 0.388, CiteScore: 1)
Health Education Research     Hybrid Journal   (Followers: 15, SJR: 0.854, CiteScore: 2)
Health Policy and Planning     Hybrid Journal   (Followers: 23, SJR: 1.512, CiteScore: 2)
Health Promotion Intl.     Hybrid Journal   (Followers: 22, SJR: 0.812, CiteScore: 2)
History Workshop J.     Hybrid Journal   (Followers: 29, SJR: 1.278, CiteScore: 1)
Holocaust and Genocide Studies     Hybrid Journal   (Followers: 28, SJR: 0.105, CiteScore: 0)
Human Communication Research     Hybrid Journal   (Followers: 13, SJR: 2.146, CiteScore: 3)
Human Molecular Genetics     Hybrid Journal   (Followers: 8, SJR: 3.555, CiteScore: 5)
Human Reproduction     Hybrid Journal   (Followers: 71, SJR: 2.643, CiteScore: 5)
Human Reproduction Open     Open Access  
Human Reproduction Update     Hybrid Journal   (Followers: 19, SJR: 5.317, CiteScore: 10)
Human Rights Law Review     Hybrid Journal   (Followers: 56, SJR: 0.756, CiteScore: 1)
ICES J. of Marine Science: J. du Conseil     Hybrid Journal   (Followers: 51, SJR: 1.591, CiteScore: 3)
ICSID Review     Hybrid Journal   (Followers: 10)
ILAR J.     Hybrid Journal   (Followers: 2, SJR: 1.732, CiteScore: 4)
IMA J. of Applied Mathematics     Hybrid Journal   (SJR: 0.679, CiteScore: 1)
IMA J. of Management Mathematics     Hybrid Journal   (SJR: 0.538, CiteScore: 1)
IMA J. of Mathematical Control and Information     Hybrid Journal   (Followers: 2, SJR: 0.496, CiteScore: 1)
IMA J. of Numerical Analysis - advance access     Hybrid Journal   (SJR: 1.987, CiteScore: 2)
Industrial and Corporate Change     Hybrid Journal   (Followers: 10, SJR: 1.792, CiteScore: 2)
Industrial Law J.     Hybrid Journal   (Followers: 35, SJR: 0.249, CiteScore: 1)
Inflammatory Bowel Diseases     Hybrid Journal   (Followers: 43, SJR: 2.511, CiteScore: 4)
Information and Inference     Free  
Integrative and Comparative Biology     Hybrid Journal   (Followers: 8, SJR: 1.319, CiteScore: 2)
Interacting with Computers     Hybrid Journal   (Followers: 11, SJR: 0.292, CiteScore: 1)
Interactive CardioVascular and Thoracic Surgery     Hybrid Journal   (Followers: 7, SJR: 0.762, CiteScore: 1)
Intl. Affairs     Hybrid Journal   (Followers: 58, SJR: 1.505, CiteScore: 3)
Intl. Data Privacy Law     Hybrid Journal   (Followers: 24)
Intl. Health     Hybrid Journal   (Followers: 5, SJR: 0.851, CiteScore: 2)
Intl. Immunology     Hybrid Journal   (Followers: 3, SJR: 2.167, CiteScore: 4)
Intl. J. for Quality in Health Care     Hybrid Journal   (Followers: 36, SJR: 1.348, CiteScore: 2)
Intl. J. of Constitutional Law     Hybrid Journal   (Followers: 62, SJR: 0.601, CiteScore: 1)
Intl. J. of Epidemiology     Hybrid Journal   (Followers: 218, SJR: 3.969, CiteScore: 5)
Intl. J. of Law and Information Technology     Hybrid Journal   (Followers: 5, SJR: 0.202, CiteScore: 1)
Intl. J. of Law, Policy and the Family     Hybrid Journal   (Followers: 25, SJR: 0.223, CiteScore: 1)
Intl. J. of Lexicography     Hybrid Journal   (Followers: 9, SJR: 0.285, CiteScore: 1)
Intl. J. of Low-Carbon Technologies     Open Access   (Followers: 1, SJR: 0.403, CiteScore: 1)
Intl. J. of Neuropsychopharmacology     Open Access   (Followers: 3, SJR: 1.808, CiteScore: 4)
Intl. J. of Public Opinion Research     Hybrid Journal   (Followers: 9, SJR: 1.545, CiteScore: 1)
Intl. J. of Refugee Law     Hybrid Journal   (Followers: 35, SJR: 0.389, CiteScore: 1)
Intl. J. of Transitional Justice     Hybrid Journal   (Followers: 11, SJR: 0.724, CiteScore: 2)
Intl. Mathematics Research Notices     Hybrid Journal   (Followers: 1, SJR: 2.168, CiteScore: 1)
Intl. Political Sociology     Hybrid Journal   (Followers: 35, SJR: 1.465, CiteScore: 3)
Intl. Relations of the Asia-Pacific     Hybrid Journal   (Followers: 23, SJR: 0.401, CiteScore: 1)
Intl. Studies Perspectives     Hybrid Journal   (Followers: 9, SJR: 0.983, CiteScore: 1)
Intl. Studies Quarterly     Hybrid Journal   (Followers: 44, SJR: 2.581, CiteScore: 2)
Intl. Studies Review     Hybrid Journal   (Followers: 22, SJR: 1.201, CiteScore: 1)
ISLE: Interdisciplinary Studies in Literature and Environment     Hybrid Journal   (Followers: 1, SJR: 0.15, CiteScore: 0)
ITNOW     Hybrid Journal   (Followers: 1, SJR: 0.103, CiteScore: 0)
J. of African Economies     Hybrid Journal   (Followers: 15, SJR: 0.533, CiteScore: 1)
J. of American History     Hybrid Journal   (Followers: 46, SJR: 0.297, CiteScore: 1)
J. of Analytical Toxicology     Hybrid Journal   (Followers: 14, SJR: 1.065, CiteScore: 2)
J. of Antimicrobial Chemotherapy     Hybrid Journal   (Followers: 15, SJR: 2.419, CiteScore: 4)
J. of Antitrust Enforcement     Hybrid Journal   (Followers: 1)
J. of Applied Poultry Research     Hybrid Journal   (Followers: 4, SJR: 0.585, CiteScore: 1)
J. of Biochemistry     Hybrid Journal   (Followers: 42, SJR: 1.226, CiteScore: 2)
J. of Burn Care & Research     Hybrid Journal   (Followers: 9, SJR: 0.768, CiteScore: 2)
J. of Chromatographic Science     Hybrid Journal   (Followers: 18, SJR: 0.36, CiteScore: 1)
J. of Church and State     Hybrid Journal   (Followers: 11, SJR: 0.139, CiteScore: 0)
J. of Communication     Hybrid Journal   (Followers: 51, SJR: 4.411, CiteScore: 5)
J. of Competition Law and Economics     Hybrid Journal   (Followers: 35, SJR: 0.33, CiteScore: 0)
J. of Complex Networks     Hybrid Journal   (Followers: 2, SJR: 1.05, CiteScore: 4)
J. of Computer-Mediated Communication     Open Access   (Followers: 26, SJR: 2.961, CiteScore: 6)
J. of Conflict and Security Law     Hybrid Journal   (Followers: 12, SJR: 0.402, CiteScore: 0)
J. of Consumer Research     Full-text available via subscription   (Followers: 42, SJR: 5.856, CiteScore: 5)

        1 2 | Last   [Sort by number of followers]   [Restore default list]

Journal Cover
Database : The Journal of Biological Databases and Curation
Journal Prestige (SJR): 1.791
Citation Impact (citeScore): 3
Number of Followers: 8  

  This is an Open Access Journal Open Access journal
ISSN (Online) 1758-0463
Published by Oxford University Press Homepage  [396 journals]
  • IsopiRBank: a research resource for tracking piRNA isoforms

    • Authors: Zhang H; Ali A, Gao J, et al.
      Abstract: PIWI-interacting RNAs (piRNAs) are essential for transcriptional and post-transcriptional regulation of transposons and coding genes in germline. With the development of sequencing technologies, length variations of piRNAs have been identified in several species. However, the extent to which, piRNA isoforms exist, and whether these isoforms are functionally distinct from canonical piRNAs remain uncharacterized. Through data mining from 2154 datasets of small RNA sequencing data from four species (Homo sapiens, Mus musculus, Danio rerio and Drosophila melanogaster), we have identified 8 749 139 piRNA isoforms from 175 454 canonical piRNAs, and classified them on the basis of variations on 5′ or 3′ end via the alignment of isoforms with canonical sequence. We thus established a database named IsopiRBank. Each isoforms has detailed annotation as follows: normalized expression data, classification, spatiotemporal expression data and genome origin. Users can also select interested isoforms for further analysis, including target prediction and Enrichment analysis. Taken together, IsopiRBank is an interactive database that aims to present the first integrated resource of piRNA isoforms, and broaden the research of piRNA biology. IsopiRBank can be accessed at without any registration or log in requirement. Database URL:
      PubDate: Tue, 28 Aug 2018 00:00:00 GMT
      DOI: 10.1093/database/bay059
      Issue No: Vol. 2018 (2018)
  • SDADB: a functional annotation database of protein structural domains

    • Authors: Zeng C; Zhan W, Deng L.
      Abstract: Annotating functional terms with individual domains is essential for understanding the functions of full-length proteins. We describe SDADB, a functional annotation database for structural domains. SDADB provides associations between gene ontology (GO) terms and SCOP domains calculated with an integrated framework. GO annotations are assigned probabilities of being correct, which are estimated with a Bayesian network by taking advantage of structural neighborhood mappings, SCOP-InterPro domain mapping information, position-specific scoring matrices (PSSMs) and sequence homolog features, with the most substantial contribution coming from high-coverage structure-based domain-protein mappings. The domain-protein mappings are computed using large-scale structure alignment. SDADB contains ontological terms with probabilistic scores for more than 214 000 distinct SCOP domains. It also provides additional features include 3D structure alignment visualization, GO hierarchical tree view, search, browse and download options.Database URL:
      PubDate: Tue, 28 Aug 2018 00:00:00 GMT
      DOI: 10.1093/database/bay064
      Issue No: Vol. 2018 (2018)
  • PtRFdb: a database for plant transfer RNA-derived fragments

    • Authors: Gupta N; Singh A, Zahra S, et al.
      Abstract: Transfer RNA-derived fragments (tRFs) represent a novel class of small RNAs (sRNAs) generated through endonucleolytic cleavage of both mature and precursor transfer RNAs (tRNAs). These 14–28 nt length tRFs that have been extensively studied in animal kingdom are to be explored in plants. In this study, we introduce a database of plant tRFs named PtRFdb (, for the scientific community. We analyzed a total of 1344 sRNA sequencing datasets of 10 different plant species and identified a total of 5607 unique tRFs (758 tRF-1, 2269 tRF-3 and 2580 tRF-5), represented by 487 765 entries. In PtRFdb, detailed and comprehensive information is available for each tRF entry. Apart from the core information consisting of the tRF type, anticodon, source organism, tissue, sequence and the genomic location; additional information like PubMed identifier (PMID), Sample accession number (GSM), sequence length and frequency relevant to the tRFs may be of high utility to the user. Two different types of search modules (Basic Search and Advanced Search), sequence similarity search (by BLAST) and Browse option with data download facility for each search is provided in this database. We believe that PtRFdb is a unique database of its kind and it will be beneficial in the validation and further characterization of plant tRFs.Database URL:
      PubDate: Fri, 22 Jun 2018 00:00:00 GMT
      DOI: 10.1093/database/bay063
      Issue No: Vol. 2018 (2018)
  • Chemical–gene relation extraction using recursive neural network

    • Authors: Lim S; Kang J.
      Abstract: In this article, we describe our system for the CHEMPROT task of the BioCreative VI challenge. Although considerable research on the named entity recognition of genes and drugs has been conducted, there is limited research on extracting relationships between them. Extracting relations between chemical compounds and genes from the literature is an important element in pharmacological and clinical research. The CHEMPROT task of BioCreative VI aims to promote the development of text mining systems that can be used to automatically extract relationships between chemical compounds and genes. We tested three recursive neural network approaches to improve the performance of relation extraction. In the BioCreative VI challenge, we developed a tree-Long Short-Term Memory networks (tree-LSTM) model with several additional features including a position feature and a subtree containment feature, and we also applied an ensemble method. After the challenge, we applied additional pre-processing steps to the tree-LSTM model, and we tested the performance of another recursive neural network model called Stack-augmented Parser Interpreter Neural Network (SPINN). Our tree-LSTM model achieved an F-score of 58.53% in the BioCreative VI challenge. Our tree-LSTM model with additional pre-processing and the SPINN model obtained F-scores of 63.7 and 64.1%, respectively.Database URL:
      PubDate: Thu, 21 Jun 2018 00:00:00 GMT
      DOI: 10.1093/database/bay060
      Issue No: Vol. 2018 (2018)
  • dbLGL: an online leukemia gene and literature database for the
           retrospective comparison of adult and childhood leukemia genetics with
           literature evidence

    • Authors: Liu Y; Luo M, Jin Z, et al.
      Abstract: Leukemia is a group of cancers with increased numbers of immature or abnormal leucocytes that originated in the bone marrow and other blood-forming organs. The development of differentially diagnostic biomarkers for different subtypes largely depends on understanding the biological pathways and regulatory mechanisms associated with leukemia-implicated genes. Unfortunately, the leukemia-implicated genes that have been identified thus far are scattered among thousands of published studies, and no systematic summary of the differences between adult and childhood leukemia exists with regard to the causative genetic mutations and genetic mechanisms of the various subtypes. In this study, we performed a systematic literature review of those susceptibility genes reported in small-scale experiments and built an online gene database containing a total of 1805 leukemia-associated genes, available at Our comparison of genes from the four primary subtypes and between adult and childhood cases identified a number of potential genes related to patient survival. These curated genes can satisfy a growing demand for further integrating genomics screening for leukemia-associated low-frequency mutated genes.Database URL:
      PubDate: Thu, 21 Jun 2018 00:00:00 GMT
      DOI: 10.1093/database/bay062
      Issue No: Vol. 2018 (2018)
  • LncCeRBase: a database of experimentally validated human competing
           endogenous long non-coding RNAs

    • Authors: Pian C; Zhang G, Tu T, et al.
      Abstract: Long non-coding RNAs (lncRNAs) are endogenous molecules longer than 200 nucleotides, and lack coding potential. LncRNAs that interact with microRNAs (miRNAs) are known as a competing endogenous RNAs (ceRNAs) and have the ability to regulate the expression of target genes. The ceRNAs play an important role in the initiation and progression of various cancers. However, until now, there is no a database including a collection of experimentally verified, human ceRNAs. We developed the LncCeRBase database, which encompasses 432 lncRNA–miRNA–mRNA interactions, including 130 lncRNAs, 214 miRNAs and 245 genes from 300 publications. In addition, we compiled the signaling pathways associated with the included lncRNA–miRNA–mRNA interactions as a tool to explore their functions. LncCeRBase is useful for understanding the regulatory mechanisms of lncRNA.Database URL:
      PubDate: Thu, 21 Jun 2018 00:00:00 GMT
      DOI: 10.1093/database/bay061
      Issue No: Vol. 2018 (2018)
  • RBPMetaDB: a comprehensive annotation of mouse RNA-Seq datasets with
           perturbations of RNA-binding proteins

    • Authors: Li J; Deng S, Vieira J, et al.
      Abstract: RNA-binding proteins (RBPs) may play a critical role in gene regulation in various diseases or biological processes by controlling post-transcriptional events such as polyadenylation, splicing and mRNA stabilization via binding activities to RNA molecules. Owing to the importance of RBPs in gene regulation, a great number of studies have been conducted, resulting in a large amount of RNA-Seq datasets. However, these datasets usually do not have structured organization of metadata, which limits their potentially wide use. To bridge this gap, the metadata of a comprehensive set of publicly available mouse RNA-Seq datasets with perturbed RBPs were collected and integrated into a database called RBPMetaDB. This database contains 292 mouse RNA-Seq datasets for a comprehensive list of 187 RBPs. These RBPs account for only ∼10% of all known RBPs annotated in Gene Ontology, indicating that most are still unexplored using high-throughput sequencing. This negative information provides a great pool of candidate RBPs for biologists to conduct future experimental studies. In addition, we found that DNA-binding activities are significantly enriched among RBPs in RBPMetaDB, suggesting that prior studies of these DNA- and RNA-binding factors focus more on DNA-binding activities instead of RNA-binding activities. This result reveals the opportunity to efficiently reuse these data for investigation of the roles of their RNA-binding activities. A web application has also been implemented to enable easy access and wide use of RBPMetaDB. It is expected that RBPMetaDB will be a great resource for improving understanding of the biological roles of RBPs.Database URL:
      PubDate: Tue, 19 Jun 2018 00:00:00 GMT
      DOI: 10.1093/database/bay054
      Issue No: Vol. 2018 (2018)
  • MPD: a pathogen genome and metagenome database

    • Authors: Zhang T; Miao J, Han N, et al.
      Abstract: Advances in high-throughput sequencing have led to unprecedented growth in the amount of available genome sequencing data, especially for bacterial genomes, which has been accompanied by a challenge for the storage and management of such huge datasets. To facilitate bacterial research and related studies, we have developed the Mypathogen database (MPD), which provides access to users for searching, downloading, storing and sharing bacterial genomics data. The MPD represents the first pathogenic database for microbial genomes and metagenomes, and currently covers pathogenic microbial genomes (6604 genera, 11 071 species, 41 906 strains) and metagenomic data from host, air, water and other sources (28 816 samples). The MPD also functions as a management system for statistical and storage data that can be used by different organizations, thereby facilitating data sharing among different organizations and research groups. A user-friendly local client tool is provided to maintain the steady transmission of big sequencing data. The MPD is a useful tool for analysis and management in genomic research, especially for clinical Centers for Disease Control and epidemiological studies, and is expected to contribute to advancing knowledge on pathogenic bacteria genomes and metagenomes.Database URL:
      PubDate: Thu, 14 Jun 2018 00:00:00 GMT
      DOI: 10.1093/database/bay055
      Issue No: Vol. 2018 (2018)
  • LeptoDB: an integrated database of genomics and proteomics resource of

    • Authors: Beriwal S; Padhiyar N, Bhatt D, et al.
      Abstract: Leptospirosis is a potentially fatal zoo-anthroponosis caused by pathogenic species of Leptospira belonging to the family of Leptospiraceae, with a worldwide distribution and effect, in terms of its burden and risk to human health. The ‘LeptoDB’ is a single window dedicated architecture (5 948 311 entries), modeled using heterogeneous data as a core resource for global Leptospira species. LeptoDB facilitates well-structured knowledge of genomics, proteomics and therapeutic aspects with more than 500 assemblies including 17 complete and 496 draft genomes encoding 1.7 million proteins for 23 Leptospira species with more than 250 serovars comprising pathogenic, intermediate and saprophytic strains. Also, it seeks to be a dynamic compendium for therapeutically essential components such as epitope, primers, CRISPR/Cas9 and putative drug targets. Integration of JBrowse provides elaborated locus centric description of sequence or contig. Jmol for structural visualization of protein structures, MUSCLE for interactive multiple sequence alignment annotation and analysis. The data on genomic islands will definitely provide an understanding of virulence and pathogenicity. Phylogenetics analysis integrated suggests the evolutionary division of strains. Easily accessible on a public web server, we anticipate wide use of this metadata on Leptospira for the development of potential therapeutics.Database URL:
      PubDate: Tue, 12 Jun 2018 00:00:00 GMT
      DOI: 10.1093/database/bay057
      Issue No: Vol. 2018 (2018)
  • ILDgenDB: integrated genetic knowledge resource for interstitial lung
           diseases (ILDs)

    • Authors: Mishra S; Shah M, Sarkar M, et al.
      Abstract: Interstitial lung diseases (ILDs) are a diverse group of ∼200 acute and chronic pulmonary disorders that are characterized by variable amounts of inflammation, fibrosis and architectural distortion with substantial morbidity and mortality. Inaccurate and delayed diagnoses increase the risk, especially in developing countries. Studies have indicated the significant roles of genetic elements in ILDs pathogenesis. Therefore, the first genetic knowledge resource, ILDgenDB, has been developed with an objective to provide ILDs genetic data and their integrated analyses for the better understanding of disease pathogenesis and identification of diagnostics-based biomarkers. This resource contains literature-curated disease candidate genes (DCGs) enriched with various regulatory elements that have been generated using an integrated bioinformatics workflow of databases searches, literature-mining and DCGs–microRNA (miRNAs)–single nucleotide polymorphisms (SNPs) association analyses. To provide statistical significance to disease-gene association, ILD-specificity index and hypergeomatric test scores were also incorporated. Association analyses of miRNAs, SNPs and pathways responsible for the pathogenesis of different sub-classes of ILDs were also incorporated. Manually verified 299 DCGs and their significant associations with 1932 SNPs, 2966 miRNAs and 9170 miR-polymorphisms were also provided. Furthermore, 216 literature-mined and proposed biomarkers were identified. The ILDgenDB resource provides user-friendly browsing and extensive query-based information retrieval systems. Additionally, this resource also facilitates graphical view of predicted DCGs–SNPs/miRNAs and literature associated DCGs–ILDs interactions for each ILD to facilitate efficient data interpretation. Outcomes of analyses suggested the significant involvement of immune system and defense mechanisms in ILDs pathogenesis. This resource may potentially facilitate genetic-based disease monitoring and diagnosis.Database URL:
      PubDate: Sat, 09 Jun 2018 00:00:00 GMT
      DOI: 10.1093/database/bay053
      Issue No: Vol. 2018 (2018)
  • A comparative synteny analysis tool for target-gene SNP marker discovery:
           connecting genomics data to breeding in Solanaceae

    • Authors: Choe J; Kim J, Lee B, et al.
      Abstract: It is necessary for molecular breeders to overcome the difficulties in applying abundant genomic information to crop breeding. Candidate orthologs would be discovered more efficiently in less-studied crops if the information gained from studies of related crops were used. We developed a comparative analysis tool and web-based genome viewer to identify orthologous genes based synteny as well as sequence similarity between tomato, pepper and potato. The tool has a step-by-step interface with multiple viewing levels to support the easy and accurate exploration of functional orthologs. Furthermore, it provides access to single nucleotide-polymorphism markers from the massive genetic resource pool in order to accelerate the development of molecular markers for candidate orthologs in the Solanaceae. This tool provides a bridge between genome data and breeding by supporting effective marker development, data utilization and communication.Database URL:
      PubDate: Sun, 03 Jun 2018 00:00:00 GMT
      DOI: 10.1093/database/bay047
      Issue No: Vol. 2018 (2018)
  • A systematic approach for identifying shared mechanisms in epilepsy and
           its comorbidities

    • Authors: Hoyt C; Domingo-Fernández D, Balzer N, et al.
      Abstract: Cross-sectional epidemiological studies have shown that the incidence of several nervous system diseases is more frequent in epilepsy patients than in the general population. Some comorbidities [e.g. Alzheimer’s disease (AD) and Parkinson’s disease] are also risk factors for the development of seizures; suggesting they may share pathophysiological mechanisms with epilepsy. A literature-based approach was used to identify gene overlap between epilepsy and its comorbidities as a proxy for a shared genetic basis for disease, or genetic pleiotropy, as a first effort to identify shared mechanisms. While the results identified neurological disorders as the group of diseases with the highest gene overlap, this analysis was insufficient for identifying putative common mechanisms shared across epilepsy and its comorbidities. This motivated the use of a dedicated literature mining and knowledge assembly approach in which a cause-and-effect model of epilepsy was captured with Biological Expression Language. After enriching the knowledge assembly with information surrounding epilepsy, its risk factors, its comorbidities, and anti-epileptic drugs, a novel comparative mechanism enrichment approach was used to propose several downstream effectors (including the GABA receptor, GABAergic pathways, etc.) that could explain the therapeutic effects carbamazepine in both the contexts of epilepsy and AD. We have made the Epilepsy Knowledge Assembly available at and queryable through NeuroMMSig at The source code used for analysis and tutorials for reproduction are available on GitHub at
      PubDate: Sun, 03 Jun 2018 00:00:00 GMT
      DOI: 10.1093/database/bay050
      Issue No: Vol. 2018 (2018)
  • SPRENO: a BioC module for identifying organism terms in figure captions

    • Authors: Dai H; Singh O.
      Abstract: Recent advances in biological research reveal that the majority of the experiments strive for comprehensive exploration of the biological system rather than targeting specific biological entities. The qualitative and quantitative findings of the investigations are often exclusively available in the form of figures in published papers. There is no denying that such findings have been instrumental in intensive understanding of biological processes and pathways. However, data as such is unacknowledged by machines as the descriptions in the figure captions comprise of sumptuous information in an ambiguous manner. The abbreviated term ‘SIN’ exemplifies such issue as it may stand for Sindbis virus or the sex-lethal interactor gene (Drosophila melanogaster). To overcome this ambiguity, entities should be identified by linking them to the respective entries in notable biological databases. Among all entity types, the task of identifying species plays a pivotal role in disambiguating related entities in the text. In this study, we present our species identification tool SPRENO (Species Recognition and Normalization), which is established for recognizing organism terms mentioned in figure captions and linking them to the NCBI taxonomy database by exploiting the contextual information from both the figure caption and the corresponding full text. To determine the ID of ambiguous organism mentions, two disambiguation methods have been developed. One is based on the majority rule to select the ID that has been successfully linked to previously mentioned organism terms. The other is a convolutional neural network (CNN) model trained by learning both the context and the distance information of the target organism mention. As a system based on the majority rule, SPRENO was one of the top-ranked systems in the BioCreative VI BioID track and achieved micro F-scores of 0.776 (entity recognition) and 0.755 (entity normalization) on the official test set, respectively. Additionally, the SPRENO-CNN exhibited better precisions with lower recalls and F-scores (0.720/0.711 for entity recognition/normalization). SPRENO is freely available at URL:
      PubDate: Sun, 03 Jun 2018 00:00:00 GMT
      DOI: 10.1093/database/bay048
      Issue No: Vol. 2018 (2018)
  • dbCRSR: a manually curated database for regulation of cancer

    • Authors: Wen P; Xia J, Cao X, et al.
      Abstract: Radiotherapy is used to treat approximately 50% of all cancer patients, with varying prognoses. Intrinsic radiosensitivity is an important factor underlying the radiotherapeutic efficacy of this precise treatment. During the past decades, great efforts have been made to improve radiotherapy treatment through multiple strategies. However, invaluable data remains buried in the extensive radiotherapy literature, making it difficult to obtain an overall view of the detailed mechanisms leading to radiosensitivity, thus limiting advances in radiotherapy. To address this issue, we collected data from the relevant literature contained in the PubMed database and developed a literature-based database that we term the cancer radiosensitivity regulation factors database (dbCRSR). dbCRSR is a manually curated catalogue of radiosensitivity, containing multiple radiosensitivity regulation factors (395 coding genes, 119 non-coding RNAs and 306 chemical compounds) with appropriate annotation. To illustrate the value of the data we collected, data mining was performed including functional annotation and network analysis. In summary, dbCRSR is the first literature-based database to focus on radiosensitivity and provides a resource to better understand the detailed mechanisms of radiosensitivity. We anticipate dbCRSR will be a useful resource to enrich our knowledge and to promote further study of radiosensitivity.Database URL: 8080/dbCRSR/
      PubDate: Wed, 30 May 2018 00:00:00 GMT
      DOI: 10.1093/database/bay049
      Issue No: Vol. 2018 (2018)
  • DEXTER: Disease-Expression Relation Extraction from Text

    • Authors: Gupta S; Dingerdissen H, Ross K, et al.
      Abstract: Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression–disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress.Database URL:
      PubDate: Wed, 30 May 2018 00:00:00 GMT
      DOI: 10.1093/database/bay045
      Issue No: Vol. 2018 (2018)
  • CBD: a biomarker database for colorectal cancer

    • Authors: Zhang X; Sun X, Cao Y, et al.
      Abstract: Colorectal cancer (CRC) biomarker database (CBD) was established based on 870 identified CRC biomarkers and their relevant information from 1115 original articles in PubMed published from 1986 to 2017. In this version of the CBD, CRC biomarker data were collected, sorted, displayed and analysed. The CBD with the credible contents as a powerful and time-saving tool provide more comprehensive and accurate information for further CRC biomarker research. The CBD was constructed under MySQL server. HTML, PHP and JavaScript languages have been used to implement the web interface. The Apache was selected as HTTP server. All of these web operations were implemented under the Windows system. The CBD could provide to users the multiple individual biomarker information and categorized into the biological category, source and application of biomarkers; the experiment methods, results, authors and publication resources; the research region, the average age of cohort, gender, race, the number of tumours, tumour location and stage. We only collect data from the articles with clear and credible results to prove the biomarkers are useful in the diagnosis, treatment or prognosis of CRC. The CBD can also provide a professional platform to researchers who are interested in CRC research to communicate, exchange their research ideas and further design high-quality research in CRC. They can submit their new findings to our database via the submission page and communicate with us in the CBD.Database URL:
      PubDate: Sat, 26 May 2018 00:00:00 GMT
      DOI: 10.1093/database/bay046
      Issue No: Vol. 2018 (2018)
  • LnChrom: a resource of experimentally validated lncRNA–chromatin
           interactions in human and mouse

    • Authors: Yu F; Zhang G, Shi A, et al.
      Abstract: Long non-coding RNAs (lncRNAs) constitute an important layer of chromatin regulation that contributes to various biological processes and diseases. By interacting with chromatin, many lncRNAs can regulate that state of chromatin by recruiting chromatin-modifying complexes and thus control large-scale gene expression programs. However, the available information on interactions between lncRNAs and chromatin is hidden in a large amount of dispersed literature and has not been extensively collected. We established the LnChrom database, a manually curated resource of experimentally validated lncRNA–chromatin interactions. The current release of LnChrom includes 382 743 interactions in human and mouse. We also manually collected detailed metadata for each interaction pair, including those of chromatin modifying factors, epigenetic marks and disease associations. LnChrom provides a user-friendly interface to facilitate browsing, searching and retrieving of lncRNA–chromatin interaction data. Additionally, a large amount of multi-omics data was integrated into LnChrom to aid in characterizing the effects of lncRNA–chromatin interactions on epigenetic modifications and transcriptional expression. We believe that LnChrom is a timely and valuable resource that can greatly motivate mechanistic research into lncRNAs.Database URL:
      PubDate: Fri, 18 May 2018 00:00:00 GMT
      DOI: 10.1093/database/bay039
      Issue No: Vol. 2018 (2018)
  • SolCyc: a database hub at the Sol Genomics Network (SGN) for the manual
           curation of metabolic networks in Solanum and Nicotiana specific databases

    • Authors: Foerster H; Bombarely A, Battey J, et al.
      Abstract: SolCyc is the entry portal to pathway/genome databases (PGDBs) for major species of the Solanaceae family hosted at the Sol Genomics Network. Currently, SolCyc comprises six organism-specific PGDBs for tomato, potato, pepper, petunia, tobacco and one Rubiaceae, coffee. The metabolic networks of those PGDBs have been computationally predicted by the pathologic component of the pathway tools software using the manually curated multi-domain database MetaCyc ( as reference. SolCyc has been recently extended by taxon-specific databases, i.e. the family-specific SolanaCyc database, containing only curated data pertinent to species of the nightshade family, and NicotianaCyc, a genus-specific database that stores all relevant metabolic data of the Nicotiana genus. Through manual curation of the published literature, new metabolic pathways have been created in those databases, which are complemented by the continuously updated, relevant species-specific pathways from MetaCyc. At present, SolanaCyc comprises 199 pathways and 29 superpathways and NicotianaCyc accounts for 72 pathways and 13 superpathways. Curator-maintained, taxon-specific databases such as SolanaCyc and NicotianaCyc are characterized by an enrichment of data specific to these taxa and free of falsely predicted pathways. Both databases have been used to update recently created Nicotiana-specific databases for Nicotiana tabacum, Nicotiana benthamiana, Nicotiana sylvestris and Nicotiana tomentosiformis by propagating verifiable data into those PGDBs. In addition, in-depth curation of the pathways in N.tabacum has been carried out which resulted in the elimination of 156 pathways from the 569 pathways predicted by pathway tools. Together, in-depth curation of the predicted pathway network and the supplementation with curated data from taxon-specific databases has substantially improved the curation status of the species–specific N.tabacum PGDB. The implementation of this strategy will significantly advance the curation status of all organism-specific databases in SolCyc resulting in the improvement on database accuracy, data analysis and visualization of biochemical networks in those species.Database URL
      PubDate: Thu, 10 May 2018 00:00:00 GMT
      DOI: 10.1093/database/bay035
      Issue No: Vol. 2018 (2018)
  • CircR2Disease: a manually curated database for experimentally supported
           circular RNAs associated with various diseases

    • Authors: Fan C; Lei X, Fang Z, et al.
      Abstract: CircR2Disease is a manually curated database, which provides a comprehensive resource for circRNA deregulation in various diseases. Increasing evidences have shown that circRNAs play critical roles in transcriptional, post-transcriptional and translational regulation. Therefore, the aberrant expression of circRNAs has been associated with a group of diseases. It is significant to develop a high-quality database to deposit the deregulated circRNAs in diseases. The current version of CircR2Disease contains 725 associations between 661 circRNAs and 100 diseases by reviewing existing literatures. Each entry in the CircR2Disease contains detailed information for the circRNA–disease relationship, including circRNA name, coordinates and gene symbol, disease name, expression patterns of circRNA, experimental techniques, a brief description of the circRNA–disease relationship, year of publication and the PubMed ID. CircR2Disease provides a user-friendly interface to browse, search and download as well as to submit novel disease-related circRNAs. CircR2Disease could be very beneficial for researches to investigate the mechanism of disease-related circRNAs and explore the appropriate algorithms for predicting novel associations.Database URL:
      PubDate: Fri, 04 May 2018 00:00:00 GMT
      DOI: 10.1093/database/bay044
      Issue No: Vol. 2018 (2018)
  • GEMiCCL: mining genotype and expression data of cancer cell lines with
           elaborate visualization

    • Authors: Jeong I; Yu N, Jang I, et al.
      Abstract: Cancer cell lines are essential components for biomedical research. However, proper choice of cell lines for experimental purposes is often difficult because genotype and/or expression data are missing or scattered in diverse resources. Here, we report Gene Expression and Mutations in Cancer Cell Lines (GEMiCCL), an online database of human cancer cell lines that provides genotype and expression information. We have collected mutation, gene expression and copy number variation (CNV) data from three representative databases on cell lines—Cancer Cell Line Encyclopedia , Catalogue of Somatic Mutations in Cancer and NCI60. In total, GEMiCCL includes 1406 cell lines from 185 cancer types and 29 tissues. Gene expression, mutation and CNV information are available for 1304, 1334 and 1365 cell lines, respectively. We removed batch effects due to different microarray platforms using the ComBat software and re-processed the entire gene expression and SNP chip data. Cell line names and clinical information were standardized using Cellosaurus from ExPASy. Our user interface supports cell line search, gene search, browsing for specific molecular characteristics and complex queries-based on Boolean logic rules. We also implemented many interactive features and user-friendly visualizations. Providing molecular characteristics and clinical information, we believe that GEMiCCL would be a valuable resource for biomedical research for functional or screening studies.Database URL: GEMiCCL is available at
      PubDate: Wed, 02 May 2018 00:00:00 GMT
      DOI: 10.1093/database/bay041
      Issue No: Vol. 2018 (2018)
  • AbDb: antibody structure database—a database of PDB-derived antibody

    • Authors: Ferdous S; Martin A.
      Abstract: In order to analyse structures of proteins of a particular class, these need to be extracted from Protein Data Bank (PDB) files. In the case of antibodies, there are a number of special considerations: (i) identifying antibodies in the PDB is not trivial, (ii) they may be crystallized with or without antigen, (iii) for analysis purposes, one is normally only interested in the Fv region of the antibody, (iv) structural analysis of epitopes, in particular, requires individual antibody–antigen complexes from a PDB file which may contain multiple copies of the same, or different, antibodies and (v) standard numbering schemes should be applied. Consequently, there is a need for a specialist resource containing pre-numbered non-redundant antibody Fv structures with their cognate antigens. We have created an automatically updated resource, AbDb, which collects the Fv regions from antibody structures using information from our SACS database which summarizes antibody structures from the PDB. PDB files containing multiple structures are split and numbered and each antibody structure is associated with its antigen where available. Antibody structures with only light or heavy chains have also been processed and sequences of antibodies are compared to identify multiple structures of the same antibody. The data may be queried on the basis of PDB code, or the name or species of the antibody or antigen, and the complete datasets may be downloaded.Database URL:
      PubDate: Fri, 27 Apr 2018 00:00:00 GMT
      DOI: 10.1093/database/bay040
      Issue No: Vol. 2018 (2018)
  • PVCbase: an integrated web resource for the PVC bacterial proteomes

    • Authors: Bordin N; González-Sánchez J, Devos D.
      Abstract: Interest in the Planctomycetes-Verrucomicrobia-Chlamydiae (PVC) bacterial superphylum is growing within the microbiology community. These organisms do not have a specialized web resource that gathers in silico predictions in an integrated fashion. Hence, we are providing the PVC community with PVCbase, a specialized web resource that gathers in silico predictions in an integrated fashion. PVCbase integrates protein function annotations obtained through sequence analysis and tertiary structure prediction for 39 representative PVC proteomes (PVCdb), a protein feature visualizer (Foundation) and a custom BLAST webserver (PVCBlast) that allows to retrieve the annotation of a hit directly from the DataTables. We display results from various predictors, encompassing most functional aspects, allowing users to have a more comprehensive overview of protein identities. Additionally, we illustrate how the application of PVCdb can be used to address biological questions from raw data.Database URL: PVCbase is freely accessible at
      PubDate: Tue, 24 Apr 2018 00:00:00 GMT
      DOI: 10.1093/database/bay042
      Issue No: Vol. 2018 (2018)
  • SNPversity: a web-based tool for visualizing diversity

    • Authors: Schott D; Vinnakota A, Portwood J, II, et al.
      Abstract: Many stand-alone desktop software suites exist to visualize single nucleotide polymorphism (SNP) diversity, but web-based software that can be easily implemented and used for biological databases is absent. SNPversity was created to answer this need by building an open-source visualization tool that can be implemented on a Unix-like machine and served through a web browser that can be accessible worldwide. SNPversity consists of a HDF5 database back-end for SNPs, a data exchange layer powered by TASSEL libraries that represent data in JSON format, and an interface layer using PHP to visualize SNP information. SNPversity displays data in real-time through a web browser in grids that are color-coded according to a given SNP’s allelic status and mutational state. SNPversity is currently available at MaizeGDB, the maize community’s database, and will be soon available at GrainGenes, the clade-oriented database for Triticeae and Avena species, including wheat, barley, rye, and oat. The code and documentation are uploaded onto github, and they are freely available to the public. We expect that the tool will be highly useful for other biological databases with a similar need to display SNP diversity through their web interfaces.Database URL:
      PubDate: Fri, 20 Apr 2018 00:00:00 GMT
      DOI: 10.1093/database/bay037
      Issue No: Vol. 2018 (2018)
  • StraPep: a structure database of bioactive peptides

    • Authors: Wang J; Yin T, Xiao X, et al.
      Abstract: Bioactive peptides, with a variety of biological activities and wide distribution in nature, have attracted great research interest in biological and medical fields, especially in pharmaceutical industry. The structural information of bioactive peptide is important for the development of peptide-based drugs. Many databases have been developed cataloguing bioactive peptides. However, to our knowledge, database dedicated to collect all the bioactive peptides with known structure is not available yet. Thus, we developed StraPep, a structure database of bioactive peptides. StraPep holds 3791 bioactive peptide structures, which belong to 1312 unique bioactive peptide sequences. About 905 out of 1312 (68%) bioactive peptides in StraPep contain disulfide bonds, which is significantly higher than that (21%) of PDB. Interestingly, 150 out of 616 (24%) bioactive peptides with three or more disulfide bonds form a structural motif known as cystine knot, which confers considerable structural stability on proteins and is an attractive scaffold for drug design. Detailed information of each peptide, including the experimental structure, the location of disulfide bonds, secondary structure, classification, post-translational modification and so on, has been provided. A wide range of user-friendly tools, such as browsing, sequence and structure-based searching and so on, has been incorporated into StraPep. We hope that this database will be helpful for the research community.Database URL:
      PubDate: Mon, 16 Apr 2018 00:00:00 GMT
      DOI: 10.1093/database/bay038
      Issue No: Vol. 2018 (2018)
  • Maser: one-stop platform for NGS big data from analysis to visualization

    • Authors: Kinjo S; Monma N, Misu S, et al.
      Abstract: A major challenge in analyzing the data from high-throughput next-generation sequencing (NGS) is how to handle the huge amounts of data and variety of NGS tools and visualize the resultant outputs. To address these issues, we developed a cloud-based data analysis platform, Maser (Management and Analysis System for Enormous Reads), and an original genome browser, Genome Explorer (GE). Maser enables users to manage up to 2 terabytes of data to conduct analyses with easy graphical user interface operations and offers analysis pipelines in which several individual tools are combined as a single pipeline for very common and standard analyses. GE automatically visualizes genome assembly and mapping results output from Maser pipelines, without requiring additional data upload. With this function, the Maser pipelines can graphically display the results output from all the embedded tools and mapping results in a web browser. Therefore Maser realized a more user-friendly analysis platform especially for beginners by improving graphical display and providing the selected standard pipelines that work with built-in genome browser. In addition, all the analyses executed on Maser are recorded in the analysis history, helping users to trace and repeat the analyses. The entire process of analysis and its histories can be shared with collaborators or opened to the public. In conclusion, our system is useful for managing, analyzing, and visualizing NGS data and achieves traceability, reproducibility, and transparency of NGS analysis.Database URL:
      PubDate: Fri, 13 Apr 2018 00:00:00 GMT
      DOI: 10.1093/database/bay027
      Issue No: Vol. 2018 (2018)
  • The tragedy of the biodiversity data commons: a data impediment creeping

    • Authors: Escribano N; Galicia D, Ariño A.
      Abstract: Researchers are embracing the open access movement to facilitate unrestricted availability of scientific results. One sign of this willingness is the steady increase in data freely shared online, which has prompted a corresponding increase in the number of papers using such data. Publishing datasets is a time-consuming process that is often seen as a courtesy, rather than a necessary step in the research process. Making data accessible allows further research, provides basic information for decision-making and contributes to transparency in science. Nevertheless, the ease of access to heaps of data carries a perception of ‘free lunch for all’, and the work of data publishers is largely going unnoticed. Acknowledging such a significant effort involving the creation, management and publication of a dataset remains a flimsy, not well established practice in the scientific community. In a meta-analysis of published literature, we have observed various dataset citation practices, but mostly (92%) consisting of merely citing the data repository rather than the data publisher. Failing to recognize the work of data publishers might lead to a decrease in the number of quality datasets shared online, compromising potential research that is dependent on the availability of such data. We make an urgent appeal to raise awareness about this issue.
      PubDate: Mon, 09 Apr 2018 00:00:00 GMT
      DOI: 10.1093/database/bay033
      Issue No: Vol. 2018 (2018)
  • Signalling maps in cancer research: construction and data analysis

    • Authors: Kondratova M; Sompairac N, Barillot E, et al.
      Abstract: Generation and usage of high-quality molecular signalling network maps can be augmented by standardizing notations, establishing curation workflows and application of computational biology methods to exploit the knowledge contained in the maps. In this manuscript, we summarize the major aims and challenges of assembling information in the form of comprehensive maps of molecular interactions. Mainly, we share our experience gained while creating the Atlas of Cancer Signalling Network. In the step-by-step procedure, we describe the map construction process and suggest solutions for map complexity management by introducing a hierarchical modular map structure. In addition, we describe the NaviCell platform, a computational technology using Google Maps API to explore comprehensive molecular maps similar to geographical maps and explain the advantages of semantic zooming principles for map navigation. We also provide the outline to prepare signalling network maps for navigation using the NaviCell platform. Finally, several examples of cancer high-throughput data analysis and visualization in the context of comprehensive signalling maps are presented.
      PubDate: Mon, 09 Apr 2018 00:00:00 GMT
      DOI: 10.1093/database/bay036
  • An entropy-reducing data representation approach for bioinformatic data

    • Authors: McCulloch A; Jauregui R, Maclean P, et al.
      Abstract: Non-semantic approaches to bioinformatic data analysis have potential relevance where semantic resources such as annotated finished reference genomes are lacking, such as in the analysis and utilisation of growing amounts of sequence data from non-model organisms, often associated with sequence-based agricultural, aqua-cultural and environmental sampling studies and commercial services. Even where rich semantic resources are available, semantic approaches to problems such as contrasting and comparing reference assemblies, and utilising multiple references in parallel to avoid reference bias, are costly and difficult to fully automate. We introduce and discuss a non-semantic data representation approach intended mainly for bioinformatic data called non-semantic labelling. Non-semantic labelling involves tensorially combining multiple kinds of model-based entropy-reducing data representation, with multiple representation models, so as to map both data and models into dual metric representation spaces, with goals of both reducing the statistical complexity of the data, and highlighting latent structure via machine learning and statistical analyses conducted within the dual representation spaces. As part of the framework, we introduce a novel algebraic abstraction of data representation mappings, and present four proof-of-concept examples of its application, to problems such as comparing and contrasting sequence assemblies, utilisation of multiple references for annotation and development of quality control diagnostics in a variety of high-throughput sequencing contexts.Database URL:
      PubDate: Thu, 05 Apr 2018 00:00:00 GMT
      DOI: 10.1093/database/bay029
  • Expert curation for building network-based dynamical models: a case study
           on atherosclerotic plaque formation

    • Authors: Bekkar A; Estreicher A, Niknejad A, et al.
      Abstract: Knowledgebases play an increasingly important role in scientific research, where the expert curation of biological knowledge in forms that are amenable to computational analysis (using ontologies for example)–provides a significant added value and enables new types of computational analyses for high throughput datasets. In this work, we demonstrate how expert curation can also play a more direct role in research, by supporting the use of network-based dynamical models to study a specific biological process. This curation effort is focused on the regulatory interactions between biological entities, such as genes or proteins and compounds, which may interact with each other in a complex manner, including regulatory complexes and conditional dependencies between co-regulators. This critical information has to be captured and encoded in a computable manner, which is currently far beyond the current capabilities of automatically constructed network. As a case study, we report here the prior knowledge network constructed by the sysVASC consortium to model the biological events leading to the formation of atherosclerotic plaques, during the onset of cardiovascular disease and discuss some specific examples to illustrate the main pitfalls and added value provided by the expert curation during this endeavor.Database URL:
      PubDate: Wed, 04 Apr 2018 00:00:00 GMT
      DOI: 10.1093/database/bay031
  • A tutorial of diverse genome analysis tools found in the CoGe web-platform
           using Plasmodium spp. as a model

    • Authors: Castillo A; Nelson A, Haug-Baltzell A, et al.
      Abstract: Integrated platforms for storage, management, analysis and sharing of large quantities of omics data have become fundamental to comparative genomics. CoGe ( is an online platform designed to manage and study genomic data, enabling both data- and hypothesis-driven comparative genomics. CoGe’s tools and resources can be used to organize and analyse both publicly available and private genomic data from any species. Here, we demonstrate the capabilities of CoGe through three example workflows using 17 Plasmodium genomes as a model. Plasmodium genomes present unique challenges for comparative genomics due to their rapidly evolving and highly variable genomic AT/GC content. These example workflows are intended to serve as templates to help guide researchers who would like to use CoGe to examine diverse aspects of genome evolution. In the first workflow, trends in genome composition and amino acid usage are explored. In the second, changes in genome structure and the distribution of synonymous (Ks) and non-synonymous (Kn) substitution values are evaluated across species with different levels of evolutionary relatedness. In the third workflow, microsyntenic analyses of multigene families’ genomic organization are conducted using two Plasmodium-specific gene families—serine repeat antigen, and cytoadherence-linked asexual gene—as models. In general, these example workflows show how to achieve quick, reproducible and shareable results using the CoGe platform. We were able to replicate previously published results, as well as leverage CoGe’s tools and resources to gain additional insight into various aspects of Plasmodium genome evolution. Our results highlight the usefulness of the CoGe platform, particularly in understanding complex features of genome evolution.Database URL:
      PubDate: Tue, 03 Apr 2018 00:00:00 GMT
      DOI: 10.1093/database/bay030
  • dbAMEPNI: a database of alanine mutagenic effects for
           protein–nucleic acid interactions

    • Authors: Liu L; Xiong Y, Gao H, et al.
      Abstract: Protein–nucleic acid interactions play essential roles in various biological activities such as gene regulation, transcription, DNA repair and DNA packaging. Understanding the effects of amino acid substitutions on protein–nucleic acid binding affinities can help elucidate the molecular mechanism of protein–nucleic acid recognition. Until now, no comprehensive and updated database of quantitative binding data on alanine mutagenic effects for protein–nucleic acid interactions is publicly accessible. Thus, we developed a new database of Alanine Mutagenic Effects for Protein-Nucleic Acid Interactions (dbAMEPNI). dbAMEPNI is a manually curated, literature-derived database, comprising over 577 alanine mutagenic data with experimentally determined binding affinities for protein–nucleic acid complexes. It contains several important parameters, such as dissociation constant (Kd), Gibbs free energy change (ΔΔG), experimental conditions and structural parameters of mutant residues. In addition, the database provides an extended dataset of 282 single alanine mutations with only qualitative data (or descriptive effects) of thermodynamic information.Database URL:
      PubDate: Mon, 02 Apr 2018 00:00:00 GMT
      DOI: 10.1093/database/bay034
  • Probabilistic and machine learning-based retrieval approaches for
           biomedical dataset retrieval

    • Authors: Karisani P; Qin Z, Agichtein E.
      Abstract: The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system.Database URL:
      PubDate: Wed, 28 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bax104
  • Biopanning data bank 2018: hugging next generation phage display

    • Authors: He B; Jiang L, Duan Y, et al.
      Abstract: The 2018 update of the biopanning data bank (BDB) stores phage display data sequenced by Sanger sequencing and next generation sequencing technologies. In this work, we upgraded the database with more biopanning data sets and several new features, including (i) incorporation of next generation biopanning data and the unselected population where the target is not determined and the round of screening is zero; (ii) addition of sequencing information; (iii) improvement of browsing and searching systems and 3 D chemical structure viewer; (iv) integration of standalone tools for target-unrelated peptides analysis within conventional phage display and next generation phage display (NGPD) data. In the current version of BDB (released on 19 January 2018), the database houses 3291 sets of biopanning data collected from 1540 published articles, including 95 NGPD data sets and 3196 traditional biopanning data sets. The BDB database serves as an important and comprehensive resource for developing peptide ligands.Database URL: The BDB database is available at
      PubDate: Tue, 27 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bay032
  • CITGeneDB: a comprehensive database of human and mouse genes enhancing or
           suppressing cold-induced thermogenesis validated by perturbation
           experiments in mice

    • Authors: Li J; Deng S, Wei G, et al.
      Abstract: Cold-induced thermogenesis increases energy expenditure and can reduce body weight in mammals, so the genes involved in it are thought to be potential therapeutic targets for treating obesity and diabetes. In the quest for more effective therapies, a great deal of research has been conducted to elucidate the regulatory mechanism of cold-induced thermogenesis. Over the last decade, a large number of genes that can enhance or suppress cold-induced thermogenesis have been discovered, but a comprehensive list of these genes is lacking. To fill this gap, we examined all of the annotated human and mouse genes and curated those demonstrated to enhance or suppress cold-induced thermogenesis by in vivo or ex vivo experiments in mice. The results of this highly accurate and comprehensive annotation are hosted on a database called CITGeneDB, which includes a searchable web interface to facilitate broad public use. The database will be updated as new genes are found to enhance or suppress cold-induced thermogenesis. It is expected that CITGeneDB will be a valuable resource in future explorations of the molecular mechanism of cold-induced thermogenesis, helping pave the way for new obesity and diabetes treatments.Database URL:
      PubDate: Fri, 23 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bay012
  • GEOMetaCuration: a web-based application for accurate manual curation of
           Gene Expression Omnibus metadata

    • Authors: Li Z; Li J, Yu P.
      Abstract: Metadata curation has become increasingly important for biological discovery and biomedical research because a large amount of heterogeneous biological data is currently freely available. To facilitate efficient metadata curation, we developed an easy-to-use web-based curation application, GEOMetaCuration, for curating the metadata of Gene Expression Omnibus datasets. It can eliminate mechanical operations that consume precious curation time and can help coordinate curation efforts among multiple curators. It improves the curation process by introducing various features that are critical to metadata curation, such as a back-end curation management system and a curator-friendly front-end. The application is based on a commonly used web development framework of Python/Django and is open-sourced under the GNU General Public License V3. GEOMetaCuration is expected to benefit the biocuration community and to contribute to computational generation of biological insights using large-scale biological data. An example use case can be found at the demo website: URL:
      PubDate: Fri, 23 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bay019
  • Improved ontology-based similarity calculations using a study-wise
           annotation model

    • Authors: Köhler S.
      Abstract: A typical use case of ontologies is the calculation of similarity scores between items that are annotated with classes of the ontology. For example, in differential diagnostics and disease gene prioritization, the human phenotype ontology (HPO) is often used to compare a query phenotype profile against gold-standard phenotype profiles of diseases or genes. The latter have long been constructed as flat lists of ontology classes, which, as we show in this work, can be improved by exploiting existing structure and information in annotation datasets or full text disease descriptions. We derive a study-wise annotation model of diseases and genes and show that this can improve the performance of semantic similarity measures. Inferred weights of individual annotations are one reason for this improvement, but more importantly using the study-wise structure further boosts the results of the algorithms according to precision-recall analyses. We test the study-wise annotation model for diseases annotated with classes from the HPO and for genes annotated with gene ontology (GO) classes. We incorporate this annotation model into similarity algorithms and show how this leads to improved performance. This work adds weight to the need for enhancing simple list-based representations of disease or gene annotations. We show how study-wise annotations can be automatically derived from full text summaries of disease descriptions and from the annotation data provided by the GO Consortium and how semantic similarity measure can utilize this extended annotation model.Database URL:
      PubDate: Fri, 23 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bay026
  • TISSUES 2.0: an integrative web resource on mammalian tissue expression

    • Authors: Palasca O; Santos A, Stolte C, et al.
      Abstract: Database (2018), doi: 10.1093/database/bay003
      PubDate: Fri, 16 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bay028
  • Finding relevant biomedical datasets: the UC San Diego solution for the
           bioCADDIE Retrieval Challenge

    • Authors: Wei W; Ji Z, He Y, et al.
      Abstract: The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets in known repositories, and they typically do not find new repositories. The biomedical and healthcare data discovery index ecosystem (bioCADDIE) team organized a challenge to solicit new indexing and searching strategies for retrieving biomedical datasets across repositories. We describe the work of one team that built a retrieval pipeline and examined its performance. The pipeline used online resources to supplement dataset metadata, automatically generated queries from users’ free-text questions, produced high-quality retrieval results and achieved the highest inferred Normalized Discounted Cumulative Gain among competitors. The results showed that it is a promising solution for cross-database, cross-domain and cross-repository biomedical dataset retrieval.Database URL:
      PubDate: Fri, 16 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bay017
  • PvaxDB: a comprehensive structural repository of Plasmodium vivax proteome

    • Authors: Singh A; Kaushik R, Kuntal H, et al.
      Abstract: The severity of malaria caused by Plasmodium vivax worldwide and its resistance against the available general antimalarial drugs has created an urgent need for a comprehensive insight into its biology and biochemistry for developing some novel potential vaccines and therapeutics. P.vivax comprises 5392 proteins mostly predicted, out of which 4211 are soluble proteins and 2205 of these belong to blood and liver stages of malarial cycle. Presently available public resources report functional annotation (gene ontology) of only 28% (627 proteins) of the enzymatic soluble proteins and experimental structures are determined for only 42 proteins P. vivax proteome. In this milieu of severe paucity of structural and functional data, we have generated structures of 2205 soluble proteins, validated them thoroughly, identified their binding pockets (including active sites) and annotated their function increasing the coverage from the existing 28% to 100%. We have pooled all this information together and created a database christened as PvaxDB, which furnishes extensive sequence, structure, ligand binding site and functional information. We believe PvaxDB could be helpful in identifying novel protein drug targets, expediting development of new drugs to combat malaria. This is also the first attempt to create a reliable comprehensive computational structural repository of all the soluble proteins of P. vivax.Database URL:
      PubDate: Wed, 14 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bay021
  • Baseline and extensions approach to information retrieval of complex
           medical data: Poznan's approach to the bioCADDIE 2016

    • Authors: Cieslewicz A; Dutkiewicz J, Jedrzejek C.
      Abstract: Information retrieval from biomedical repositories has become a challenging task because of their increasing size and complexity. To facilitate the research aimed at improving the search for relevant documents, various information retrieval challenges have been launched. In this article, we present the improved medical information retrieval systems designed by Poznan University of Technology and Poznan University of Medical Sciences as a contribution to the bioCADDIE 2016 challenge—a task focusing on information retrieval from a collection of 794 992 datasets generated from 20 biomedical repositories. The system developed by our team utilizes the Terrier 4.2 search platform enhanced by a query expansion method using word embeddings. This approach, after post-challenge modifications and improvements (with particular regard to assigning proper weights for original and expanded terms), allowed us achieving the second best infNDCG measure (0.4539) compared with the challenge results and infAP 0.3978. This demonstrates that proper utilization of word embeddings can be a valuable addition to the information retrieval process. Some analysis is provided on related work involving other bioCADDIE contributions. We discuss the possibility of improving our results by using better word embedding schemes to find candidates for query expansion.Database URL:
      PubDate: Mon, 12 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bax103
  • SPTEdb: a database for transposable elements in salicaceous plants

    • Authors: Yi F; Jia Z, Xiao Y, et al.
      Abstract: Although transposable elements (TEs) play significant roles in structural, functional and evolutionary dynamics of the salicaceous plants genome and the accurate identification, definition and classification of TEs are still inadequate. In this study, we identified 18 393 TEs from Populus trichocarpa, Populus euphratica and Salix suchowensis using a combination of signature-based, similarity-based and De novo method, and annotated them into 1621 families. A comprehensive and user-friendly web-based database, SPTEdb, was constructed and served for researchers. SPTEdb enables users to browse, retrieve and download the TEs sequences from the database. Meanwhile, several analysis tools, including BLAST, HMMER, GetORF and Cut sequence, were also integrated into SPTEdb to help users to mine the TEs data easily and effectively. In summary, SPTEdb will facilitate the study of TEs biology and functional genomics in salicaceous plants.Database URL:
      PubDate: Fri, 09 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bay024
  • YummyData: providing high-quality open life science data

    • Authors: Yamamoto Y; Yamaguchi A, Splendiani A.
      Abstract: Many life science datasets are now available via Linked Data technologies, meaning that they are represented in a common format (the Resource Description Framework), and are accessible via standard APIs (SPARQL endpoints). While this is an important step toward developing an interoperable bioinformatics data landscape, it also creates a new set of obstacles, as it is often difficult for researchers to find the datasets they need. Different providers frequently offer the same datasets, with different levels of support: as well as having more or less up-to-date data, some providers add metadata to describe the content, structures, and ontologies of the stored datasets while others do not. We currently lack a place where researchers can go to easily assess datasets from different providers in terms of metrics such as service stability or metadata richness. We also lack a space for collecting feedback and improving data providers’ awareness of user needs. To address this issue, we have developed YummyData, which consists of two components. One periodically polls a curated list of SPARQL endpoints, monitoring the states of their Linked Data implementations and content. The other presents the information measured for the endpoints and provides a forum for discussion and feedback. YummyData is designed to improve the findability and reusability of life science datasets provided as Linked Data and to foster its adoption. It is freely accessible at URL:
      PubDate: Fri, 09 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bay022
  • The SNPcurator: literature mining of enriched SNP-disease associations

    • Authors: Tawfik N; Spruit M.
      Abstract: The uniqueness of each human genetic structure motivated the shift from the current practice of medicine to a more tailored one. This personalized medicine revolution would not be possible today without the genetics data collected from genome-wide association studies (GWASs) that investigate the relation between different phenotypic traits and single-nucleotide polymorphisms (SNPs). The huge increase in the literature publication space imposes a challenge on the conventional manual curation process which is becoming more and more expensive. This research aims at automatically extracting SNP associations of any given disease and its reported statistical significance (P-value) and odd ratio as well as cohort information such as size and ethnicity. Our evaluation illustrates that SNPcurator was able to replicate a large number of SNP-disease associations that were also reported in the NHGRI-EBI Catalog of published GWASs. SNPcurator was also tested by eight external genetics experts, who queried the system to examine diseases of their choice, and was found to be efficient and satisfactory. We conclude that the text-mining-based system has a great potential for helping researchers and scientists, especially in their preliminary genetics research. SNPcurator is publicly available at URL:
      PubDate: Thu, 08 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bay020
  • NDDVD: an integrated and manually curated Neurodegenerative Diseases
           Variation Database

    • Authors: Yang Y; Xu C, Liu X, et al.
      Abstract: Neurodegenerative diseases (NDDs) are associated with genetic variations including point substitutions, copy number alterations, insertions and deletions. At present, a few genetic variation repositories for some individual NDDs have been created, however, these databases are needed to be integrated and expanded to all the NDDs for systems biological investigation. We here build a relational database termed as NDDVD to integrate all the variations of NDDs using Leiden Open Variation Database (LOVD) platform. The items in the NDDVD are collected manually from PubMed or extracted from the existed variation databases. The cross-disease database includes over 6374 genetic variations of 289 genes associated with 37 different NDDs. The patterns, conservations and biological functions for variations in different NDDs are statistically compared and a user-friendly interface is provided for NDDVD at:
      PubDate: Mon, 05 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bay018
  • Micropublication: incentivizing community curation and placing unpublished
           data into the public domain

    • Authors: Raciti D; Yook K, Harris T, et al.
      Abstract: Large volumes of data generated by research laboratories coupled with the required effort and cost of curation present a significant barrier to inclusion of these data in authoritative community databases. Further, many publicly funded experimental observations remain invisible to curation simply because they are never published: results often do not fit within the scope of a standard publication; trainee-generated data are forgotten when the experimenter (e.g. student, post-doc) leaves the lab; results are omitted from science narratives due to publication bias where certain results are considered irrelevant for the publication. While authors are in the best position to curate their own data, they face a steep learning curve to ensure that appropriate referential tags, metadata, and ontologies are applied correctly to their observations, a task sometimes considered beyond the scope of their research and other numerous responsibilities. Getting researchers to adopt a new system of data reporting and curation requires a fundamental change in behavior among all members of the research community. To solve these challenges, we have created a novel scholarly communication platform that captures data from researchers and directly delivers them to information resources via Micropublication. This platform incentivizes authors to publish their unpublished observations along with associated metadata by providing a deliberately fast and lightweight but still peer-reviewed process that results in a citable publication. Our long-term goal is to develop a data ecosystem that improves reproducibility and accountability of publicly funded research and in turn accelerates both basic and translational discovery.Database URL:
      PubDate: Fri, 02 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bay013
  • BioDataome: a collection of uniformly preprocessed and automatically
           annotated datasets for data-driven biology

    • Authors: Lakiotaki K; Vorniotakis N, Tsagris M, et al.
      Abstract: Biotechnology revolution generates a plethora of omics data with an exponential growth pace. Therefore, biological data mining demands automatic, ‘high quality’ curation efforts to organize biomedical knowledge into online databases. BioDataome is a database of uniformly preprocessed and disease-annotated omics data with the aim to promote and accelerate the reuse of public data. We followed the same preprocessing pipeline for each biological mart (microarray gene expression, RNA-Seq gene expression and DNA methylation) to produce ready for downstream analysis datasets and automatically annotated them with disease-ontology terms. We also designate datasets that share common samples and automatically discover control samples in case-control studies. Currently, BioDataome includes ∼5600 datasets, ∼260 000 samples spanning ∼500 diseases and can be easily used in large-scale massive experiments and meta-analysis. All datasets are publicly available for querying and downloading via BioDataome web application. We demonstrate BioDataome’s utility by presenting exploratory data analysis examples. We have also developed BioDataome R package found in: URL:
      PubDate: Fri, 02 Mar 2018 00:00:00 GMT
      DOI: 10.1093/database/bay011
  • AntiTbPdb: a knowledgebase of anti-tubercular peptides

    • Authors: Usmani S; Kumar R, Kumar V, et al.
      Abstract: Tuberculosis is a global menace, caused by Mycobacterium tuberculosis, responsible for millions of premature deaths every year. In the era of drug-resistant tuberculosis, peptide-based therapeutics may provide alternate to small molecule based drugs. In order to create knowledgebase, AntiTbPdb (, experimentally validated anti-tubercular and anti-mycobacterial peptides were compiled from literature. We curate 10 652 research articles and 35 patents to extract anti-tubercular peptides and annotate these peptides manually. This knowledgebase has 1010 entries, each entry provides extensive information about an anti-tubercular peptide such as sequence, chemical modification, chirality, nature and source of origin. The tertiary structure of these anti-tubercular peptides containing natural as well as chemically modified residues was predicted using PEPstrMOD and I-TASSER. In addition to structural information, database maintains other properties of peptides like physiochemical properties. Numerous web-based tools have been integrated for data retrieval, browsing, sequence similarity search and peptide mapping. In order to assist wide range of user, we developed a responsive website suitable for smartphone, tablet and desktop.Database URL:
      PubDate: Wed, 28 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay025
  • miRwayDB: a database for experimentally validated microRNA-pathway
           associations in pathophysiological conditions

    • Authors: Das S; Saha P, Chakravorty N.
      Abstract: MicroRNAs (miRNAs) are well-known as key regulators of diverse biological pathways. A series of experimental evidences have shown that abnormal miRNA expression profiles are responsible for various pathophysiological conditions by modulating genes in disease associated pathways. In spite of the rapid increase in research data confirming such associations, scientists still do not have access to a consolidated database offering these miRNA-pathway association details for critical diseases. We have developed miRwayDB, a database providing comprehensive information of experimentally validated miRNA-pathway associations in various pathophysiological conditions utilizing data collected from published literature. To the best of our knowledge, it is the first database that provides information about experimentally validated miRNA mediated pathway dysregulation as seen specifically in critical human diseases and hence indicative of a cause-and-effect relationship in most cases. The current version of miRwayDB collects an exhaustive list of miRNA-pathway association entries for 76 critical disease conditions by reviewing 663 published articles. Each database entry contains complete information on the name of the pathophysiological condition, associated miRNA(s), experimental sample type(s), regulation pattern (up/down) of miRNA, pathway association(s), targeted member of dysregulated pathway(s) and a brief description. In addition, miRwayDB provides miRNA, gene and pathway score to evaluate the role of a miRNA regulated pathways in various pathophysiological conditions. The database can also be used for other biomedical approaches such as validation of computational analysis, integrated analysis and prediction of computational model. It also offers a submission page to submit novel data from recently published studies. We believe that miRwayDB will be a useful tool for miRNA research community.Database URL:
      PubDate: Wed, 28 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay023
  • Prevention of data duplication for high throughput sequencing repositories

    • Authors: Gabdank I; Chan E, Davidson J, et al.
      Abstract: Prevention of unintended duplication is one of the ongoing challenges many databases have to address. Working with high-throughput sequencing data, the complexity of that challenge increases with the complexity of the definition of a duplicate. In a computational data model, a data object represents a real entity like a reagent or a biosample. This representation is similar to how a card represents a book in a paper library catalog. Duplicated data objects not only waste storage, they can mislead users into assuming the model represents more than the single entity. Even if it is clear that two objects represent a single entity, data duplication opens the door to potential inconsistencies between the objects since the content of the duplicated objects can be updated independently, allowing divergence of the metadata associated with the objects. Analogously to a situation in which a catalog in a paper library would contain by mistake two cards for a single copy of a book. If these cards are listing simultaneously two different individuals as current book borrowers, it would be difficult to determine which borrower (out of the two listed) actually has the book. Unfortunately, in a large database with multiple submitters, unintended duplication is to be expected. In this article, we present three principal guidelines the Encyclopedia of DNA Elements (ENCODE) Portal follows in order to prevent unintended duplication of both actual files and data objects: definition of identifiable data objects (I), object uniqueness validation (II) and de-duplication mechanism (III). In addition to explaining our modus operandi, we elaborate on the methods used for identification of sequencing data files. Comparison of the approach taken by the ENCODE Portal vs other widely used biological data repositories is provided.Database URL:
      PubDate: Tue, 27 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay008
  • Updated regulation curation model at the Saccharomyces Genome Database

    • Authors: Engel S; Skrzypek M, Hellerstedt S, et al.
      Abstract: The Saccharomyces Genome Database (SGD) provides comprehensive, integrated biological information for the budding yeast Saccharomyces cerevisiae, along with search and analysis tools to explore these data, enabling the discovery of functional relationships between sequence and gene products in fungi and higher organisms. We have recently expanded our data model for regulation curation to address regulation at the protein level in addition to transcription, and are presenting the expanded data on the ‘Regulation’ pages at SGD. These pages include a summary describing the context under which the regulator acts, manually curated and high-throughput annotations showing the regulatory relationships for that gene and a graphical visualization of its regulatory network and connected networks. For genes whose products regulate other genes or proteins, the Regulation page includes Gene Ontology enrichment analysis of the biological processes in which those targets participate. For DNA-binding transcription factors, we also provide other information relevant to their regulatory function, such as DNA binding site motifs and protein domains. As with other data types at SGD, all regulatory relationships and accompanying data are available through YeastMine, SGD’s data warehouse based on InterMine.Database URL:
      PubDate: Tue, 27 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay007
  • The NCBI BioCollections Database

    • Authors: Sharma S; Ciufo S, Starchenko E, et al.
      Abstract: The rapidly growing set of GenBank submissions includes sequences that are derived from vouchered specimens. These are associated with culture collections, museums, herbaria and other natural history collections, both living and preserved. Correct identification of the specimens studied, along with a method to associate the sample with its institution, is critical to the outcome of related studies and analyses. The National Center for Biotechnology Information BioCollections Database was established to allow the association of specimen vouchers and related sequence records to their home institutions. This process also allows cross-linking from the home institution for quick identification of all records originating from each collection.Database URL:
      PubDate: Fri, 23 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay006
  • TransAtlasDB: an integrated database connecting expression data, metadata
           and variants

    • Authors: Adetunji M; Lamont S, Schmidt C.
      Abstract: High-throughput transcriptome sequencing (RNAseq) is the universally applied method for target-free transcript identification and gene expression quantification, generating huge amounts of data. The constraint of accessing such data and interpreting results can be a major impediment in postulating suitable hypothesis, thus an innovative storage solution that addresses these limitations, such as hard disk storage requirements, efficiency and reproducibility are paramount. By offering a uniform data storage and retrieval mechanism, various data can be compared and easily investigated. We present a sophisticated system, TransAtlasDB, which incorporates a hybrid architecture of both relational and NoSQL databases for fast and efficient data storage, processing and querying of large datasets from transcript expression analysis with corresponding metadata, as well as gene-associated variants (such as SNPs) and their predicted gene effects. TransAtlasDB provides the data model of accurate storage of the large amount of data derived from RNAseq analysis and also methods of interacting with the database, either via the command-line data management workflows, written in Perl, with useful functionalities that simplifies the complexity of data storage and possibly manipulation of the massive amounts of data generated from RNAseq analysis or through the web interface. The database application is currently modeled to handle analyses data from agricultural species, and will be expanded to include more species groups. Overall TransAtlasDB aims to serve as an accessible repository for the large complex results data files derived from RNAseq gene expression profiling and variant analysis.Database URL:
      PubDate: Fri, 23 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay014
  • AllerGAtlas 1.0: a human allergy-related genes database

    • Authors: Liu J; Liu Y, Wang D, et al.
      Abstract: Allergy is a detrimental hypersensitive response to innocuous environmental antigen, which is caused by the effect of interaction between environmental factors and multiple genetic pre-disposition. In the past decades, hundreds of allergy-related genes have been identified to illustrate the epidemiology and pathogenesis of allergic diseases, which are associated with better endophenotype, novel biomarkers, early-life risk factors and individual differences in treatment responses. However, the information of all these allergy-related genes is dispersed in thousands of publications. Here, we present a manually curated human allergy-related gene database of AllerGAtlas, which contained 1195 well-annotated human allergy-related genes, determined by text-mining and manual curation. AllerGAtlas will be a valuable bioinformatics resource to search human allergy-related genes and explore their functions in allergy for experimental research.Database URL:
      PubDate: Thu, 22 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay010
  • dbDEPC 3.0: the database of differentially expressed proteins in human
           cancer with multi-level annotation and drug indication

    • Authors: Yang Q; Zhang Y, Cui H, et al.
      Abstract: Proteins are major effectors of biological functions, and differentially expressed proteins (DEPs) are widely reported as biomarkers in pathological mechanism, prognosis prediction as well as treatment targeting in cancer research. High-throughput technology of mass spectrometry (MS) has identified large amounts of DEPs in human cancers. Through mining published researches with detailed experiment information, dbDEPC was the first database aimed to provide a systematic resource for the storage and query of the DEPs generated by MS in cancer research. It was updated to dbDEPC 2.0 in 2012. Here, we provide another updated version of dbDEPC, with improvement of database contents and enhanced web interface. The current version of dbDEPC 3.0 contains 11 669 unique DEPs in 26 different cancer types. Multi-level annotations of DEPs have been firstly introduced this time, including cancer-related peptide amino acid variations, post-translational modifications and drug information. Moreover, these multi-level annotations can be displayed in the biological networks, which can benefit integrative analysis. Finally, an online enrichment analysis tool has been developed, to support a KEGG enrichment analysis and to browse the relationship among interested protein list and known DEPs in KEGG pathways. In summary, dbDEPC 3.0 provides a comprehensive resource for accessing integrated and highly annotated DEPs in human cancer.Database URL:
      PubDate: Thu, 22 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay015
  • Identification of errors in the IEDB using ontologies

    • Authors: Vita R; Overton J, Peters B.
      Abstract: The Immune Epitope Database (IEDB) is a free online resource that has manually curated over 18 500 references from the scientific literature. Our database presents experimental data relating to the recognition of immune epitopes by the adaptive immune system in a structured, searchable manner. In order to be consistent and accurate in our data representation across many different journals, authors and curators, we have implemented several quality control measures, such as curation rules, controlled vocabularies and links to external ontologies and other resources. Ontologies and other resources have greatly benefited the IEDB through improved search interfaces, easier curation practices, interoperability between the IEDB and other databases and the identification of errors within our dataset. Here, we will elaborate on how ontology mapping and usage can be used to find and correct errors in a manually curated database.Database URL:
      PubDate: Thu, 22 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay005
  • GAN: a platform of genomics and genetics analysis and application in

    • Authors: Yang S; Zhang X, Li H, et al.
      Abstract: Nicotiana is an important Solanaceae genus, and plays a significant role in modern biological research. Massive Nicotiana biological data have emerged from in-depth genomics and genetics studies. From big data to big discovery, large-scale analysis and application with new platforms is critical. Based on data accumulation, a comprehensive platform of Genomics and Genetics Analysis and Application in Nicotiana (GAN) has been developed, and is publicly available at GAN consists of four main sections: (i) Sources, a total of 5267 germplasm lines, along with detailed descriptions of associated characteristics, are all available on the Germplasm page, which can be queried using eight different inquiry modes. Seven fully sequenced species with accompanying sequences and detailed genomic annotation are available on the Genomics page. (ii) Genetics, detailed descriptions of 10 genetic linkage maps, constructed by different parents, 2239 KEGG metabolic pathway maps and 209 945 gene families across all catalogued genes, along with two co-linearity maps combining N. tabacum with available tomato and potato linkage maps are available here. Furthermore, 3 963 119 genome-SSRs, 10 621 016 SNPs, 12 388 PIPs and 102 895 reverse transcription-polymerase chain reaction primers, are all available to be used and searched on the Markers page. (iii) Tools, the genome browser JBrowse and five useful online bioinformatics softwares, Blast, Primer3, SSR-detect, Nucl-Protein and E-PCR, are provided on the JBrowse and Tools pages. (iv) Auxiliary, all the datasets are shown on a Statistics page, and are available for download on a Download page. In addition, the user’s manual is provided on a Manual page in English and Chinese languages. GAN provides a user-friendly Web interface for searching, browsing and downloading the genomics and genetics datasets in Nicotiana. As far as we can ascertain, GAN is the most comprehensive source of bio-data available, and the most applicable resource for breeding, gene mapping, gene cloning, the study of the origin and evolution of polyploidy, and related studies in Nicotiana.Database URL:
      PubDate: Wed, 21 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay001
  • FAIR principles and the IEDB: short-term improvements and a long-term
           vision of OBO-foundry mediated machine-actionable interoperability

    • Authors: Vita R; Overton J, Mungall C, et al.
      Abstract: The Immune Epitope Database (IEDB), at, has the mission to make published experimental data relating to the recognition of immune epitopes easily available to the scientific public. By presenting curated data in a searchable database, we have liberated it from the tables and figures of journal articles, making it more accessible and usable by immunologists. Recently, the principles of Findability, Accessibility, Interoperability and Reusability have been formulated as goals that data repositories should meet to enhance the usefulness of their data holdings. We here examine how the IEDB complies with these principles and identify broad areas of success, but also areas for improvement. We describe short-term improvements to the IEDB that are being implemented now, as well as a long-term vision of true ‘machine-actionable interoperability’, which we believe will require community agreement on standardization of knowledge representation that can be built on top of the shared use of ontologies.
      PubDate: Mon, 19 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bax105
  • miRToolsGallery: a tag-based and rankable microRNA bioinformatics
           resources database portal

    • Authors: Chen L; Heikkinen L, Wang C, et al.
      Abstract: Hundreds of bioinformatics tools have been developed for MicroRNA (miRNA) investigations including those used for identification, target prediction, structure and expression profile analysis. However, finding the correct tool for a specific application requires the tedious and laborious process of locating, downloading, testing and validating the appropriate tool from a group of nearly a thousand. In order to facilitate this process, we developed a novel database portal named miRToolsGallery. We constructed the portal by manually curating > 950 miRNA analysis tools and resources. In the portal, a query to locate the appropriate tool is expedited by being searchable, filterable and rankable. The ranking feature is vital to quickly identify and prioritize the more useful from the obscure tools. Tools are ranked via different criteria including the PageRank algorithm, date of publication, number of citations, average of votes and number of publications. miRToolsGallery provides links and data for the comprehensive collection of currently available miRNA tools with a ranking function which can be adjusted using different criteria according to specific requirements.Database URL:
      PubDate: Mon, 19 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay004
  • Fungal Stress Database (FSD)––a repository of fungal stress
           physiological data

    • Authors: Orosz E; van de Wiele N, Emri T, et al.
      Abstract: The construction of the Fungal Stress Database (FSD) was initiated and fueled by two major goals. At first, some outstandingly important groups of filamentous fungi including the aspergilli possess remarkable capabilities to adapt to a wide spectrum of environmental stress conditions but the underlying mechanisms of this stress tolerance have remained yet to be elucidated. Furthermore, the lack of any satisfactory interlaboratory standardization of stress assays, e.g. the widely used stress agar plate experiments, often hinders the direct comparison and discussion of stress physiological data gained for various fungal species by different research groups. In order to overcome these difficulties and to promote multilevel, e.g. combined comparative physiology-based and comparative genomics-based, stress research in filamentous fungi, we constructed FSD, which currently stores 1412 photos taken on Aspergillus colonies grown under precisely defined stress conditions. This study involved altogether 18 Aspergillus strains representing 17 species with two different strains for Aspergillus niger and covered six different stress conditions. Stress treatments were selected considering the frequency of various stress tolerance studies published in the last decade in the aspergilli and included oxidative (H2O2, menadione sodium bisulphite), high-osmolarity (NaCl, sorbitol), cell wall integrity (Congo Red) and heavy metal (CdCl2) stress exposures. In the future, we would like to expand this database to accommodate further fungal species and stress treatments.URL:
      PubDate: Mon, 12 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay009
  • OliveNet™: a comprehensive library of compounds from Olea europaea

    • Authors: Bonvino N; Liang J, McCord E, et al.
      Abstract: Accumulated epidemiological, clinical and experimental evidence has indicated the beneficial health effects of the Mediterranean diet, which is typified by the consumption of virgin olive oil (VOO) as a main source of dietary fat. At the cellular level, compounds derived from various olive (Olea europaea), matrices, have demonstrated potent antioxidant and anti-inflammatory effects, which are thought to account, at least in part, for their biological effects. Research efforts are expanding into the characterization of compounds derived from Olea europaea, however, the considerable diversity and complexity of the vast array of chemical compounds have made their precise identification and quantification challenging. As such, only a relatively small subset of olive-derived compounds has been explored for their biological activity and potential health effects to date. Although there is adequate information describing the identification or isolation of olive-derived compounds, these are not easily searchable, especially when attempting to acquire chemical or biological properties. Therefore, we have created the OliveNet™ database containing a comprehensive catalogue of compounds identified from matrices of the olive, including the fruit, leaf and VOO, as well as in the wastewater and pomace accrued during oil production. From a total of 752 compounds, chemical analysis was sufficient for 676 individual compounds, which have been included in the database. The database is curated and comprehensively referenced containing information for the 676 compounds, which are divided into 13 main classes and 47 subclasses. Importantly, with respect to current research trends, the database includes 222 olive phenolics, which are divided into 13 subclasses. To our knowledge, OliveNet™ is currently the only curated open access database with a comprehensive collection of compounds associated with Olea europaea.Database URL:
      PubDate: Mon, 12 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay016
  • TISSUES 2.0: an integrative web resource on mammalian tissue expression

    • Authors: Palasca O; Santos A, Stolte C, et al.
      Abstract: Physiological and molecular similarities between organisms make it possible to translate findings from simpler experimental systems—model organisms—into more complex ones, such as human. This translation facilitates the understanding of biological processes under normal or disease conditions. Researchers aiming to identify the similarities and differences between organisms at the molecular level need resources collecting multi-organism tissue expression data. We have developed a database of gene–tissue associations in human, mouse, rat and pig by integrating multiple sources of evidence: transcriptomics covering all four species and proteomics (human only), manually curated and mined from the scientific literature. Through a scoring scheme, these associations are made comparable across all sources of evidence and across organisms. Furthermore, the scoring produces a confidence score assigned to each of the associations. The TISSUES database (version 2.0) is publicly accessible through a user-friendly web interface and as part of the STRING app for Cytoscape. In addition, we analyzed the agreement between datasets, across and within organisms, and identified that the agreement is mainly affected by the quality of the datasets rather than by the technologies used or organisms compared.Database URL:
      PubDate: Mon, 12 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay003
  • Worldwide Protein Data Bank biocuration supporting open access to
           high-quality 3D structural biology data

    • Authors: Young J; Westbrook J, Feng Z, et al.
      Abstract: The Protein Data Bank (PDB) is the single global repository for experimentally determined 3D structures of biological macromolecules and their complexes with ligands. The worldwide PDB (wwPDB) is the international collaboration that manages the PDB archive according to the FAIR principles: Findability, Accessibility, Interoperability and Reusability. The wwPDB recently developed OneDep, a unified tool for deposition, validation and biocuration of structures of biological macromolecules. All data deposited to the PDB undergo critical review by wwPDB Biocurators. This article outlines the importance of biocuration for structural biology data deposited to the PDB and describes wwPDB biocuration processes and the role of expert Biocurators in sustaining a high-quality archive. Structural data submitted to the PDB are examined for self-consistency, standardized using controlled vocabularies, cross-referenced with other biological data resources and validated for scientific/technical accuracy. We illustrate how biocuration is integral to PDB data archiving, as it facilitates accurate, consistent and comprehensive representation of biological structure data, allowing efficient and effective usage by research scientists, educators, students and the curious public worldwide.Database URL:
      PubDate: Wed, 07 Feb 2018 00:00:00 GMT
      DOI: 10.1093/database/bay002
  • FishTEDB: a collective database of transposable elements identified in the
           complete genomes of fish

    • Authors: Shao F; Wang J, Xu H, et al.
      Abstract: Transposable elements (TEs) are important for host gene regulation and genome evolution. Consensus sequences of TEs can assist investigators in accelerating studies on TE origins, amplification, functions and evolution, as well as comparative analyses and prediction of TEs in different species. In evolution, physiology, ecology and heredity research, fish are important models. However, to date, no comprehensive resource for TE consensus sequences exists for fish. Here, we collected genome-wide data and developed a novel database, FishTEDB, including 27 bony fishes, 1 cartilaginous fish, 1 lamprey and 1 lancelet. De novo, structure-based and homology-based approaches were combined to detect TEs. The database is open-source and user-friendly, and users can browse, search and download all data. FishTEDB also provides GetORF, BLAST and HMMER tools to analyze sequences.Database URL:
      PubDate: Tue, 16 Jan 2018 00:00:00 GMT
      DOI: 10.1093/database/bax106
  • CellExpress: a comprehensive microarray-based cancer cell line and
           clinical sample gene expression analysis online system

    • Authors: Lee Y; Lee C, Lai L, et al.
      Abstract: With the advancement of high-throughput technologies, gene expression profiles in cell lines and clinical samples are widely available in the public domain for research. However, a challenge arises when trying to perform a systematic and comprehensive analysis across independent datasets. To address this issue, we developed a web-based system, CellExpress, for analyzing the gene expression levels in more than 4000 cancer cell lines and clinical samples obtained from public datasets and user-submitted data. First, a normalization algorithm can be utilized to reduce the systematic biases across independent datasets. Next, a similarity assessment of gene expression profiles can be achieved through a dynamic dot plot, along with a distance matrix obtained from principal component analysis. Subsequently, differentially expressed genes can be visualized using hierarchical clustering. Several statistical tests and analytical algorithms are implemented in the system for dissecting gene expression changes based on the groupings defined by users. Lastly, users are able to upload their own microarray and/or next-generation sequencing data to perform a comparison of their gene expression patterns, which can help classify user data, such as stem cells, into different tissue types. In conclusion, CellExpress is a user-friendly tool that provides a comprehensive analysis of gene expression levels in both cell lines and clinical samples. The website is freely available at Source code is available at under the MIT License.Database URL:
      PubDate: Fri, 12 Jan 2018 00:00:00 GMT
      DOI: 10.1093/database/bax101
  • A generic workflow for effective sampling of environmental vouchers with
           UUID assignment and image processing

    • Authors: Triebel D; Reichert W, Bosert S, et al.
      Abstract: Sampling of biological and environmental vouchers in the field is rather challenging, particularly under adverse habitat conditions and when various activities need to be handled simultaneously. The workflow described here includes five procedural steps, which result in professional sampling and the generation of universally identifiable data. In preparation for the field campaign, sample containers need to be labelled with universally unique identifier (UUID)-QR-codes. At the collection site, labelled containers, sampled material and attached supplementary information are imaged using a GNSS- respectively GPS-enabled smartphone or camera. Image processing, tagging and data storage as CSV text file is subsequently achieved in a field station or laboratory. For this purposes, the newly implemented tool DiversityImageInspector (URL: is used. It addresses combined image and data processing in such a context including the extraction of the QR-coded UUID from the image content and the extraction of geodata and time information from the Exif image header. The import of the resulting data files into a relational database or other kind of data management systems is optional but recommended. If applied, the import might be guided by a data transformation tool with compliant schema as described here. The new approach is discussed also with regard to implications for virtual research environments and data publication networks.Database URL:
      PubDate: Tue, 09 Jan 2018 00:00:00 GMT
      DOI: 10.1093/database/bax096
  • YAAM: Yeast Amino Acid Modifications Database

    • Authors: Ledesma L; Sandoval E, Cruz-Martínez U, et al.
      Abstract: Proteins are dynamic molecules that regulate a myriad of cellular functions; these functions may be regulated by protein post-translational modifications (PTMs) that mediate the activity, localization and interaction partners of proteins. Thus, understanding the meaning of a single PTM or the combination of several of them is essential to unravel the mechanisms of protein regulation. Yeast Amino Acid Modification (YAAM) ( is a comprehensive database that contains information from 121 921 residues of proteins, which are post-translationally modified in the yeast model Saccharomyces cerevisiae. All the PTMs contained in YAAM have been confirmed experimentally. YAAM database maps PTM residues in a 3D canvas for 680 proteins with a known 3D structure. The structure can be visualized and manipulated using the most common web browsers without the need for any additional plugin. The aim of our database is to retrieve and organize data about the location of modified amino acids providing information in a concise but comprehensive and user-friendly way, enabling users to find relevant information on PTMs. Given that PTMs influence almost all aspects of the biology of both healthy and diseased cells, identifying and understanding PTMs is critical in the study of molecular and cell biology. YAAM allows users to perform multiple searches, up to three modifications at the same residue, giving the possibility to explore possible regulatory mechanism for some proteins. Using YAAM search engine, we found three different PTMs of lysine residues involved in protein translation. This suggests an important regulatory mechanism for protein translation that needs to be further studied.Database URL:
      PubDate: Tue, 09 Jan 2018 00:00:00 GMT
      DOI: 10.1093/database/bax099
  • To increase trust, change the social design behind aggregated biodiversity

    • Authors: Franz N; Sterner B.
      Abstract: Growing concerns about the quality of aggregated biodiversity data are lowering trust in large-scale data networks. Aggregators frequently respond to quality concerns by recommending that biologists work with original data providers to correct errors ‘at the source.’ We show that this strategy falls systematically short of a full diagnosis of the underlying causes of distrust. In particular, trust in an aggregator is not just a feature of the data signal quality provided by the sources to the aggregator, but also a consequence of the social design of the aggregation process and the resulting power balance between individual data contributors and aggregators. The latter have created an accountability gap by downplaying the authorship and significance of the taxonomic hierarchies—frequently called ‘backbones’—they generate, and which are in effect novel classification theories that operate at the core of data-structuring process. The Darwin Core standard for sharing occurrence records plays an under-appreciated role in maintaining the accountability gap, because this standard lacks the syntactic structure needed to preserve the taxonomic coherence of data packages submitted for aggregation, potentially leading to inferences that no individual source would support. Since high-quality data packages can mirror competing and conflicting classifications, i.e. unsettled systematic research, this plurality must be accommodated in the design of biodiversity data integration. Looking forward, a key directive is to develop new technical pathways and social incentives for experts to contribute directly to the validation of taxonomically coherent data packages as part of a greater, trustworthy aggregation process.
      PubDate: Thu, 04 Jan 2018 00:00:00 GMT
      DOI: 10.1093/database/bax100
  • HTT-DB: new features and updates

    • Authors: Dotto B; Carvalho E, da Silva A, et al.
      Abstract: Horizontal Transfer (HT) of genetic material between species is a common phenomenon among Bacteria and Archaea species and several databases are available for information retrieval and data mining. However, little attention has been given to this phenomenon among eukaryotic species mainly due to the lower proportion of these events. In the last years, a vertiginous amount of new HT events involving eukaryotic species was reported in the literature, highlighting the need of a common repository to keep the scientific community up to date and describe overall trends. Recently, we published the first HT database focused on HT of transposable elements among eukaryotes: the Horizontal Transposon Transfer DataBase: Database URL: ( 8080/httdatabase/). Here, we present new features and updates of this unique database: (i) its expansion to include virus-host exchange of genetic material, which we called Horizontal Virus Transfer (HVT) and (ii) the availability of a web server for HT detection, where we implemented the online version of vertical and horizontal inheritance consistence analysis (VHICA), an R package developed for HT detection. These improvements will help researchers to navigate through known HVT cases, take data-informed decision and export figures based on keywords searches. Moreover, the availability of the VHICA as an online tool will make this software easily reachable even for researchers with no or little computation knowledge as well as foster our capability to detect new HT events in a wide variety of taxa.Database URL:
      PubDate: Thu, 04 Jan 2018 00:00:00 GMT
      DOI: 10.1093/database/bax102
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
Home (Search)
Subjects A-Z
Publishers A-Z
Your IP address:
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-