for Journals by Title or ISSN
for Articles by Keywords

Publisher: Oxford University Press   (Total: 370 journals)

 A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

        1 2 | Last   [Sort by number of followers]   [Restore default list]

Showing 1 - 200 of 370 Journals sorted alphabetically
Acta Biochimica et Biophysica Sinica     Hybrid Journal   (Followers: 6, SJR: 0.881, h-index: 38)
Adaptation     Hybrid Journal   (Followers: 8, SJR: 0.111, h-index: 4)
Aesthetic Surgery J.     Hybrid Journal   (Followers: 6, SJR: 1.538, h-index: 35)
African Affairs     Hybrid Journal   (Followers: 57, SJR: 1.512, h-index: 46)
Age and Ageing     Hybrid Journal   (Followers: 81, SJR: 1.611, h-index: 107)
Alcohol and Alcoholism     Hybrid Journal   (Followers: 14, SJR: 0.935, h-index: 80)
American Entomologist     Full-text available via subscription   (Followers: 6)
American Historical Review     Hybrid Journal   (Followers: 125, SJR: 0.652, h-index: 43)
American J. of Agricultural Economics     Hybrid Journal   (Followers: 41, SJR: 1.441, h-index: 77)
American J. of Epidemiology     Hybrid Journal   (Followers: 151, SJR: 3.047, h-index: 201)
American J. of Hypertension     Hybrid Journal   (Followers: 18, SJR: 1.397, h-index: 111)
American J. of Jurisprudence     Hybrid Journal   (Followers: 15)
American journal of legal history     Full-text available via subscription   (Followers: 4, SJR: 0.151, h-index: 7)
American Law and Economics Review     Hybrid Journal   (Followers: 26, SJR: 0.824, h-index: 23)
American Literary History     Hybrid Journal   (Followers: 12, SJR: 0.185, h-index: 22)
Analysis     Hybrid Journal   (Followers: 24)
Annals of Botany     Hybrid Journal   (Followers: 35, SJR: 1.912, h-index: 124)
Annals of Occupational Hygiene     Hybrid Journal   (Followers: 26, SJR: 0.837, h-index: 57)
Annals of Oncology     Hybrid Journal   (Followers: 48, SJR: 4.362, h-index: 173)
Annals of the Entomological Society of America     Full-text available via subscription   (Followers: 8, SJR: 0.642, h-index: 53)
Annals of Work Exposures and Health     Hybrid Journal  
AoB Plants     Open Access   (Followers: 4, SJR: 0.78, h-index: 10)
Applied Economic Perspectives and Policy     Hybrid Journal   (Followers: 19, SJR: 0.884, h-index: 31)
Applied Linguistics     Hybrid Journal   (Followers: 51, SJR: 1.749, h-index: 63)
Applied Mathematics Research eXpress     Hybrid Journal   (Followers: 1, SJR: 0.779, h-index: 11)
Arbitration Intl.     Full-text available via subscription   (Followers: 19)
Arbitration Law Reports and Review     Hybrid Journal   (Followers: 12)
Archives of Clinical Neuropsychology     Hybrid Journal   (Followers: 26, SJR: 0.96, h-index: 71)
Aristotelian Society Supplementary Volume     Hybrid Journal   (Followers: 2, SJR: 0.102, h-index: 20)
Arthropod Management Tests     Hybrid Journal   (Followers: 2)
Astronomy & Geophysics     Hybrid Journal   (Followers: 47, SJR: 0.144, h-index: 15)
Behavioral Ecology     Hybrid Journal   (Followers: 46, SJR: 1.698, h-index: 92)
Bioinformatics     Hybrid Journal   (Followers: 231, SJR: 4.643, h-index: 271)
Biology Methods and Protocols     Hybrid Journal  
Biology of Reproduction     Full-text available via subscription   (Followers: 9, SJR: 1.646, h-index: 149)
Biometrika     Hybrid Journal   (Followers: 19, SJR: 2.801, h-index: 90)
BioScience     Hybrid Journal   (Followers: 28, SJR: 2.374, h-index: 154)
Bioscience Horizons : The National Undergraduate Research J.     Open Access   (Followers: 1, SJR: 0.213, h-index: 9)
Biostatistics     Hybrid Journal   (Followers: 16, SJR: 1.955, h-index: 55)
BJA : British J. of Anaesthesia     Hybrid Journal   (Followers: 134, SJR: 2.314, h-index: 133)
BJA Education     Hybrid Journal   (Followers: 65, SJR: 0.272, h-index: 20)
Brain     Hybrid Journal   (Followers: 61, SJR: 6.097, h-index: 264)
Briefings in Bioinformatics     Hybrid Journal   (Followers: 45, SJR: 4.086, h-index: 73)
Briefings in Functional Genomics     Hybrid Journal   (Followers: 4, SJR: 1.771, h-index: 50)
British J. for the Philosophy of Science     Hybrid Journal   (Followers: 33, SJR: 1.267, h-index: 38)
British J. of Aesthetics     Hybrid Journal   (Followers: 25, SJR: 0.217, h-index: 18)
British J. of Criminology     Hybrid Journal   (Followers: 502, SJR: 1.373, h-index: 62)
British J. of Social Work     Hybrid Journal   (Followers: 80, SJR: 0.771, h-index: 53)
British Medical Bulletin     Hybrid Journal   (Followers: 7, SJR: 1.391, h-index: 84)
British Yearbook of Intl. Law     Hybrid Journal   (Followers: 26)
Bulletin of the London Mathematical Society     Hybrid Journal   (Followers: 3, SJR: 1.474, h-index: 31)
Cambridge J. of Economics     Hybrid Journal   (Followers: 54, SJR: 0.957, h-index: 59)
Cambridge J. of Regions, Economy and Society     Hybrid Journal   (Followers: 9, SJR: 1.067, h-index: 22)
Cambridge Quarterly     Hybrid Journal   (Followers: 10, SJR: 0.1, h-index: 7)
Capital Markets Law J.     Hybrid Journal  
Carcinogenesis     Hybrid Journal   (Followers: 2, SJR: 2.439, h-index: 167)
Cardiovascular Research     Hybrid Journal   (Followers: 11, SJR: 2.897, h-index: 175)
Cerebral Cortex     Hybrid Journal   (Followers: 39, SJR: 4.827, h-index: 192)
CESifo Economic Studies     Hybrid Journal   (Followers: 16, SJR: 0.501, h-index: 19)
Chemical Senses     Hybrid Journal   (Followers: 1, SJR: 1.436, h-index: 76)
Children and Schools     Hybrid Journal   (Followers: 6, SJR: 0.211, h-index: 18)
Chinese J. of Comparative Law     Hybrid Journal   (Followers: 3)
Chinese J. of Intl. Law     Hybrid Journal   (Followers: 19, SJR: 0.737, h-index: 11)
Chinese J. of Intl. Politics     Hybrid Journal   (Followers: 8, SJR: 1.238, h-index: 15)
Christian Bioethics: Non-Ecumenical Studies in Medical Morality     Hybrid Journal   (Followers: 11, SJR: 0.191, h-index: 8)
Classical Receptions J.     Hybrid Journal   (Followers: 17, SJR: 0.1, h-index: 3)
Clinical Infectious Diseases     Hybrid Journal   (Followers: 59, SJR: 4.742, h-index: 261)
Clinical Kidney J.     Open Access   (Followers: 4, SJR: 0.338, h-index: 19)
Community Development J.     Hybrid Journal   (Followers: 24, SJR: 0.47, h-index: 28)
Computer J.     Hybrid Journal   (Followers: 8, SJR: 0.371, h-index: 47)
Conservation Physiology     Open Access   (Followers: 2)
Contemporary Women's Writing     Hybrid Journal   (Followers: 11, SJR: 0.111, h-index: 3)
Contributions to Political Economy     Hybrid Journal   (Followers: 6, SJR: 0.313, h-index: 10)
Critical Values     Full-text available via subscription  
Current Legal Problems     Hybrid Journal   (Followers: 25)
Current Zoology     Full-text available via subscription   (SJR: 0.999, h-index: 20)
Database : The J. of Biological Databases and Curation     Open Access   (Followers: 11, SJR: 1.068, h-index: 24)
Digital Scholarship in the Humanities     Hybrid Journal   (Followers: 12)
Diplomatic History     Hybrid Journal   (Followers: 18, SJR: 0.296, h-index: 22)
DNA Research     Open Access   (Followers: 4, SJR: 2.42, h-index: 77)
Dynamics and Statistics of the Climate System     Open Access   (Followers: 3)
Early Music     Hybrid Journal   (Followers: 13, SJR: 0.124, h-index: 11)
Economic Policy     Hybrid Journal   (Followers: 52, SJR: 2.052, h-index: 52)
ELT J.     Hybrid Journal   (Followers: 25, SJR: 1.26, h-index: 23)
English Historical Review     Hybrid Journal   (Followers: 46, SJR: 0.311, h-index: 10)
English: J. of the English Association     Hybrid Journal   (Followers: 12, SJR: 0.144, h-index: 3)
Environmental Entomology     Full-text available via subscription   (Followers: 11, SJR: 0.791, h-index: 66)
Environmental Epigenetics     Open Access   (Followers: 1)
Environmental History     Hybrid Journal   (Followers: 25, SJR: 0.197, h-index: 25)
EP-Europace     Hybrid Journal   (Followers: 2, SJR: 2.201, h-index: 71)
Epidemiologic Reviews     Hybrid Journal   (Followers: 9, SJR: 3.917, h-index: 81)
ESHRE Monographs     Hybrid Journal  
Essays in Criticism     Hybrid Journal   (Followers: 15, SJR: 0.1, h-index: 6)
European Heart J.     Hybrid Journal   (Followers: 47, SJR: 6.997, h-index: 227)
European Heart J. - Cardiovascular Imaging     Hybrid Journal   (Followers: 9, SJR: 2.044, h-index: 58)
European Heart J. - Cardiovascular Pharmacotherapy     Full-text available via subscription   (Followers: 1)
European Heart J. - Quality of Care and Clinical Outcomes     Hybrid Journal  
European Heart J. Supplements     Hybrid Journal   (Followers: 8, SJR: 0.152, h-index: 31)
European J. of Cardio-Thoracic Surgery     Hybrid Journal   (Followers: 7, SJR: 1.568, h-index: 104)
European J. of Intl. Law     Hybrid Journal   (Followers: 147, SJR: 0.722, h-index: 38)
European J. of Orthodontics     Hybrid Journal   (Followers: 4, SJR: 1.09, h-index: 60)
European J. of Public Health     Hybrid Journal   (Followers: 22, SJR: 1.284, h-index: 64)
European Review of Agricultural Economics     Hybrid Journal   (Followers: 12, SJR: 1.549, h-index: 42)
European Review of Economic History     Hybrid Journal   (Followers: 26, SJR: 0.628, h-index: 24)
European Sociological Review     Hybrid Journal   (Followers: 37, SJR: 2.061, h-index: 53)
Evolution, Medicine, and Public Health     Open Access   (Followers: 11)
Family Practice     Hybrid Journal   (Followers: 13, SJR: 1.048, h-index: 77)
Fems Microbiology Ecology     Hybrid Journal   (Followers: 8, SJR: 1.687, h-index: 115)
Fems Microbiology Letters     Hybrid Journal   (Followers: 20, SJR: 1.126, h-index: 118)
Fems Microbiology Reviews     Hybrid Journal   (Followers: 25, SJR: 7.587, h-index: 150)
Fems Yeast Research     Hybrid Journal   (Followers: 13, SJR: 1.213, h-index: 66)
Foreign Policy Analysis     Hybrid Journal   (Followers: 21, SJR: 0.859, h-index: 10)
Forestry: An Intl. J. of Forest Research     Hybrid Journal   (Followers: 16, SJR: 0.903, h-index: 44)
Forum for Modern Language Studies     Hybrid Journal   (Followers: 6, SJR: 0.108, h-index: 6)
French History     Hybrid Journal   (Followers: 30, SJR: 0.123, h-index: 10)
French Studies     Hybrid Journal   (Followers: 19, SJR: 0.119, h-index: 7)
French Studies Bulletin     Hybrid Journal   (Followers: 10, SJR: 0.102, h-index: 3)
Gastroenterology Report     Open Access   (Followers: 2)
Genome Biology and Evolution     Open Access   (Followers: 10, SJR: 3.22, h-index: 39)
Geophysical J. Intl.     Hybrid Journal   (Followers: 31, SJR: 1.839, h-index: 119)
German History     Hybrid Journal   (Followers: 25, SJR: 0.437, h-index: 13)
GigaScience     Open Access   (Followers: 3)
Global Summitry     Hybrid Journal  
Glycobiology     Hybrid Journal   (Followers: 14, SJR: 1.692, h-index: 101)
Health and Social Work     Hybrid Journal   (Followers: 45, SJR: 0.505, h-index: 40)
Health Education Research     Hybrid Journal   (Followers: 12, SJR: 0.814, h-index: 80)
Health Policy and Planning     Hybrid Journal   (Followers: 21, SJR: 1.628, h-index: 66)
Health Promotion Intl.     Hybrid Journal   (Followers: 20, SJR: 0.664, h-index: 60)
History Workshop J.     Hybrid Journal   (Followers: 27, SJR: 0.313, h-index: 20)
Holocaust and Genocide Studies     Hybrid Journal   (Followers: 23, SJR: 0.115, h-index: 13)
Human Molecular Genetics     Hybrid Journal   (Followers: 9, SJR: 4.288, h-index: 233)
Human Reproduction     Hybrid Journal   (Followers: 76, SJR: 2.271, h-index: 179)
Human Reproduction Update     Hybrid Journal   (Followers: 16, SJR: 4.678, h-index: 128)
Human Rights Law Review     Hybrid Journal   (Followers: 58, SJR: 0.7, h-index: 21)
ICES J. of Marine Science: J. du Conseil     Hybrid Journal   (Followers: 54, SJR: 1.233, h-index: 88)
ICSID Review     Hybrid Journal   (Followers: 9)
ILAR J.     Hybrid Journal   (Followers: 1, SJR: 1.099, h-index: 51)
IMA J. of Applied Mathematics     Hybrid Journal   (SJR: 0.329, h-index: 26)
IMA J. of Management Mathematics     Hybrid Journal   (Followers: 2, SJR: 0.351, h-index: 20)
IMA J. of Mathematical Control and Information     Hybrid Journal   (Followers: 2, SJR: 0.661, h-index: 28)
IMA J. of Numerical Analysis - advance access     Hybrid Journal   (SJR: 2.032, h-index: 44)
Industrial and Corporate Change     Hybrid Journal   (Followers: 8, SJR: 1.37, h-index: 81)
Industrial Law J.     Hybrid Journal   (Followers: 29, SJR: 0.184, h-index: 15)
Information and Inference     Free  
Integrative and Comparative Biology     Hybrid Journal   (Followers: 8, SJR: 1.911, h-index: 90)
Interacting with Computers     Hybrid Journal   (Followers: 10, SJR: 0.529, h-index: 59)
Interactive CardioVascular and Thoracic Surgery     Hybrid Journal   (Followers: 4, SJR: 0.743, h-index: 35)
Intl. Affairs     Hybrid Journal   (Followers: 51, SJR: 1.264, h-index: 53)
Intl. Data Privacy Law     Hybrid Journal   (Followers: 26)
Intl. Health     Hybrid Journal   (Followers: 4, SJR: 0.835, h-index: 15)
Intl. Immunology     Hybrid Journal   (Followers: 3, SJR: 1.613, h-index: 111)
Intl. J. for Quality in Health Care     Hybrid Journal   (Followers: 32, SJR: 1.593, h-index: 69)
Intl. J. of Constitutional Law     Hybrid Journal   (Followers: 59, SJR: 0.613, h-index: 19)
Intl. J. of Epidemiology     Hybrid Journal   (Followers: 122, SJR: 4.381, h-index: 145)
Intl. J. of Law and Information Technology     Hybrid Journal   (Followers: 3, SJR: 0.247, h-index: 8)
Intl. J. of Law, Policy and the Family     Hybrid Journal   (Followers: 28, SJR: 0.307, h-index: 15)
Intl. J. of Lexicography     Hybrid Journal   (Followers: 8, SJR: 0.404, h-index: 18)
Intl. J. of Low-Carbon Technologies     Open Access   (Followers: 1, SJR: 0.457, h-index: 12)
Intl. J. of Neuropsychopharmacology     Open Access   (Followers: 3, SJR: 1.69, h-index: 79)
Intl. J. of Public Opinion Research     Hybrid Journal   (Followers: 8, SJR: 0.906, h-index: 33)
Intl. J. of Refugee Law     Hybrid Journal   (Followers: 32, SJR: 0.231, h-index: 21)
Intl. J. of Transitional Justice     Hybrid Journal   (Followers: 13, SJR: 0.833, h-index: 12)
Intl. Mathematics Research Notices     Hybrid Journal   (Followers: 1, SJR: 2.052, h-index: 42)
Intl. Political Sociology     Hybrid Journal   (Followers: 26, SJR: 1.339, h-index: 19)
Intl. Relations of the Asia-Pacific     Hybrid Journal   (Followers: 17, SJR: 0.539, h-index: 17)
Intl. Studies Perspectives     Hybrid Journal   (Followers: 7, SJR: 0.998, h-index: 28)
Intl. Studies Quarterly     Hybrid Journal   (Followers: 36, SJR: 2.184, h-index: 68)
Intl. Studies Review     Hybrid Journal   (Followers: 17, SJR: 0.783, h-index: 38)
ISLE: Interdisciplinary Studies in Literature and Environment     Hybrid Journal   (Followers: 1, SJR: 0.155, h-index: 4)
ITNOW     Hybrid Journal   (Followers: 2, SJR: 0.102, h-index: 4)
J. of African Economies     Hybrid Journal   (Followers: 15, SJR: 0.647, h-index: 30)
J. of American History     Hybrid Journal   (Followers: 39, SJR: 0.286, h-index: 34)
J. of Analytical Toxicology     Hybrid Journal   (Followers: 13, SJR: 1.038, h-index: 60)
J. of Antimicrobial Chemotherapy     Hybrid Journal   (Followers: 19, SJR: 2.157, h-index: 149)
J. of Antitrust Enforcement     Hybrid Journal   (Followers: 1)
J. of Applied Poultry Research     Hybrid Journal   (Followers: 3, SJR: 0.563, h-index: 43)
J. of Biochemistry     Hybrid Journal   (Followers: 44, SJR: 1.341, h-index: 96)
J. of Chromatographic Science     Hybrid Journal   (Followers: 16, SJR: 0.448, h-index: 42)
J. of Church and State     Hybrid Journal   (Followers: 11, SJR: 0.167, h-index: 11)
J. of Competition Law and Economics     Hybrid Journal   (Followers: 34, SJR: 0.442, h-index: 16)
J. of Complex Networks     Hybrid Journal   (Followers: 1, SJR: 1.165, h-index: 5)
J. of Conflict and Security Law     Hybrid Journal   (Followers: 11, SJR: 0.196, h-index: 15)
J. of Consumer Research     Full-text available via subscription   (Followers: 40, SJR: 4.896, h-index: 121)
J. of Crohn's and Colitis     Hybrid Journal   (Followers: 9, SJR: 1.543, h-index: 37)
J. of Cybersecurity     Hybrid Journal   (Followers: 3)
J. of Deaf Studies and Deaf Education     Hybrid Journal   (Followers: 8, SJR: 0.69, h-index: 36)
J. of Design History     Hybrid Journal   (Followers: 15, SJR: 0.166, h-index: 14)
J. of Economic Entomology     Full-text available via subscription   (Followers: 6, SJR: 0.894, h-index: 76)
J. of Economic Geography     Hybrid Journal   (Followers: 34, SJR: 2.909, h-index: 69)
J. of Environmental Law     Hybrid Journal   (Followers: 23, SJR: 0.457, h-index: 20)
J. of European Competition Law & Practice     Hybrid Journal   (Followers: 19)
J. of Experimental Botany     Hybrid Journal   (Followers: 14, SJR: 2.798, h-index: 163)
J. of Financial Econometrics     Hybrid Journal   (Followers: 21, SJR: 1.314, h-index: 27)
J. of Global Security Studies     Hybrid Journal   (Followers: 2)
J. of Heredity     Hybrid Journal   (Followers: 3, SJR: 1.024, h-index: 76)
J. of Hindu Studies     Hybrid Journal   (Followers: 7, SJR: 0.186, h-index: 3)
J. of Hip Preservation Surgery     Open Access  
J. of Human Rights Practice     Hybrid Journal   (Followers: 20, SJR: 0.399, h-index: 10)
J. of Infectious Diseases     Hybrid Journal   (Followers: 40, SJR: 4, h-index: 209)
J. of Insect Science     Open Access   (Followers: 9, SJR: 0.388, h-index: 31)

        1 2 | Last   [Sort by number of followers]   [Restore default list]

Journal Cover Database : The Journal of Biological Databases and Curation
  [SJR: 1.068]   [H-I: 24]   [11 followers]  Follow
  This is an Open Access Journal Open Access journal
   ISSN (Online) 1758-0463
   Published by Oxford University Press Homepage  [370 journals]
  • GenomeHubs: simple containerized setup of a custom Ensembl database and
           web server for any species

    • Authors: Challis RJ; Kumar S, Stevens L, et al.
      Abstract: As the generation and use of genomic datasets is becoming increasingly common in all areas of biology, the need for resources to collate, analyse and present data from one or more genome projects is becoming more pressing. The Ensembl platform is a powerful tool to make genome data and cross-species analyses easily accessible through a web interface and a comprehensive application programming interface. Here we introduce GenomeHubs, which provide a containerized environment to facilitate the setup and hosting of custom Ensembl genome browsers. This simplifies mirroring of existing content and import of new genomic data into the Ensembl database schema. GenomeHubs also provide a set of analysis containers to decorate imported genomes with results of standard analyses and functional annotations and support export to flat files, including EMBL format for submission of assemblies and annotations to International Nucleotide Sequence Database Collaboration.Database URL:
      PubDate: 2017-05-15
  • BioM2MetDisease: a manually curated database for associations between
           microRNAs, metabolites, small molecules and metabolic diseases

    • Authors: Xu Y; Yang H, Wu T, et al.
      Abstract: BioM2MetDisease is a manually curated database that aims to provide a comprehensive and experimentally supported resource of associations between metabolic diseases and various biomolecules. Recently, metabolic diseases such as diabetes have become one of the leading threats to people’s health. Metabolic disease associated with alterations of multiple types of biomolecules such as miRNAs and metabolites. An integrated and high-quality data source that collection of metabolic disease associated biomolecules is essential for exploring the underlying molecular mechanisms and discovering novel therapeutics. Here, we developed the BioM2MetDisease database, which currently documents 2681 entries of relationships between 1147 biomolecules (miRNAs, metabolites and small molecules/drugs) and 78 metabolic diseases across 14 species. Each entry includes biomolecule category, species, biomolecule name, disease name, dysregulation pattern, experimental technique, a brief description of metabolic disease-biomolecule relationships, the reference, additional annotation information etc. BioM2MetDisease provides a user-friendly interface to explore and retrieve all data conveniently. A submission page was also offered for researchers to submit new associations between biomolecules and metabolic diseases. BioM2MetDisease provides a comprehensive resource for studying biology molecules act in metabolic diseases, and it is helpful for understanding the molecular mechanisms and developing novel therapeutics for metabolic diseases.Database URL:
      PubDate: 2017-05-12
  • AnnoSys—implementation of a generic annotation system for schema-based
           data using the example of biodiversity collection data

    • Authors: Suhrbier LL; Kusber WH, Tschöpe OO, et al.
      Abstract: <span class="paragraphSection">Several errors were noted in the above paper after publication and have now been corrected.</span>
      PubDate: 2017-05-06
  • CHOmine: an integrated data warehouse for CHO systems biology and modeling

    • Authors: Gerstl MP; Hanscho M, Ruckerbauer DE, et al.
      Abstract: <span class="paragraphSection"><div class="boxTitle"> </div>The last decade has seen a surge in published genome-scale information for Chinese hamster ovary (CHO) cells, which are the main production vehicles for therapeutic proteins. While a single access point is available at <a href=""></a>, the primary data is distributed over several databases at different institutions. Currently research is frequently hampered by a plethora of gene names and IDs that vary between published draft genomes and databases making systems biology analyses cumbersome and elaborate. Here we present CHOmine, an integrative data warehouse connecting data from various databases and links to other ones. Furthermore, we introduce CHOmodel, a web based resource that provides access to recently published CHO cell line specific metabolic reconstructions. Both resources allow to query CHO relevant data, find interconnections between different types of data and thus provides a simple, standardized entry point to the world of CHO systems biology.<strong>Database URL:</strong><a href=""></a></span>
      PubDate: 2017-04-22
  • Improving biocuration of microRNAs in diseases: a case study in idiopathic
           pulmonary fibrosis

    • Authors: Balderas-Martínez Y; Rinaldi F, Contreras G, et al.
      Abstract: <span class="paragraphSection"><div class="boxTitle"> </div>MicroRNAs (miRNAs) are small and non-coding RNA molecules that inhibit gene expression posttranscriptionally. They play important roles in several biological processes, and in recent years there has been an interest in studying how they are related to the pathogenesis of diseases. Although there are already some databases that contain information for miRNAs and their relation with illnesses, their curation represents a significant challenge due to the amount of information that is being generated every day. In particular, respiratory diseases are poorly documented in databases, despite the fact that they are of increasing concern regarding morbidity, mortality and economic impacts. In this work, we present the results that we obtained in the BioCreative Interactive Track (IAT), using a semiautomatic approach for improving biocuration of miRNAs related to diseases. Our procedures will be useful to complement databases that contain this type of information. We adapted the OntoGene text mining pipeline and the ODIN curation system in a full-text corpus of scientific publications concerning one specific respiratory disease: idiopathic pulmonary fibrosis, the most common and aggressive of the idiopathic interstitial cases of pneumonia. We curated 823 miRNA text snippets and found a total of 246 miRNAs related to this disease based on our semiautomatic approach with the system OntoGene/ODIN. The biocuration throughput improved by a factor of 12 compared with traditional manual biocuration. <a href="">A significant advantage of our semiautomatic pipeline is that it can be applied to obtain the miRNAs of all the respiratory diseases and offers the possibility to be used for other illnesses</a>.<strong>Database URL:</strong><a href=""></a></span>
      PubDate: 2017-04-22
  • Strategies towards digital and semi-automated curation in RegulonDB

    • Authors: Rinaldi F; Lithgow O, Gama-Castro S, et al.
      Abstract: Several errors to the authors’ details have now been corrected in the above paper.
      PubDate: 2017-04-17
  • GeneHancer: genome-wide integration of enhancers and target genes in

    • Authors: Fishilevich S; Nudel R, Rappaport N, et al.
      Abstract: A major challenge in understanding gene regulation is the unequivocal identification of enhancer elements and uncovering their connections to genes. We present GeneHancer, a novel database of human enhancers and their inferred target genes, in the framework of GeneCards. First, we integrated a total of 434 000 reported enhancers from four different genome-wide databases: the Encyclopedia of DNA Elements (ENCODE), the Ensembl regulatory build, the functional annotation of the mammalian genome (FANTOM) project and the VISTA Enhancer Browser. Employing an integration algorithm that aims to remove redundancy, GeneHancer portrays 285 000 integrated candidate enhancers (covering 12.4% of the genome), 94 000 of which are derived from more than one source, and each assigned an annotation-derived confidence score. GeneHancer subsequently links enhancers to genes, using: tissue co-expression correlation between genes and enhancer RNAs, as well as enhancer-targeted transcription factor genes; expression quantitative trait loci for variants within enhancers; and capture Hi-C, a promoter-specific genome conformation assay. The individual scores based on each of these four methods, along with gene–enhancer genomic distances, form the basis for GeneHancer’s combinatorial likelihood-based scores for enhancer–gene pairing. Finally, we define ‘elite’ enhancer–gene relations reflecting both a high-likelihood enhancer definition and a strong enhancer–gene association.GeneHancer predictions are fully integrated in the widely used GeneCards Suite, whereby candidate enhancers and their annotations are displayed on every relevant GeneCard. This assists in the mapping of non-coding variants to enhancers, and via the linked genes, forms a basis for variant–phenotype interpretation of whole-genome sequences in health and disease.Database URL:
      PubDate: 2017-04-17
  • Surveying the Maize community for their diversity and pedigree
           visualization needs to prioritize tool development and curation

    • Authors: Sen TZ; Braun BL, Schott DA, et al.
      Abstract: The Maize Genetics and Genomics Database (MaizeGDB) team prepared a survey to identify breeders’ needs for visualizing pedigrees, diversity data and haplotypes in order to prioritize tool development and curation efforts at MaizeGDB. The survey was distributed to the maize research community on behalf of the Maize Genetics Executive Committee in Summer 2015. The survey garnered 48 responses from maize researchers, of which more than half were self-identified as breeders. The survey showed that the maize researchers considered their top priorities for visualization as: (i) displaying single nucleotide polymorphisms in a given region for a given list of lines, (ii) showing haplotypes for a given list of lines and (iii) presenting pedigree relationships visually. The survey also asked which populations would be most useful to display. The following two populations were on top of the list: (i) 3000 publicly available maize inbred lines used in Romay et al. (Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol, 2013;14:R55) and (ii) maize lines with expired Plant Variety Protection Act (ex-PVP) certificates. Driven by this strong stakeholder input, MaizeGDB staff are currently working in four areas to improve its interface and web-based tools: (i) presenting immediate progenies of currently available stocks at the MaizeGDB Stock pages, (ii) displaying the most recent ex-PVP lines described in the Germplasm Resources Information Network (GRIN) on the MaizeGDB Stock pages, (iii) developing network views of pedigree relationships and (iv) visualizing genotypes from SNP-based diversity datasets. These survey results can help other biological databases to direct their efforts according to user preferences as they serve similar types of data sets for their communities.Database URL:
      PubDate: 2017-04-17
  • TriatoKey: a web and mobile tool for biodiversity identification of
           Brazilian triatomine species

    • Authors: Márcia de Oliveira L; Nogueira de Brito R, Anderson Souza Guimarães P, et al.
      Abstract: Triatomines are blood-sucking insects that transmit the causative agent of Chagas disease, Trypanosoma cruzi. Despite being recognized as a difficult task, the correct taxonomic identification of triatomine species is crucial for vector control in Latin America, where the disease is endemic. In this context, we have developed a web and mobile tool based on PostgreSQL database to help healthcare technicians to overcome the difficulties to identify triatomine vectors when the technical expertise is missing. The web and mobile version makes use of real triatomine species pictures and dichotomous key method to support the identification of potential vectors that occur in Brazil. It provides a user example-driven interface with simple language. TriatoKey can also be useful for educational purposes.Database URL:
      PubDate: 2017-04-17
  • Workflow and web application for annotating NCBI BioProject transcriptome

    • Authors: Vera Alvarez R; Medeiros Vidal N, Garzón-Martínez GA, et al.
      Abstract: The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621.Database URL:
      PubDate: 2017-04-17
  • HopBase: a unified resource for Humulus genomics

    • Authors: Hill ST; Sudarsanam R, Henning J, et al.
      Abstract: Hop (Humulus lupulus L. var lupulus) is a dioecious plant of worldwide significance, used primarily for bittering and flavoring in brewing beer. Studies on the medicinal properties of several unique compounds produced by hop have led to additional interest from pharmacy and healthcare industries as well as livestock production as a natural antibiotic. Genomic research in hop has resulted a published draft genome and transcriptome assemblies. As research into the genomics of hop has gained interest, there is a critical need for centralized online genomic resources. To support the growing research community, we report the development of an online resource "" In addition to providing a gene annotation to the existing Shinsuwase draft genome, HopBase makes available genome assemblies and annotations for both the cultivar “Teamaker” and male hop accession number USDA 21422M. These genome assemblies, gene annotations, along with other common data, coupled with a genome browser and BLAST database enable the hop community to enter the genomic age. The HopBase genomic resource is accessible at and
      PubDate: 2017-04-06
  • Chemical-induced disease relation extraction via convolutional neural

    • Authors: Gu J; Sun F, Qian L, et al.
      Abstract: This article describes our work on the BioCreative-V chemical–disease relation (CDR) extraction task, which employed a maximum entropy (ME) model and a convolutional neural network model for relation extraction at inter- and intra-sentence level, respectively. In our work, relation extraction between entity concepts in documents was simplified to relation extraction between entity mentions. We first constructed pairs of chemical and disease mentions as relation instances for training and testing stages, then we trained and applied the ME model and the convolutional neural network model for inter- and intra-sentence level, respectively. Finally, we merged the classification results from mention level to document level to acquire the final relations between chemical and disease concepts. The evaluation on the BioCreative-V CDR corpus shows the effectiveness of our proposed approach.Database URL:
      PubDate: 2017-04-02
  • NaviCom: a web application to create interactive molecular network
           portraits using multi-level omics data

    • Authors: Dorel M; Viara E, Barillot E, et al.
      Abstract: Human diseases such as cancer are routinely characterized by high-throughput molecular technologies, and multi-level omics data are accumulated in public databases at increasing rate. Retrieval and visualization of these data in the context of molecular network maps can provide insights into the pattern of regulation of molecular functions reflected by an omics profile. In order to make this task easy, we developed NaviCom, a Python package and web platform for visualization of multi-level omics data on top of biological network maps. NaviCom is bridging the gap between cBioPortal, the most used resource of large-scale cancer omics data and NaviCell, a data visualization web service that contains several molecular network map collections. NaviCom proposes several standardized modes of data display on top of molecular network maps, allowing addressing specific biological questions. We illustrate how users can easily create interactive network-based cancer molecular portraits via NaviCom web interface using the maps of Atlas of Cancer Signalling Network (ACSN) and other maps. Analysis of these molecular portraits can help in formulating a scientific hypothesis on the molecular mechanisms deregulated in the studied disease.Database URL: NaviCom is available at
      PubDate: 2017-04-02
  • Automated PDF highlighting to support faster curation of literature for
           Parkinson’s and Alzheimer’s disease

    • Authors: Wu H; Oellrich A, Girges C, et al.
      Abstract: Neurodegenerative disorders such as Parkinson’s and Alzheimer’s disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator. Using techniques similar to those of summarisation, we developed an algorithm that relies on linguistic, semantic and spatial features. Employing this algorithm on a test set manually corrected for tool imprecision, we achieved a macro F1-measure of 0.51, which is an increase of 132% compared to the best bag-of-words baseline model. A user based evaluation was also conducted to assess the usefulness of the methodology on 40 unseen publications, which reveals that in 85% of cases all highlighted sentences are relevant to the curation task and in about 65% of the cases, the highlights are sufficient to support the knowledge curation task without needing to consult the full text. In conclusion, we believe that these are promising results for a step in automating the recognition of curation-relevant sentences. Refining our approach to pre-digest papers will lead to faster processing and cost reduction in the curation process.Database URL:
      PubDate: 2017-03-27
  • Effective biomedical document classification for identifying publications
           relevant to the mouse Gene Expression Database (GXD)

    • Authors: Jiang X; Ringwald M, Blake J, et al.
      Abstract: The Gene Expression Database (GXD) is a comprehensive online database within the Mouse Genome Informatics resource, aiming to provide available information about endogenous gene expression during mouse development. The information stems primarily from many thousands of biomedical publications that database curators must go through and read. Given the very large number of biomedical papers published each year, automatic document classification plays an important role in biomedical research. Specifically, an effective and efficient document classifier is needed for supporting the GXD annotation workflow. We present here an effective yet relatively simple classification scheme, which uses readily available tools while employing feature selection, aiming to assist curators in identifying publications relevant to GXD. We examine the performance of our method over a large manually curated dataset, consisting of more than 25 000 PubMed abstracts, of which about half are curated as relevant to GXD while the other half as irrelevant to GXD. In addition to text from title-and-abstract, we also consider image captions, an important information source that we integrate into our method. We apply a captions-based classifier to a subset of about 3300 documents, for which the full text of the curated articles is available. The results demonstrate that our proposed approach is robust and effectively addresses the GXD document classification. Moreover, using information obtained from image captions clearly improves performance, compared to title and abstract alone, affirming the utility of image captions as a substantial evidence source for automatically determining the relevance of biomedical publications to a specific subject area.Database
      PubDate: 2017-03-24
  • GrTEdb: the first web-based database of transposable elements in cotton (
           Gossypium raimondii)

    • Authors: Xu Z; Liu J, Ni W, et al.
      Abstract: Although several diploid and tetroploid Gossypium species genomes have been sequenced, the well annotated web-based transposable elements (TEs) database is lacking. To better understand the roles of TEs in structural, functional and evolutionary dynamics of the cotton genome, a comprehensive, specific, and user-friendly web-based database, Gossypium raimondii transposable elements database (GrTEdb), was constructed. A total of 14 332 TEs were structurally annotated and clearly categorized in G. raimondii genome, and these elements have been classified into seven distinct superfamilies based on the order of protein-coding domains, structures and/or sequence similarity, including 2929 Copia-like elements, 10 368 Gypsy-like elements, 299 L1, 12 Mutators, 435 PIF-Harbingers, 275 CACTAs and 14 Helitrons. Meanwhile, the web-based sequence browsing, searching, downloading and blast tool were implemented to help users easily and effectively to annotate the TEs or TE fragments in genomic sequences from G. raimondii and other closely related Gossypium species. GrTEdb provides resources and information related with TEs in G. raimondii, and will facilitate gene and genome analyses within or across Gossypium species, evaluating the impact of TEs on their host genomes, and investigating the potential interaction between TEs and protein-coding genes in Gossypium species.Database URL:
      PubDate: 2017-03-24
  • TMPL: a database of experimental and theoretical transmembrane protein
           models positioned in the lipid bilayer

    • Authors: Postic G; Ghouzam Y, Etchebest C, et al.
      Abstract: Knowing the position of protein structures within the membrane is crucial for fundamental and applied research in the field of molecular biology. Only few web resources propose coordinate files of oriented transmembrane proteins, and these exclude predicted structures, although they represent the largest part of the available models. In this article, we present TMPL (, a database of transmembrane protein structures (α-helical and β-sheet) positioned in the lipid bilayer. It is the first database to include theoretical models of transmembrane protein structures, making it a large repository with more than 11 000 entries. The TMPL database also contains experimentally solved protein structures, which are available as either atomistic or coarse-grained models. A unique feature of TMPL is the possibility for users to update the database by uploading, through an intuitive web interface, the membrane assignments they can obtain with our recent OREMPRO web server.
      PubDate: 2017-03-24
  • WikiGenomes: an open web application for community consumption and
           curation of gene annotation data in Wikidata

    • Authors: Putman TE; Lelong S, Burgstaller-Muehlbacher S, et al.
      Abstract: With the advancement of genome-sequencing technologies, new genomes are being sequenced daily. Although these sequences are deposited in publicly available data warehouses, their functional and genomic annotations (beyond genes which are predicted automatically) mostly reside in the text of primary publications. Professional curators are hard at work extracting those annotations from the literature for the most studied organisms and depositing them in structured databases. However, the resources don’t exist to fund the comprehensive curation of the thousands of newly sequenced organisms in this manner. Here, we describe WikiGenomes (, a web application that facilitates the consumption and curation of genomic data by the entire scientific community. WikiGenomes is based on Wikidata, an openly editable knowledge graph with the goal of aggregating published knowledge into a free and open database. WikiGenomes empowers the individual genomic researcher to contribute their expertise to the curation effort and integrates the knowledge into Wikidata, enabling it to be accessed by anyone without restriction.Database URL:
      PubDate: 2017-03-24
  • ABCMdb reloaded: updates on mutations in ATP binding cassette proteins

    • Authors: Tordai H; Jakab K, Gyimesi G, et al.
      Abstract: ABC (ATP-Binding Cassette) proteins with altered function are responsible for numerous human diseases. To aid the selection of positions and amino acids for ABC structure/function studies we have generated a database, ABCMdb (Gyimesi et al., ABCMdb: a database for the comparative analysis of protein mutations in ABC transporters, and a potential framework for a general application. Hum Mutat 2012; 33:1547–1556.), with interactive tools. The database has been populated with mentions of mutations extracted from full text papers, alignments and structural models. In the new version of the database we aimed to collect the effect of mutations from databases including ClinVar. Because of the low number of available data, even in the case of the widely studied disease-causing ABC proteins, we also included the possible effects of mutations based on SNAP2 and PROVEAN predictions. To aid the interpretation of variations in non-coding regions, the database was supplemented with related DNA level information. Our results emphasize the importance of in silico predictions because of the sparse information available on variants and suggest that mutations at analogous positions in homologous ABC proteins have a strong predictive power for the effects of mutations. Our improved ABCMdb advances the design of both experimental studies and meta-analyses in order to understand drug interactions of ABC proteins and the effects of mutations on functional expression.Database URL:
      PubDate: 2017-03-18
  • AnnoSys—implementation of a generic annotation system for schema-based
           data using the example of biodiversity collection data

    • Authors: Suhrbier LL; Kusber WH, Tschöpe OO, et al.
      Abstract: Biological research collections holding billions of specimens world-wide provide the most important baseline information for systematic biodiversity research. Increasingly, specimen data records become available in virtual herbaria and data portals. The traditional (physical) annotation procedure fails here, so that an important pathway of research documentation and data quality control is broken. In order to create an online annotation system, we analysed, modeled and adapted traditional specimen annotation workflows. The AnnoSys system accesses collection data from either conventional web resources or the Biological Collection Access Service (BioCASe) and accepts XML-based data standards like ABCD or DarwinCore. It comprises a searchable annotation data repository, a user interface, and a subscription based message system. We describe the main components of AnnoSys and its current and planned interoperability with biodiversity data portals and networks. Details are given on the underlying architectural model, which implements the W3C OpenAnnotation model and allows the adaptation of AnnoSys to different problem domains. Advantages and disadvantages of different digital annotation and feedback approaches are discussed. For the biodiversity domain, AnnoSys proposes best practice procedures for digital annotations of complex records.Database URL:
      PubDate: 2017-03-18
  • Better living through ontologies at the Immune Epitope Database

    • Authors: Vita R; Overton JA, Sette A, et al.
      Abstract: The Immune Epitope Database (IEDB) project incorporates independently developed ontologies and controlled vocabularies into its curation and search interface. This simplifies curation practices, improves the user query experience and facilitates interoperability between the IEDB and other resources. While the use of independently developed ontologies has long been recommended as a best practice, there continues to be a significant number of projects that develop their own vocabularies instead, or that do not fully utilize the power of ontologies that they are using. We describe how we use ontologies in the IEDB, providing a concrete example of the benefits of ontologies in practice.Database
      PubDate: 2017-03-18
  • Biocuration in the structure–function linkage database: the anatomy
           of a superfamily

    • Authors: Holliday GL; Brown SD, Akiva E, et al.
      Abstract: With ever-increasing amounts of sequence data available in both the primary literature and sequence repositories, there is a bottleneck in annotating molecular function to a sequence. This article describes the biocuration process and methods used in the structure-function linkage database (SFLD) to help address some of the challenges. We discuss how the hierarchy within the SFLD allows us to infer detailed functional properties for functionally diverse enzyme superfamilies in which all members are homologous, conserve an aspect of their chemical function and have associated conserved structural features that enable the chemistry. Also presented is the Enzyme Structure-Function Ontology (ESFO), which has been designed to capture the relationships between enzyme sequence, structure and function that underlie the SFLD and is used to guide the biocuration processes within the SFLD.Database URL:
      PubDate: 2017-03-18
  • Ensembl core software resources: storage and programmatic access for DNA
           sequence and genome annotation

    • Authors: Ruffier M; Kähäri A, Komorowska M, et al.
      Abstract: The Ensembl software resources are a stable infrastructure to store, access and manipulate genome assemblies and their functional annotations. The Ensembl ‘Core’ database and Application Programming Interface (API) was our first major piece of software infrastructure and remains at the centre of all of our genome resources. Since its initial design more than fifteen years ago, the number of publicly available genomic, transcriptomic and proteomic datasets has grown enormously, accelerated by continuous advances in DNA-sequencing technology. Initially intended to provide annotation for the reference human genome, we have extended our framework to support the genomes of all species as well as richer assembly models. Cross-referenced links to other informatics resources facilitate searching our database with a variety of popular identifiers such as UniProt and RefSeq. Our comprehensive and robust framework storing a large diversity of genome annotations in one location serves as a platform for other groups to generate and maintain their own tailored annotation. We welcome reuse and contributions: our databases and APIs are publicly available, all of our source code is released with a permissive Apache v2.0 licence at and we have an active developer mailing list ( URL:
      PubDate: 2017-03-18
  • Literature consistency of bioinformatics sequence databases is effective
           for assessing record quality

    • Authors: Bouadjenek M; Verspoor K, Zobel J.
      Abstract: Bioinformatics sequence databases such as Genbank or UniProt contain hundreds of millions of records of genomic data. These records are derived from direct submissions from individual laboratories, as well as from bulk submissions from large-scale sequencing centres; their diversity and scale means that they suffer from a range of data quality issues including errors, discrepancies, redundancies, ambiguities, incompleteness and inconsistencies with the published literature. In this work, we seek to investigate and analyze the data quality of sequence databases from the perspective of a curator, who must detect anomalous and suspicious records. Specifically, we emphasize the detection of inconsistent records with respect to the literature. Focusing on GenBank, we propose a set of 24 quality indicators, which are based on treating a record as a query into the published literature, and then use query quality predictors. We then carry out an analysis that shows that the proposed quality indicators and the quality of the records have a mutual relationship, in which one depends on the other. We propose to represent record-literature consistency as a vector of these quality indicators. By reducing the dimensionality of this representation for visualization purposes using principal component analysis, we show that records which have been reported as inconsistent with the literature fall roughly in the same area, and therefore share similar characteristics. By manually analyzing records not previously known to be erroneous that fall in the same area than records know to be inconsistent, we show that one record out of four is inconsistent with respect to the literature. This high density of inconsistent record opens the way towards the development of automatic methods for the detection of faulty records. We conclude that literature inconsistency is a meaningful strategy for identifying suspicious records.Database URL:
      PubDate: 2017-03-18
  • MiDAS 2.0: an ecosystem-specific taxonomy and online database for the
           organisms of wastewater treatment systems expanded for anaerobic digester

    • Authors: McIlroy S; Kirkegaard R, McIlroy B, et al.
      Abstract: Wastewater is increasingly viewed as a resource, with anaerobic digester technology being routinely implemented for biogas production. Characterising the microbial communities involved in wastewater treatment facilities and their anaerobic digesters is considered key to their optimal design and operation. Amplicon sequencing of the 16S rRNA gene allows high-throughput monitoring of these systems. The MiDAS field guide is a public resource providing amplicon sequencing protocols and an ecosystem-specific taxonomic database optimized for use with wastewater treatment facility samples. The curated taxonomy endeavours to provide a genus-level-classification for abundant phylotypes and the online field guide links this identity to published information regarding their ecology, function and distribution. This article describes the expansion of the database resources to cover the organisms of the anaerobic digester systems fed primary sludge and surplus activated sludge. The updated database includes descriptions of the abundant genus-level-taxa in influent wastewater, activated sludge and anaerobic digesters. Abundance information is also included to allow assessment of the role of emigration in the ecology of each phylotype. MiDAS is intended as a collaborative resource for the progression of research into the ecology of wastewater treatment, by providing a public repository for knowledge that is accessible to all interested in these biotechnologically important systems.Database URL:
      PubDate: 2017-03-18
  • miRnalyze: an interactive database linking tool to unlock intuitive
           microRNA regulation of cell signaling pathways

    • Authors: Subhra Das S; James M, Paul S, et al.
      Abstract: The various pathophysiological processes occurring in living systems are known to be orchestrated by delicate interplays and cross-talks between different genes and their regulators. Among the various regulators of genes, there is a class of small non-coding RNA molecules known as microRNAs. Although, the relative simplicity of miRNAs and their ability to modulate cellular processes make them attractive therapeutic candidates, their presence in large numbers make it challenging for experimental researchers to interpret the intricacies of the molecular processes they regulate. Most of the existing bioinformatic tools fail to address these challenges. Here, we present a new web resource ‘miRnalyze’ that has been specifically designed to directly identify the putative regulation of cell signaling pathways by miRNAs. The tool integrates miRNA-target predictions with signaling cascade members by utilizing TargetScanHuman 7.1 miRNA-target prediction tool and the KEGG pathway database, and thus provides researchers with in-depth insights into modulation of signal transduction pathways by miRNAs. miRnalyze is capable of identifying common miRNAs targeting more than one gene in the same signaling pathway—a feature that further increases the probability of modulating the pathway and downstream reactions when using miRNA modulators. Additionally, miRnalyze can sort miRNAs according to the seed-match types and TargetScan Context ++ score, thus providing a hierarchical list of most valuable miRNAs. Furthermore, in order to provide users with comprehensive information regarding miRNAs, genes and pathways, miRnalyze also links to expression data of miRNAs (miRmine) and genes (TiGER) and proteome abundance (PaxDb) data. To validate the capability of the tool, we have documented the correlation of miRnalyze’s prediction with experimental confirmation studies.Database URL:
      PubDate: 2017-03-18
  • Strategies towards digital and semi-automated curation in RegulonDB

    • Authors: Rinaldi F; Lithgow O, Gama-Castro S, et al.
      Abstract: Experimentally generated biological information needs to be organized and structured in order to become meaningful knowledge. However, the rate at which new information is being published makes manual curation increasingly unable to cope. Devising new curation strategies that leverage upon data mining and text analysis is, therefore, a promising avenue to help life science databases to cope with the deluge of novel information. In this article, we describe the integration of text mining technologies in the curation pipeline of the RegulonDB database, and discuss how the process can enhance the productivity of the curators.Specifically, a named entity recognition approach is used to pre-annotate terms referring to a set of domain entities which are potentially relevant for the curation process. The annotated documents are presented to the curator, who, thanks to a custom-designed interface, can select sentences containing specific types of entities, thus restricting the amount of text that needs to be inspected. Additionally, a module capable of computing semantic similarity between sentences across the entire collection of articles to be curated is being integrated in the system. We tested the module using three sets of scientific articles and six domain experts. All these improvements are gradually enabling us to obtain a high throughput curation process with the same quality as manual curation.
      PubDate: 2017-03-18
  • The HIV oligonucleotide database (HIVoligoDB)

    • Authors: Carneiro J; Resende A, Pereira F.
      Abstract: The human immunodeficiency virus (HIV) is associated with one of the most widespread infectious disease, the acquired immunodeficiency syndrome (AIDS). The development of antiretroviral drugs and methods for virus detection requires a comprehensive analysis of the HIV genomic diversity, particularly in the binding sites of oligonucleotides. Here, we describe a versatile online database (HIVoligoDB) with oligonucleotides selected for the diagnosis of HIV and treatment of AIDS. Currently, the database provides an interface for visualization, analysis and download of 380 HIV-1 and 65 HIV-2 oligonucleotides annotated according to curated reference genomes. The database also allows the selection of the most conserved HIV genomic regions for the development of molecular diagnostic assays and sequence-based candidate therapeutics.Database URL:
      PubDate: 2017-03-18
  • The ‘straight mouse’: defining anatomical axes in 3D embryo

    • Authors: Armit C; Hill B, Venkataraman SS, et al.
      Abstract: A primary objective of the eMouseAtlas Project is to enable 3D spatial mapping of whole embryo gene expression data to capture complex 3D patterns for indexing, visualization, cross-comparison and analysis. For this we have developed a spatio-temporal framework based on 3D models of embryos at different stages of development coupled with an anatomical ontology. Here we introduce a method of defining coordinate axes that correspond to the anatomical or biologically relevant anterior–posterior (A–P), dorsal–ventral (D–V) and left–right (L–R) directions. These enable more sophisticated query and analysis of the data with biologically relevant associations, and provide novel data visualizations that can reveal patterns that are otherwise difficult to detect in the standard 3D coordinate space. These anatomical coordinates are defined using the concept of a ‘straight mouse-embryo’ within which the anatomical coordinates are Cartesian. The straight embryo model has been mapped via a complex non-linear transform onto the standard embryo model. We explore the utility of this anatomical coordinate system in elucidating the spatial relationship of spatially mapped embryonic ‘Fibroblast growth factor’ gene expression patterns, and we discuss the importance of this technology in summarizing complex multimodal mouse embryo image data from gene expression and anatomy studies.Database
      PubDate: 2017-03-11
  • Curated protein information in the Saccharomyces genome database

    • Authors: Hellerstedt ST; Nash RS, Weng S, et al.
      Abstract: Due to recent advancements in the production of experimental proteomic data, the Saccharomyces genome database (SGD; has been expanding our protein curation activities to make new data types available to our users. Because of broad interest in post-translational modifications (PTM) and their importance to protein function and regulation, we have recently started incorporating expertly curated PTM information on individual protein pages. Here we also present the inclusion of new abundance and protein half-life data obtained from high-throughput proteome studies. These new data types have been included with the aim to facilitate cellular biology research.Database URL:
      PubDate: 2017-03-11
  • Boechera microsatellite website: an online portal for species
           identification and determination of hybrid parentage

    • Authors: Li F; Rushworth CA, Beck JB, et al.
      Abstract: Boechera (Brassicaceae) has many features to recommend it as a model genus for ecological and evolutionary research, including species richness, ecological diversity, experimental tractability and close phylogenetic proximity to Arabidopsis. However, efforts to realize the full potential of this model system have been thwarted by the frequent inability of researchers to identify their samples and place them in a broader evolutionary context. Here we present the Boechera Microsatellite Website (BMW), a portal that archives over 55 000 microsatellite allele calls from 4471 specimens (including 133 nomenclatural types). The portal includes analytical tools that utilize data from 15 microsatellite loci as a highly effective DNA barcoding system. The BMW facilitates the accurate identification of Boechera samples and the investigation of reticulate evolution among the ±83 sexual diploid taxa in the genus, thereby greatly enhancing Boechera’s potential as a model system.Database URL:
      PubDate: 2017-02-27
  • Actionable, long-term stable and semantic web compatible identifiers for
           access to biological collection objects

    • Authors: Güntsch A; Hyam R, Hagedorn G, et al.
      Abstract: With biodiversity research activities being increasingly shifted to the web, the need for a system of persistent and stable identifiers for physical collection objects becomes increasingly pressing. The Consortium of European Taxonomic Facilities agreed on a common system of HTTP-URI-based stable identifiers which is now rolled out to its member organizations. The system follows Linked Open Data principles and implements redirection mechanisms to human-readable and machine-readable representations of specimens facilitating seamless integration into the growing semantic web. The implementation of stable identifiers across collection organizations is supported with open source provider software scripts, best practices documentations and recommendations for RDF metadata elements facilitating harmonized access to collection information in web portals.Database URL:
      PubDate: 2017-02-26
  • BELMiner: adapting a rule-based relation extraction system to extract
           biological expression language statements from bio-medical literature
           evidence sentences

    • Authors: Ravikumar KE; Rastegar-Mojarad M, Liu H.
      Abstract: Extracting meaningful relationships with semantic significance from biomedical literature is often a challenging task. BioCreative V track4 challenge for the first time has organized a comprehensive shared task to test the robustness of the text-mining algorithms in extracting semantically meaningful assertions from the evidence statement in biomedical text. In this work, we tested the ability of a rule-based semantic parser to extract Biological Expression Language (BEL) statements from evidence sentences culled out of biomedical literature as part of BioCreative V Track4 challenge. The system achieved an overall best F-measure of 21.29% in extracting the complete BEL statement. For relation extraction, the system achieved an F-measure of 65.13% on test data set. Our system achieved the best performance in five of the six criteria that was adopted for evaluation by the task organizers. Lack of ability to derive semantic inferences, limitation in the rule sets to map the textual extractions to BEL function were some of the reasons for low performance in extracting the complete BEL statement. Post shared task we also evaluated the impact of differential NER components on the ability to extract BEL statements on the test data sets besides making a single change in the rule sets that translate relation extractions into a BEL statement. There is a marked improvement by over 20% in the overall performance of the BELMiner’s capability to extract BEL statement on the test set. The system is available as a REST-API at URL:
      PubDate: 2017-02-26
  • Carotenoids Database: structures, chemical fingerprints and distribution
           among organisms

    • Authors: Yabuzaki J.
      Abstract: To promote understanding of how organisms are related via carotenoids, either evolutionarily or symbiotically, or in food chains through natural histories, we built the Carotenoids Database. This provides chemical information on 1117 natural carotenoids with 683 source organisms. For extracting organisms closely related through the biosynthesis of carotenoids, we offer a new similarity search system ‘Search similar carotenoids’ using our original chemical fingerprint ‘Carotenoid DB Chemical Fingerprints’. These Carotenoid DB Chemical Fingerprints describe the chemical substructure and the modification details based upon International Union of Pure and Applied Chemistry (IUPAC) semi-systematic names of the carotenoids. The fingerprints also allow (i) easier prediction of six biological functions of carotenoids: provitamin A, membrane stabilizers, odorous substances, allelochemicals, antiproliferative activity and reverse MDR activity against cancer cells, (ii) easier classification of carotenoid structures, (iii) partial and exact structure searching and (iv) easier extraction of structural isomers and stereoisomers. We believe this to be the first attempt to establish fingerprints using the IUPAC semi-systematic names. For extracting close profiled organisms, we provide a new tool ‘Search similar profiled organisms’. Our current statistics show some insights into natural history: carotenoids seem to have been spread largely by bacteria, as they produce C30, C40, C45 and C50 carotenoids, with the widest range of end groups, and they share a small portion of C40 carotenoids with eukaryotes. Archaea share an even smaller portion with eukaryotes. Eukaryotes then have evolved a considerable variety of C40 carotenoids. Considering carotenoids, eukaryotes seem more closely related to bacteria than to archaea aside from 16S rRNA lineage analysis.Database URL:
      PubDate: 2017-02-26
  • OCaPPI-Db: an oligonucleotide probe database for pathogen identification
           through hybridization capture

    • Authors: Gasc C; Constantin A, Jaziri F, et al.
      Abstract: The detection and identification of bacterial pathogens involved in acts of bio- and agroterrorism are essential to avoid pathogen dispersal in the environment and propagation within the population. Conventional molecular methods, such as PCR amplification, DNA microarrays or shotgun sequencing, are subject to various limitations when assessing environmental samples, which can lead to inaccurate findings. We developed a hybridization capture strategy that uses a set of oligonucleotide probes to target and enrich biomarkers of interest in environmental samples. Here, we present Oligonucleotide Capture Probes for Pathogen Identification Database (OCaPPI-Db), an online capture probe database containing a set of 1,685 oligonucleotide probes allowing for the detection and identification of 30 biothreat agents up to the species level. This probe set can be used in its entirety as a comprehensive diagnostic tool or can be restricted to a set of probes targeting a specific pathogen or virulence factor according to the user’s needs.Database URL:
      PubDate: 2017-02-26
  • Outreach and online training services at the Saccharomyces Genome Database

    • Authors: MacPherson KA; Starr B, Wong ED, et al.
      Abstract: The Saccharomyces Genome Database (SGD;, the primary genetics and genomics resource for the budding yeast S. cerevisiae, provides free public access to expertly curated information about the yeast genome and its gene products. As the central hub for the yeast research community, SGD engages in a variety of social outreach efforts to inform our users about new developments, promote collaboration, increase public awareness of the importance of yeast to biomedical research, and facilitate scientific discovery. Here we describe these various outreach methods, from networking at scientific conferences to the use of online media such as blog posts and webinars, and include our perspectives on the benefits provided by outreach activities for model organism databases.Database URL:
      PubDate: 2017-02-26
  • PCPPI: a comprehensive database for the prediction of Penicillium –crop
           protein–protein interactions

    • Authors: Yue J; Zhang D, Ban R, et al.
      Abstract: Penicillium expansum, the causal agent of blue mold, is one of the most prevalent post-harvest pathogens, infecting a wide range of crops after harvest. In response, crops have evolved various defense systems to protect themselves against this and other pathogens. Penicillium–crop interaction is a multifaceted process and mediated by pathogen- and host-derived proteins. Identification and characterization of the inter-species protein–protein interactions (PPIs) are fundamental to elucidating the molecular mechanisms underlying infection processes between P. expansum and plant crops. Here, we have developed PCPPI, the Penicillium-Crop Protein–Protein Interactions database, which is constructed based on the experimentally determined orthologous interactions in pathogen–plant systems and available domain–domain interactions (DDIs) in each PPI. Thus far, it stores information on 9911 proteins, 439 904 interactions and seven host species, including apple, kiwifruit, maize, pear, rice, strawberry and tomato. Further analysis through the gene ontology (GO) annotation indicated that proteins with more interacting partners tend to execute the essential function. Significantly, semantic statistics of the GO terms also provided strong support for the accuracy of our predicted interactions in PCPPI. We believe that all the PCPPI datasets are helpful to facilitate the study of pathogen-crop interactions and freely available to the research community.Database URL:
      PubDate: 2017-02-26
  • SilkPathDB: a comprehensive resource for the study of silkworm pathogens

    • Authors: Li T; Pan G, Vossbrinck CR, et al.
      Abstract: Silkworm pathogens have been heavily impeding the development of sericultural industry and play important roles in lepidopteran ecology, and some of which are used as biological insecticides. Rapid advances in studies on the omics of silkworm pathogens have produced a large amount of data, which need to be brought together centrally in a coherent and systematic manner. This will facilitate the reuse of these data for further analysis. We have collected genomic data for 86 silkworm pathogens from 4 taxa (fungi, microsporidia, bacteria and viruses) and from 4 lepidopteran hosts, and developed the open-access Silkworm Pathogen Database (SilkPathDB) to make this information readily available. The implementation of SilkPathDB involves integrating Drupal and GBrowse as a graphic interface for a Chado relational database which houses all of the datasets involved. The genomes have been assembled and annotated for comparative purposes and allow the search and analysis of homologous sequences, transposable elements, protein subcellular locations, including secreted proteins, and gene ontology. We believe that the SilkPathDB will aid researchers in the identification of silkworm parasites, understanding the mechanisms of silkworm infections, and the developmental ecology of silkworm parasites (gene expression) and their hosts.Database URL:
      PubDate: 2017-02-26
  • VerSeDa: vertebrate secretome database

    • Authors: Cortazar AR; Oguiza JA, Aransay AM, et al.
      Abstract: Based on the current tools, de novo secretome (full set of proteins secreted by an organism) prediction is a time consuming bioinformatic task that requires a multifactorial analysis in order to obtain reliable in silico predictions. Hence, to accelerate this process and offer researchers a reliable repository where secretome information can be obtained for vertebrates and model organisms, we have developed VerSeDa (Vertebrate Secretome Database). This freely available database stores information about proteins that are predicted to be secreted through the classical and non-classical mechanisms, for the wide range of vertebrate species deposited at the NCBI, UCSC and ENSEMBL sites. To our knowledge, VerSeDa is the only state-of-the-art database designed to store secretome data from multiple vertebrate genomes, thus, saving an important amount of time spent in the prediction of protein features that can be retrieved from this repository directly.Database URL: VerSeDa is freely available at
      PubDate: 2017-02-24
  • Correction notice for booking in

    • Abstract: doi: 10.1093/database/baw167
      PubDate: 2017-02-10
  • Automatic query generation using word embeddings for retrieving passages
           describing experimental methods

    • Authors: Aydın F; Hüsünbeyi Z, Özgür A.
      Abstract: Information regarding the physical interactions among proteins is crucial, since protein–protein interactions (PPIs) are central for many biological processes. The experimental techniques used to verify PPIs are vital for characterizing and assessing the reliability of the identified PPIs. A lot of information about PPIs and the experimental methods are only available in the text of the scientific publications that report them. In this study, we approach the problem of identifying passages with experimental methods for physical interactions between proteins as an information retrieval search task. The baseline system is based on query matching, where the queries are generated by utilizing the names (including synonyms) of the experimental methods in the Proteomics Standard Initiative–Molecular Interactions (PSI-MI) ontology. We propose two methods, where the baseline queries are expanded by including additional relevant terms. The first method is a supervised approach, where the most salient terms for each experimental method are obtained by using the term frequency–relevance frequency (tf.rf) metric over 13 articles from our manually annotated data set of 30 full text articles, which is made publicly available. On the other hand, the second method is an unsupervised approach, where the queries for each experimental method are expanded by using the word embeddings of the names of the experimental methods in the PSI-MI ontology. The word embeddings are obtained by utilizing a large unlabeled full text corpus. The proposed methods are evaluated on the test set consisting of 17 articles. Both methods obtain higher recall scores compared with the baseline, with a loss in precision. Besides higher recall, the word embeddings based approach achieves higher F-measure than the baseline and the tf.rf based methods. We also show that incorporating gene name and interaction keyword identification leads to improved precision and F-measure scores for all three evaluated methods. The tf.rf based approach was developed as part of our participation in the Collaborative Biocurator Assistant Task of the BioCreative V challenge assessment, whereas the word embeddings based approach is a novel contribution of this article.Database URL:
      PubDate: 2017-01-10
  • blend4php: a PHP API for galaxy

    • Authors: Wytko C; Soto B, Ficklin SP.
      Abstract: Galaxy is a popular framework for execution of complex analytical pipelines typically for large data sets, and is a commonly used for (but not limited to) genomic, genetic and related biological analysis. It provides a web front-end and integrates with high performance computing resources. Here we report the development of the blend4php library that wraps Galaxy’s RESTful API into a PHP-based library. PHP-based web applications can use blend4php to automate execution, monitoring and management of a remote Galaxy server, including its users, workflows, jobs and more. The blend4php library was specifically developed for the integration of Galaxy with Tripal, the open-source toolkit for the creation of online genomic and genetic web sites. However, it was designed as an independent library for use by any application, and is freely available under version 3 of the GNU Lesser General Public License (LPGL v3.0) at URL:
      PubDate: 2017-01-10
  • Duplicates, redundancies and inconsistencies in the primary nucleotide
           databases: a descriptive study

    • Authors: Chen Q; Zobel J, Verspoor K.
      Abstract: GenBank, the EMBL European Nucleotide Archive and the DNA DataBank of Japan, known collectively as the International Nucleotide Sequence Database Collaboration or INSDC, are the three most significant nucleotide sequence databases. Their records are derived from laboratory work undertaken by different individuals, by different teams, with a range of technologies and assumptions and over a period of decades. As a consequence, they contain a great many duplicates, redundancies and inconsistencies, but neither the prevalence nor the characteristics of various types of duplicates have been rigorously assessed. Existing duplicate detection methods in bioinformatics only address specific duplicate types, with inconsistent assumptions; and the impact of duplicates in bioinformatics databases has not been carefully assessed, making it difficult to judge the value of such methods. Our goal is to assess the scale, kinds and impact of duplicates in bioinformatics databases, through a retrospective analysis of merged groups in INSDC databases. Our outcomes are threefold: (1) We analyse a benchmark dataset consisting of duplicates manually identified in INSDC—a dataset of 67 888 merged groups with 111 823 duplicate pairs across 21 organisms from INSDC databases – in terms of the prevalence, types and impacts of duplicates. (2) We categorize duplicates at both sequence and annotation level, with supporting quantitative statistics, showing that different organisms have different prevalence of distinct kinds of duplicate. (3) We show that the presence of duplicates has practical impact via a simple case study on duplicates, in terms of GC content and melting temperature. We demonstrate that duplicates not only introduce redundancy, but can lead to inconsistent results for certain tasks. Our findings lead to a better understanding of the problem of duplication in biological databases.Database URL: the merged records are available at
      PubDate: 2017-01-10
  • FARME DB: a functional antibiotic resistance element database

    • Authors: Wallace JC; Port JA, Smith MN, et al.
      Abstract: Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR sequences from clinical isolates using standard classification criteria. In addition, existing AR databases provide no information about flanking sequences containing regulatory or mobile genetic elements. To help address this issue, we created an annotated database of DNA and protein sequences derived exclusively from environmental metagenomic sequences showing AR in laboratory experiments. Our Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences and predicted protein sequences conferring AR as well as regulatory elements, mobile genetic elements and predicted proteins flanking antibiotic resistant genes. FARME is the first database to focus on functional metagenomic AR gene elements and provides a resource to better understand AR in the 99% of bacteria which cannot be cultured and the relationship between environmental AR sequences and antibiotic resistant genes derived from cultured isolates.Database URL:
      PubDate: 2017-01-10
  • KTCNlncDB—a first platform to investigate lncRNAs expressed in human
           keratoconus and non-keratoconus corneas

    • Authors: Szcześniak MW; Kabza M, Karolak JA, et al.
      Abstract: Keratoconus (KTCN, OMIM 148300) is a degenerative eye disorder characterized by progressive stromal thinning that leads to a conical shape of the cornea, resulting in optical aberrations and even loss of visual function. The biochemical background of the disease is poorly understood, which motivated us to perform RNA-Seq experiment, aimed at better characterizing the KTCN transcriptome and identification of long non-coding RNAs (lncRNAs) that might be involved in KTCN etiology. The in silico functional studies based on predicted lncRNA:RNA base-pairings led us to recognition of a number of lncRNAs possibly regulating genes with known or plausible links to KTCN. The lncRNA sequences and data regarding their predicted functions in controlling the RNA processing and stability are available for browse, search and download in KTCNlncDB (, the first online platform devoted to KTCN transcriptome.Database URL:
      PubDate: 2017-01-10
  • MAHMI database: a comprehensive MetaHit-based resource for the study of
           the mechanism of action of the human microbiota

    • Authors: Blanco-Míguez A; Gutiérrez-Jácome A, Fdez-Riverola F, et al.
      Abstract: The Mechanism of Action of the Human Microbiome (MAHMI) database is a unique resource that provides comprehensive information about the sequence of potential immunomodulatory and antiproliferative peptides encrypted in the proteins produced by the human gut microbiota. Currently, MAHMI database contains over 300 hundred million peptide entries, with detailed information about peptide sequence, sources and potential bioactivity. The reference peptide data section is curated manually by domain experts. The in silico peptide data section is populated automatically through the systematic processing of publicly available exoproteomes of the human microbiome. Bioactivity prediction is based on the global alignment of the automatically processed peptides with experimentally validated immunomodulatory and antiproliferative peptides, in the reference section. MAHMI provides researchers with a comparative tool for inspecting the potential immunomodulatory or antiproliferative bioactivity of new amino acidic sequences and identifying promising peptides to be further investigated. Moreover, researchers are welcome to submit new experimental evidence on peptide bioactivity, namely, empiric and structural data, as a proactive, expert means to keep the database updated and improve the implemented bioactivity prediction method. Bioactive peptides identified by MAHMI have a huge biotechnological potential, including the manipulation of aberrant immune responses and the design of new functional ingredients/foods based on the genetic sequences of the human microbiome. Hopefully, the resources provided by MAHMI will be useful to those researching gastrointestinal disorders of autoimmune and inflammatory nature, such as Inflammatory Bowel Diseases. MAHMI database is routinely updated and is available free of charge.Database URL:
      PubDate: 2017-01-10
  • RAIN: RNA–protein Association and Interaction Networks

    • Authors: Junge A; Refsgaard JC, Garde C, et al.
      Abstract: Protein association networks can be inferred from a range of resources including experimental data, literature mining and computational predictions. These types of evidence are emerging for non-coding RNAs (ncRNAs) as well. However, integration of ncRNAs into protein association networks is challenging due to data heterogeneity. Here, we present a database of ncRNA–RNA and ncRNA–protein interactions and its integration with the STRING database of protein–protein interactions. These ncRNA associations cover four organisms and have been established from curated examples, experimental data, interaction predictions and automatic literature mining. RAIN uses an integrative scoring scheme to assign a confidence score to each interaction. We demonstrate that RAIN outperforms the underlying microRNA-target predictions in inferring ncRNA interactions. RAIN can be operated through an easily accessible web interface and all interaction data can be downloaded.Database URL:
      PubDate: 2017-01-10
  • The BioC-BioGRID corpus: full text articles annotated for curation of
           protein–protein and genetic interactions

    • Authors: Islamaj Doğan R; Kim S, Chatr-aryamontri A, et al.
      Abstract: A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein–protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future uses of the BioC-BioGRID corpus are detailed in this report.Database URL:
      PubDate: 2017-01-10
  • FirebrowseR: an R client to the Broad Institute’s Firehose Pipeline

    • Authors: Deng M; Brägelmann J, Kryukov I, et al.
      Abstract: With its Firebrowse service ( the Broad Institute is making large-scale multi-platform omics data analysis results publicly available through a Representational State Transfer (REST) Application Programmable Interface (API). Querying this database through an API client from an arbitrary programming environment is an essential task, allowing other developers and researchers to focus on their analysis and avoid data wrangling. Hence, as a first result, we developed a workflow to automatically generate, test and deploy such clients for rapid response to API changes. Its underlying infrastructure, a combination of free and publicly available web services, facilitates the development of API clients. It decouples changes in server software from the client software by reacting to changes in the RESTful service and removing direct dependencies on a specific implementation of an API. As a second result, FirebrowseR, an R client to the Broad Institute’s RESTful Firehose Pipeline, is provided as a working example, which is built by the means of the presented workflow. The package’s features are demonstrated by an example analysis of cancer gene expression data.Database URL:
      PubDate: 2017-01-06
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
Home (Search)
Subjects A-Z
Publishers A-Z
Your IP address:
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2016