JournalTOCs Blog

News and Opinions about current awareness on new research

Archive for the ‘JOURNALTOCSAPI’ tag

Tackling the “Who are you?” question

without comments

Who are you?

Last December it was announced that an impressive number of important organisations has founded the ORCID Initiative (Open Researcher Contributor Identification Initiative). If this initiative succeeds in achieving its goal, unambiguous referencing of authors will become a problem of the past.

No wonder the initiative has produced high expectations. Who doesn’t want to take steps to avoid confusion when identifying authors? Basically what ORCID is proposing is to create a central Researcher Registry System, where each individual has a unique identifier that is linked to the individual’s research output. That sounds interesting for journalTOCs, where the emphasis is in the need for linking articles with their legitimate authors.

Thomson Reuters together with Nature Publishing are the main forces behind ORCID, and they are bringing into the ORCID infrastructure their ResearcherID index and Nature Network linking services. However, the participation of CrossRef means taking on board the concepts of Contributor ID and we will not be surprised to see that, at the end, CrossRef will be running the ORCID service. “a la DOI“? or “a la CrossCheck” (where iThenticate maintains the database and provides the software tools) with Thomson Reuters this time?

How much of Contributor ID will be in ORCID? What about the nice things (for the publishers) that ContributorID promised, such as helping the manuscript submission processes? (although the burden was put on the authors’ shoulders: no ID, no publish)

Free services such as journalTOCs API would certainly benefit from the establishment of unique identifiers for authors. But that it is only one side of the coin. If the publishers do not include the author IDs in their RSS feeds, we will still unable to do some interesting things, for example identifying new papers for Institutional Repositories. ORCID will be based on ResearcherID software (a proprietary product), but we expect that it will provide appropriate APIs to allow any external web application to query the ORCID database. Otherwise, ORCID will not be as good as it seems to be.

ORCID is mainly backed by the publishing industry (CrossRef is owned by this industry). Perhaps that is better. At least ORCID seems to be able to bring together all the main commercial initiatives so far, such as Elsevier ( Scopus Author Identifier) and ProQuest (Author Resolver) Anyway, what non-commercial alternatives we have? Many public efforts got stuck at the starting points, so it’s good that the commercial world try to come up with a workable solution, isn’t it?

OpenID is not relevant in this context because OpenID is about authentication. It doesn’t solve the “attribution of work” issue nor is free from the link-rot malediction.

Is the JISC funded Names Project going to be an alternative to ORCID? No. Names is a national-level JISC-funded project with MIMAS and the British Library as project partners. The British Library is a member of ORCID. So, we expect that Names will be collaborating with ORCID by providing expertise on requirements, data formats and process involved in identifying author names. However, Names has the possibility of becoming a national service, such as FRIDA (Norwegian National Research Database), DissOnline (Germany’s national dissertations database) or People Australia. The Names Project is also creating a database of authors with the purpose of testing the service prototype that the project is developing. Currently the Names API is operative but search queries always return cero hits. So, it seems that the database is practically empty. On the other hand, there is no information on the Names project web site about providing APIs to allow external data sources to upload data on the Names database.

And how the “Web of Data” paradigm fits in these efforts to enable machines to unambiguously identify individuals on the web? The subject is beyond journalTOCs remit and is being investigated, at the institutional level, by the new WattNames JISC funded project.

In summary, there’s plenty of agreement that this sort of author ID system is past due. As we are aware that this is not an easy problem to solve, even if ORCID has its issues and shortcomings, the current situation seems far worse than an imperfect solution.

Written by Santiago Chumbe

February 5th, 2010 at 6:51 pm

JournalTOCs break the 13,000 journals barrier

with one comment

The JournalTOCs Directory underwent a full upgrading and maintenance in the past week with the aim of restoring unresolvable TOC RSS feeds, such as dated or broken RSS links, or different URLs due the transferring of journal ownerships between publishers. Additionally, new journals that were recently approved by the JournalTOCs team or were found in the OPML files of registered publishers have also been added to the directory.

This successful expansion has brought up the number of journals with TOC RSS feeds in the journalTOCs Directory to a total of 13,340 journals(*), being the first time that we can offer the latest research published in more than 13,000 scholarly journals. The journal titles and the articles of these TOCs are searchable from the web (https://www.journaltocs.ac.uk) or via the journalTOCs API (https://www.journaltocs.ac.uk/api) as shown in the following examples (the search results are returned in enriched RSS 1.0 format):

To retrieve the details of a journal with e-ISSN 1741-9212 use:
https://www.journaltocs.ac.uk/api/journals/1741-9212

if you want to retrieve the articles published in the latest issue of this journal, just add ?output=articles at the end of the previous URL:
https://www.journaltocs.ac.uk/api/journals/1741-9212?output=articles

More examples and further information on the API can be found at https://www.journaltocs.ac.uk/docs/index.php

breaking the 13,000 journals barrier

(*) 13,340 is the maximum number of journals with valid TOC RSS feeds currently stored in our directory. However, the exact number of journal TOC RSS feeds available at JournalTOCs at any time is variable, as it depends on the number of RSS feeds that were successfully syndicated from the publishers’ sites by our harvester, which runs at scheduled times every day.

Written by Santiago Chumbe

January 25th, 2010 at 12:41 pm

Final Progress Post

with one comment

Title of Primary Project Output: JournalToCsAPI: An API to search current issues of journals for up-to-date content.

Screenshots or diagram of prototype:

Diagram of prototype

Use Cases:

Project Use cases

Description of Prototype:
JournalTOCs is a prototype of a web API(*) that uses the REST software architecture style to search the directory of journals and articles obtained directly from the publishers’ websites by aggregating their TOC RSS feeds (TOC: Table of Contents). The API produce search results in RSS 1.0 web feed format.

The API has four “calls”: journals, articles, user and institution. A “call” is a URL consisting of a base URL, the name of the call and the search query.

JournalTOCs API base URL is “https://www.journaltocs.ac.uk/api“. To use the API you need to combine the base URL and the name of one call. Most calls require a search query or otherwise they won’t work. Guidelines for using each of the calls is found here.

(*) API is an abbreviation of Application Programming Interface. An API is a software programme that enable interaction between two software applications.

End User of Prototype:
The end-user for this API is a developer wanting to combine journal TOC RSS feeds with multiple services into new applications known as mashups. The API is written in PHP and uses MySQL as its back-end database system. You do not need an account to use the API which is free to use for anyone.

When your application points to the API base URL https://www.journaltocs.ac.uk/api the API returns a brief description on how to use the API. The description is found in the <description> element of the unique item encoded in the RSS response. The following screen shows how the response is presented on a browser.

API base URL

Below there is an example showing how to use the API from a PHP script so that you can get an idea on how to use the API calls.

PHP example

If you print the content of $xmlRSS1, you will see that it is an XML file where each article of the search results is included in an <item> element as shown in the following screenshot.

API output showing the item content

Link to working prototype:
https://www.journaltocs.ac.uk/api

Link to end user documentation:
https://www.journaltocs.ac.uk/API

Link to code repository:
https://journaltocsapi.sourceforge.net

Link to technical documentation:
https://www.journaltocs.ac.uk/docs/

Date prototype was launched:
– Beta Version Released 28th November 2009
– Alpha Version Released 23rd September 2009

Project Team Names, Emails and Organisations:
Roger Rist – Project Director r.j.rist@hw.ac.uk
Santy Chumbe – Project Manager s.chumbe@hw.ac.uk
Lisa Rogers – Project Officer l.j.rogers@hw.ac.uk
ICBL, Heriot-Watt University

Project Blog:
https://www.journaltocs.ac.uk/API/blog/

PIMS entry:
https://pims.jisc.ac.uk/projects/view/1390

Table of Content for Project Posts:

  1. Welcome to the journalTOCsAPI Project blog
  2. JournalTOCsAPI Project
  3. OAI-PMH instead of RSS feeds for Use Case 2?
  4. Community engagement: A special invitation
  5. Do we need a “best practice” for generating RSS’s URLs for IR search results?
  6. Community Engagement: Response to Invitation
  7. Methods of Engaging with JournalTOCsAPI Project
  8. Preparing the framework for our RESTful API
  9. Strengths, Weaknesses, Opportunities and Threats analysis
  10. Clarification of Use Cases
  11. Use Cases and Prerequisite Data
  12. Alpha Release of JournalTOCs API
  13. How do you want to be alerted?
  14. User Feedback (1-2 Development Cycle) – I
  15. User Feedback (1-2 Development Cycle) – II
  16. Author Affiliation
  17. The ticTOCs Best Practice Recommendation has been released
  18. Presentation at EUROCRIS
  19. journalTOCs API Project Workshop
  20. JournalTOCs Workshop: Presentation 1 – Introduction and Feedback
  21. JournalTOCs Workshop: Presentation 2 – Repositories and Alert Services
  22. JournalTOCs Workshop: Presentation 3 – Testing the First Use Case
  23. JournalTOCs Workshop: Presentation 4 – Bibliosight Project
  24. JournalTOCs Workshop: Presentation 5 – The Other Side of The Interface
  25. JOURNAL TOCS API Beta 1 Released
  26. JournalsTOCS API Technical Documentation
  27. JournalTOCs Workshop: Presentation 6 – TechXtra and TechJournalContents
  28. JournalTOCs Workshop: Presentation 7 – JournalTOCs in a CRIS
  29. Measuring the usefulness and effectiveness of the API: A retrospective view of prototyping the use cases
  30. Demonstrations of Using the JournalTOCs API

Written by lisa

December 11th, 2009 at 6:58 pm

Demonstrations of Using the JournalTOCs API

with one comment

I have created screen casts for each of the 4 JournalTOCs API calls. For more information about each calls, please see the Technical Documentation.

Journals
The following screen cast demonstrates 7 examples of using the Journals API call.


(Best Viewed Full screen)
Technical Documentation for Journals call

Articles
The following screen cast demonstrates 4 examples of using the Articles API call.


(Best Viewed Full screen)
Technical Documentation for Articles call

User
The following screen casts demonstrates two examples of using the User API call.


(Best Viewed Full screen)
Technical Documentation for User call

Institution
The following screen cast shows two examples of using the Institution API call and compares the results from the Articles API call.


(Best Viewed Full screen)
Technical Documentation for Institution call

Written by lisa

December 11th, 2009 at 6:50 pm

Measuring the usefulness and effectiveness of the API: A retrospective view of prototyping the use cases

with one comment

The project identified two use cases in the context of helping Institutional Repository (IR) managers to ensure that their content is complete and up-to-date.  The first Use Case tried to find an answer to the need for IR managers to gather articles for the IR as they are published. The second Use Case looked into the need for IR managers to be alerted when deposited “submitted” articles have been published in scholarly journals. The project developed and prototyped a lightweight RESTful API to solve or alleviate both cases, by making use of content that is already completely freely available, namely journal TOC RSS feeds.

The first Use Case was tested using information provided by the British Geological Survey repository NORA (NERC Open Research Archive) and by the University of Warwick repository WRAP (Warwick Research Archives Project). In the case of the WRAP repository only data from the Department of History was used. The methodology used for testing this use case was presented in the project workshop and made available in the JournalTOCs Workshop: Presentation 3 – Testing the First Use Case blog post.  Basically the methodology involves using two kinds of searches. One “batch” search and one set of “search by keywords” (the keywords are terms extracted from the institution name). The batch process, which combines searches by author, institution and subject, needs to be configured in advance and run offline. The search by keywords is done online and doesn’t require any previous configuration. The analysis of the results show that only 28% of the articles were positive results (articles that were really authored by researches from the institution). On the other hand 52% of the results produced by the best combinations of terms used by the search by keyword approach were positive results (Interestingly, for the NORA case, it was noticed that the extra effort of running a batch process had only identified two more authors than the quick search by keyword).

From the results obtained for the first Use Case, we can consider that searching by keywords is the most suitable option, despite only producing 50% positive results on average. The “batch” search does not justify the invested cost needed to be done by the IR manager and the API developer. It requires doing a setup for each repository. This setup is time consuming for the IR manager because she needs to identify the authors and the subjects that are relevant to her IR. Some IR managers have manifested that they may not be even able to get a list of authors for their own institutions. However, the main reason why the “batch” approach and in general any search by author fails is that the API is unable to unambiguously identify authors and their affiliation from the TOC RSS feeds. This is a problem beyond JournalTOCs capabilities. Our project has only confirmed the emerging need for having a means for uniquely and reliably identifying authors. We believe that the correct identification of authors will enhance the effectiveness of our API and in general enable proper discovery and reusability of research output. It is encouraging to know that the extremely difficult task of correctly associating research output with their legitimate authors is being carried out by the Names Project at the national level.  Based on these evidences it is not worth running a “batch” search based on authors’ names. (The problem could also be alleviated if the publishers would implement the ticTOCs recommendations and authors’ affiliations in their journal TOC RSS feeds.) The outputs obtained from this Use Case suggest that integrating the API results directly into the repository workflow will be not possible until the unambiguous identification of authors is happening. What the IR manager can do is to use the API to setup an RSS feed tailored for his institution and based on searching by keywords taken from his institution name. In this way the API would alert the IR manager when new articles including the name or similar names to his institution name are published online.

Identification of researchers

In the Second Use Case we aimed to alert IR managers when submitted articles had been published. (In this context a “submitted” article is an article that has been submitted to a scholarly journal and in some cases accepted by the peer-review process but not yet published). Using sources from Sherpa/RoMEO we created a local directory of 108 repositories, most of them from the UK, including details for their OAI servers and RSS feeds. Our first approach then was to setup a process to periodically collect and analyse the RSS feeds produced by the repositories. It quickly became evident for us that those RSS feeds were not suitable sources for our work. The problems found in these RSS feeds are discussed in detail in the ‘Do we need a “best practice” for generating RSS’s URLs for IR search results?’ blog post.

Our second approach to tackle the second Use Case was to use OAI-PMH to harvest the IR OAI servers and thus identify recently deposited articles from the repositories. The first harvesting uncovered interesting findings. First of all, the OAI repositories were not using a standard way to identify or categorise “submitted” articles, even among repositories using the same software platform. Therefore, there was no way to tell for sure whether an article was in fact a “submitted” one. Secondly, we ran a quick survey among 20 IR managers from a sample of harvested IRs. None of them were letting authors to deposit submitted articles directly to their repositories. Most of these managers were only taking published articles, making the distinction between submitted and published articles almost null. Having not succeeded with identifying “submitted” articles we decided to apply the look-up tool against each article found in the repository (this approach was only tested with two repositories and there is no evidence to suggest that it is an scalable solution, even when, at the present time, repositories have only a few thousands records). Two new obstacles were identified when doing the matching against the complete content of repositories that we harvested using OAI-PMH. The first one was the low number of positive results obtained by this method and the second one was the inability to identify for sure new records from the OAI servers. The two IR managers informed us that using only the title of the article to match harvested articles with the metadata collected from the RSS feeds were not giving enough positive results. Adding the keywords and the abstract and authors (if available) in the search query only increased the number of false positives. On the other hand, automatically identifying new records in an OAI repository was a challenge task due the inconsistencies made by the repositories when cataloguing the fields that were supposed to be used to identify new records and the dates when the updates have been done. In conclusion, the second Use Case produced relevant results only when the API was used by the IR manager to manually send search queries to the API and if these queries included specific keywords taken from the title of the article and the results were filtered by the journal title. In these cases there are high chances to obtain either positive results or null results (the number of negative results is always much smaller than the number of positive results). However, again the second Use Case has also highlighted the need for having access to rich metadata to uniquely and unambiguously identify authors.

In general the most pressing concerns of repository managers were to get content for their repositories in the first place and then to have high quality metadata. Even with the limitations mentioned in the previous paragraphs, the API has demonstrated to still be able to assist in both those aims, as expressed in the feedback sent to the project by the majority of IR managers that have tested the prototype. The users have also appreciated the ability of the API to process heterogeneous and incomplete metadata to produce reusable consistent and “clean” metadata on current publications.

Interestingly new use cases for the API were identified by the own users. In the following paragraphs, we will mention briefly some of these use cases or potential spin-offs.

1. Providing relevant metadata to Research Information (RI) systems. Representatives from ATIRA, a Danish software company that commercialise the PURE RI system, approached the project to request us to adjust some of the API’s calls to support two functionalities of PURE: (1) to automatically complete journal’s metadata when the user is cataloguing a new article with PURE and (2) to provide cataloguers with an additional or alternative source of bibliographic references, alongside other data sources such as Web Of Science, Scopus and Biomed Central.

2. Sherpa/RoMEO has interest in using the API to link journal titles and ISSNs to their publishers. Peter Millington, the SHERPA Technical Development Officer found that the data returned by the API was very useful and easy to use. However, he identified the following functionality issues (1) The API doesn’t return all the types of journal title query that RoMEO offers and needs (e.g. “contains”, “starts”, “exact phrase” queries) (2) There are some keywords that are ignored by the API to support queries made by IR managers but that are needed for RoMEO queries. The exclusion of some stop words such as “journal” is particularly unhelpful in this respect. (3) RoMEO has also requested us to implement a new call to support queries on publisher names and get back a list of their journals.

3. Expanding the “users” call to get back a list of articles per user. The API is able to perform searches by email address of a registered user and to return a list of journals that user has added to his MyTOCs folder. The call is being used by a large number of different types of users (e.g. librarians, students, researchers, etc.) Some of these users have requested us to expand the functionality of this call to provide users with the option to request for a list of articles in addition to the default option of returning a list of journals.

4. Using the API to provide library users with the capability of searching for the latest articles published in most of the journals for which the University has current subscriptions. That means that the user will always be able to access the full-text of the articles returned in the search results. This application was requested by the Institution leading the project, Heriot-Watt University. The API should be able to inter-operate with A-Z journal lists, link resolvers and off-campus access control mechanisms such as EZYproxy. In addition, users will be given the option to obtain their search results in RSS format. The library is keen to use the free service offered by the API because the library will not need to transfer its holding to any database external to the library or to modify their current database systems in order to use the API. Any UK University would benefit from the development of this API application. The only requirement is that the API is provided with restricted by enough HTTP access to the library database holding its current journal subscriptions.

5. Embedding search results in Current Awareness Subject based services. The “institution” call has also highlighted a new use case or area of application for the API. This application has already attracted a lot of attention from the community of students and academics in Engineering, Computing and Mathematics since TechXtra launched its new service TechJournalContents, which is fully based on the API. TechXtra is a free service providing access to research, learning and teaching resources in engineering, mathematics and computing. The brand new service TechJournalContents was well received by TechXtra users and has already been mentioned in more than 50 relevant blogs. We would like to enhance the API subject classification database to support other different subject-based services.

A final thought from the project is that each of the above use cases and in general any service based on reusing the journal TOC RSS feeds will greatly benefit from any effort that publishers could make to implement the ticTOCs Metadata Recommendations and the project recommendation outlined in the Author Affiliation blog post. Publishers need to realise that the required effort is very small compared to the benefits brought by reusable TOC RSS feeds, in particular for their own business and for the research community in general. The question on “convincing” publishers to produce valid, consistent and rich journal TOC RSS feeds is still unsolved.

convincing publishers

Written by Santiago Chumbe

December 11th, 2009 at 5:25 pm