Archive for the ‘Community engagement’ tag
At the moment we are dealing with a high volume of daily email alerts caused by the increasing number of free accounts. Some of those accounts are following 100s of journals. To protect the normal service of JournalTOCs, we are moving free accounts to a separated server and from next week, the maximum number of journals that a free account can follow will be limited to 25 journals. Users registered with the free service of JournalTOCs are advised to follow up to 25 journals only and remove the extra journals from their accounts. The new limit of 25 journals per account doesn’t apply to Premium users.
One year ago (21st March 2014 to be exact) we contacted Helen Duce, the Head of E-Publishing at Maney Publishing, because after Maney migrated to its new Atypon’s e-publishing platform (Literatum), JournalTOCs was unable to crawl the TOC RSS feeds of Maney’s journals.
JournalTOCS not only uses the effective and simple RSS feeds to get the latest articles from over 25,000 journals. It also uses a very basic version of the simple, but still effective,
wget unix command:
wget -O newtocs.tmp "journal-RSS-feed-URL" 2>&1
That is it. A
wget that has nothing to hide or try to use its rich options to force crawling.
As we can only communicate with the publishers, we couldn’t discuss the problem directly with Atypon. So, we contacted Maney many times. While Helen was very helpful, Atypon was telling Maney that everything was OK at their end, but we knew that we were being refused access to the RSS feeds.
Today, Helen gave us the good news that Maney have finally heard back from Atypon on this issue. It turns out that our IP range was blocked by Maney Online (Atypon) because of “abuse monitoring“, given that JournalsTOCs was crawling content (RSS feeds) which Atypon flagged up as abuse.
Fortunately the misunderstanding has been resolved. Atypon has noticed that crawling RSS feeds is not abuse. The very reason for having RSS feeds is to enable other services to crawl and reuse your feeds to facilitate the widest dissemination of your content, which at the end of the day will benefit your business because it would increase the number of visitors to your site.
We are glad to be able to access the RSS feeds of Maney again. We will restore the Maney journals that were selected by the JournalTOCs Index and start to update their TOCs. In the last year, usage (number of followers) for Maney’s journals have decreased at JournalTOCs, but we hope that once users see that Maney’s journals are being updated, they will start to follow Maney journals again.
Publishers that are changing platforms should make sure to check that their RSS feeds continue being accessible for aggregators and discovery services. By working together, publishers, discovery services, aggregators and e-publishing platforms, can create positive impact in facilitating the dissemination of research.
“the success of these systems [link resolvers and knowledgebases] and services is ultimately dependent upon the cooperation of the various players across the supply chain of electronic resource metadata”
(van Ballegooie, Marlene (2015) Knowledgebases: The Cornerstone of E-Resource Management and Access. Serials Review 40(4) pp. 259-266. DOI: 10.1080/00987913.2014.977127)
(Update: Three months after this blog post was published, OA Publishing London removed the NOINDEX meta-tag from their RSS feeds. Now, all the journals currently being published online by OA Publishing London have been restored in JournalTOCs.)
Last week, JournalTOCs stopped indexing all of the 40 journals published by OA Publishing London because this publisher took the unusual and illogical measure of requesting aggregators not to index (aggregate) the RSS feeds for the current issues of its journals. Tables of Contents from the OA Publishing London journals will no longer be updated at JournalTOCs. Those who have been following any of the 40 journals will not be able to keep up with new issues.
Why would OA Publishing London want to stop aggregators and search engines from crawling and collecting its RSS feeds? Years ago, it might just have made some sense using the noindex meta-tag for RSS feeds, but nowadays there is no need to noindex such feeds. Google and the rest of modern search engines can easily identify RSS feeds and they act on that by not including RSS feeds in web search results.
Publishers should, in reality, very much want their RSS feeds to be indexed, because it can help aggregators and search engines to direct users to where the newest content is. Search engines are smart enough to understand the difference between a feed and webpage, and use the feed as a pointer to the webpage where the real source of the content resides. Allowing search engines to index RSS feeds is therefore an important way to drive traffic to the webpages of the actual content.
There is no scenario in which a publisher is not interested in having their latest content indexed. Old feeds generators, such as the deprecated Feedburner, still provide users with the outdated option to noindex feeds to prevent them from being penalized by search engines. Publishers need to be reassured that that it is no longer an issue, and indexed feeds do not create penalty situations. Google itself will normally not show RSS feeds in search results.
The noindex meta-tag is not good for publishers. Any publisher who wants to enable RSS readers, aggregators and APIs to reuse details of their content should make sure to remove the noindex meta-tag from their RSS pages and from their software that generates RSS feeds.
The noindex meta-tag to be removed looks like this:
<meta name=”robots” content=”noindex“>
This code tells search engines and aggregators that they should not index or crawl the content of the RSS feeds.
So, if you want the abstracts of your latest publications to be indexed by JournalTOCs, search engine, aggregator or any web service, and thus ensure that hundreds of thousands of potential readers can discover your content, you should make sure you ARE NOT using the noindex meta-tag.
The noindex meta-tag can help in search engine optimization (SOA) but it should be used wisely, rather than simply assuming that it’s always a good idea to use it. noindex should only be used for web pages you don’t want showing up in search results or want to hide from the external world. For example a test page, archive page, or something similar that is not relevant for the publisher’s business; these should have the noindex tag, so that they don’t end up taking the place of the real important pages in search results (Google’s algorithm tends to avoid placing multiple links from the same domain on the front page (unless the website has a good ranking)).
For optimal crawling, Google recommends using also RSS/Atom feeds
RSS pages (feeds) are not only relevant pages; they are used by the search engines and aggregators to redirect users to your relevant webpages! They help to market your real content. They are good for everyone, including readers, authors, end users and for your business.
The University Library of Regensburg and JournalTOCs concluded the implementation of a collaboration agreement to include in the Elektronische Zeitschriftenbibliothek (Electronic Journals Library (EZB)) journals information to enable their users to access to new journal TOCs from their EZB web pages. The new EZB service including the links from JournalTOC was launched on 5th December, 2013. The project mutually benefits both parties. In exchange of receiving free access to the JournalTOCs API, EZB helps with providing feedback and testing new features developed for the API.
Annually, many journal titles are transferred between publishers, cease publication, have their URLs changed, new titles are published, etc. JISC Collections estimated that over 3400 journal titles were transferred between publishers in the 2009-2011 period only. JournalTOCs is able to keep track of those changes in a systematic or automated way. In particular JournalTOCs can identify when the URL for a journal TOC RSS feeds have been changed, removed or when new TOC RSS feeds are made available. Thus, through its customised APIs, JournalTOCs constantly is providing up-to-date information on journal metadata to research libraries and service partners such as EZB.
EZB was founded in 1997 by the University Library of Regensburg, in Regensburg, Germany; with the aim of presenting e-journals content to library users in a clearly arranged one-stop user-interface and to create for the EZB member libraries and efficient administration tool for e-journal licences. Over 600 institutions from Germany are part of EZB, which is also used by subject libraries and information services. The EZB was a sponsored project by the German Federal Ministry of Education and Research, the Bavarian State and the German Research Foundation (DFG). Since 2010 all participant libraries pay a small fee to keep the service ongoing.
Prof. Rafael Ball, Director of the University Library Regensburg, said “We want to give our users more helpful data, so we would like to include the information of JournalTOCs. It would be possible e.g. to integrate the information of JournalTOCs with a symbol and a hint like ‘recent articles’ on the detail site of a journal in EZB. So our users would get the possibility to set a dynamic bookmark, if they want to; we hope to give them a new better benefit with this feature.”
JournalTOCs carries out systematic research into new types of integration of journal metadata, and develops new web services for enabling institutions to benefit from the metadata collected by JournalTOCs. The core aim of this research is to ensure that other services can provide their end-users with tailored access to the latest literature published in scholarly journals. JournalTOCs is currently involved with research projects and collaborations, it highly values working with members of the research community and welcomes future opportunities for collaboration particularly in the fields of:
- Metadata standards for systematic discovery of new research
- Integration of TOCs metadata within library services
- Identification and clustering of Open Access articles
You can get in touch with JournalTOCs at: firstname.lastname@example.org
OpenRefine (ex-Google Refine) is a powerful tool for working with big data, cleaning it, transforming it from one format into another, extending it with web services, and exploring large data sets with ease.
JournalTOC API is a RESTful web service that can provide access to the full dataset collected by the JournalTOCs Project since 2009. This dataset contains the metadata for over 22,000 journals and for more than two millions of articles published during that span of time.
Ted has used the RDF Refine extension for OpenRefine to link local data stored in VIVO as RDF with other sources on the Web. OpenRefine allowed him to query a reconciliation service to match local strings to entities from another source and the RDF Extension enabled him to export those entities as RDF.
Basically Ted wanted to interlink the metadata describing the work of university researchers with the venues in which their research is published. Because JournalTOCs is a good source of metadata about academic journals and articles, he used a demo reconciliation service developed by Michael Stephens as a model, and put together a basic reconciliation service for the JournalTOC data that queries the JournalTOC API and translates the response to the format that OpenRefine is expecting. This service can be run locally and OpenRefine will query it just fine. Ted has open sourced his code and it is available on Github and it looks like a good option for librarians and researchers working with similar data sets.
Developers can use the JournalTOCs API to embed JournalTOCs’ metadata and search functionality within their own web services. Anyone with access to RSS Readers can also benefit from the JournalTOCs API. Most of JournalTOCs API calls are free and only require a simple registration process. The API responses are returned in RSS 1.0 format, which then you can parse and use in your own web application, RSS reader or institutional web page. Further information on JournalTOCS API can be found here.
More Information on OpenRefine and JournalTOCs: