JournalTOCs Blog

News and Opinions about current awareness on new research

Archive for the ‘RSS feeds’ tag

The easy way: Dealing with large-scale move of RSS feeds from HTTP to HTTPS

without comments

wget for Maney journal RSS feeds

A month ago, the world suffered a global cyberattack named by the international press as the ‘biggest ransomware’ offensive in history. Although the attack used a technique known as phishing (hackers spread a “ransomware” called WannaCry tricking email users into opening attachments and releasing malware onto their system) companies and organisations implemented every security measure available to them. One of those most common measures implemented by many journal publishers was to switch every webpage from HTTP to HTTPS (secure protocol) in order to encrypt and transport their content safely over the net.

While using https for every webpage, including pages that do not contain sensitive information, could seem to be an exaggerated and disputable measure, it is one of the quickest and efficient ways to protect a website. However, this measure has produced an unintended effect in the case of the RSS feeds used by journals to announce their new content: As a result of all these URLs changing, people who have manually added the previous URLs to feed readers are finding that those feeds are now out of date and are not providing the latest Tables of Contents. Even in the popular RSS reader services such as Netvibes, the previous feed URLs are not working.

It is up to individuals if they wish to load RSS feeds into their own readers, but in doing so, if the URL changes, individuals will then need to manually update the feeds in question. The benefit of using an aggregation service such as JournalTOCs is that we constantly maintain our database of feeds to ensure that we link only to the latest ones and that the content displayed in JournalTOCs is up-to-date. In the past couple of weeks we have updated thousands of feeds, using manual and automated methods, and this work continues. In essence, JournalTOCs does the work so that you don’t have to.

Written by Santiago Chumbe

June 23rd, 2017 at 2:26 pm

At last we got “200 OK” from Atypon for Maney

without comments

wget for Maney journal RSS feeds

One year ago (21st March 2014 to be exact) we contacted Helen Duce, the Head of E-Publishing at Maney Publishing, because after Maney migrated to its new Atypon’s e-publishing platform (Literatum), JournalTOCs was unable to crawl the TOC RSS feeds of Maney’s journals.

JournalTOCS not only uses the effective and simple RSS feeds to get the latest articles from over 25,000 journals. It also uses a very basic version of the simple, but still effective, wget unix command:

wget -O newtocs.tmp "journal-RSS-feed-URL" 2>&1

That is it. A wget that has nothing to hide or try to use its rich options to force crawling.

As we can only communicate with the publishers, we couldn’t discuss the problem directly with Atypon. So, we contacted Maney many times. While Helen was very helpful, Atypon was telling Maney that everything was OK at their end, but we knew that we were being refused access to the RSS feeds.

Today, Helen gave us the good news that Maney have finally heard back from Atypon on this issue. It turns out that our IP range was blocked by Maney Online (Atypon) because of “abuse monitoring“, given that JournalsTOCs was crawling content (RSS feeds) which Atypon flagged up as abuse.

Fortunately the misunderstanding has been resolved. Atypon has noticed that crawling RSS feeds is not abuse. The very reason for having RSS feeds is to enable other services to crawl and reuse your feeds to facilitate the widest dissemination of your content, which at the end of the day will benefit your business because it would increase the number of visitors to your site.

We are glad to be able to access the RSS feeds of Maney again. We will restore the Maney journals that were selected by the JournalTOCs Index and start to update their TOCs. In the last year, usage (number of followers) for Maney’s journals have decreased at JournalTOCs, but we hope that once users see that Maney’s journals are being updated, they will start to follow Maney journals again.

Publishers that are changing platforms should make sure to check that their RSS feeds continue being accessible for aggregators and discovery services. By working together, publishers, discovery services, aggregators and e-publishing platforms, can create positive impact in facilitating the dissemination of research.

“the success of these systems [link resolvers and knowledgebases] and services is ultimately dependent upon the cooperation of the various players across the supply chain of electronic resource metadata”
(van Ballegooie, Marlene (2015) Knowledgebases: The Cornerstone of E-Resource Management and Access. Serials Review 40(4) pp. 259-266. DOI: 10.1080/00987913.2014.977127)

Written by Santiago Chumbe

March 13th, 2015 at 1:02 pm