Followed Journals
Journal you Follow: 0
 
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Journals are published.
Already have an account? Sign In to see the journals you follow.
Similar Journals
Journal Cover
Journal of Information Science
Journal Prestige (SJR): 0.674
Citation Impact (citeScore): 2
Number of Followers: 1283  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 0165-5515 - ISSN (Online) 1741-6485
Published by Sage Publications Homepage  [1118 journals]
  • Do online reviews have different effects on consumers’ sampling
           behaviour across product types' Evidence from the software industry
    • Authors: Shengli Li, Fan Li, Shiyu Xie
      Abstract: Journal of Information Science, Ahead of Print.
      Previous research shows that online reviews may have different effects for search goods and experience goods. However, as a typical type of experience goods, software can be further divided into different categories based on product characteristics. Little research has been conducted regarding the different effects of online reviews for different types of software. Furthermore, to offer free samples is another common practice of software firms to alleviate consumer uncertainty prior to purchase. To fill the corresponding research gap, this research focuses on the interaction effects between online reviews and free samples for different types of software. Through our empirical analysis, we find that user ratings significantly increase consumers’ sample downloads. Furthermore, consumers download more samples for some categories than for others. Finally, user and editor ratings might have differential effects for different types of software.
      Citation: Journal of Information Science
      PubDate: 2021-03-08T08:46:37Z
      DOI: 10.1177/0165551520965399
       
  • Research on differential and interactive impact of China-led and US-led
           open-access articles
    • Authors: Wei Mingkun, Quan Wei, Sadhana Misra, Russell Savage
      Abstract: Journal of Information Science, Ahead of Print.
      With the development of Web 2.0, social media dialogue has been increasingly important within the world of open access (OA), striving for more user-generated content and ease of use. In this article, we analysed the impact of OA articles published by both Chinese and the American researchers using PLOS ONE. Papers published in the same year, using citation and social media metrics, were all used to analyse the correlation between the level of social media metrics and citation. Overall, the impact of OA articles published within the United States is higher than OA articles published in China. The results showed that citations and number of Mendeley readers have a significant correlation, which reflect the similar impact in evaluation of OA articles. However, most social media metrics did not have an obvious correlation with impact evaluation, which indicates the social media metrics are useful when paired with citations, but not irreplaceable to citations. Social media metrics appear to be a useful alternative metrics to accurately reflecting the impact of OA articles within the scientific community.
      Citation: Journal of Information Science
      PubDate: 2021-03-04T06:11:01Z
      DOI: 10.1177/0165551521998637
       
  • Improvements for research data repositories: The case of text spam
    • Authors: Ismael Vázquez, María Novo-Lourés, Reyes Pavón, Rosalía Laza, José Ramón Méndez, David Ruano-Ordás
      Abstract: Journal of Information Science, Ahead of Print.
      Current research has evolved in such a way scientists must not only adequately describe the algorithms they introduce and the results of their application, but also ensure the possibility of reproducing the results and comparing them with those obtained through other approximations. In this context, public data sets (sometimes shared through repositories) are one of the most important elements for the development of experimental protocols and test benches. This study has analysed a significant number of CS/ML (Computer Science/Machine Learning) research data repositories and data sets and detected some limitations that hamper their utility. Particularly, we identify and discuss the following demanding functionalities for repositories: (1) building customised data sets for specific research tasks, (2) facilitating the comparison of different techniques using dissimilar pre-processing methods, (3) ensuring the availability of software applications to reproduce the pre-processing steps without using the repository functionalities and (4) providing protection mechanisms for licencing issues and user rights. To show the introduced functionality, we created STRep (Spam Text Repository) web application which implements our recommendations adapted to the field of spam text repositories. In addition, we launched an instance of STRep in the URL https://rdata.4spam.group to facilitate understanding of this study.
      Citation: Journal of Information Science
      PubDate: 2021-03-03T07:52:41Z
      DOI: 10.1177/0165551521998636
       
  • Multi-thread hierarchical deep model for context-aware sentiment analysis
    • Authors: Abdalsamad Keramatfar, Hossein Amirkhani, Amir Jalali Bidgoly
      Abstract: Journal of Information Science, Ahead of Print.
      Real-time messaging and opinion sharing in social media websites have made them valuable sources of different kinds of information. This source provides the opportunity for doing different kinds of analysis. Sentiment analysis as one of the most important of these analyses gains increasing interests. However, the research in this field is still facing challenges. The mainstream of the sentiment analysis research on social media websites and microblogs just exploits the textual content of the posts. This makes the analysis hard because microblog posts are short and noisy. However, they have lots of contexts which can be exploited for sentiment analysis. In order to use the context as an auxiliary source, some recent papers use reply/retweet to model the context of the target post. We claim that multiple sequential contexts can be used jointly in a unified model. In this article, we propose a context-aware multi-thread hierarchical long short-term memory (MHLSTM) that jointly models different kinds of contexts, such as tweep, hashtag and reply besides the content of the target post. Experimental evaluations on a real-world Twitter data set demonstrate that our proposed model can outperform some strong baseline models by 28.39% in terms of relative error reduction.
      Citation: Journal of Information Science
      PubDate: 2021-02-16T06:45:12Z
      DOI: 10.1177/0165551521990617
       
  • Delphi study of risk to individuals who disclose personal information
           online
    • Authors: David Haynes, Lyn Robinson
      Abstract: Journal of Information Science, Ahead of Print.
      A two-round Delphi study was conducted to explore priorities for addressing online risk to individuals. A corpus of literature was created based on 69 peer-reviewed articles about privacy risk and the privacy calculus published between 2014 and 2019. A cluster analysis of the resulting text-base using Pearson’s correlation coefficient resulted in seven broad topics. After two rounds of the Delphi survey with experts in information security and information literacy, the following topics were identified as priorities for further investigation: personalisation versus privacy, responsibility for privacy on social networks, measuring privacy risk, and perceptions of powerlessness and the resulting apathy. The Delphi approach provided clear conclusions about research topics and has potential as a tool for prioritising future research areas.
      Citation: Journal of Information Science
      PubDate: 2021-02-16T04:46:58Z
      DOI: 10.1177/0165551521992756
       
  • Unsupervised extractive multi-document summarization method based on
           transfer learning from BERT multi-task fine-tuning
    • Authors: Salima Lamsiyah, Abdelkader El Mahdaouy, Saïd El Alaoui Ouatik, Bernard Espinasse
      Abstract: Journal of Information Science, Ahead of Print.
      Text representation is a fundamental cornerstone that impacts the effectiveness of several text summarization methods. Transfer learning using pre-trained word embedding models has shown promising results. However, most of these representations do not consider the order and the semantic relationships between words in a sentence, and thus they do not carry the meaning of a full sentence. To overcome this issue, the current study proposes an unsupervised method for extractive multi-document summarization based on transfer learning from BERT sentence embedding model. Moreover, to improve sentence representation learning, we fine-tune BERT model on supervised intermediate tasks from GLUE benchmark datasets using single-task and multi-task fine-tuning methods. Experiments are performed on the standard DUC’2002–2004 datasets. The obtained results show that our method has significantly outperformed several baseline methods and achieves a comparable and sometimes better performance than the recent state-of-the-art deep learning–based methods. Furthermore, the results show that fine-tuning BERT using multi-task learning has considerably improved the performance.
      Citation: Journal of Information Science
      PubDate: 2021-02-16T04:45:59Z
      DOI: 10.1177/0165551521990616
       
  • Not just for the money' An examination of the motives behind
           physicians’ sharing of paid health information
    • Authors: Yulin Yang, Xuekun Zhu, Ruidi Song, Xiaofei Zhang, Feng Guo
      Abstract: Journal of Information Science, Ahead of Print.
      Online platforms make it possible for physicians to share online information with the public, however, few studies have explored the underlying mechanism of physicians’ sharing of paid health information. Drawing on motivation theory, this study developed a theoretical framework to explore the effects of extrinsic motivation, enjoyment, and professional motivation on the sharing of paid information, as well as the contingent role of income ratio (online to offline) and online reputation. The model was tested with both objective and subjective data, which contain responses from 298 physicians. The results show that extrinsic motivation, enjoyment, and professional motivation play significant roles in inducing physicians to share paid information. Furthermore, income ratio can moderate the effects of motives on paid information sharing. Besides, the effect of professional motivation can be more effective in certain situations (low-level income ratio or high online reputation). This study contributes to the literature on knowledge sharing, online health behaviour, and motivation theory, and provides implications for practitioners.
      Citation: Journal of Information Science
      PubDate: 2021-02-15T06:30:02Z
      DOI: 10.1177/0165551521991029
       
  • Important citations identification by exploiting generative model into
           discriminative model
    • Authors: Xin An, Xin Sun, Shuo Xu, Liyuan Hao, Jinghong Li
      Abstract: Journal of Information Science, Ahead of Print.
      Although the citations between scientific documents are deemed as a vehicle for dissemination, inheritance and development of scientific knowledge, not all citations are well-positioned to be equal. A plethora of taxonomies and machine-learning models have been implemented to tackle the task of citation function and importance classification from qualitative aspect. Inspired by the success of kernel functions from resulting general models to promote the performance of the support vector machine (SVM) model, this work exploits the potential of combining generative and discriminative models for the task of citation importance classification. In more detail, generative features are generated from a topic model, citation influence model (CIM) and then fed to two discriminative traditional machine-learning models, SVM and RF (random forest), and a deep learning model, convolutional neural network (CNN), with other 13 traditional features to identify important citations. The extensive experiments are performed on two data sets with different characteristics. These three models perform better on the data set from one discipline. It is very possible that the patterns for important citations may vary by the fields, which disable machine-learning models to learn effectively the discriminative patterns from publications from multiple domains. The RF classifier outperforms the SVM classifier, which accords with many prior studies. However, the CNN model does not achieve the desired performance due to small-scaled data set. Furthermore, our CIM model–based features improve further the performance for identifying important citations.
      Citation: Journal of Information Science
      PubDate: 2021-02-08T06:28:00Z
      DOI: 10.1177/0165551521991034
       
  • Informational features of WhatsApp in everyday life in Madrid: An
           exploratory study
    • Authors: Juan-Antonio Martínez-Comeche, Ian Ruthven
      Abstract: Journal of Information Science, Ahead of Print.
      WhatsApp is one of the most used social media tools, but little is known about its use for everyday purposes. In this study, the informational features of WhatsApp in everyday life in Madrid are analysed through 30 semi-structured interviews, resulting in an informational typology of the messages, a description of the informational purposes of WhatsApp use and descriptions of the social use of WhatsApp. We conclude that WhatsApp allows us to deepen our understanding of the informational habits of people in everyday life.
      Citation: Journal of Information Science
      PubDate: 2021-02-08T06:22:40Z
      DOI: 10.1177/0165551521990612
       
  • A generic metamodel for data extraction and generic ontology population
    • Authors: Yohann Chasseray, Anne-Marie Barthe-Delanoë, Stéphane Négny, Jean-Marc Le Lann
      Abstract: Journal of Information Science, Ahead of Print.
      As the next step in the development of intelligent computing systems is the addition of human expertise and knowledge, it is a priority to build strong computable and well-documented knowledge bases. Ontologies partially respond to this challenge by providing formalisms for knowledge representation. However, one major remaining task is the population of these ontologies with concrete application. Based on Model-Driven Engineering principles, a generic metamodel for the extraction of heterogeneous data is presented in this article. The metamodel has been designed with two objectives, namely (1) the need of genericity regarding the source of collected pieces of knowledge and (2) the intent to stick to a structure close to an ontological structure. As well, an example of instantiation of the metamodel for textual data in chemistry domain and an insight of how this metamodel could be integrated in a larger automated domain independent ontology population framework are given.
      Citation: Journal of Information Science
      PubDate: 2021-02-04T05:14:11Z
      DOI: 10.1177/0165551521989641
       
  • Detection of conspiracy propagators using psycho-linguistic
           characteristics
    • Authors: Anastasia Giachanou, Bilal Ghanem, Paolo Rosso
      Abstract: Journal of Information Science, Ahead of Print.
      The rise of social media has offered a fast and easy way for the propagation of conspiracy theories and other types of disinformation. Despite the research attention that has received, fake news detection remains an open problem and users keep sharing articles that contain false statements but which they consider real. In this article, we focus on the role of users in the propagation of conspiracy theories that is a specific type of disinformation. First, we compare profile and psycho-linguistic patterns of online users that tend to propagate posts that support conspiracy theories and of those who propagate posts that refute them. To this end, we perform a comparative analysis over various profile, psychological and linguistic characteristics using social media texts of users that share posts about conspiracy theories. Then, we compare the effectiveness of those characteristics for predicting whether a user is a conspiracy propagator or not. In addition, we propose ConspiDetector, a model that is based on a convolutional neural network (CNN) and which combines word embeddings with psycho-linguistic characteristics extracted from the tweets of users to detect conspiracy propagators. The results show that ConspiDetector can improve the performance in detecting conspiracy propagators by 8.82% compared with the CNN baseline with regard to F1-metric.
      Citation: Journal of Information Science
      PubDate: 2021-01-28T06:31:58Z
      DOI: 10.1177/0165551520985486
       
  • Impact of COVID-19 on search in an organisation
    • Authors: Paul H Cleverley, Fionnuala Cousins, Simon Burnett
      Abstract: Journal of Information Science, Ahead of Print.
      COVID-19 has created unprecedented organisational challenges, yet no study has examined the impact on information search. A case study in a knowledge-intensive organisation was undertaken on 2.5 million search queries during the pandemic. A surge of unique users and COVID-19 search queries in March 2020 may equate to ‘peak uncertainty and activity’, demonstrating the importance of corporate search engines in times of crisis. Search volumes dropped 24% after lockdowns; an ‘L-shaped’ recovery may be a surrogate for business activity. COVID-19 search queries transitioned from awareness, to impact, strategy, response and ways of working that may influence future search design. Low click through rates imply some information needs were not met and searches on mental health increased. In extreme situations (i.e. a pandemic), companies may need to move faster, monitoring and exploiting their enterprise search logs in real time as these reflect uncertainty and anxiety that may exist in the enterprise.
      Citation: Journal of Information Science
      PubDate: 2021-01-22T07:14:34Z
      DOI: 10.1177/0165551521989531
       
  • A study of Turkish emotion classification with pretrained language models
    • Authors: Alaettin Uçan, Murat Dörterler, Ebru Akçapınar Sezer
      Abstract: Journal of Information Science, Ahead of Print.
      Emotion classification is a research field that aims to detect the emotions in a text using machine learning methods. In traditional machine learning (TML) methods, feature engineering processes cause the loss of some meaningful information, and classification performance is negatively affected. In addition, the success of modelling using deep learning (DL) approaches depends on the sample size. More samples are needed for Turkish due to the unique characteristics of the language. However, emotion classification data sets in Turkish are quite limited. In this study, the pretrained language model approach was used to create a stronger emotion classification model for Turkish. Well-known pretrained language models were fine-tuned for this purpose. The performances of these fine-tuned models for Turkish emotion classification were comprehensively compared with the performances of TML and DL methods in experimental studies. The proposed approach provides state-of-the-art performance for Turkish emotion classification.
      Citation: Journal of Information Science
      PubDate: 2021-01-13T05:12:31Z
      DOI: 10.1177/0165551520985507
       
  • Embodying algorithms, enactive artificial intelligence and the extended
           cognition: You can see as much as you know about algorithm
    • Authors: Donghee Shin
      Abstract: Journal of Information Science, Ahead of Print.
      The recent proliferation of artificial intelligence (AI) gives rise to questions on how users interact with AI services and how algorithms embody the values of users. Despite the surging popularity of AI, how users evaluate algorithms, how people perceive algorithmic decisions, and how they relate to algorithmic functions remain largely unexplored. Invoking the idea of embodied cognition, we characterize core constructs of algorithms that drive the value of embodiment and conceptualizes these factors in reference to trust by examining how they influence the user experience of personalized recommendation algorithms. The findings elucidate the embodied cognitive processes involved in reasoning algorithmic characteristics – fairness, accountability, transparency, and explainability – with regard to their fundamental linkages with trust and ensuing behaviors. Users use a dual-process model, whereby a sense of trust built on a combination of normative values and performance-related qualities of algorithms. Embodied algorithmic characteristics are significantly linked to trust and performance expectancy. Heuristic and systematic processes through embodied cognition provide a concise guide to its conceptualization of AI experiences and interaction. The identified user cognitive processes provide information on a user’s cognitive functioning and patterns of behavior as well as a basis for subsequent metacognitive processes.
      Citation: Journal of Information Science
      PubDate: 2021-01-13T05:08:51Z
      DOI: 10.1177/0165551520985495
       
  • Proposing an information value chain to improve information services to
           disabled library patrons using assistive technologies
    • Authors: Devendra Potnis, Kevin Mallary
      Abstract: Journal of Information Science, Ahead of Print.
      Information services offered by academic libraries increasingly rely on assistive technologies (AT) to facilitate disabled patrons’ retrieval and use of information for learning and teaching. However, disabled patrons’ access to AT might not always lead to their use, resulting in the underutilization of information services offered by academic libraries. We adopt an inward-looking, service innovation perspective to improve information services for disabled patrons using AT. The open coding of qualitative responses collected from administrators and librarians in 186 academic libraries in public universities in the United States, reveals 10 mechanisms (i.e. modified work practices), which involve searching, compiling, mixing, framing, sharing, or reusing information, and learning from it. Based on this information-centric reorganisation of work practices, we propose an ‘information value chain’, like Porter’s value chain, for improving information services to disabled patrons using AT in academic libraries, which is the major theoretical contribution of our study.
      Citation: Journal of Information Science
      PubDate: 2021-01-13T05:06:12Z
      DOI: 10.1177/0165551520984719
       
  • An exploratory study of the all-author bibliographic coupling analysis:
           Taking scientometrics for example
    • Authors: Song Yanhui, Wu Lijuan, Chen Shiji
      Abstract: Journal of Information Science, Ahead of Print.
      All-author bibliographic coupling analyses (AABCA) take all authors of the article into account when constructing author coupling relationships. Taking scientometrics as an example, this article takes the papers from 2010 to 2019 as data sample and divides them into two periods (limited to 5 years) to discuss the performance of AABCA in discovering potential academic communities and intellectual structure of this discipline. It is found that when all authors of the paper are considered, the relationship between the bibliographic coupling authors presents a certain regularity and the bibliographic coupling is likely to be passed between different pairs of authors. With the transitivity of the coupling relationship, AABCA can effectively identify and discover the potential academic groups of this discipline, and more fully reflect the degree of cooperation among authors. AABCA is an effective method to reveal the intellectual structure in the field of scientometrics, and it is easier to find some small research topics with weak correlation. In addition, AABCA is also an ideal way to explore the author’s research interests over time.
      Citation: Journal of Information Science
      PubDate: 2021-01-04T04:29:08Z
      DOI: 10.1177/0165551520981293
       
  • Parallel sentence extraction to improve cross-language information
           retrieval from Wikipedia
    • Authors: Juryong Cheon, Youngjoong Ko
      First page: 281
      Abstract: Journal of Information Science, Ahead of Print.
      Translation language resources, such as bilingual word lists and parallel corpora, are important factors affecting the effectiveness of cross-language information retrieval (CLIR) systems. In particular, when large domain-appropriate parallel corpora are not available, developing an effective CLIR system is particularly difficult. Furthermore, creating a large parallel corpus is costly and requires considerable effort. Therefore, we here demonstrate the construction of parallel corpora from Wikipedia as well as improved query translation, wherein the queries are used for a CLIR system. To do so, we first constructed a bilingual dictionary, termed WikiDic. Then, we evaluated individual language resources and combinations of them in terms of their ability to extract parallel sentences; the combinations of our proposed WikiDic with the translation probability from the Web’s bilingual example sentence pairs and WikiDic was found to be best suited to parallel sentence extraction. Finally, to evaluate the parallel corpus generated from this best combination of language resources, we compared its performance in query translation for CLIR to that of a manually created English–Korean parallel corpus. As a result, the corpus generated by our proposed method achieved a better performance than did the manually created corpus, thus demonstrating the effectiveness of the proposed method for automatic parallel corpus extraction. Not only can the method demonstrated herein be used to inform the construction of other parallel corpora from language resources that are readily available, but also, the parallel sentence extraction method will naturally improve as Wikipedia continues to be used and its content develops.
      Citation: Journal of Information Science
      PubDate: 2021-02-11T03:03:02Z
      DOI: 10.1177/0165551521992754
       
  • Information security: Legal regulations in Azerbaijan and abroad
    • Authors: Amir I Aliyev, Aytakin N Ibrahimova, Gulnaz A Rzayeva
      Abstract: Journal of Information Science, Ahead of Print.
      The article is devoted to information security issues in the world and in Azerbaijan, in particular. The article compares laws and regulations of Azerbaijan and other countries in the cybersecurity policy between them. The article reveals the features of the organisational and legal regulation of the information security system as an integral part of state security. A number of aspects of ensuring information security through legal and technological means, as well as a number of features of ensuring the security of certain categories of information, are highlighted. Recommendations and conclusions from the policies of both jurisdictions are presented.
      Citation: Journal of Information Science
      PubDate: 2020-12-29T07:21:39Z
      DOI: 10.1177/0165551520981813
       
  • Using community information for natural disaster alerts
    • Authors: Chun Chieh Chen, Hei-Chia Wang
      Abstract: Journal of Information Science, Ahead of Print.
      Recently, the ceaseless rise in the global average temperature has led to extreme climates in which natural disasters, such as droughts, hurricanes, earthquakes and floods, are becoming increasingly serious. Recent research has found that social media typically reflects disasters earlier than official communication channels. In this study, the idea of collecting information on flood disasters caused during the periods of typhoons and heavy rains for a city from the plain text messages released by social media by means of a term frequency (TF) and sliding window approach is proposed. The dataset analysed here contains a total of 292 articles and 12,484 tweets. This research determines how to establish a warning mechanism, with an added notification time for flooding disasters, and it shows how to provide relevant disaster relief personnel with references. This article contributes by combining social media data with emergency management information cloud (EMIC) data, especially in the context of having a mechanism for warning about flooding disasters. According to the experimental results, a sliding window of 90 min and a sliding gap of 10 min obtained the best F-measure value (F = 0.315). The event studied was Typhoon Megi (September 2016), which caused major flooding in Tainan. For the Typhoon Megi event, the flood disaster location database had 161 streets available for matching. Based on the experimental results, it is possible to obtain a high-precision (90% or higher) accuracy rate from real-time tweet data by exploiting a social media dataset.
      Citation: Journal of Information Science
      PubDate: 2020-12-22T08:45:57Z
      DOI: 10.1177/0165551520979870
       
  • Investigating Reddit to detect subreddit and author stereotypes and to
           evaluate author assortativity
    • Authors: Francesco Cauteruccio, Enrico Corradini, Giorgio Terracina, Domenico Ursino, Luca Virgili
      Abstract: Journal of Information Science, Ahead of Print.
      In recent years, Reddit has attracted the interest of many researchers due to its popularity all over the world. In this article, we aim at providing a contribution to the knowledge of this social network by investigating three of its aspects, interesting from the scientific viewpoint, and, at the same time, by analysing a large number of applications. In particular, we first propose a definition and an analysis of several stereotypes of both subreddits and authors. This analysis is coupled with the definition of three possible orthogonal taxonomies that help us to classify stereotypes in an appropriate way. Then, we investigate the possible existence of author assortativity in this social medium; specifically, we focus on co-posters, that is, authors who submitted posts on the same subreddit.
      Citation: Journal of Information Science
      PubDate: 2020-12-21T05:09:38Z
      DOI: 10.1177/0165551520979869
       
  • Testing the validity of Wikipedia categories for subject matter labelling
           of open-domain corpus data
    • Authors: Ahmad Aghaebrahimian, Andy Stauder, Michael Ustaszewski
      Abstract: Journal of Information Science, Ahead of Print.
      The Wikipedia category system was designed to enable browsing and navigation of Wikipedia. It is also a useful resource for knowledge organisation and document indexing, especially using automatic approaches. However, it has received little attention as a resource for manual indexing. In this article, a hierarchical taxonomy of three-level depth is extracted from the Wikipedia category system. The resulting taxonomy is explored as a lightweight alternative to expert-created knowledge organisation systems (e.g. library classification systems) for the manual labelling of open-domain text corpora. Combining quantitative and qualitative data from a crowd-based text labelling study, the validity of the taxonomy is tested and the results quantified in terms of interrater agreement. While the usefulness of the Wikipedia category system for automatic document indexing is documented in the pertinent literature, our results suggest that at least the taxonomy we derived from it is not a valid instrument for manual subject matter labelling of open-domain text corpora.
      Citation: Journal of Information Science
      PubDate: 2020-12-04T06:23:16Z
      DOI: 10.1177/0165551520977438
       
  • Influence and performance of user similarity metrics in followee
           prediction
    • Authors: Antonela Tommasel, Daniela Godoy
      Abstract: Journal of Information Science, Ahead of Print.
      Followee recommendation is a problem rapidly gaining importance in Twitter as well as in other micro-blogging communities. Hence, understanding how users select whom to follow becomes crucial for designing accurate and personalised recommendation strategies. This work aims at shedding some light on how homophily drives the formation of user relationships by studying the influence of diverse recommendation factors on tie formation. The selected recommendation factors were studied considering multiple alternatives for assessing them in terms of user similarity. A data analysis comparing the similarity among Twitter users and their followees, regarding two commonly used followee recommendation factors (topology and content) was performed in the context of a followee recommendation task. This study is among the firsts to analyse the effect of the different criteria for followee recommendation in micro-blogging communities, and the importance of thoroughly analysing the different aspects of user relationships to define the concept of user similarity. The study showed how the choice of the different factors and assessment alternatives affects followee recommendation. It also verified the existence of certain patterns regarding friends and random users’ similarities, which can condition the adequacy of the available similarity metrics.
      Citation: Journal of Information Science
      PubDate: 2020-12-04T06:21:36Z
      DOI: 10.1177/0165551520975359
       
  • Exploiting user network topology and comment semantic for accurate rumour
           stance recognition on social media
    • Authors: Yongcong Luo, Jing Ma, Chai Kiat Yeo
      Abstract: Journal of Information Science, Ahead of Print.
      Online social media (OSM) has become a hotbed for the rapid dissemination of disinformation or fake news. In order to recognise fake news and guide users of OSM, we focus on the stance recognition of comments, posted on OSM on the fake news-related users. In this article, we propose a framework for recognition of rumour stances (we set four categories –‘agree’, ‘disagree’, ‘neutral’ and ‘query’), combining network topology and comment semantic enhancement (CSE). We first construct a vector matrix of comments via a novel optimised term frequency–inverse document frequency (OTI). To better recognise stances, we employ another vector matrix with novel or special attributes which comprises the network topology of the OSM users derived from the random walk with restart (RWR) method. In addition, we set a weight parameter for each word in the comments to enhance comment semantic representation, where these parameters are tuned based on sentiment score, topology features and question format words. These vector matrices are optimised and combined into an integrated matrix whose transpose matrix is fed into a neural network (NN) for final rumour stance recognition. Experimental evaluations show that our approach achieves a high precision of 93.96% and F1-score of 92.02% which are superior to baselines and other existing methods.
      Citation: Journal of Information Science
      PubDate: 2020-12-04T06:18:40Z
      DOI: 10.1177/0165551520977443
       
  • Deep Persian sentiment analysis: Cross-lingual training for low-resource
           languages
    • Authors: Rouzbeh Ghasemi, Seyed Arad Ashrafi Asli, Saeedeh Momtazi
      Abstract: Journal of Information Science, Ahead of Print.
      With the advent of deep neural models in natural language processing tasks, having a large amount of training data plays an essential role in achieving accurate models. Creating valid training data, however, is a challenging issue in many low-resource languages. This problem results in a significant difference between the accuracy of available natural language processing tools for low-resource languages compared with rich languages. To address this problem in the sentiment analysis task in the Persian language, we propose a cross-lingual deep learning framework to benefit from available training data of English. We deployed cross-lingual embedding to model sentiment analysis as a transfer learning model which transfers a model from a rich-resource language to low-resource ones. Our model is flexible to use any cross-lingual word embedding model and any deep architecture for text classification. Our experiments on English Amazon dataset and Persian Digikala dataset using two different embedding models and four different classification networks show the superiority of the proposed model compared with the state-of-the-art monolingual techniques. Based on our experiment, the performance of Persian sentiment analysis improves 22% in static embedding and 9% in dynamic embedding. Our proposed model is general and language-independent; that is, it can be used for any low-resource language, once a cross-lingual embedding is available for the source–target language pair. Moreover, by benefitting from word-aligned cross-lingual embedding, the only required data for a reliable cross-lingual embedding is a bilingual dictionary that is available between almost all languages and the English language, as a potential source language.
      Citation: Journal of Information Science
      PubDate: 2020-12-02T10:11:38Z
      DOI: 10.1177/0165551520962781
       
  • A spike in the scientific output on social sciences in Vietnam for recent
           three years: Evidence from bibliometric analysis in Scopus database
           (2000–2019)
    • Authors: Binh Pham-Duc, Trung Tran, Thao-Phuong-Thi Trinh, Tien-Trung Nguyen, Ngoc-Trang Nguyen, Hien-Thu-Thi Le
      Abstract: Journal of Information Science, Ahead of Print.
      Bibliometric analysis of 3105 publications retrieved from the Scopus database was conducted to evaluate bibliographic content of scientific output on social sciences in Vietnam, for the 2000–2019 period. Our main findings show that the number of publications on social sciences from Vietnam has increased significantly over the last two decades, and there was a spike in the scientific output for the recent three years when the number of publications accounted for 53.76% of the collection. The most productive authors came from a few public research institutes with strong resources as the top 10 institutions participated in 44.22% of the collection. Vietnamese scholars tend not to submit their works to high-ranking journals since five Q1 journals in the top 10 publishing journals published only 6.17% of the collection. For international collaboration, Australia and the United States ranked first and second based on the number of publications and citations. Other countries in top 10 mostly located in Europe and Asia. Research topics were diverse focusing on gender, poverty, HIV, higher education and sustainable development. We suggest that supporting policies and funding need to be provided to help Vietnamese scholars improve their works, and to boost their scientific production in the future.
      Citation: Journal of Information Science
      PubDate: 2020-12-01T04:34:15Z
      DOI: 10.1177/0165551520977447
       
  • Topic attention encoder: A self-supervised approach for short text
           clustering
    • Authors: Jian Jin, Haiyuan Zhao, Ping Ji
      Abstract: Journal of Information Science, Ahead of Print.
      Short text clustering is a challenging and important task in many practical applications. However, many Bag-of-Word–based methods for short text clustering are often limited by the sparsity of text representation, while many sentence embedding–based methods fail to capture the document structure dependencies within a text corpus. In considerations of the shortcomings of many existing studies, a topic attention encoder (TAE) is proposed in this study. Given topics derived from corpus by the techniques of topic modelling, the cross-document information is introduced. This encoder assumes the document-topic vector to be the learning target and the concatenating vectors of the word embedding and corresponding topic-word vector to be the input. Also, a self-attention mechanism is employed in the encoder, which aims to extract weights of hidden states adaptively and encode the semantics of each short text document. With captured global dependencies and local semantics, TAE integrates the superiority of Bag-of-Word methods and sentence embedding methods. Finally, categories of benchmarking experiments were conducted by analysing three public data sets. It demonstrates that the proposed TAE outperforms many document representation benchmark methods for short text clustering.
      Citation: Journal of Information Science
      PubDate: 2020-12-01T04:31:36Z
      DOI: 10.1177/0165551520977453
       
  • Investigating information seeking in physical and online environments with
           escape room and web search
    • Authors: Dongho Choi, Chirag Shah, Vivek Singh
      Abstract: Journal of Information Science, Ahead of Print.
      Searching and interacting with information is one of the most fundamental behaviours of human beings – something that takes place in both online and physical environments. Yet, most studies of information interaction have focused on only one of these sides. This work aims to connect them by investigating one’s information interaction behaviours in different physical and online contexts as well as different types of tasks. During Web search (online searching) and Escape Room (physical searching), 31 participants’ behavioural data during web search (online searching) and escape room (physical searching) were collected through eye-tracker, web browser logs, and wearable video recorder. Analysis of the behavioural data suggests that individuals have a preferred search strategy that they adopt across different tasks and environments. The behavioural pattern, however, was found to be affected by the task type (e.g. problem searching vs exploratory search) and the way information is structured within the environments.
      Citation: Journal of Information Science
      PubDate: 2020-11-26T12:25:36Z
      DOI: 10.1177/0165551520972285
       
  • Dynamical entropic analysis of scientific concepts
    • Authors: Artem Chumachenko, Boris Kreminskyi, Iurii Mosenkis, Alexander Yakimenko
      Abstract: Journal of Information Science, Ahead of Print.
      In the present era of information, the problem of effective knowledge retrieval from a collection of scientific documents becomes especially important for continuous scientific progress. The information available in scientific publications traditionally consists of bibliometric metadata and its semantic component such as title, abstract and text. While the former having a machine-readable format usually used for knowledge mapping and pattern recognition, the latter designed for human interpretation and analysis. Only a few studies use full-text analysis, based on carefully selected scientific ontology, to map the actual structure of the scientific knowledge or uncover similarities between documents. Unfortunately, the presence of common (basic) concepts across semantically unrelated documents creates spurious connections between different topics. We revise the known method based on the entropic information-theoretic measure used for selecting basic concepts and propose to analyse the dynamics of Shannon entropy for more rigorous sorting of concepts by their generality.
      Citation: Journal of Information Science
      PubDate: 2020-11-23T06:48:56Z
      DOI: 10.1177/0165551520972034
       
  • Research practices of LIS professionals in Pakistan: A study of attitudes,
           involvement and competencies
    • Authors: Arslan Sheikh, Amara Malik, Khalid Mahmood
      Abstract: Journal of Information Science, Ahead of Print.
      This study analyses the attitudes, involvement and competencies of Pakistani Library Information Science (LIS) professionals towards research. An online survey was carried out by using a questionnaire to collect data from LIS professionals working in various types of libraries in Pakistan. The findings reveal that the overall attitude of the Pakistani LIS professionals towards research is positive. A vast majority of them read research literature, albeit occasionally, while a small majority read the full-text articles. Two local journals, Pakistan Library & Information Science Journal (PLISJ) and Pakistan Journal of Information Management & Libraries (PJIML), are the top read titles. Though research contributions counted towards promotion of LIS professionals, a very small majority of them were currently engaged in research project/s. They do not feel very confident about their research expertise, yet aspire to increase their knowledge of research. Some of the factors that deter LIS professionals from engaging in research are: a lack of time, little support from their organisation, lack of research ideas and lack of research skills. Institutional support in terms of time, money and educational training would enhance opportunities for LIS professionals to produce more research and publication.
      Citation: Journal of Information Science
      PubDate: 2020-11-18T06:35:31Z
      DOI: 10.1177/0165551520972033
       
  • TIPS: Time-aware Personalised Semantic-based query auto-completion
    • Authors: Saedeh Tahery, Saeed Farzi
      Abstract: Journal of Information Science, Ahead of Print.
      With the rapid growth of the Internet, search engines play vital roles in meeting the users’ information needs. However, formulating information needs to simple queries for canonical users is a problem yet. Therefore, query auto-completion, which is one of the most important characteristics of the search engines, is leveraged to provide a ranked list of queries matching the user’s entered prefix. Although query auto-completion utilises useful information provided by search engine logs, time-, semantic- and context-aware features are still important resources of extra knowledge. Specifically, in this study, a hybrid query auto-completion system called TIPS (Time-aware Personalised Semantic-based query auto-completion) is introduced to combine the well-known systems performing based on popularity and neural language model. Furthermore, this system is supplemented by time-aware features that blend both context and semantic information in a collaborative manner. Experimental studies on the standard AOL dataset are conducted to compare our proposed system with state-of-the-art methods, that is, FactorCell, ConcatCell and Unadapted. The results illustrate the significant superiorities of TIPS in terms of mean reciprocal rank (MRR), especially for short-length prefixes.
      Citation: Journal of Information Science
      PubDate: 2020-11-09T05:33:04Z
      DOI: 10.1177/0165551520968690
       
  • On the relationship between supervisor–supervisee gender difference and
           scientific impact of doctoral dissertations: Evidence from Humanities and
           Social Sciences in China
    • Authors: Yi Bu, Hanlin Li, Chunli Wei, Meijun Liu, Jiang Li
      Abstract: Journal of Information Science, Ahead of Print.
      This article explores the relationships between supervisor–supervisee gender difference and the scientific impact of doctoral dissertations. We use the China Doctoral Dissertations Full-text Database and pay special attention to the fields of Humanities and Social Sciences in China in our empirical study. By establishing regression models, we find that the ranks of the scientific impact regarding doctoral dissertations are female–female (first), female–male (second), male–male (third) and male–female (fourth) pairs (sequence: student gender and then supervisor gender). The finding has many interesting implications for science policy and gender inequality.
      Citation: Journal of Information Science
      PubDate: 2020-11-04T05:01:57Z
      DOI: 10.1177/0165551520969935
       
  • Measuring visibility of disciplines on Chinese academic web
    • Authors: Bo Yang, Ying Sun, Shan Huang
      Abstract: Journal of Information Science, Ahead of Print.
      This study proposes a hierarchy affiliation model (department–school–university) to build network between web entities taking into account the domain names, the topological structure of academic network and the disciplinary characteristics of schools and universities synthetically. The study of the Chinese academic web based on the model shows that at the school level, 68 of 95 disciplines (71.6%) are identified from the directed school network and 71 from the undirected school network, respectively; at the university level, four out of seven broad disciplines are found. Furthermore, according to the comparative result based on three types of relations (hyperlinks, citations and collaborations) among universities, we would like to argue with cautions that the structure on academic web would potentially be more suitable to trace the interests in common between institutions.
      Citation: Journal of Information Science
      PubDate: 2020-11-04T04:59:57Z
      DOI: 10.1177/0165551520968059
       
  • Understanding social media discontinuance from social cognitive
           perspective: Evidence from Facebook users
    • Authors: Shaoxiong Fu, Hongxiu Li
      Abstract: Journal of Information Science, Ahead of Print.
      Based on social cognitive theory, this study proposes a research framework to investigate two different social media discontinuance behaviours: reduced usage and abandoned usage. Specifically, perceived technology overload, information overload and social overload are the environmental factors that induce negative personal states, including dissatisfaction and social media fatigue, which lead to negative behavioural changes, such as reduced usage and abandoned usage of social media. The proposed research model was tested empirically with data collected among Facebook users. The research results indicate that impacts from perceived technology overload, information overload and social overload on social network fatigue and dissatisfaction vary. Dissatisfaction exerts greater impacts on abandoned-usage behaviour than social media fatigue, but similar impacts on reduced-usage behaviour as social media fatigue. In addition, reduced-usage behaviour was found to lead to abandoned-usage behaviour. Finally, we discuss the theoretical and practical contributions that can be gleaned from the proposed research model.
      Citation: Journal of Information Science
      PubDate: 2020-11-02T04:53:10Z
      DOI: 10.1177/0165551520968688
       
  • Link prediction in supernetwork: Risk perception of emergencies
    • Authors: Ning Ma, Yijun Liu, Liangliang Li
      Abstract: Journal of Information Science, Ahead of Print.
      After an emergency incident occurs, how to identify risks, predict trends and scientifically cope before the crisis erupts is the basic starting point of this study. In this study, a supernetwork model of the risk perception in emergencies is innovatively constructed from the perspective of the governance of risks. This supernetwork model includes three subnetworks: the similar relationship subnetwork that is composed of newly occurring emergencies, the chain relationship subnetwork that is composed of historical emergencies and the co-occurrence relationship subnetwork that is composed of the risk elements for emergencies. Afterwards, the feature similarity algorithm is applied to quantify the relations between newly occurring emergencies and historical emergencies, and then, the link prediction algorithm is applied to predict the risk elements that may be derived from the newly occurring emergencies. This will be beneficial to enhancing the scientific accuracy of decision-making by managers when coping with emergencies risks.
      Citation: Journal of Information Science
      PubDate: 2020-11-02T04:39:50Z
      DOI: 10.1177/0165551520967303
       
  • A new similarity measure for vector space models in text classification
           and information retrieval
    • Authors: Mete Eminagaoglu
      Abstract: Journal of Information Science, Ahead of Print.
      There are various models, methodologies and algorithms that can be used today for document classification, information retrieval and other text mining applications and systems. One of them is the vector space–based models, where distance metrics or similarity measures lie at the core of such models. Vector space–based model is one of the fast and simple alternatives for the processing of textual data; however, its accuracy, precision and reliability still need significant improvements. In this study, a new similarity measure is proposed, which can be effectively used for vector space models and related algorithms such as k-nearest neighbours (k-NN) and Rocchio as well as some clustering algorithms such as K-means. The proposed similarity measure is tested with some universal benchmark data sets in Turkish and English, and the results are compared with some other standard metrics such as Euclidean distance, Manhattan distance, Chebyshev distance, Canberra distance, Bray–Curtis dissimilarity, Pearson correlation coefficient and Cosine similarity. Some successful and promising results have been obtained, which show that this proposed similarity measure could be alternatively used within all suitable algorithms and models for information retrieval, document clustering and text classification.
      Citation: Journal of Information Science
      PubDate: 2020-10-28T04:27:50Z
      DOI: 10.1177/0165551520968055
       
  • Text and metadata extraction from scanned Arabic documents using support
           vector machines
    • Authors: Wenda Qin, Randa Elanwar, Margrit Betke
      Abstract: Journal of Information Science, Ahead of Print.
      Text information in scanned documents becomes accessible only when extracted and interpreted by a text recognizer. For a recognizer to work successfully, it must have detailed location information about the regions of the document images that it is asked to analyse. It will need focus on page regions with text skipping non-text regions that include illustrations or photographs. However, text recognizers do not work as logical analyzers. Logical layout analysis automatically determines the function of a document text region, that is, it labels each region as a title, paragraph, or caption, and so on, and thus is an essential part of a document understanding system. In the past, rule-based algorithms have been used to conduct logical layout analysis, using limited size data sets. We here instead focus on supervised learning methods for logical layout analysis. We describe LABA, a system based on multiple support vector machines to perform logical Layout Analysis of scanned Books pages in Arabic. The system detects the function of a text region based on the analysis of various images features and a voting mechanism. For a baseline comparison, we implemented an older but state-of-the-art neural network method. We evaluated LABA using a data set of scanned pages from illustrated Arabic books and obtained high recall and precision values. We also found that the F-measure of LABA is higher for five of the tested six classes compared to the state-of-the-art method.
      Citation: Journal of Information Science
      PubDate: 2020-10-16T05:14:09Z
      DOI: 10.1177/0165551520961256
       
  • Analysis of direct citation, co-citation and bibliographic coupling in
           scientific topic identification
    • Authors: Rajmund Kleminski, Przemysiaw Kazienko, Tomasz Kajdanowicz
      Abstract: Journal of Information Science, Ahead of Print.
      In our study, we examine the impact of citation network structures on the ability to discern valuable research topics in Computer Science literature. We use the bibliographic information available in the DBLP database to extract candidate phrases from scientific paper abstracts. Following that, we construct citation networks based on direct citation, co-citation and bibliographic coupling relationships between the papers. The candidate research topics, in the form of keyphrases and n-grammes, are subsequently ranked and filtered by a graph-text ranking algorithm. This selection of the highest ranked potential topics is further evaluated by domain experts and through the Wikipedia knowledge base. The results obtained from these citation networks are complementary, returning valid but non-overlapping output phrases between some pairs of networks. In particular, bibliographic coupling appears to capture more unique information than either direct citation or co-citation. These findings point towards the possible added value in combining bibliographic coupling analysis with other structures. At the same time, combining direct citation and co-citation is put into question. We expect our findings to be utilised in method design for research topic identification.
      Citation: Journal of Information Science
      PubDate: 2020-10-07T09:30:24Z
      DOI: 10.1177/0165551520962775
       
  • Do researchers use open research data' Exploring the relationships
           between usage trends and metadata quality across scientific disciplines
           from the Figshare case
    • Authors: Alfonso Quarati, Juliana E Raffaghelli
      Abstract: Journal of Information Science, Ahead of Print.
      Open research data (ORD) have been considered a driver of scientific transparency. However, data friction, as the phenomenon of data underutilisation for several causes, has also been pointed out. A factor often called into question for ORD low usage is the quality of the ORD and associated metadata. This work aims to illustrate the use of ORD, published by the Figshare scientific repository, concerning their scientific discipline, their type and compared with the quality of their metadata. Considering all the Figshare resources and carrying out a programmatic quality assessment of their metadata, our analysis highlighted two aspects. First, irrespective of the scientific domain considered, most ORD are under-used, but with exceptional cases which concentrate most researchers’ attention. Second, there was no evidence that the use of ORD is associated with good metadata publishing practices. These two findings opened to a reflection about the potential causes of such data friction.
      Citation: Journal of Information Science
      PubDate: 2020-10-05T06:03:35Z
      DOI: 10.1177/0165551520961048
       
  • Mining information from sentences through Semantic Web data and
           Information Extraction tasks
    • Authors: Jose L. Martinez-Rodriguez, Ivan Lopez-Arevalo, Ana B. Rios-Alvarado
      Abstract: Journal of Information Science, Ahead of Print.
      The Semantic Web provides guidelines for the representation of information about real-world objects (entities) and their relations (properties). This is helpful for the dissemination and consumption of information by people and applications. However, the information is mainly contained within natural language sentences, which do not have a structure or linguistic descriptions ready to be directly processed by computers. Thus, the challenge is to identify and extract the elements of information that can be represented. Hence, this article presents a strategy to extract information from sentences and its representation with Semantic Web standards. Our strategy involves Information Extraction tasks and a hybrid semantic similarity measure to get entities and relations that are later associated with individuals and properties from a Knowledge Base to create RDF triples (Subject–Predicate–Object structures). The experiments demonstrate the feasibility of our method and that it outperforms the accuracy provided by a pattern-based method from the literature.
      Citation: Journal of Information Science
      PubDate: 2020-10-05T04:24:19Z
      DOI: 10.1177/0165551520934387
       
  • A semi-hierarchical clustering method for constructing knowledge trees
           from stackoverflow
    • Authors: Chun-Hsiung Tseng, Jia-Rou Lin
      Abstract: Journal of Information Science, Ahead of Print.
      To help students learn how to programme, we have to give them a clear knowledge map and sufficient materials. Question-based websites, such as stackoverflow, are excellent information sources for this goal. However, for beginners, the process can be a little tricky since they may not know how to ask correct questions if they do not have sufficient background knowledge, and a knowledge tree is usually considered more helpful in such a scenario. In this research, a method to infer a knowledge tree automatically from the type of websites and to group documents based on the resulting knowledge tree is proposed. The proposed method mainly addresses two issues: first, the quality of tags cannot be guaranteed, and second, clustering-based methods usually generate the flat schema. The occurrence count and the co-occurrence ratio were used together to identify important tags. Then, an algorithm was developed to infer the hierarchical relationship between tags. Using these tags as centres, the clustering performance is better than applying k-means alone.
      Citation: Journal of Information Science
      PubDate: 2020-09-21T07:20:07Z
      DOI: 10.1177/0165551520961035
       
  • Prediction of online topics’ popularity patterns
    • Authors: Hengmin Zhu, Yanshuang Mei, Jing Wei, Chao Shen
      Abstract: Journal of Information Science, Ahead of Print.
      Popularity prediction of online contents is always a tool of emergency management, business decision-making, and public opinion monitoring. Most previous work has made efforts to predict the volumes or levels of popularity, but patterns of popularity evolution are remaining largely unexplored. Actually, topic popularity patterns can offer more detailed information for event detection and early warning. In this article, we proposed an effective method to discover and predict the popularity patterns of topics on the Internet which combined clustering and classification models. This method does not rely on the early time data of topic propagation, so it can predict the future popularity pattern at the initial stage of topic releasing. First, we chose a time series clustering algorithm K-SC to obtain basic types of topic popularity patterns. Then, through acquiring and evaluating multiple features related to the topics including publisher features, outward characteristics of content and textual ones, we built the prediction model of topic popularity patterns based on machine learning methods. The experimental results show that it is suitable to cluster four basic patterns of topic popularity from the experimental data. What’s more, making use of certain initial characteristics, Decision Tree model can effectively predict the popularity pattern of a newly released topic, with an accuracy of 89.4%.
      Citation: Journal of Information Science
      PubDate: 2020-09-21T07:09:26Z
      DOI: 10.1177/0165551520961026
       
  • Factors influencing researchers’ journal selection decisions
    • Authors: Jennifer Rowley, Laura Sbaffi, Martin Sugden, Anna Gilbert
      Abstract: Journal of Information Science, Ahead of Print.
      The scholarly publication landscape continues to grow in complexity, presenting researchers with ever-increasing dilemmas regarding journal choice. However, research into the decision-making processes associated with journal choice is limited. This article contributes by reporting on an international survey of researchers in various disciplines and with varying levels of experience. The study examines the extent to which various journal characteristics affect journal selection, perceptions of the extent to which university and national research policies impact on their journal choice, and the influence of academics’ familiarity, confidence and objectives on journal choice. The most important factors influencing journal choice were as follows: reliability of reviewing, usefulness of reviewers’ feedback, the reputation of the journal and confidence that their article is in scope for the journal. Publishing productivity, publishing experience, researcher role and discipline had little impact on the ranking of journal choice factors, suggesting that the research community is homogeneous.
      Citation: Journal of Information Science
      PubDate: 2020-09-17T06:18:18Z
      DOI: 10.1177/0165551520958591
       
  • Optimal policy learning for COVID-19 prevention using reinforcement
           learning
    • Authors: M Irfan Uddin, Syed Atif Ali Shah, Mahmoud Ahmad Al-Khasawneh, Ala Abdulsalam Alarood, Eesa Alsolami
      Abstract: Journal of Information Science, Ahead of Print.
      COVID-19 has changed the lifestyle of many people due to its rapid human-to-human transmission. The spread started at the end of January 2020, and different countries used different approaches in terms of testing, sanitization, lock down and quarantine centres to control the spread of the virus. People are getting back to working and routine life activities with new normal standards of testing, sanitization, social distancing and lock down. People are regularly tested to identify those who are infected with COVID-19 and isolate them from general public. However, testing all people unnecessarily is an expensive operation in terms of resources usage. There must be an optimal policy to test only those who have higher chances of being COVID-19 positive. Similarly, sanitization is used for individuals and streets to disinfect people and places. However, sanitization is also an expensive operation in terms of resources, and it is not possible to disinfect each and every individual and street. Social separating or lock down or quarantine centres focuses are different methodologies that are utilised to control the human-to-human transmission of the infection and separate the individuals who are contaminated with COVID-19. However, lock down and quarantine centres are expensive operations in terms of resources as it disturbs the affairs of state and the growth of economy. At the same time, it negatively affects the quality of life of a society. It is also not possible to provide resources to all citizens by locking them inside homes or quarantine centres for infinite time. All these parameters are expensive in terms of resources and have an effect on controlling the spread of the virus, quality of life of human, resources and economy. In this article, a novel intelligent method based on reinforcement learning (RL) is built up that quantifies the unique levels of testing, disinfection and lock down alongside its impact on the spread of the infection, personal satisfaction or quality of life, resource use and economy. Different RL algorithms are actualized and agents are prepared with these algorithms to interact with the environment to gain proficiency with the best strategy. The examinations exhibit that deep learning–based algorithms, for example, DQN and DDPG are performing better than customary RL algorithms, for example, Q-Learning and SARSA.
      Citation: Journal of Information Science
      PubDate: 2020-09-17T06:12:19Z
      DOI: 10.1177/0165551520959798
       
  • Discovering informative features in large-scale landmark image collection
    • Authors: Ala’a Alzou’bi, Keng Hoon Gan
      Abstract: Journal of Information Science, Ahead of Print.
      One of the key problems in image retrieval systems is the presence of irrelevant and noisy image content. Such content can cause significant confusion for the system. Therefore, there is a need to represent images with only informative features in order to improve the retrieval performance of the system or any subsequent process. In this article, we propose a method to identify the informative features in a large-scale image collection. We apply the frequent itemset mining (FIM) approach to extract visual features patterns from a list of images of the same object. Then, we generate feature pairs to measure the significance of each feature depending on the co-occurrence with its neighbouring features. In addition, we apply this feature selection technique to localise the landmark in the image. The performance of the proposed method is evaluated in terms of average precision (AP) on two benchmark data sets and found that it gives a comparable retrieval performance over the bag of visual words baseline system and the previous methods.
      Citation: Journal of Information Science
      PubDate: 2020-09-02T05:23:57Z
      DOI: 10.1177/0165551520950653
       
  • An overview of literature on COVID-19, MERS and SARS: Using text mining
           and latent Dirichlet allocation
    • Authors: Xian Cheng, Qiang Cao, Stephen Shaoyi Liao
      Abstract: Journal of Information Science, Ahead of Print.
      The unprecedented outbreak of COVID-19 is one of the most serious global threats to public health in this century. During this crisis, specialists in information science could play key roles to support the efforts of scientists in the health and medical community for combatting COVID-19. In this article, we demonstrate that information specialists can support health and medical community by applying text mining technique with latent Dirichlet allocation procedure to perform an overview of a mass of coronavirus literature. This overview presents the generic research themes of the coronavirus diseases: COVID-19, MERS and SARS, reveals the representative literature per main research theme and displays a network visualisation to explore the overlapping, similarity and difference among these themes. The overview can help the health and medical communities to extract useful information and interrelationships from coronavirus-related studies.
      Citation: Journal of Information Science
      PubDate: 2020-09-01T06:39:23Z
      DOI: 10.1177/0165551520954674
       
  • Quantifying and analysing the stages of online information dissemination
           in different enterprise emergencies: The idea of system cybernetics
    • Authors: Yongtian Yu, Guang Yu, Xiangbin Yan, Xiao Yu
      Abstract: Journal of Information Science, Ahead of Print.
      Previous research on information dissemination in emergencies focus on prediction of the volume via abundant models. However, most of these models did not specify different stages of emergencies, and hence making it difficult for public relations (PR) practitioner to make decisions based on needs of each stage in today’s rapid changing media environments. In this study, we introduce the idea of system cybernetics and the method of system identification into information dissemination perspective. Based on the proposed information accumulation probability distribution continuity (IAPDC) model, we provide a quantitative division of the information accumulation process. The durations of each stage and the time points that each stage begins are stated and defined with a quantitative calculation method. Using empirical data from 83 emergencies in 2016 and 2017 covering Weibo, WeChat Platforms and over 20,000 web media, we verify the effectiveness of this method. Next, we use simulation analysis to demonstrate what effects of parameters have on the dissemination process and how do changes on different stages affect the process. Moreover, we also demonstrate the effects of emergencies’ attributes on the information dissemination process and on each stage. Our study complements the gaps in existing communication discipline and provides insight for PR practitioner when dealing with enterprise emergencies.
      Citation: Journal of Information Science
      PubDate: 2020-08-31T06:23:11Z
      DOI: 10.1177/0165551520948443
       
  • A multi-strategy approach for the merging of multiple taxonomies
    • Authors: Mao Chen, Chao Wu, Zongkai Yang, Sanya Liu, Zengzhao Chen, Xiuling He
      Abstract: Journal of Information Science, Ahead of Print.
      Taxonomy merging is an important work to provide a uniform schema for several heterogeneous taxonomies. Previous studies primarily focus on merging two taxonomies in a specific domain, while the merging of multiple taxonomies has been neglected. This article proposes a taxonomy merging approach to automatically merge multiple source taxonomies into a target taxonomy in an asymmetric manner. The approach adopts a strategy of breaking up the whole into parts to decrease the complexity of merging multiple taxonomies and employs a block-based method to reduce the scale of measuring semantic relations between concept pairs. In addition, for the problem of multiple inheritance, a method of topical coverage is proposed. Experiments conducted on synthetic and real-world scenarios indicate that the proposed merging approach is feasible and effective to merge multiple taxonomies. In particular, the proposed approach works well in the aspects of limiting the semantic redundancy and establishing high-quality hierarchical relations between concepts.
      Citation: Journal of Information Science
      PubDate: 2020-08-31T06:15:11Z
      DOI: 10.1177/0165551520952340
       
  • Reusing digital collections from GLAM institutions
    • Authors: Gustavo Candela, María Dolores Sáez, MPilar Escobar Esteban, Manuel Marco-Such
      Abstract: Journal of Information Science, Ahead of Print.
      For some decades now, Galleries, Libraries, Archives and Museums (GLAM) institutions have published and provided access to information resources in digital format. Recently, innovative approaches have appeared such as the concept of Labs within GLAM institutions that facilitates the adoption of innovative and creative tools for content delivery and user engagement. In addition, new methods have been proposed to address the publication of digital collections as data sets amenable to computational use. In this article, we propose a methodology to create machine actionable collections following a set of steps. This methodology is then applied to several use cases based on data sets published by relevant GLAM institutions. It intends to encourage institutions to adopt the publication of data sets that support computationally driven research as a core activity.
      Citation: Journal of Information Science
      PubDate: 2020-08-25T05:05:09Z
      DOI: 10.1177/0165551520950246
       
  • Exploring research trends in big data across disciplines: A text mining
           analysis
    • Authors: Ehsan Mohammadi, Amir Karami
      Abstract: Journal of Information Science, Ahead of Print.
      Using big data has been a prevailing research trend in various academic fields. However, no studies have explored the scope and structure of big data across disciplines. In this article, we applied topic modeling and word co-occurrence analysis methods to identify key topics from more than 36,000 big data publications across all academic disciplines between 2012 and 2017. The results revealed several topics associated with the storage, collection and analysis of large datasets; the publications were predominantly published in computational fields. Other identified research topics show the influence of big data methods and techniques in areas beyond computer science, such as education, urban informatics, business, health and medical sciences. In fact, the prevalence of these topics has increased over time. In contrast, some themes like parallel computing, network modeling and big data analytic techniques have lost their popularity in recent years. These results probably reflect the maturity of big data core topics and highlight flourishing new research trends pertinent to big data in new domains, especially in social sciences, health and medicine. Findings of this article can be beneficial for researchers and science policymakers to understand the scope and structure of big data in different academic disciplines.
      Citation: Journal of Information Science
      PubDate: 2020-08-25T04:56:41Z
      DOI: 10.1177/0165551520932855
       
  • A method of semi-automated ontology population from multiple
           semi-structured data sources
    • Authors: Irina Leshcheva, Alena Begler
      Abstract: Journal of Information Science, Ahead of Print.
      Organisations use data in different formats: Word documents, Excel spreadsheets, databases, HTML pages and so on. It is not easy to make decisions with such data due to the lack of integration between the different sources and built-in decision-making rules. Decisions can be reached with knowledge bases, which, unlike databases, make it possible to store not only objects, facts and attributes but also more sophisticated patterns such as rules and axioms. The article proposes an ontology-based method for knowledge base creation that allows for the simultaneous integration of semi-structured data sources and extendibility while remaining context independent. At the initial steps of the method, data specification should be performed with the Data Sources Ontology developed by the authors. This ontology provides data structure description that forms supportive knowledge graph. The graph’s schema should be mapped with the domain ontology to be populated. Finally, the data are inserted into the domain ontology according to the mapping rules. Manual input is needed during data specification and data-to-ontology schema mapping.
      Citation: Journal of Information Science
      PubDate: 2020-08-21T07:25:33Z
      DOI: 10.1177/0165551520950243
       
  • On the measurement of scientific leadership
    • Authors: Nadia Simoes, Nuno Crespo
      Abstract: Journal of Information Science, Ahead of Print.
      The [math] index was recently proposed to measure the degree of scientific leadership. While the concept is useful and interesting, namely, as a complement to the traditional performance analysis, the metric suffers from important shortcomings. We argue that scientific leadership should be evaluated: (1) taking into account information of the moment the paper is produced/published, and (2) in the specific context of the paper, meaning that only previous work relevant for the paper should be taken into account. Based on these two principles, we introduce an alternative approach, using self-citations as a source of information, which eliminates the shortcomings inherent to the [math] index. The new measures proposed in this study can be used to complement the traditional performance assessment, namely, through the application of the h-index.
      Citation: Journal of Information Science
      PubDate: 2020-08-21T07:25:22Z
      DOI: 10.1177/0165551520950240
       
  • Exploring direct citations between citing publications
    • Authors: Yong Huang, Yi Bu, Ying Ding, Wei Lu
      Abstract: Journal of Information Science, Ahead of Print.
      This article defines and explores the direct citations between citing publications (DCCPs) of a publication. We construct an ego-centred citation network for each paper that contains all of its citing papers and itself, as well as the citation relationships among them. By utilising a large-scale scholarly dataset from the computer science field in the Microsoft Academic Graph (MAG-CS) dataset, we find that DCCPs exist universally in medium and highly cited papers. For those papers that have DCCPs, DCCPs do occur frequently; highly cited papers tend to contain more DCCPs than others. Meanwhile, the number of DCCPs of papers published in different years does not vary dramatically. This paper also discusses the relationship between DCCPs and some indirect citation relationships (e.g. co-citation and bibliographic coupling).
      Citation: Journal of Information Science
      PubDate: 2020-08-21T07:25:12Z
      DOI: 10.1177/0165551520917654
       
  • Negotiating change: Transition as a central concept for information
           literacy
    • Authors: Alison Hicks
      Abstract: Journal of Information Science, Ahead of Print.
      Transition forms a dynamic concept that has been underexplored within information literacy research and practice. This article uses the grounded theory of mitigating risk, which was produced through doctoral research into the information literacy practices of language-learners, as a lens for a more detailed examination of transition and its role within information literacy. This framing demonstrates that information literacy mediates transition through supporting preparation, connection, situatedness and confidence within a new setting and facilitating a shift in identity. This article concludes by discussing the important role that time and temporality, resistance and reflexivity play within transition as well as outlining implications for information literacy instruction and future research into time, affect and materiality.
      Citation: Journal of Information Science
      PubDate: 2020-08-19T04:39:26Z
      DOI: 10.1177/0165551520949159
       
  • Does the mobility of scientists disrupt their collaboration stability'
    • Authors: Zhenyue Zhao, Yi Bu, Jiang Li
      Abstract: Journal of Information Science, Ahead of Print.
      To explore to what extent the mobility of scientists disrupts the stability of their research collaboration, we designed a measure − Collaboration Stability After Moving (CSAM) − for scientists, retrieved 4343 US-related scientists’ curricula vitae (CVs) from the Open Researcher and Contributor ID (ORCID) website and publication records in the Web of Science database and applied a linear regression model to the dataset. Our findings include the following: (1) the more times a scientist moved, the more she or he is inclined to co-author with previous collaborators, (2) cross-country mobility disrupts the stability of research collaboration more than domestic mobility and (3) the stability of research collaboration correlates with scientists’ cultural background, cross-country work experience and research areas.
      Citation: Journal of Information Science
      PubDate: 2020-08-10T05:28:34Z
      DOI: 10.1177/0165551520948744
       
  • A content-based technique for linking dual language news articles in an
           archive
    • Authors: Muzammil Khan, Arif Ur Rahman, Arshad Ahmad, Sarwar Shah Khan
      Abstract: Journal of Information Science, Ahead of Print.
      To retrieve a specific news article from a vast archive containing multilingual news articles against a user query or based on similarity among news articles is a challenging task. The task becomes even further complicated when the archive contains articles from a low resourced and morphologically complex language like Urdu, along with English new articles. The article proposes a content-based (lexical) similarity measure, that is, Common Ratio Measure for Dual Language (CRMDL), for linking digital news articles published in various online news sources. The similarity measure links Urdu-to-English news articles during the preservation process using an Urdu-to-English lexicon. A literature review showed that an Urdu-to-English lexicon did not exist, and therefore, the first task was to build a lexicon from multiple sources. The proposed similarity measure, that is, CRMDL, is evaluated rigorously on different data sets, of varying sizes, to assess the effectiveness. The experimental results show that the proposed measure is feasible and effective for similarity computation between Urdu and English news articles, which can obtain, on average, 50% precision and 67% recall. The performance can be improved sufficiently by managing the limitations summarised in the study.
      Citation: Journal of Information Science
      PubDate: 2020-08-04T02:29:31Z
      DOI: 10.1177/0165551520937614
       
  • REDI: Towards knowledge graph-powered scholarly information management and
           research networking
    • Authors: José Ortiz Vivar, José Segarra, Boris Villazón-Terrazas, Víctor Saquicela
      Abstract: Journal of Information Science, Ahead of Print.
      Academic data management has become an increasingly challenging task as research evolves over time. Essential tasks such as information retrieval and research networking have turned into extremely difficult operations due to an ever-growing number of researchers and scientific articles. Numerous initiatives have emerged in the IT environments to address this issue, especially focused on web technologies. Although those approaches have individually provided solutions for diverse problems, they still can not offer integrated knowledge bases nor flexibility to exploit adequately this information. In this article, we present REDI, a Linked Data-powered framework for academic knowledge management and research networking, which introduces a new perspective of integration. REDI combines information from multiple sources into a consolidated knowledge base through state-of-the-art procedures and leverages semantic web standards to represent the information. Moreover, REDI takes advantage of such knowledge for data visualisation and analysis, which ultimately improves and simplifies many activities including research networking.
      Citation: Journal of Information Science
      PubDate: 2020-08-04T02:28:16Z
      DOI: 10.1177/0165551520944351
       
  • Evaluating the quality of linked open data in digital libraries
    • Authors: Gustavo Candela, Pilar Escobar, Rafael C Carrasco, Manuel Marco-Such
      Abstract: Journal of Information Science, Ahead of Print.
      Cultural heritage institutions have recently started to share their metadata as Linked Open Data (LOD) in order to disseminate and enrich them. The publication of large bibliographic data sets as LOD is a challenge that requires the design and implementation of custom methods for the transformation, management, querying and enrichment of the data. In this report, the methodology defined by previous research for the evaluation of the quality of LOD is analysed and adapted to the specific case of Resource Description Framework (RDF) triples containing standard bibliographic information. The specified quality measures are reported in the case of four highly relevant libraries.
      Citation: Journal of Information Science
      PubDate: 2020-08-04T02:23:50Z
      DOI: 10.1177/0165551520930951
       
  • The distinctiveness of author interdisciplinarity: A long-neglected issue
           in research on interdisciplinarity
    • Authors: Wenyu Zhang, Shunshun Shi, Xiaoling Huang, Shuai Zhang, Peijia Yao, Yilei Qiu
      Abstract: Journal of Information Science, Ahead of Print.
      In the research on interdisciplinarity (RID), measures for evaluating the interdisciplinarity of scientific entities (e.g., papers, authors, journals or research areas) have been proposed for a long time. The author interdisciplinarity is very different from the other types of interdisciplinarity because of the complex interpersonal relationships between the connected authors. However, previous work has failed to uncover the distinctiveness of author interdisciplinarity and has regarded it as equivalent to other types of interdisciplinarity. In this work, an extended Rao–Stirling diversity measure is proposed, which incorporates the co-author network and a network similarity measure to specifically evaluate the author interdisciplinarity. Moreover, betweenness centrality is used for improving network similarity measure, because of its intrinsic advantage of expressing how an entity loads on different factors in a network, which is highly in line with the characteristic of interdisciplinarity. An experiment on the papers about Public Administration in the Web of Science is conducted; based on the final results, a deeper investigation is performed into by typical authors. The work proposes a novel idea for measuring author interdisciplinarity, which can promote the study of interdisicplinarity measuring in RID.
      Citation: Journal of Information Science
      PubDate: 2020-08-04T02:23:01Z
      DOI: 10.1177/0165551520939499
       
  • Identification of rumour stances by considering network topology and
           social media comments
    • Authors: Yongcong Luo, Jing Ma, Chai Kiat Yeo
      Abstract: Journal of Information Science, Ahead of Print.
      Online social media (OSM) has become a hotbed for the rapid dissemination of disinformation or faked news. In order to track and limit the spread of faked news, we study stance identification of comments posted on OSM, where the stance can denote the comment’s semantics. In this article, we propose a framework for identification of rumour stances, combining network topology and OSM comments. We construct a vector matrix of comments and words via OTI (optimisation term frequency–inverse document frequency). To better identify the stances, we introduce another vector matrix with novel or special attribute, that is, network topology among the users. Variant autoencoder (VAE) is then applied for dimensionality reduction and optimisation of these vector matrices which are then combined into an integrated matrix [math], tempered by two parameters [math] and [math]. Finally, the matrix is fed into a neural network for final rumour stance identification. Experimental evaluations show that our proposed approach outperforms some state-of-the-art methods and achieves a high precision of 90.26% and F1-score of 88.58%.
      Citation: Journal of Information Science
      PubDate: 2020-07-30T05:31:41Z
      DOI: 10.1177/0165551520944352
       
  • How do academia and society react to erroneous or deceitful claims'
           The case of retracted articles’ recognition
    • Authors: Hajar Sotudeh, Nilofar Barahmand, Zahra Yousefi, Maryam Yaghtin
      Abstract: Journal of Information Science, Ahead of Print.
      Researchers give credit to peer-reviewed, and thus, credible publications through citations. Despite a rigorous reviewing process, certain articles undergo retraction due to disclosure of their ethical or scientific deficiencies. It is, therefore, important to understand how society and academia react to the erroneous or deceitful claims and purge the science of their unreliable results. Applying a matched-pairs research design, this study examined a sample of medicine-related retracted and non-retracted articles matched by their content similarity. The regression analysis revealed similarities in obsolescence trends of the retracted and non-retracted groups. The Generalized Estimating Equations showed that citations are affected by the retraction status, life after retraction, life cycle and the journals’ previous reputation, with the two formers being the strongest in positively predicting the citations. The retracted papers obtain fewer citations either before or after retraction, implying academia’s watchful reaction to the low-quality papers even before official announcement of their fallibility. They exhibit an equal or higher social recognition level regarding Tweets and Blog Mentions, while a lower status regarding Mendeley Readership. This could signify social users’ sensibility regarding scientific quality since they probably publicise the retraction and warn against the retracted items in their tweets or blogs, while avoiding recording them in their Mendeley profiles. Further scrutiny is required to gain insight into the sensibility, if any, about scientific quality. The study’s originality relies on matching the retracted and non-retracted papers with their topics and neutralising variations in their citation potentials. It is also the first study comparing the groups’ social impacts.
      Citation: Journal of Information Science
      PubDate: 2020-07-30T05:20:15Z
      DOI: 10.1177/0165551520945853
       
  • Understanding the evolution of a scientific field by clustering and
           visualizing knowledge graphs
    • Authors: Mauro Dalle Lucca Tosi, Julio Cesar dos Reis
      Abstract: Journal of Information Science, Ahead of Print.
      The process of tracking the evolution of a scientific field is arduous. It allows researchers to understand trends in areas of science and predict how they may evolve. Nowadays, most of the automated mechanisms developed to assist researchers in this process do not consider the content of articles to identify changes in its structure, only the articles metadata. These methods are not suited to easily assist researchers to study the concepts that compose an area and its evolution. In this article, we propose a method to track the evolution of a scientific field at a concept level. Our method structures a scientific field using two knowledge graphs, representing distinct periods of the studied field. Then, it clusters them and identifies correspondent clusters between the knowledge graphs, representing the same subareas in distinct time periods. Our solution enables to compare the corresponding clusters, tracking their evolution. We apply and experiment our method in two case studies concerning the artificial intelligence (AI) and the biotechnology (BIO) fields. Findings indicate befitting results regarding the way their evolution can be assessed with our implemented software tool. From our analyses, we perceived evolution in broader subareas of a scientific field, as the growth of the ‘Convolutional Neural Network’ area from 2006; to specific ones, as the decrease of research works using mice to study BRAF-mutation lung cancer from 2018. This work contributes with the development of a web application with interactive user interfaces to assist researchers in representing, analysing and tracking the evolution of scientific fields at a concept level.
      Citation: Journal of Information Science
      PubDate: 2020-07-10T06:10:31Z
      DOI: 10.1177/0165551520937915
       
  • Does the use of open, non-anonymous peer review in scholarly publishing
           introduce bias' Evidence from the F1000Research post-publication open
           peer review publishing model
    • Authors: Mike Thelwall, Liz Allen, Eleanor-Rose Papas, Zena Nyakoojo, Verena Weigert
      Abstract: Journal of Information Science, Ahead of Print.
      As part of moves towards open knowledge practices, making peer review open is cited as a way to enable fuller scrutiny and transparency of assessments around research. There are now many flavours of open peer review in use across scholarly publishing, including where reviews are fully attributable and the reviewer is named. This study examines whether there is any evidence of bias in two areas of common critique of open, non-anonymous (named) peer review – and used in the post-publication, peer review system operated by the open-access scholarly publishing platform F1000Research. First, is there evidence of potential bias where a reviewer based in a specific country assesses the work of an author also based in the same country' Second, are reviewers influenced by being able to see the comments and know the origins of a previous reviewer' Based on over 4 years of open peer review data, we found some weak evidence that being based in the same country as an author may influence a reviewer’s decision, while there was insufficient evidence to conclude that being able to read an existing published review prior to submitting a review encourages conformity. Thus, while immediate publishing of peer review reports appears to be unproblematic, caution may be needed when selecting same-country reviewers in open systems if other studies confirm these results.
      Citation: Journal of Information Science
      PubDate: 2020-07-06T04:17:13Z
      DOI: 10.1177/0165551520938678
       
  • Twenty-six years of LIS research focus and hot spots, 1990–2016: A
           co-word analysis
    • Authors: Reza Mokhtarpour, Ali Akbar Khasseh
      Abstract: Journal of Information Science, Ahead of Print.
      The purpose of this research is to map and analyse the conceptual and thematic structure of library and information science (LIS) research from the perspective of the co-word analysis. The bibliographical records consist of all the research papers published in the LIS core journals between 1990 and 2016 and indexed in Web of Science. ‘CiteSpace’ was used to visualise the co-word network of LIS studies. The frequency of co-occurrence and centrality scores in the overall structure of the field showed that the word ‘Science’ is the most significant and pivotal keyword among the nodes in the co-word network of LIS literature, and in this respect, the word ‘Library’ is in the second place. However, the results of the social network analysis uncovered that in spite of the high frequency of the word ‘library’, the pivotal role of the term has been exposed to decline over the time. The results of the analysis of co-word clusters showed that ‘information seeking and retrieval’ is the most important research focus in the intellectual structure of LIS literature during 1990–2016. Also, analysis of the hot spots of the LIS research based on Kleinberg algorithm indicated that the words ‘Internet’ and ‘World Wide Web’ have attracted the most attention by LIS scholars during the years under study.
      Citation: Journal of Information Science
      PubDate: 2020-07-02T08:47:14Z
      DOI: 10.1177/0165551520932119
       
  • A domain knowledge graph construction method based on Wikipedia
    • Authors: Haoze Yu, Haisheng Li, Dianhui Mao, Qiang Cai
      Abstract: Journal of Information Science, Ahead of Print.
      In order to achieve real-time updating of the domain knowledge graph and improve the relationship extraction ability in the construction process, a domain knowledge graph construction method is proposed. Based on the structured knowledge in Wikipedia’s classification system, we acquire concepts and instances contained in subject areas. A relationship extraction algorithm based on co-word analysis is intended to extract the classification relationships in semi-structured open labels. A Bi-GRU remote supervised relationship extraction model based on a multiple-scale attention mechanism and an improved cross-entropy loss function is proposed to obtain the non-classification relationships of concepts in unstructured texts. Experiments show that the proposed model performs better than the existing methods. Based on the obtained concepts, instances and relationships, a domain knowledge graph is constructed and the domain-independent nodes and relationships contained in them are removed through a vector variance algorithm. The effectiveness of the proposed method is verified by constructing a food domain knowledge graph based on Wikipedia.
      Citation: Journal of Information Science
      PubDate: 2020-06-30T06:16:35Z
      DOI: 10.1177/0165551520932510
       
  • Modelling users’ perceptions of video information seeking, learning
           through added value and use of curated digital collections
    • Authors: Dan Albertson, Melissa P Johnston
      Abstract: Journal of Information Science, Ahead of Print.
      Information seeking research has provided models of users in the search for information across many different contexts and situations. Digital content curation has emerged as a means for managing information and facilitating user learning by adding ‘value’ to digital content in different ways, enhancing the user experience. Using digital video and K–12 education as the context, this study examined factors representing video information seeking, user learning and use of curated video collections both individually and together as user-centred constructs. Two hundred and fifty-two K–12 teachers provided perceptions of their own information seeking processes and for different qualities of curated content and collections within the context of searching digital video for applied purposes. Results extracted underlying factors of these concepts and demonstrated significant relationships between them. Findings enabled the expansion of a model to incorporate both users’ perceptions of information seeking together with user-centred constructs of learning through added value content and use of curated digital collections. Practical implications of the study help establish baselines for future studies for formulating, incorporating and emphasising added value and video curation qualities based on users’ information seeking within the process.
      Citation: Journal of Information Science
      PubDate: 2020-06-25T04:47:46Z
      DOI: 10.1177/0165551520920807
       
  • NaLa-Search: A multimodal, interaction-based architecture for faceted
           search on linked open data
    • Authors: José Luis Sánchez-Cervantes, Giner Alor-Hernández, Mario Andrés Paredes-Valverde, Lisbeth Rodríguez-Mazahua, Rafael Valencia-García
      Abstract: Journal of Information Science, Ahead of Print.
      Mobile devices are the technological basis of computational intelligent systems, yet traditional mobile application interfaces tend to rely only on the touch modality. That said, such interfaces could improve human–computer interaction by combining diverse interaction modalities, such as visual, auditory and touch. Also, a lot of information on the Web is published under the Linked Data principles to allow people and computers to share, use and/or reuse high-quality information; however, current tools for searching for, browsing and visualising this kind of data are not fully developed. The goal of this research is to propose a novel architecture called NaLa-Search to effectively explore the Linked Open Data cloud. We present a mobile application that combines voice commands and touch for browsing and searching for such semantic information through faceted search, which is a widely used interaction scheme for exploratory search that is faithful to its richness and practical for real-world use. NaLa-Search was evaluated by real users from the clinical pharmacology domain. In this evaluation, the users had to search and navigate among the DrugBank dataset through voice commands. The evaluation results show that faceted search combined with multiple interaction modalities (e.g. speech and touch) can enhance users’ interaction with semantic knowledge bases.
      Citation: Journal of Information Science
      PubDate: 2020-06-24T04:34:14Z
      DOI: 10.1177/0165551520930918
       
  • A topic analysis method based on a three-dimensional strategic diagram
    • Authors: Jia Feng, Xiaomin Mu, Wei Wang, Ying Xu
      Abstract: Journal of Information Science, Ahead of Print.
      With the tremendous growth of scientific literature in recent years, methods of detecting and analysing research topics have become more and more important. This study proposes a topic analysis method combining latent Dirichlet allocation (LDA) and a three-dimensional strategic diagram. This study constructs the three-dimensional strategic diagram by three dimensions of centrality, density and novelty, and we classify topics into seven categories according to their strategic positions. Using this topic analysis method, the paper analyses 62,340 publications in the field of medical informatics between 1991 and 2018. Results show that the research scope of medical informatics has become increasingly interdisciplinary. Data analytical methods and technologies are sub-domains with persistent popularity. New health technologies, drug safety, algorithm optimisation and standardisation of medical information are emerging research topics. We hope the findings could help researchers identify potential research topics and facilitate in-depth analysis of the current state of various fields.
      Citation: Journal of Information Science
      PubDate: 2020-06-24T04:32:55Z
      DOI: 10.1177/0165551520930907
       
  • The impact of semantic annotation techniques on content-based video
           lecture recommendation
    • Authors: Laura Lima Dias, Eduardo Barrére, Jairo Francisco de Souza
      Abstract: Journal of Information Science, Ahead of Print.
      Increasing videos available in educational content repositories makes searching difficult, and recommendation systems have been used to help students and teachers receive a content of interest. Speech is an important carrier of information in video lectures and is used by content-based video recommendation systems. Although automatic speech recognition (ASR) transcripts have been used in modern video recommendation systems, it is not clear how annotation techniques work with noisy text. This article presents an analysis on a set of semantic annotation techniques when applied to text extracted from video lecture speech and their impact on two tasks: annotation and similarity analysis. Experiments show that topic models have good results in this scenario. Besides, a new benchmark for this task has been created and researchers can use it to evaluate new techniques.
      Citation: Journal of Information Science
      PubDate: 2020-06-23T04:31:27Z
      DOI: 10.1177/0165551520931732
       
  • The effects of globalisation techniques on feature selection for text
           classification
    • Authors: Bekir Parlak, Alper Kursat Uysal
      Abstract: Journal of Information Science, Ahead of Print.
      Text classification (TC) is very important and critical task in the 21th century as there exist high volume of electronic data on the Internet. In TC, textual data are characterised by a huge number of highly sparse features/terms. A typical TC consists of many steps and one of the most important steps is undoubtedly feature selection (FS). In this study, we have comprehensively investigated the effects of various globalisation techniques on local feature selection (LFS) methods using datasets with different characteristics such as multi-class unbalanced (MCU), multi-class balanced (MCB), binary-class unbalanced (BCU) and binary-class balanced (BCB). The globalisation techniques used in this study are summation (SUM), weighted-sum (AVG), and maximum (MAX). To investigate the effect of globalisation techniques, we used three LFS methods named as Discriminative Feature Selection (DFSS), odds ratio (OR) and chi-square (CHI2). In the experiments, we have utilised four different benchmark datasets named as Reuters-21578, 20Newsgroup., Enron1, and Polarity in addition to Support Vector Machines (SVM) and Decision Tree (DT) classifiers. According to the experimental results, the most successful globalisation technique is AVG while all situations are taken into account. The experimental results indicate that DFSS method is more successful than OR and CHI2 methods on datasets with MCU and MCB characteristics. However, CHI2 method seems more accurate than OR and DFSS methods on datasets with BCU and BCB characteristics. Also, SVM classifier performed better than DT classifier in most cases.
      Citation: Journal of Information Science
      PubDate: 2020-06-18T01:32:28Z
      DOI: 10.1177/0165551520930897
       
  • Sentiment analysis of tweets through Altmetrics: A machine learning
           approach
    • Authors: Saeed-Ul Hassan, Aneela Saleem, Saira Hanif Soroya, Iqra Safder, Sehrish Iqbal, Saqib Jamil, Faisal Bukhari, Naif Radi Aljohani, Raheel Nawaz
      Abstract: Journal of Information Science, Ahead of Print.
      The purpose of the study is to (a) contribute to annotating an Altmetrics dataset across five disciplines, (b) undertake sentiment analysis using various machine learning and natural language processing–based algorithms, (c) identify the best-performing model and (d) provide a Python library for sentiment analysis of an Altmetrics dataset. First, the researchers gave a set of guidelines to two human annotators familiar with the task of related tweet annotation of scientific literature. They duly labelled the sentiments, achieving an inter-annotator agreement (IAA) of 0.80 (Cohen’s Kappa). Then, the same experiments were run on two versions of the dataset: one with tweets in English and the other with tweets in 23 languages, including English. Using 6388 tweets about 300 papers indexed in Web of Science, the effectiveness of employed machine learning and natural language processing models was measured by comparing with well-known sentiment analysis models, that is, SentiStrength and Sentiment140, as the baseline. It was proved that Support Vector Machine with uni-gram outperformed all the other classifiers and baseline methods employed, with an accuracy of over 85%, followed by Logistic Regression at 83% accuracy and Naïve Bayes at 80%. The precision, recall and F1 scores for Support Vector Machine, Logistic Regression and Naïve Bayes were (0.89, 0.86, 0.86), (0.86, 0.83, 0.80) and (0.85, 0.81, 0.76), respectively.
      Citation: Journal of Information Science
      PubDate: 2020-06-16T04:57:16Z
      DOI: 10.1177/0165551520930917
       
  • Using social media during job search: The case of 16–24 year olds in
           Scotland
    • Authors: John A Mowbray, Hazel Hall
      Abstract: Journal of Information Science, Ahead of Print.
      Social media are powerful networking platforms that provide users with significant information opportunities. Despite this, little is known about their impact on job search behaviour. Here, interview (participants = 7), focus group (participants = 6) and survey (n = 558) data supplied by young jobseekers in Scotland were analysed to investigate the role of social media in job search. The findings show that Facebook, Twitter and LinkedIn are the most popular platforms for this purpose, and that the type of job sought influences the direction of user behaviour. Frequent social media use for job search is linked with interview invitations. The study also reveals that although most jobseekers use social media for job search sparingly, they are much more likely to do so if advised by a professional. Combined, the findings represent a crucial base of knowledge which can inform careers policy and be used as a platform for further research.
      Citation: Journal of Information Science
      PubDate: 2020-06-11T08:42:31Z
      DOI: 10.1177/0165551520927657
       
  • From words to connections: Word use similarity as an honest signal
           conducive to employees’ digital communication
    • Authors: Andrea Fronzetti Colladon, Johanne Saint-Charles, Pierre Mongeau
      Abstract: Journal of Information Science, Ahead of Print.
      Bringing together considerations from three research trends (honest signals of collaboration, socio-semantic networks and homophily theory), we hypothesise that word use similarity and having similar social network positions are linked with the level of employees’ digital interaction. To verify our hypothesis, we analyse the communication of close to 1600 employees, interacting on the intranet communication forum of a large company. We study their social dynamics and the ‘honest signals’ that, in past research, proved to be conducive to employees’ engagement and collaboration. We find that word use similarity is the main driver of interaction, much more than other language characteristics or similarity in network position. Our results suggest carefully choosing the language according to the target audience and have practical implications for both company managers and online community administrators. Understanding how to better use language could, for example, support the development of knowledge sharing practices or internal communication campaigns.
      Citation: Journal of Information Science
      PubDate: 2020-06-10T05:11:46Z
      DOI: 10.1177/0165551520929931
       
  • Supporting information use and task accomplishment: What system features
           do users like and expect'
    • Authors: Jingjing Liu, Yuan Li
      Abstract: Journal of Information Science, Ahead of Print.
      Information systems have been improving in helping users find information. However, they have been less attended to regarding helping searchers in using located information. This research attempts to address the issue of information use by investigating what information systems and features searchers think are helpful in using located information to accomplish information tasks. In all, 32 college students were invited to an information interaction lab, first being interviewed on a recently completed task and then working on a to-be-finished task, both being their real-life tasks of their own choices. Through questionnaires, the study discovered the most favoured existing and expected features helpful for users’ task completion. Users expected convenient citations, note taking in search result pages and being kept on task. Findings in this study have implications on designing search systems that can better support task accomplishment, in addition to returning search results.
      Citation: Journal of Information Science
      PubDate: 2020-06-08T08:00:29Z
      DOI: 10.1177/0165551520917100
       
  • A survey on automatically constructed universal knowledge bases
    • Authors: Bayzid Ashik Hossain, Abdus Salam, Rolf Schwitter
      Abstract: Journal of Information Science, Ahead of Print.
      A universal knowledge base can be defined as a domain-independent ontology containing instances. Ontologies define the concepts and relations among these concepts and are used to represent a domain of interest. These universal knowledge bases are the elementary units for automated reasoning on the Semantic Web. The Semantic Web is an extension of the World Wide Web which facilitates software agents to share content beyond the limitations of applications and websites. This survey focuses on the most prominent automatically constructed universal knowledge bases including KnowItAll, DBpedia, YAGO, NELL, Probase, BabelNet and Knowledge Vault. We take a closer look at how these knowledge bases are built, in particular at the information extraction and taxonomy generation process and investigate how they are used in practical applications. Due to quality concerns, the most successful and widely employed knowledge bases are manually constructed to maintain high quality, but they suffer from low coverage, high assembly and quality assurance cost. On the contrary, automatic approaches for building knowledge bases try to overcome these drawbacks. Although it is strenuous to achieve the same level of quality as for manual knowledge bases, we found that the surveyed automatically constructed knowledge bases have shown promising results and are useful for many real-world applications.
      Citation: Journal of Information Science
      PubDate: 2020-06-05T06:29:03Z
      DOI: 10.1177/0165551520921342
       
  • Predicting mobile application breakout using sentiment analysis of
           Facebook posts
    • Authors: Moez Ben Hajhmida, Oumayma Oueslati
      Abstract: Journal of Information Science, Ahead of Print.
      Publishing mobile applications on the official stores is becoming a big business. Many developers are charmed by the billion-dollar success of breakout applications. Thus, in order to ensure success, mobile applications need to sustain top ranking. Previous work on the predictability of mobile applications success aimed to extract from app stores relevant features that influence high rating. In this article, we propose an automated approach to exploit data available on Facebook platform that predicts mobile applications breakout. We collect data from Facebook graph API, then determine sentiment polarity of user comments. We design statistical features to score users sentiment for each post. Then, we compose posts scores with Facebook statistical measures to form a mobile applications breakout dataset. Finally, we use machine learning techniques to build our breakout prediction model. We evaluate our approach with 199 mobile applications and obtain a prediction accuracy of 83.78%. We find that Likes count on a Facebook page is decisive for climbing mobile applications ranking. However, a high rate of negative opinions declines application ranking and deprives mobile application of achieving a breakout. Based on these findings, we provide evidence that user interactions on social networks can influence the success of mobile applications.
      Citation: Journal of Information Science
      PubDate: 2020-05-27T06:31:06Z
      DOI: 10.1177/0165551520917099
       
  • SBTM: A joint sentiment and behaviour topic model for online course
           discussion forums
    • Authors: Xian Peng, Qinmei Xu, Wenbin Gan
      Abstract: Journal of Information Science, Ahead of Print.
      Large quantities of textual posts are increasingly generated in course discussion forums, and the accumulation of these data greatly increases the cognitive loads on online participants. It is imperative for them to automatically identify the potential semantic information derived from these textual discourse interactions. Moreover, existing topic models can discover the latent topics or sentimental polarities from textual data, but these models typically ignore the interactive ways of discussing topics, thus making it difficult to further construct topics’ semantic space from the perspective of document generation. To solve this issue, we proposed a joint sentiment and behaviour topic model called SBTM, which was an unsupervised approach for automatic analysis of learners’ discussed posts. The results demonstrated that SBTM was quantitatively effective on both model generalisation and topic exploration, and rich topic content was qualitatively characterised. Furthermore, the model can be potentially employed in some practical applications, such as information summarisation and behaviour-oriented personalised recommendation.
      Citation: Journal of Information Science
      PubDate: 2020-05-27T06:23:07Z
      DOI: 10.1177/0165551520917120
       
  • Intelligent detection of hate speech in Arabic social network: A machine
           learning approach
    • Authors: Ibrahim Aljarah, Maria Habib, Neveen Hijazi, Hossam Faris, Raneem Qaddoura, Bassam Hammo, Mohammad Abushariah, Mohammad Alfawareh
      Abstract: Journal of Information Science, Ahead of Print.
      Nowadays, cyber hate speech is increasingly growing, which forms a serious problem worldwide by threatening the cohesion of civil societies. Hate speech relates to using expressions or phrases that are violent, offensive or insulting for a person or a minority of people. In particular, in the Arab region, the number of Arab social media users is growing rapidly, which is accompanied with high increasing rate of cyber hate speech. This drew our attention to aspire healthy online environments that are free of hatred and discrimination. Therefore, this article aims to detect cyber hate speech based on Arabic context over Twitter platform, by applying Natural Language Processing (NLP) techniques, and machine learning methods. The article considers a set of tweets related to racism, journalism, sports orientation, terrorism and Islam. Several types of features and emotions are extracted and arranged in 15 different combinations of data. The processed dataset is experimented using Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT) and Random Forest (RF), in which RF with the feature set of Term Frequency-Inverse Document Frequency (TF-IDF) and profile-related features achieves the best results. Furthermore, a feature importance analysis is conducted based on RF classifier in order to quantify the predictive ability of features in regard to the hate class.
      Citation: Journal of Information Science
      PubDate: 2020-05-18T07:50:11Z
      DOI: 10.1177/0165551520917651
       
  • Museum libraries in Spain: A case study at state level
    • Authors: Silvia Cobo-Serrano, Rosario Arquero-Avilés, Gonzalo Marco-Cuenca
      Abstract: Journal of Information Science, Ahead of Print.
      Special libraries are essential information and documentation centres for university teachers and researchers due to the quality and richness of their collections. In Spain, it is estimated that there are 2456 special libraries, although many are unknown either generally or among information professionals. These include museum libraries, which are important centres with valuable collections of bibliographic heritage for the area of Humanities and Social Sciences. The aim of this research is to gain an understanding of the real state of these information units and promote the social value of museum libraries in Spain. To do this, a survey was sent to the libraries of state-owned and -managed museums under the General Directorate of Fine Arts and Cultural Property (Ministry of Culture and Sports) of the Government of Spain. This general objective will be accompanied by a review of the scientific literature on various aspects of museum libraries at national and international level. After addressing the research methodology, the results obtained will be discussed and will include the following topics: collection management, library services and staff, economic and technological resources and finally, library management. Conclusions include recommendations for museum librarians and reveal that institutional cooperation is a strategic issue to improve both museum libraries visibility and their social recognition as cultural and research centre.
      Citation: Journal of Information Science
      PubDate: 2020-05-15T07:08:07Z
      DOI: 10.1177/0165551520917652
       
  • Performance-based evaluation of academic libraries in the big data era
    • Authors: A Y M Atiquil Islam, Khurshid Ahmad, Muhammad Rafi, Zheng JianMing
      Abstract: Journal of Information Science, Ahead of Print.
      The concept of big data has been extensively considered as a technological modernisation in organisations and educational institutes. Thus, the purpose of this study is to determine whether the modified technology acceptance model (MTAM) is viable for evaluating the performance of librarians in the use of big data analytics in academic libraries. This study used an empirical research method for collecting data from 211 librarians working in Pakistan’s universities. On the basis of the findings of the MTAM analysis by structural equation modelling, the performances of the academic libraries were comprehended through the process of big data. The main influential components of the performance analysis in this study were the big data analytics capabilities, perceived ease of access and the usefulness of big data practices in academic libraries. Subsequently, the utilisation of big data was significantly affected by skills, perceived ease of access and the usefulness of academic libraries. The results also suggested that the various components of the academic libraries lead to effective organisational performance when linked to big data analytics.
      Citation: Journal of Information Science
      PubDate: 2020-05-13T04:34:42Z
      DOI: 10.1177/0165551520918516
       
  • Semantics-preserving optimisation of mapping multi-column key constraints
           for RDB to RDF transformation
    • Authors: Hee-Gook Jun, Dong-Hyuk Im, Hyoung-Joo Kim
      Abstract: Journal of Information Science, Ahead of Print.
      The relational database (RDB) to resource description framework (RDF) transformation is a major semantic information extraction method because most web data are managed by RDBs. Existing automatic RDB-to-RDF transformation methods generate RDF data without losing the semantics of original relational data. However, two major problems have been observed during the mapping of multi-column key constraints: repetitive data generation and semantic information loss. In this article, we propose an improved RDB-to-RDF transformation method that ensures mapping without the aforementioned problems. Optimised rules are defined to generate an accurate semantic data structure for a multi-column key constraint and to reduce repetitive constraint data. Experimental results show that the proposed method achieves better accuracy in transforming multi-column key constraints and generates compact semantic results without repetitive data.
      Citation: Journal of Information Science
      PubDate: 2020-05-12T05:14:54Z
      DOI: 10.1177/0165551520920804
       
  • Topic extraction to provide an overview of research activities: The case
           of the high-temperature superconductor and simulation and modelling
    • Authors: Ritsuko Nakajima, Nobuyuki Midorikawa
      Abstract: Journal of Information Science, Ahead of Print.
      For those who are not experts in a particular scientific field, it is difficult to understand scientific research trends. Although studies on the extraction of research trends have been conducted, most focus on extracting global trends from large-scale data, and the methods are often complicated. The purpose of this study is to develop a method of obtaining overviews of a scientific field for non-experts by capturing research trends simply and then to verify the method. To extract research topics which should express research trends, text analysis was performed using abstracts over 12 years of articles on high-temperature superconductors. We characterised three topics for the extracted word groups that frequently occurred. For these topics, we studied their appropriateness using a method that has been little used: examining research articles, review literature and co-citations among research articles used to extract the words, comparisons with controlled index terms assigned to the articles and confirming that there were no contradictions. Based on the established method, we have also applied this method to another research field: ‘simulation and modelling’. Although the method used in this article is simple, important topics were extracted, and the relations with the original articles are clear, which can lead to further investigation of the extracted topics.
      Citation: Journal of Information Science
      PubDate: 2020-05-06T05:11:55Z
      DOI: 10.1177/0165551520920794
       
  • Partitioning highly, medium and lowly cited publications
    • Authors: Yong Huang, Yi Bu, Ying Ding, Wei Lu
      Abstract: Journal of Information Science, Ahead of Print.
      Dividing papers based on their numbers of citations into several groups constitutes one of the most common research practices in bibliometrics and beyond. However, existing dividing methods are both arbitrary and subject to bias. This article proposes a novel approach to partition highly, medium and lowly cited publications based on their citation distribution. We utilise the whole Web of Science (WoS) dataset to demonstrate how to apply this approach to scholarly datasets and examine the robustness of our algorithm in each of the six disciplines under the WoS dataset. The codes that underlie the algorithm are available online.
      Citation: Journal of Information Science
      PubDate: 2020-04-27T04:30:29Z
      DOI: 10.1177/0165551520917655
       
  • Which are the influential publications in the Web of Science subject
           categories over a long period of time' CRExplorer software used for
           big-data analyses in bibliometrics
    • Authors: Andreas Thor, Lutz Bornmann, Robin Haunschild, Loet Leydesdorff
      Abstract: Journal of Information Science, Ahead of Print.
      What are the landmark papers in scientific disciplines' Which papers are indispensable for scientific progress' These are typical questions which are of interest not only for researchers (who frequently know the answers – or guess to know them) but also for the interested general public. Citation counts can be used to identify very useful papers since they reflect the wisdom of the crowd – in this case, the scientists using published results for their research. In this study, we identified with recently developed methods for the program CRExplorer landmark publications in nearly all Web of Science subject categories (WoS-SCs). These are publications which belong more frequently than other publications during the citing years to the top-1‰ in their subject area. As examples, we show the results of five subject categories: ‘Information Science & Library Science’, ‘Computer Science, Information Systems’, ‘Computer Science, Software Engineering’, ‘Psychology, Social’ and, ‘Chemistry, Physical’. The results of the other WoS-SCs can be found online at http://crexplorer.net. An analyst of the results should keep in mind that the identification of landmark papers depends on the used methods and data. Small differences in methods and/or data may lead to other results.
      Citation: Journal of Information Science
      PubDate: 2020-04-24T05:31:55Z
      DOI: 10.1177/0165551520913817
       
  • An ensemble clustering approach for topic discovery using implicit text
           segmentation
    • Authors: Muhammad Qasim Memon, Yu Lu, Penghe Chen, Aasma Memon, Muhammad Salman Pathan, Zulfiqar Ali Zardari
      Abstract: Journal of Information Science, Ahead of Print.
      Text segmentation (TS) is the process of dividing multi-topic text collections into cohesive segments using topic boundaries. Similarly, text clustering has been renowned as a major concern when it comes to multi-topic text collections, as they are distinguished by sub-topic structure and their contents are not associated with each other. Existing clustering approaches follow the TS method which relies on word frequencies and may not be suitable to cluster multi-topic text collections. In this work, we propose a new ensemble clustering approach (ECA) is a novel topic-modelling-based clustering approach, which induces the combination of TS and text clustering. We improvised a LDA-onto (LDA-ontology) is a TS-based model, which presents a deterioration of a document into segments (i.e. sub-documents), wherein each sub-document is associated with exactly one sub-topic. We deal with the problem of clustering when it comes to a document that is intrinsically related to various topics and its topical structure is missing. ECA is tested through well-known datasets in order to provide a comprehensive presentation and validation of clustering algorithms using LDA-onto. ECA exhibits the semantic relations of keywords in sub-documents and resultant clusters belong to original documents that they contain. Moreover, present research sheds the light on clustering performances and it indicates that there is no difference over performances (in terms of F-measure) when the number of topics changes. Our findings give above par results in order to analyse the problem of text clustering in a broader spectrum without applying dimension reduction techniques over high sparse data. Specifically, ECA provides an efficient and significant framework than the traditional and segment-based approach, such that achieved results are statistically significant with an average improvement of over 10.2%. For the most part, proposed framework can be evaluated in applications where meaningful data retrieval is useful, such as document summarization, text retrieval, novelty and topic detection.
      Citation: Journal of Information Science
      PubDate: 2020-04-14T08:30:03Z
      DOI: 10.1177/0165551520911590
       
  • Cross-lingual text similarity exploiting neural machine translation models
    • Authors: Kazuhiro Seki
      Abstract: Journal of Information Science, Ahead of Print.
      This article studies cross-lingual text similarity using neural machine translation models. A straightforward approach based on machine translation is to use translated text so as to make the problem monolingual. Another possible approach is to use intermediate states of machine translation models as recently proposed in the related work, which could avoid propagation of translation errors. We aim at improving both approaches independently and then combine the two types of information, that is, translations and intermediate states, in a learning-to-rank framework to compute cross-lingual text similarity. To evaluate the effectiveness and generalisability of our approach, we conduct empirical experiments on English–Japanese and English–Hindi translation corpora for a cross-lingual sentence retrieval task. It is demonstrated that our approach using translations and intermediate states outperforms other neural network–based approaches and is even comparable with a strong baseline based on a state-of-the-art machine translation system.
      Citation: Journal of Information Science
      PubDate: 2020-03-19T04:39:20Z
      DOI: 10.1177/0165551520912676
       
  • Semisupervised sentiment analysis method for online text reviews
    • Authors: Gyeong Taek Lee, Chang Ouk Kim, Min Song
      Abstract: Journal of Information Science, Ahead of Print.
      Sentiment analysis plays an important role in understanding individual opinions expressed in websites such as social media and product review sites. The common approaches to sentiment analysis use the sentiments carried by words that express opinions and are based on either supervised or unsupervised learning techniques. The unsupervised learning approach builds a word-sentiment dictionary, but it requires lengthy time periods and high costs to build a reliable dictionary. The supervised learning approach uses machine learning models to learn the sentiment scores of words; however, training a classifier model requires large amounts of labelled text data to achieve a good performance. In this article, we propose a semisupervised approach that performs well despite having only small amounts of labelled data available for training. The proposed method builds a base sentiment dictionary from a small training dataset using a lasso-based ensemble model with minimal human effort. The scores of words not in the training dataset are estimated using an adaptive instance-based learning model. In a pretrained word2vec model space, the sentiment values of the words in the dictionary are propagated to the words that did not exist in the training dataset. Through two experiments, we demonstrate that the performance of the proposed method is comparable to that of supervised learning models trained on large datasets.
      Citation: Journal of Information Science
      PubDate: 2020-03-02T05:46:53Z
      DOI: 10.1177/0165551520910032
       
  • A qualitative–quantitative study of science mapping by different
           algorithms: The Polish journals landscape
    • Authors: Veslava Osinska
      Abstract: Journal of Information Science, Ahead of Print.
      By applying different clustering algorithms, the author strived to construct the best visual representation of scientific domains and disciplines in Poland. Journals and their disciplinary categories constituted a data set. A comparative analysis of maps was based on both qualitative and quantitative approaches. Complex patterns of eight maps were evaluated taking into account both the local proximity of disciplines and the whole structure of presented domains. Final clustering quality value was introduced and calculated in reference to the knowledge domains. The authors underlined the role of quantitative and qualitative methods in combination in the mapping evaluation. The best results were obtained with the T-distributed stochastic neighbour embedding (t-SNE) algorithm. This youngest technique may have the biggest potential for semantic information studies and in the scope of broadly understood semantic solutions.
      Citation: Journal of Information Science
      PubDate: 2020-02-03T09:30:59Z
      DOI: 10.1177/0165551520902738
       
  • Online news media website ranking using user-generated content
    • Authors: Samaneh Karimi, Azadeh Shakery, Rakesh Verma
      Abstract: Journal of Information Science, Ahead of Print.
      News media websites are important online resources that have drawn great attention of text mining researchers. The main aim of this study is to propose a framework for ranking online news websites from different viewpoints. The ranking of news websites provides useful information, which can benefit many news-related tasks such as news retrieval and news recommendation. In the proposed framework, the ranking of news websites is obtained by calculating three measures introduced in the article and based on user-generated content (UGC). Each proposed measure is concerned with the performance of news websites from a particular viewpoint including the completeness of news reports, the diversity of events being covered by the website and its speed. The use of UGC in this framework, as a partly unbiased, real-time and low cost content on the web distinguishes the proposed news website ranking framework from the literature. The results obtained for three prominent news websites, British Broadcasting Corporation (BBC), Cable News Network (CNN) and New York Times (NYTimes), show that BBC has the best performance in terms of news completeness and speed, and NYTimes has the best diversity in comparison with the other two websites.
      Citation: Journal of Information Science
      PubDate: 2020-02-03T09:08:28Z
      DOI: 10.1177/0165551519894928
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 3.230.76.48
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-