Followed Journals
Journal you Follow: 0
 
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Journals are published.
Already have an account? Sign In to see the journals you follow.
Similar Journals
Journal Cover
Journal of Information Science
Journal Prestige (SJR): 0.674
Citation Impact (citeScore): 2
Number of Followers: 1296  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 0165-5515 - ISSN (Online) 1741-6485
Published by Sage Publications Homepage  [1156 journals]
  • Using ISO and Semantic Web standard for building a multilingual
           terminology e-Dictionary: A use case of Chinese ceramic vases

    • Free pre-print version: Loading...

      Authors: Tong Wei, Christophe Roche, Maria Papadopoulou, Yangli Jia
      Abstract: Journal of Information Science, Ahead of Print.
      Cultural heritage is the legacy of physical artefacts and intangible attributes of a group or society that is inherited from past generations. Terminology is a tool for the dissemination and communication of cultural heritage. The lack of clearly identified terminologies is an obstacle to communication and knowledge sharing. Especially, for experts with different languages, it is difficult to understand what the term refers to only through terms. Our work aims to respond to this issue by implementing practices drawn from the Semantic Web and ISO Terminology standards (ISO 704 and ISO 1087-1) and more particularly, by building in a W3C format ontology as knowledge infrastructure to construct a multilingual terminology e-Dictionary. The Chinese ceramic vases of the Ming and Qing dynasties are the application cases of our work. The method of building ontology is the ‘term-and-characteristic guided method’, which follows the ISO principles of Terminology. The main result of this work is an online terminology e-Dictionary. The terminology e-Dictionary could help archaeologists communicate and understand the concepts denoted by terms in different languages and provide a new perspective based on ontology for the digital protection of cultural heritage. The e-Dictionary was published at http://www.dh.ketrc.com/e-dictionary.html.
      Citation: Journal of Information Science
      PubDate: 2021-06-09T06:36:36Z
      DOI: 10.1177/01655515211022185
       
  • Enhancing data quality to mine credible patterns

    • Free pre-print version: Loading...

      Authors: Muhammad Imran, Adnan Ahmad
      Abstract: Journal of Information Science, Ahead of Print.
      The importance of big data is widely accepted in various fields. Organisations spend a lot of money to collect, process and mine the data to identify patterns. These patterns facilitate their future decision-making process to improve the organisational performance and profitability. However, among discovered patterns, there are some meaningless and misleading patterns which restrict the effectiveness of decision-making process. The presence of data discrepancies, noise and outliers also impacts the quality of discovered patterns and leads towards missing strategic goals and objectives. Quality inception of these discovered patterns is vital before utilising them in making predictions, decision-making process or strategic planning. Mining useful and credible patterns over social media is a challenging task. Often, people spread targeted content for character assassination or defamation of brands. Recently, some studies have evaluated the credibility of information over social media based on users’ surveys, experts’ judgement and manually annotating Twitter tweets to predict credibility. Unfortunately, due to the large volume and exponential growth of data, these surveys and annotation-based information credibility techniques are not efficiently applicable. This article presents a data quality and credibility evaluation framework to determine the quality of individual data instances. This framework provides a way to discover useful and credible patterns using credibility indicators. Moreover, a new Twitter bot detection algorithm is proposed to classify tweets generated by Twitter bots and real users. The results of conducted experiments showed that the proposed model generates a positive impact on improving classification accuracy and quality of discovered patterns.
      Citation: Journal of Information Science
      PubDate: 2021-06-07T04:36:23Z
      DOI: 10.1177/01655515211013693
       
  • A semantic metric for concepts similarity in knowledge graphs

    • Free pre-print version: Loading...

      Authors: Majed A Alkhamees, Mohammed A Alnuem, Saleh M Al-Saleem, Abdulrakeeb M Al-Ssulami
      Abstract: Journal of Information Science, Ahead of Print.
      Semantic similarity between concepts concerns expressing the degree of similarity in meaning between two concepts in a computational model. This problem has recently attracted considerable attention from researchers in attempting to automate the understanding of word meanings to expedite the classification of users’ opinions and attitudes embedded in text. In this article, a semantic similarity metric is presented. The proposed metric, namely, weighted information-content (wic), exploits the information content of the least common subsumer of two compared concepts and the depth information in knowledge graphs such as DBPedia and YAGO. The two similarity components were combined using calibrated cooperative contributions from both similarity components. A statistical test using the Spearman correlations on well-known human judgement word-similarity data sets showed that the wic metric produced more highly correlated similarities compared with state-of-the-art metrics. In addition, a real-world aspect category classification was evaluated, which exhibited further increased accuracy and recall.
      Citation: Journal of Information Science
      PubDate: 2021-06-04T05:10:05Z
      DOI: 10.1177/01655515211020580
       
  • Effect of Chinese characters on machine learning for Chinese author name
           disambiguation: A counterfactual evaluation

    • Free pre-print version: Loading...

      Authors: Jinseok Kim, Jenna Kim, Jinmo Kim
      Abstract: Journal of Information Science, Ahead of Print.
      Chinese author names are known to be more difficult to disambiguate than other ethnic names because they tend to share surnames and forenames, thus creating many homonyms. In this study, we demonstrate how using Chinese characters can affect machine learning for author name disambiguation. For analysis, 15K author names recorded in Chinese are transliterated into English and simplified by initialising their forenames to create counterfactual scenarios, reflecting real-world indexing practices in which Chinese characters are usually unavailable. The results show that Chinese author names that are highly ambiguous in English or with initialised forenames tend to become less confusing if their Chinese characters are included in the processing. Our findings indicate that recording Chinese author names in native script can help researchers and digital libraries enhance authority control of Chinese author names that continue to increase in size in bibliographic data.
      Citation: Journal of Information Science
      PubDate: 2021-05-31T07:50:30Z
      DOI: 10.1177/01655515211018171
       
  • A review on h-index and its alternative indices

    • Free pre-print version: Loading...

      Authors: Anand Bihari, Sudhakar Tripathi, Akshay Deepak
      Abstract: Journal of Information Science, Ahead of Print.
      In recent years, several scientometrics and bibliometrics indicators were proposed to evaluate the scientific impact of individuals, institutions, colleges, universities and research teams. The h-index gives a breakthrough in the research community for assessing the scientific impact of an individual. It got a lot of attention due to its simplicity, and several other indicators were proposed to extend the properties of the h-index and to overcome its shortcomings. In this literature review, we have discussed the advantages and limitations of almost all scientometrics and bibliometrics indicators, which have been categorised into seven categories based on their properties: (1) complement of h-index, (2) based on total number of authors, (3) based on publication age, (4) combination of two indices, (5) based on excess citation count, (6) based on total publication count and (7) based on other variants. The primary objective of this article is to study all those indicators which have been proposed to evaluate the scientific impact of an individual researcher or a group of researchers.
      Citation: Journal of Information Science
      PubDate: 2021-05-31T07:49:41Z
      DOI: 10.1177/01655515211014478
       
  • Effect of data environment and cognitive ability on participants’
           attitude towards data governance

    • Free pre-print version: Loading...

      Authors: Guoyin Jiang, Xingshun Cai, Xiaodong Feng, Wenping Liu
      Abstract: Journal of Information Science, Ahead of Print.
      Data governance has received research attention, but its effect on public attitude has not been sufficiently explored. To analyse the attitude towards public participation in data governance in the context of a tourism platform, we conduct an empirical model that aims to understand the impact of data governance environment and participants’ cognitive ability on attitude. Taking tourism sharing platforms as an example, we collected 339 questionnaires for data analysis. Results show that data quality and website design have a positive effect on users’ attitude towards data governance through data literacy self-efficacy. Data literacy self-efficacy has a suppression effect between data quality and attitude towards data governance and has the same effect between website design and attitude towards data governance. Data quality and website design have a positive effect on users’ attitude towards data governance through platform interaction. Platform interactivity plays a mediating role between data quality and attitude towards data governance and has the same effect between website design and attitude towards data governance. Data policy has a positive effect on users’ data literacy self-efficacy but no significant effect on platform interactivity. Moreover, this study provides theoretical and practical implications that can guide the government in policy implementation and platform managers in data governance.
      Citation: Journal of Information Science
      PubDate: 2021-05-31T07:25:07Z
      DOI: 10.1177/01655515211019000
       
  • Efficient indexing and retrieval of patient information from the big data
           using MapReduce framework and optimisation

    • Free pre-print version: Loading...

      Authors: N.R. Gladiss Merlin, Vigilson Prem. M
      Abstract: Journal of Information Science, Ahead of Print.
      Large and complex data becomes a valuable resource in biomedical discovery, which is highly facilitated to increase the scientific resources for retrieving the helpful information. However, indexing and retrieving the patient information from the disparate source of big data is challenging in biomedical research. Indexing and retrieving the patient information from big data is performed using the MapReduce framework. In this research, the indexing and retrieval of information are performed using the proposed Jaya-Sine Cosine Algorithm (Jaya–SCA)-based MapReduce framework. Initially, the input big data is forwarded to the mapper randomly. The average of each mapper data is calculated, and these data are forwarded to the reducer, where the representative data are stored. For each user query, the input query is matched with the reducer, and thereby, it switches over to the mapper for retrieving the matched best result. The bilevel matching is performed while retrieving the data from the mapper based on the distance between the query. The similarity measure is computed based on the parametric-enabled similarity measure (PESM), cosine similarity and the proposed Jaya–SCA, which is the integration of the Jaya algorithm and the SCA. Moreover, the proposed Jaya–SCA algorithm attained the maximum value of F-measure, recall and precision of 0.5323, 0.4400 and 0.6867, respectively, using the StatLog Heart Disease dataset.
      Citation: Journal of Information Science
      PubDate: 2021-05-24T07:53:51Z
      DOI: 10.1177/01655515211013708
       
  • A domain categorisation of vocabularies based on a deep learning
           classifier

    • Free pre-print version: Loading...

      Authors: Alberto Nogales, Miguel-Angel Sicilia, Álvaro J García-Tejedor
      Abstract: Journal of Information Science, Ahead of Print.
      The publication of large amounts of open data is an increasing trend. This is a consequence of initiatives like Linked Open Data (LOD) that aims at publishing and linking data sets published in the World Wide Web. Linked Data publishers should follow a set of principles for their task. This information is described in a 2011 document that includes the consideration of reusing vocabularies as key. The Linked Open Vocabularies (LOV) project attempts to collect the vocabularies and ontologies commonly used in LOD. These ontologies have been classified by domain following the criteria of LOV members, thus having the disadvantage of introducing personal biases. This article presents an automatic classifier of ontologies based on the main categories appearing in Wikipedia. For that purpose, word-embedding models are used in combination with deep learning techniques. Results show that with a hybrid model of regular Deep Neural Networks (DNNs), Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN), classification could be made with an accuracy of 93.57%. A further evaluation of the domain matchings between LOV and the classifier brings possible matchings in 79.8% of the cases.
      Citation: Journal of Information Science
      PubDate: 2021-05-24T04:33:07Z
      DOI: 10.1177/01655515211018170
       
  • Misplaced trust' The relationship between trust, ability to identify
           commercially influenced results and search engine preference

    • Free pre-print version: Loading...

      Authors: Sebastian Schultheiß, Dirk Lewandowski
      Abstract: Journal of Information Science, Ahead of Print.
      People have a high level of trust in search engines, especially Google, but only limited knowledge of them, as numerous studies have shown. This leads to the question: To what extent is this trust justified considering the lack of familiarity among users with how search engines work and the business models they are founded on' We assume that trust in Google, search engine preferences and knowledge of result types are interrelated. To examine this assumption, we conducted a representative online survey with n = 2012 German Internet users. We show that users with little search engine knowledge are more likely to trust and use Google than users with more knowledge. A contradiction revealed itself – users strongly trust Google, yet they are unable to adequately evaluate search results. For those users, this may be problematic since it can potentially affect knowledge acquisition. Consequently, there is a need to promote user information literacy to create a more solid foundation for user trust in search engines. The impact of our study lies in emphasising the need for creating appropriate training formats to promote information literacy.
      Citation: Journal of Information Science
      PubDate: 2021-05-14T10:37:31Z
      DOI: 10.1177/01655515211014157
       
  • Structuration analysis of e-government studies: A bibliometric analysis
           based on knowledge maps

    • Free pre-print version: Loading...

      Authors: Huii Jiang, Suli Wang, Jianrong Yao
      Abstract: Journal of Information Science, Ahead of Print.
      Considering the lack of systematic reviews on e-government research, this study includes research categories, spatial structure, research paradigms and noteworthy future topics in the domain of e-government. We collect 142 keywords from 2646 papers published in the Web of Science from 2000 to 2019 as the study object. Then, we identify four research categories: (1) technology and modelling in e-government, (2) drivers of e-government development, (3) public management and (4) governmental management. From public and government perspectives, we outline the spatial structure which includes theories and practices research and conclude four research paradigms: (1) theoretical modelling and application, (2) e-government development, (3) status of public management and (4) status of governmental management. Finally, we develop a 3D spatial map to analyse noteworthy topics and explore the well-studied themes: government-to-citizen, government-to-government and government-to-business, and the under-studied themes: government-to-civil society organisations and citizens-to-citizens, which helps scholars research e-government roundly.
      Citation: Journal of Information Science
      PubDate: 2021-05-14T08:53:32Z
      DOI: 10.1177/0165551520978346
       
  • A novel scheme of domain transfer in document-level cross-domain sentiment
           classification

    • Free pre-print version: Loading...

      Authors: Yueting Lei, Yanting Li
      Abstract: Journal of Information Science, Ahead of Print.
      The sentiment classification aims to learn sentiment features from the annotated corpus and automatically predict the sentiment polarity of new sentiment text. However, people have different ways of expressing feelings in different domains. Thus, there are important differences in the characteristics of sentimental distribution across different domains. At the same time, in certain specific domains, due to the high cost of corpus collection, there is no annotated corpus available for the classification of sentiment. Therefore, it is necessary to leverage or reuse existing annotated corpus for training. In this article, we proposed a new algorithm for extracting central sentiment sentences in product reviews, and improved the pre-trained language model Bidirectional Encoder Representations from Transformers (BERT) to achieve the domain transfer for cross-domain sentiment classification. We used various pre-training language models to prove the effectiveness of the newly proposed joint algorithm for text-ranking and emotional words extraction, and utilised Amazon product reviews data set to demonstrate the effectiveness of our proposed domain-transfer framework. The experimental results of 12 different cross-domain pairs showed that the new cross-domain classification method was significantly better than several popular cross-domain sentiment classification methods.
      Citation: Journal of Information Science
      PubDate: 2021-05-14T04:51:27Z
      DOI: 10.1177/01655515211012329
       
  • Government regulation of the Internet as instrument of digital
           protectionism in case of developing countries

    • Free pre-print version: Loading...

      Authors: Nikolai Topornin, Darya Pyatkina, Yuri Bokov
      Abstract: Journal of Information Science, Ahead of Print.
      The research is devoted to the study of digital protectionism technologies, in particular, Internet censorship as a non-tariff barrier to digital trade and the determination of the strategic motives of states to use them. The reports ‘Freedom on the Net’ and ‘The network readiness index 2020’ acted as a basic data source for the study of modern instruments of government regulation of interactions in the digital environment. Internet censorship technologies have been considered in six countries with varying levels of Internet freedom: Russia, Belarus, Kazakhstan, Georgia, Armenia and Estonia. The key instruments of digital protectionism as a non-tariff barrier of the digital economy have been identified, such as: localisation requirements; restrictions on cross-border data flow; system of national protection of intellectual property rights; discriminatory, unique standards or burdensome testing; filtering or blocking; restrictions on electronic payment systems or the use of encryption; cybersecurity threats and forced technology transfer. Internet censorship technologies have been demonstrated and their influence on the strategic development of trade relations between economies in cyberspace has been determined. The scientific value of the article lies in substantiating the understanding of Internet censorship as a natural tool for regulating the development of a digital society and international trade relations. Each state at one time goes through a technological stage of development, which leads to the emergence of different levels of digital isolation and integration; and Internet censorship is a natural element in the system of building a national platform economy and consolidating the country’s internal technological and innovative advantages in digital realities.
      Citation: Journal of Information Science
      PubDate: 2021-05-14T04:41:29Z
      DOI: 10.1177/01655515211014142
       
  • Multi-agent-based hybrid peer-to-peer system for distributed information
           retrieval

    • Free pre-print version: Loading...

      Authors: Abdel Naser Pouamoun, İlker Kocabaş
      Abstract: Journal of Information Science, Ahead of Print.
      With the increasingly huge amount of data located in various databases and the need for users to access them, distributed information retrieval (DIR) has been at the core of the preoccupations of a number of researchers. Indeed, numerous DIR systems and architectures have been proposed including the broker-based architecture. Moreover, providing DIR with more flexibility and adaptability has led researchers thinking to build DIR with software agents. Thus, this research proposes a design and an implementation of a novel system based on the broker-based architecture and the peer-to-peer (P2P) network called broker-based P2P network. The proposed architecture is implemented with a multi-agent system (MAS) where the main agent playing the role of the broker, receives query from a peer agent and forwards them to other peer agents each with their index and resources. Upon completing retrieval process at each peer agent, results are directly sent to the peer agent that initiated the query without using the broker agent. Java Agent DEvelopment framework (JADE) is used to implement the agents and, for experiments, TERRIER (TERabyte RetRIEveR) is extended and used as the search engine to retrieve the Text Retrieval Conference (TREC) collections dataset notably TREC-6. The peer agent that originated the query progressively collects results coming from other peer agents, normalises and merges them and then proceeds with re-ranking. For normalisation, MinMax and Sum that are unsupervised normalisation methods are used.
      Citation: Journal of Information Science
      PubDate: 2021-05-12T04:33:19Z
      DOI: 10.1177/01655515211010392
       
  • Communicating knowledge-focus through websites of higher education
           institutions

    • Free pre-print version: Loading...

      Authors: Andrej Miklosik, Nina Evans, Ivan Hlavaty
      Abstract: Journal of Information Science, Ahead of Print.
      Although higher education institutions (HEIs) are expected to be the leaders in knowledge generation and dissemination, it is often not clear whether they are knowledge-aware, that is, have a knowledge focus. In this article, the communication with stakeholders through HEI websites is examined to determine to what extent these institutions communicate about their knowledge initiatives and projects. This is done through an investigative study involving all HEIs and their faculties in Slovakia. Using content analysis, the study examines whether the publicly available resources on HEIs’ websites contain knowledge-related keywords, indicating the existence of a knowledge-focus. The results reveal that the websites of some HEIs contain hundreds of these resources, whereas others have none. Statistical evidence confirms that the intensity of communication about knowledge terms increases with the age and size of the HEI and is also dependent on the type of HEI (public, private state, foreign). Other dependencies between the examined factors have also been revealed, for example, HEIs that rank higher in Webometrics indicators are more intensive in their knowledge communications.
      Citation: Journal of Information Science
      PubDate: 2021-05-12T04:01:32Z
      DOI: 10.1177/01655515211014475
       
  • Binary background model with geometric mean for author-independent
           authorship verification

    • Free pre-print version: Loading...

      Authors: Pelin Canbay, Ebru A Sezer, Hayri Sever
      Abstract: Journal of Information Science, Ahead of Print.

      Authors hip verification (AV) is one of the main problems of authorship analysis and digital text forensics. The classical AV problem is to decide whether or not a particular author wrote the document in question. However, if there is one and relatively short document as the author’s known document, the verification problem becomes more difficult than the classical AV and needs a generalised solution. Regarding to decide AV of the given two unlabeled documents (2D-AV), we proposed a system that provides an author-independent solution with the help of a Binary Background Model (BBM). The BBM is a supervised model that provides an informative background to distinguish document pairs written by the same or different authors. To evaluate the document pairs in one representation, we also proposed a new, simple and efficient document combination method based on the geometric mean of the stylometric features. We tested the performance of the proposed system for both author-dependent and author-independent AV cases. In addition, we introduced a new, well-defined, manually labelled Turkish blog corpus to be used in subsequent studies about authorship analysis. Using a publicly available English blog corpus for generating the BBM, the proposed system demonstrated an accuracy of over 90% from both trained and unseen authors’ test sets. Furthermore, the proposed combination method and the system using the BBM with the English blog corpus were also evaluated with other genres, which were used in the international PAN AV competitions, and achieved promising results.
      Citation: Journal of Information Science
      PubDate: 2021-05-11T07:46:08Z
      DOI: 10.1177/01655515211007710
       
  • An integrative framework of information as both objective and subjective

    • Free pre-print version: Loading...

      Authors: Mohammad Hossein Jarrahi, Yuanye Ma, Cami Goray
      Abstract: Journal of Information Science, Ahead of Print.
      We present a model of information that integrates two competing perspectives of information by emulating the Chinese philosophy of yin-yang. The model embraces the two key dimensions of information that exist harmoniously: information as (1) objective and veridical representations in the world (information as object) and (2) socially constructed interpretations that are a result of contextual influences (information as subject). We argue that these two facets of information cocreate information as a unified system and complement one another through two processes, which we denote as forming and informing. While the information literature has historically treated these objective and subjective identities of information as incompatible, we argue that they are mutually relevant and that our understanding of one actually enhances our understanding of the other.
      Citation: Journal of Information Science
      PubDate: 2021-05-11T04:15:24Z
      DOI: 10.1177/01655515211014149
       
  • A recommendation-based reading list system prototype for learning and
           resource management

    • Free pre-print version: Loading...

      Authors: Gobinda Chowdhury, Kushwanth Koya, Mariam Bugaje
      Abstract: Journal of Information Science, Ahead of Print.
      A reading list is a list of reading items recommended by an academic to assist students’ acquisition of knowledge for a specific subject. Subsequently, the libraries of higher education institutions collect and assemble reading lists according to specific courses and offer the students the reading list service. However, the reading list is created based on localised intelligence, restricted to the academic’s knowledge of their field, semantics, experience and awareness of developments. This investigation aims to present the views and comments of academics, and library staff, on an envisaged aggregated reading list service, which aggregates recommended reading items from various higher education institutions. This being the aim, we build a prototype, which aggregates reading lists from different universities and showcase it to 19 academics and library staff in various higher education institutions to capture their views, comments and any recommendations. In the process, we also showcase the feasibility of collecting and aggregating reading lists, in addition to understanding the process of reading lists creation at their respective higher education institutions. The prototype successfully showcases the creation of ranked lists of reading items, authors, topics, modules and courses. Academics and library staff indicated that aggregated lists would collectively benefit the academic community. Consequently, recommendations in the form of process implementations and technological applications are made to overcome and successfully implement the proposed aggregated reading list service. This proof-of-concept demonstrates potential benefits for the academic community and identifies further challenges to overcome in order to scale it up to the implementation stage.
      Citation: Journal of Information Science
      PubDate: 2021-05-05T10:41:18Z
      DOI: 10.1177/01655515211006587
       
  • Obsolescence of the literature: A study of included studies in Cochrane
           reviews

    • Free pre-print version: Loading...

      Authors: Frandsen Tove Faber, Mette Brandt Eriksen, David Mortan Grøne Hammer
      Abstract: Journal of Information Science, Ahead of Print.
      Ageing or obsolescence describes the process of declining use of a particular publication over time and can affect the results of a citation analyses as the length of citation window can change rankings. Obsolescence may not only vary across fields but also across subfields or sub-disciplines. The aim of this study is to determine the sub-disciplinary differences of obsolescence on a larger scale allowing for differences over time as well. The study presents the results of an analysis of 82,759 references across 53 healthcare and health policy topics. The references in this study are extracted from systematic reviews published from 2012 to 2016. The analyses of obsolescence include median citation age and mean citation age. This study finds that the median citation age and the mean citation age differ considerably across groups. For the latter indicator, an analysis of the confidence intervals confirms these differences. Using the subfield categorisation from Cochrane review groups, we found larger differences across subfields than in the citing half-lives published by Journal Citation Reports. Obsolescence is important to consider when setting the length of the citation windows. This study emphasises the vast differences across health sciences subfields. The length of the citation period is thus highly important for the results of a bibliometric evaluation or study covering fields with very varying obsolescence rates.
      Citation: Journal of Information Science
      PubDate: 2021-04-14T08:54:10Z
      DOI: 10.1177/01655515211006588
       
  • A novel filter feature selection method for text classification: Extensive
           Feature Selector

    • Free pre-print version: Loading...

      Authors: Bekir Parlak, Alper Kursat Uysal
      Abstract: Journal of Information Science, Ahead of Print.
      As the huge dimensionality of textual data restrains the classification accuracy, it is essential to apply feature selection (FS) methods as dimension reduction step in text classification (TC) domain. Most of the FS methods for TC contain several number of probabilities. In this study, we proposed a new FS method named as Extensive Feature Selector (EFS), which benefits from corpus-based and class-based probabilities in its calculations. The performance of EFS is compared with nine well-known FS methods, namely, Chi-Squared (CHI2), Class Discriminating Measure (CDM), Discriminative Power Measure (DPM), Odds Ratio (OR), Distinguishing Feature Selector (DFS), Comprehensively Measure Feature Selection (CMFS), Discriminative Feature Selection (DFSS), Normalised Difference Measure (NDM) and Max–Min Ratio (MMR) using Multinomial Naive Bayes (MNB), Support-Vector Machines (SVMs) and k-Nearest Neighbour (KNN) classifiers on four benchmark data sets. These data sets are Reuters-21578, 20-Newsgroup, Mini 20-Newsgroup and Polarity. The experiments were carried out for six different feature sizes which are 10, 30, 50, 100, 300 and 500. Experimental results show that the performance of EFS method is more successful than the other nine methods in most cases according to micro-F1 and macro-F1 scores.
      Citation: Journal of Information Science
      PubDate: 2021-04-13T09:18:47Z
      DOI: 10.1177/0165551521991037
       
  • ClickbaitTR: Dataset for clickbait detection from Turkish news sites and
           social media with a comparative analysis via machine learning algorithms

    • Free pre-print version: Loading...

      Authors: Şura Genç, Elif Surer
      Abstract: Journal of Information Science, Ahead of Print.
      Clickbait is a strategy that aims to attract people’s attention and direct them to specific content. Clickbait titles, created by the information that is not included in the main content or using intriguing expressions with various text-related features, have become very popular, especially in social media. This study expands the Turkish clickbait dataset that we had constructed for clickbait detection in our proof-of-concept study, written in Turkish. We achieve a 48,060 sample size by adding 8859 tweets and release a publicly available dataset – ClickbaitTR – with its open-source data analysis library. We apply machine learning algorithms such as Artificial Neural Network (ANN), Logistic Regression, Random Forest, Long Short-Term Memory Network (LSTM), Bidirectional Long Short-Term Memory (BiLSTM) and Ensemble Classifier on 48,060 news headlines extracted from Twitter. The results show that the Logistic Regression algorithm has 85% accuracy; the Random Forest algorithm has a performance of 86% accuracy; the LSTM has 93% accuracy; the ANN has 93% accuracy; the Ensemble Classifier has 93% accuracy; and finally, the BiLSTM has 97% accuracy. A thorough discussion is provided for the psychological aspects of clickbait strategy focusing on curiosity and interest arousal. In addition to a successful clickbait detection performance and the detailed analysis of clickbait sentences in terms of language and psychological aspects, this study also contributes to clickbait detection studies with the largest clickbait dataset in Turkish.
      Citation: Journal of Information Science
      PubDate: 2021-04-13T04:26:41Z
      DOI: 10.1177/01655515211007746
       
  • Mass aesthetic changes in the context of the development of world museums

    • Free pre-print version: Loading...

      Authors: Yu He, Zheng Chen
      Abstract: Journal of Information Science, Ahead of Print.
      This article is relevant, as in the process when the world community is experiencing crisis phenomena in the public consciousness and social forms of existence, the change of the museum as an accumulator of works of art and cultural centres acquires historical significance. The novelty of the study is determined by the fact that exhibitions can be held not only online or during the period when museums act as cultural centres. The purpose of the study is to research the aesthetic changes in the context of global art communication through exhibition areas in the world of museums. The leading research method was comparative analysis, thanks to which mass aesthetic changes in the process of changing the global socio-economic environment were studied. The basis for the work of UNESCO as a global repository and management centre in the museum community was shown. The authors note that the formation of museum competence and a change in the aesthetics of mass consciousness on this basis is possible only if the structural content of the coordination of museum art. The authors see the creation of a single-world museum centre as the basis for such a change.
      Citation: Journal of Information Science
      PubDate: 2021-04-13T04:26:01Z
      DOI: 10.1177/01655515211007729
       
  • A guided latent Dirichlet allocation approach to investigate real-time
           latent topics of Twitter data during Hurricane Laura

    • Free pre-print version: Loading...

      Authors: Sulong Zhou, Pengyu Kan, Qunying Huang, Janet Silbernagel
      Abstract: Journal of Information Science, Ahead of Print.
      Natural disasters cause significant damage, casualties and economical losses. Twitter has been used to support prompt disaster response and management because people tend to communicate and spread information on public social media platforms during disaster events. To retrieve real-time situational awareness (SA) information from tweets, the most effective way to mine text is using natural language processing (NLP). Among the advanced NLP models, the supervised approach can classify tweets into different categories to gain insight and leverage useful SA information from social media data. However, high-performing supervised models require domain knowledge to specify categories and involve costly labelling tasks. This research proposes a guided latent Dirichlet allocation (LDA) workflow to investigate temporal latent topics from tweets during a recent disaster event, the 2020 Hurricane Laura. With integration of prior knowledge, a coherence model, LDA topics visualisation and validation from official reports, our guided approach reveals that most tweets contain several latent topics during the 10-day period of Hurricane Laura. This result indicates that state-of-the-art supervised models have not fully utilised tweet information because they only assign each tweet a single label. In contrast, our model can not only identify emerging topics during different disaster events but also provides multilabel references to the classification schema. In addition, our results can help to quickly identify and extract SA information to responders, stakeholders and the general public so that they can adopt timely responsive strategies and wisely allocate resource during Hurricane events.
      Citation: Journal of Information Science
      PubDate: 2021-04-13T04:25:01Z
      DOI: 10.1177/01655515211007724
       
  • A semiautomatic annotation approach for sentiment analysis

    • Free pre-print version: Loading...

      Authors: Rahma Alahmary, Hmood Al-Dossari
      Abstract: Journal of Information Science, Ahead of Print.
      Sentiment analysis (SA) aims to extract users’ opinions automatically from their posts and comments. Almost all prior works have used machine learning algorithms. Recently, SA research has shown promising performance in using the deep learning approach. However, deep learning is greedy and requires large datasets to learn, so it takes more time for data annotation. In this research, we proposed a semiautomatic approach using Naïve Bayes (NB) to annotate a new dataset in order to reduce the human effort and time spent on the annotation process. We created a dataset for the purpose of training and testing the classifier by collecting Saudi dialect tweets. The dataset produced from the semiautomatic model was then used to train and test deep learning classifiers to perform Saudi dialect SA. The accuracy achieved by the NB classifier was 83%. The trained semiautomatic model was used to annotate the new dataset before it was fed into the deep learning classifiers. The three deep learning classifiers tested in this research were convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (Bi-LSTM). Support vector machine (SVM) was used as the baseline for comparison. Overall, the performance of the deep learning classifiers exceeded that of SVM. The results showed that CNN reported the highest performance. On one hand, the performance of Bi-LSTM was higher than that of LSTM and SVM, and, on the other hand, the performance of LSTM was higher than that of SVM. The proposed semiautomatic annotation approach is usable and promising to increase speed and save time and effort in the annotation process.
      Citation: Journal of Information Science
      PubDate: 2021-04-13T04:23:40Z
      DOI: 10.1177/01655515211006594
       
  • Influence of personality traits on users’ viewing behaviour

    • Free pre-print version: Loading...

      Authors: Samer Muthana Sarsam, Hosam Al-Samarraie, Ahmed Ibrahim Alzahrani
      Abstract: Journal of Information Science, Ahead of Print.
      Different views on the role of personal factors in moderating individual viewing behaviour exist. This study examined the impact of personality traits on individual viewing behaviour of facial stimulus. A total of 96 students (46 males and 50 females, age 23–28 years) were participated in this study. The Big-Five personality traits of all the participants together with data related to their eye-movements were collected and analysed. The results showed three groups of users who scored high on the personality traits of neuroticism, agreeableness and conscientiousness. Individuals who scored high in a specific personality trait were more probably to interpret the visual image differently from individuals with other personality traits. To determine the extent to which a specific personality trait is associated with users’ viewing behaviour of visual stimulus, a predictive model was developed and validated. The prediction results showed that 96.73% of the identified personality traits can potentially be predicted by the viewing behaviour of users. The findings of this study can expand the current understanding of human personality and choice behaviour. The study also contributes to the perceptual encoding process of faces and the perceptual mechanism in the holistic face processing theory.
      Citation: Journal of Information Science
      PubDate: 2021-04-13T04:22:41Z
      DOI: 10.1177/0165551521998051
       
  • Infectious epidemics and the research output of nations: A data-driven
           analysis

    • Free pre-print version: Loading...

      Authors: Houcemeddine Turki, Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha, Anastassios Pouris
      Abstract: Journal of Information Science, Ahead of Print.
      During the last years, several infectious diseases have caused widespread nationwide epidemics that affected information seeking behaviours, people mobility, economics and research trends. Examples of these epidemics are 2003 severe acute respiratory syndrome (SARS) epidemic in mainland China and Hong Kong, 2014–2016 Ebola epidemic in Guinea and Sierra Leone, 2015–2016 Zika epidemic in Brazil, Colombia and Puerto Rico and the recent COVID-19 epidemic in China and other countries. In this research article, we investigate the effect of large-scale outbreaks of infectious diseases on the research productivity and landscape of nations through the analysis of the research outputs of main countries affected by SARS, Zika and Ebola epidemics as returned by Web of Science Core Collection. Despite the mobility restrictions and the limitations of work conditions due to the epidemics, we surprisingly found that the research characteristics and productivity of the countries that have excellent or moderate research traditions and communities are not affected by infectious epidemics due to their robust long-term research structures and policy. Similarly, large-scale infectious outbreaks can even boost the research productivity of countries with limited research traditions thanks to international capacity building collaborations provided by organisations and associations from leading research countries.
      Citation: Journal of Information Science
      PubDate: 2021-04-09T08:34:16Z
      DOI: 10.1177/01655515211006605
       
  • Personalised attraction recommendation for enhancing topic diversity and
           accuracy

    • Free pre-print version: Loading...

      Authors: Yuanyuan Lin, Chao Huang, Wei Yao, Yifei Shao
      Abstract: Journal of Information Science, Ahead of Print.
      Attraction recommendation plays an important role in tourism, such as solving information overload problems and recommending proper attractions to users. Currently, most recommendation methods are dedicated to improving the accuracy of recommendations. However, recommendation methods only focusing on accuracy tend to recommend popular items that are often purchased by users, which results in a lack of diversity and low visibility of non-popular items. Hence, many studies have suggested the importance of recommendation diversity and proposed improved methods, but there is room for improvement. First, the definition of diversity for different items requires consideration for domain characteristics. Second, the existing algorithms for improving diversity sacrifice the accuracy of recommendations. Therefore, the article utilises the topic ‘features of attractions’ to define the calculation method of recommendation diversity. We developed a two-stage optimisation model to enhance recommendation diversity while maintaining the accuracy of recommendations. In the first stage, an optimisation model considering topic diversity is proposed to increase recommendation diversity and generate candidate attractions. In the second stage, we propose a minimisation misclassification cost optimisation model to balance recommendation diversity and accuracy. To assess the performance of the proposed method, experiments are conducted with real-world travel data. The results indicate that the proposed two-stage optimisation model can significantly improve the diversity and accuracy of recommendations.
      Citation: Journal of Information Science
      PubDate: 2021-04-09T08:33:17Z
      DOI: 10.1177/0165551521999801
       
  • Evaluating and ranking the digital content generation components for
           marketing the libraries and information centres’ goods and services
           using fuzzy TOPSIS technique

    • Free pre-print version: Loading...

      Authors: Zahra Naseri, Abdolreza Noroozi Chakoli, Mila Malekolkalami
      Abstract: Journal of Information Science, Ahead of Print.
      Since content audiences, including libraries and information centres, are increasingly geared to digital environments and virtual networks, the production and delivery of high-quality digital content are becoming continuously important. So far, several components have been introduced by researchers for evaluating the quality of digital content generation. However, due to the uncertainty of the importance rate and value of each of these components, it has not yet been possible to use them effectively to evaluate the content produced. This study aimed to rank the components of content generation to allow accurate evaluation of them for users as well as content providers and distributors including libraries and marketers. The ranked content can motivate digital content producers and distributors to better evaluate the quality of digital content, better attract customers and make more effective decisions about the quality of digital content use based on their specific goals. Initially, 42 of the most important components were identified from the literature. Then, the next steps were taken to rank these components, and based on three rounds of Delphi interviews, the experts’ views on the importance rate of each of the components were obtained, analysed and ranked. Since in this ranking, the importance of a wide range of components should be highlighted towards each other, the fuzzy TOPSIS technique was emphasised for analysing the views of 16 experts in the field of content generation in Iran. This ranking indicated that components such as ‘findable and access’, ‘non-disturbing and helpful’, ‘clear’ and ‘remarkable’ are the main pillars of content generation and are of the utmost importance. The results can be used as an effective tool to improve the quality of content. Moreover, it increases audience engagement in digital environments and social networks, and encourages them to make more use of the digital content of libraries.
      Citation: Journal of Information Science
      PubDate: 2021-03-30T04:32:15Z
      DOI: 10.1177/0165551521998045
       
  • A discriminative method for global query expansion and term reweighting
           using co-occurrence graphs

    • Free pre-print version: Loading...

      Authors: Billel Aklouche, Ibrahim Bounhas, Yahya Slimani
      Abstract: Journal of Information Science, Ahead of Print.
      This article presents a new query expansion (QE) method aiming to tackle term mismatch in information retrieval (IR). Previous research showed that selecting good expansion terms which do not hurt retrieval effectiveness remains an open and challenging research question. Our method investigates how global statistics of term co-occurrence can be used effectively to enhance expansion term selection and reweighting. Indeed, we build a co-occurrence graph using a context window approach over the entire collection, thus adopting a global QE approach. Then, we employ a semantic similarity measure inspired by the Okapi BM25 model, which allows to evaluate the discriminative power of words and to select relevant expansion terms based on their similarity to the query as a whole. The proposed method includes a reweighting step where selected terms are assigned weights according to their relevance to the query. What’s more, our method does not require matrix factorisation or complex text mining processes. It only requires simple co-occurrence statistics about terms, which reduces complexity and insures scalability. Finally, it has two free parameters that may be tuned to adapt the model to the context of a given collection and control co-occurrence normalisation. Extensive experiments on four standard datasets of English (TREC Robust04 and Washington Post) and French (CLEF2000 and CLEF2003) show that our method improves both retrieval effectiveness and robustness in terms of various evaluation metrics and outperforms competitive state-of-the-art baselines with significantly better results. We also investigate the impact of varying the number of expansion terms on retrieval results.
      Citation: Journal of Information Science
      PubDate: 2021-03-30T04:31:41Z
      DOI: 10.1177/0165551521998047
       
  • A novel focal-loss and class-weight-aware convolutional neural network for
           the classification of in-text citations

    • Free pre-print version: Loading...

      Authors: Naif Radi Aljohani, Ayman Fayoumi, Saeed-Ul Hassan
      Abstract: Journal of Information Science, Ahead of Print.
      We argue that citations, as they have different reasons and functions, should not all be treated in the same way. Using the large, annotated dataset of about 10K citation contexts annotated by human experts, extracted from the Association for Computational Linguistics repository, we present a deep learning–based citation context classification architecture. Unlike all existing state-of-the-art feature-based citation classification models, our proposed convolutional neural network (CNN) with fastText-based pre-trained embedding vectors uses only the citation context as its input to outperform them in both binary- (important and non-important) and multi-class (Use, Extends, CompareOrContrast, Motivation, Background, Other) citation classification tasks. Furthermore, we propose using focal-loss and class-weight functions in the CNN model to overcome the inherited class imbalance issues in citation classification datasets. We show that using the focal-loss function with CNN adds a factor of [math] to the cross-entropy function. Our model improves on the baseline results by achieving an encouraging 90.6 F1 score with 90.7% accuracy and a 72.3 F1 score with a 72.1% accuracy score, respectively, for binary- and multi-class citation classification tasks.
      Citation: Journal of Information Science
      PubDate: 2021-03-25T04:39:27Z
      DOI: 10.1177/0165551521991022
       
  • Searchable Turkish OCRed historical newspaper collection 1928–1942

    • Free pre-print version: Loading...

      Authors: Houssem Menhour, Hasan Basri Şahin, Ramazan Nejdet Sarıkaya, Medine Aktaş, Rümeysa Sağlam, Ekin Ekinci, Süleyman Eken
      Abstract: Journal of Information Science, Ahead of Print.
      The newspaper emerged as a distinct cultural form in early 17th-century Europe. It is bound up with the early modern period of history. Historical newspapers are of utmost importance to nations and its people, and researchers from different disciplines rely on these papers to improve our understanding of the past. In pursuit of satisfying this need, Istanbul University Head Office of Library and Documentation provides access to a big database of scanned historical newspapers. To take it another step further and make the documents more accessible, we need to run optical character recognition (OCR) and named entity recognition (NER) tasks on the whole database and index the results to allow for full-text search mechanism. We design and implement a system encompassing the whole pipeline starting from scrapping the dataset from the original website to providing a graphical user interface to run search queries, and it manages to do that successfully. Proposed system provides to search people, culture and security-related keywords and to visualise them.
      Citation: Journal of Information Science
      PubDate: 2021-03-22T05:12:35Z
      DOI: 10.1177/01655515211000642
       
  • Usability of data-oriented user interfaces for cultural heritage: A
           systematic mapping study

    • Free pre-print version: Loading...

      Authors: Maria de la Paz Diulio, Juan Cruz Gardey, Analía Fernanda Gomez, Alejandra Garrido
      Abstract: Journal of Information Science, Ahead of Print.
      This study surveys the state of the art in usability and user experience strategies applied to applications that deal with large amounts of data in the field of cultural heritage, highlighting the most prominent aspects and underlining the under-explored. In these applications, large amounts of data need to be wisely presented to help final users at drawing conclusions and making decisions. While sophisticated technology may be used to improve the user experience, it should not be applied to the detriment of usability, which is critical for the success of these applications. We performed a systematic mapping study to classify the literature retrieved in the four largest scientific databases by a structured search string. We classify applications according to purpose, intended users, the way they address and evaluate user experience and usability, among others, and include the analysis of combined results through maps. Findings reveal the contradiction that while most articles are intended for the education and tourism of the general public, only half of the studies evaluate usability. Moreover, there is a significant research gap in user interfaces for systems in the context of preventive conservation, for research, assessment and decision assistance. This is the first systematic mapping study combining usability and cultural heritage, especially for data-oriented applications. It shows that more research is necessary to assist conservators and researchers and to address usability from early stages of development.
      Citation: Journal of Information Science
      PubDate: 2021-03-19T08:50:13Z
      DOI: 10.1177/01655515211001787
       
  • Identifying effective cognitive biases in information retrieval

    • Free pre-print version: Loading...

      Authors: Gisoo Gomroki, Hassan Behzadi, Rahmatolloah Fattahi, Javad Salehi Fadardi
      Abstract: Journal of Information Science, Ahead of Print.
      The purpose of this study is to identify the types of cognitive biases in the process of information retrieval. This research used a mixed-method approach for data collection. The research population consisted of 25 information retrieval specialists and 30 post-graduate students. We employed three tools for collecting data, including a checklist, log files and semi-structured interviews. The findings showed that from the perspective of information retrieval specialists, the cognitive biases such as ‘Familiarity’, ‘Anchoring’, ‘Rush to solve’ and ‘Curse of knowledge’ could be of the greatest importance in the field of information retrieval. Also, in terms of users’ searching, the ‘Rush to solve problems’ and ‘Mere exposure effects’ biases have the highest frequency, and the ‘Outcome’ and ‘Curse of knowledge’ biases have the lowest frequency in the process of user retrieval information. It can be concluded that, because cognitive biases occurring in information retrieval, designers of information retrieval systems and librarians should pay attention to this issue in designing and evaluating information systems.
      Citation: Journal of Information Science
      PubDate: 2021-03-17T08:47:16Z
      DOI: 10.1177/01655515211001777
       
  • Using text mining to glean insights from COVID-19 literature

    • Free pre-print version: Loading...

      Authors: Billie S Anderson
      Abstract: Journal of Information Science, Ahead of Print.
      The purpose of this study is to develop a text clustering–based analysis of COVID-19 research articles. Owing to the proliferation of published COVID-19 research articles, researchers need a method for reducing the number of articles they have to search through to find material relevant to their expertise. The study analyzes 83,264 abstracts from research articles related to COVID-19. The textual data are analysed using singular value decomposition (SVD) and the expectation–maximisation (EM) algorithm. Results suggest that text clustering can both reveal hidden research themes in the published literature related to COVID-19, and reduce the number of articles that researchers need to search through to find material relevant to their field of interest.
      Citation: Journal of Information Science
      PubDate: 2021-03-16T08:24:04Z
      DOI: 10.1177/01655515211001661
       
  • The hybridised indexing method for research-based information retrieval

    • Free pre-print version: Loading...

      Authors: Kyle Andrew Fitzgerald, Andre Charles de la Harpe, Corrie Susanna Uys
      Abstract: Journal of Information Science, Ahead of Print.
      An information retrieval system (IRS) is used to retrieve documents based on an information need. The IRS makes relevance judgements by attempting to match a query to a document. As IRS capabilities are indexing design dependent, the hybrid indexing method (IRS-H) is introduced. The objectives of this article are to examine IRS-H (as an alternative indexing method that performs exact phrase matching) and IRS-I, regarding retrieval usefulness, identification of relevant documents, and the quality of rejecting irrelevant documents by conducting three experiments and by analysing the related data. Three experiments took place where a collection of 100 research documents and 75 queries were presented to: (1) five participants answering a questionnaire, (2) IRS-I to generate data and (3) IRS-H to generate data. The data generated during the experiments were statistically analysed using the performance measurements of Precision, Recall and Specificity, and one-tailed Student’s t-tests. The results reveal that IRS-H (1) increased the retrieval of relevant documents, (2) reduced incorrect identification of relevant documents and (3) increased the quality of rejecting irrelevant documents. The research found that the hybrid indexing method, using a small closed document collection of a hundred documents, produced the required outputs and that it may be used as an alternative IRS indexing method.
      Citation: Journal of Information Science
      PubDate: 2021-03-15T08:05:19Z
      DOI: 10.1177/0165551521999800
       
  • Automatic construction of academic profile: A case of information science
           domain

    • Free pre-print version: Loading...

      Authors: Qian Geng, Ziang Chuai, Jian Jin
      Abstract: Journal of Information Science, Ahead of Print.
      To provide junior researchers with domain-specific concepts efficiently, an automatic approach for academic profiling is needed. First, to obtain personal records of a given scholar, typical supervised approaches often utilise structured data like infobox in Wikipedia as training dataset, but it may lead to a severe mis-labelling problem when they are utilised to train a model directly. To address this problem, a new relation embedding method is proposed for fine-grained entity typing, in which the initial vector of entities and a new penalty scheme are considered, based on the semantic distance of entities and relations. Also, to highlight critical concepts relevant to renowned scholars, scholars’ selective bibliographies which contain massive academic terms are analysed by a newly proposed extraction method based on logistic regression, AdaBoost algorithm and learning-to-rank techniques. It bridges the gap that conventional supervised methods only return binary classification results and fail to help researchers understand the relative importance of selected concepts. Categories of experiments on academic profiling and corresponding benchmark datasets demonstrate that proposed approaches outperform existing methods notably. The proposed techniques provide an automatic way for junior researchers to obtain organised knowledge in a specific domain, including scholars’ background information and domain-specific concepts.
      Citation: Journal of Information Science
      PubDate: 2021-03-15T08:03:55Z
      DOI: 10.1177/0165551521998048
       
  • Do online reviews have different effects on consumers’ sampling
           behaviour across product types' Evidence from the software industry

    • Free pre-print version: Loading...

      Authors: Shengli Li, Fan Li, Shiyu Xie
      Abstract: Journal of Information Science, Ahead of Print.
      Previous research shows that online reviews may have different effects for search goods and experience goods. However, as a typical type of experience goods, software can be further divided into different categories based on product characteristics. Little research has been conducted regarding the different effects of online reviews for different types of software. Furthermore, to offer free samples is another common practice of software firms to alleviate consumer uncertainty prior to purchase. To fill the corresponding research gap, this research focuses on the interaction effects between online reviews and free samples for different types of software. Through our empirical analysis, we find that user ratings significantly increase consumers’ sample downloads. Furthermore, consumers download more samples for some categories than for others. Finally, user and editor ratings might have differential effects for different types of software.
      Citation: Journal of Information Science
      PubDate: 2021-03-08T08:46:37Z
      DOI: 10.1177/0165551520965399
       
  • Research on differential and interactive impact of China-led and US-led
           open-access articles

    • Free pre-print version: Loading...

      Authors: Wei Mingkun, Quan Wei, Sadhana Misra, Russell Savage
      Abstract: Journal of Information Science, Ahead of Print.
      With the development of Web 2.0, social media dialogue has been increasingly important within the world of open access (OA), striving for more user-generated content and ease of use. In this article, we analysed the impact of OA articles published by both Chinese and the American researchers using PLOS ONE. Papers published in the same year, using citation and social media metrics, were all used to analyse the correlation between the level of social media metrics and citation. Overall, the impact of OA articles published within the United States is higher than OA articles published in China. The results showed that citations and number of Mendeley readers have a significant correlation, which reflect the similar impact in evaluation of OA articles. However, most social media metrics did not have an obvious correlation with impact evaluation, which indicates the social media metrics are useful when paired with citations, but not irreplaceable to citations. Social media metrics appear to be a useful alternative metrics to accurately reflecting the impact of OA articles within the scientific community.
      Citation: Journal of Information Science
      PubDate: 2021-03-04T06:11:01Z
      DOI: 10.1177/0165551521998637
       
  • Improvements for research data repositories: The case of text spam

    • Free pre-print version: Loading...

      Authors: Ismael Vázquez, María Novo-Lourés, Reyes Pavón, Rosalía Laza, José Ramón Méndez, David Ruano-Ordás
      Abstract: Journal of Information Science, Ahead of Print.
      Current research has evolved in such a way scientists must not only adequately describe the algorithms they introduce and the results of their application, but also ensure the possibility of reproducing the results and comparing them with those obtained through other approximations. In this context, public data sets (sometimes shared through repositories) are one of the most important elements for the development of experimental protocols and test benches. This study has analysed a significant number of CS/ML (Computer Science/Machine Learning) research data repositories and data sets and detected some limitations that hamper their utility. Particularly, we identify and discuss the following demanding functionalities for repositories: (1) building customised data sets for specific research tasks, (2) facilitating the comparison of different techniques using dissimilar pre-processing methods, (3) ensuring the availability of software applications to reproduce the pre-processing steps without using the repository functionalities and (4) providing protection mechanisms for licencing issues and user rights. To show the introduced functionality, we created STRep (Spam Text Repository) web application which implements our recommendations adapted to the field of spam text repositories. In addition, we launched an instance of STRep in the URL https://rdata.4spam.group to facilitate understanding of this study.
      Citation: Journal of Information Science
      PubDate: 2021-03-03T07:52:41Z
      DOI: 10.1177/0165551521998636
       
  • Multi-thread hierarchical deep model for context-aware sentiment analysis

    • Free pre-print version: Loading...

      Authors: Abdalsamad Keramatfar, Hossein Amirkhani, Amir Jalali Bidgoly
      Abstract: Journal of Information Science, Ahead of Print.
      Real-time messaging and opinion sharing in social media websites have made them valuable sources of different kinds of information. This source provides the opportunity for doing different kinds of analysis. Sentiment analysis as one of the most important of these analyses gains increasing interests. However, the research in this field is still facing challenges. The mainstream of the sentiment analysis research on social media websites and microblogs just exploits the textual content of the posts. This makes the analysis hard because microblog posts are short and noisy. However, they have lots of contexts which can be exploited for sentiment analysis. In order to use the context as an auxiliary source, some recent papers use reply/retweet to model the context of the target post. We claim that multiple sequential contexts can be used jointly in a unified model. In this article, we propose a context-aware multi-thread hierarchical long short-term memory (MHLSTM) that jointly models different kinds of contexts, such as tweep, hashtag and reply besides the content of the target post. Experimental evaluations on a real-world Twitter data set demonstrate that our proposed model can outperform some strong baseline models by 28.39% in terms of relative error reduction.
      Citation: Journal of Information Science
      PubDate: 2021-02-16T06:45:12Z
      DOI: 10.1177/0165551521990617
       
  • Delphi study of risk to individuals who disclose personal information
           online

    • Free pre-print version: Loading...

      Authors: David Haynes, Lyn Robinson
      Abstract: Journal of Information Science, Ahead of Print.
      A two-round Delphi study was conducted to explore priorities for addressing online risk to individuals. A corpus of literature was created based on 69 peer-reviewed articles about privacy risk and the privacy calculus published between 2014 and 2019. A cluster analysis of the resulting text-base using Pearson’s correlation coefficient resulted in seven broad topics. After two rounds of the Delphi survey with experts in information security and information literacy, the following topics were identified as priorities for further investigation: personalisation versus privacy, responsibility for privacy on social networks, measuring privacy risk, and perceptions of powerlessness and the resulting apathy. The Delphi approach provided clear conclusions about research topics and has potential as a tool for prioritising future research areas.
      Citation: Journal of Information Science
      PubDate: 2021-02-16T04:46:58Z
      DOI: 10.1177/0165551521992756
       
  • Unsupervised extractive multi-document summarization method based on
           transfer learning from BERT multi-task fine-tuning

    • Free pre-print version: Loading...

      Authors: Salima Lamsiyah, Abdelkader El Mahdaouy, Saïd El Alaoui Ouatik, Bernard Espinasse
      Abstract: Journal of Information Science, Ahead of Print.
      Text representation is a fundamental cornerstone that impacts the effectiveness of several text summarization methods. Transfer learning using pre-trained word embedding models has shown promising results. However, most of these representations do not consider the order and the semantic relationships between words in a sentence, and thus they do not carry the meaning of a full sentence. To overcome this issue, the current study proposes an unsupervised method for extractive multi-document summarization based on transfer learning from BERT sentence embedding model. Moreover, to improve sentence representation learning, we fine-tune BERT model on supervised intermediate tasks from GLUE benchmark datasets using single-task and multi-task fine-tuning methods. Experiments are performed on the standard DUC’2002–2004 datasets. The obtained results show that our method has significantly outperformed several baseline methods and achieves a comparable and sometimes better performance than the recent state-of-the-art deep learning–based methods. Furthermore, the results show that fine-tuning BERT using multi-task learning has considerably improved the performance.
      Citation: Journal of Information Science
      PubDate: 2021-02-16T04:45:59Z
      DOI: 10.1177/0165551521990616
       
  • Not just for the money' An examination of the motives behind
           physicians’ sharing of paid health information

    • Free pre-print version: Loading...

      Authors: Yulin Yang, Xuekun Zhu, Ruidi Song, Xiaofei Zhang, Feng Guo
      Abstract: Journal of Information Science, Ahead of Print.
      Online platforms make it possible for physicians to share online information with the public, however, few studies have explored the underlying mechanism of physicians’ sharing of paid health information. Drawing on motivation theory, this study developed a theoretical framework to explore the effects of extrinsic motivation, enjoyment, and professional motivation on the sharing of paid information, as well as the contingent role of income ratio (online to offline) and online reputation. The model was tested with both objective and subjective data, which contain responses from 298 physicians. The results show that extrinsic motivation, enjoyment, and professional motivation play significant roles in inducing physicians to share paid information. Furthermore, income ratio can moderate the effects of motives on paid information sharing. Besides, the effect of professional motivation can be more effective in certain situations (low-level income ratio or high online reputation). This study contributes to the literature on knowledge sharing, online health behaviour, and motivation theory, and provides implications for practitioners.
      Citation: Journal of Information Science
      PubDate: 2021-02-15T06:30:02Z
      DOI: 10.1177/0165551521991029
       
  • Important citations identification by exploiting generative model into
           discriminative model

    • Free pre-print version: Loading...

      Authors: Xin An, Xin Sun, Shuo Xu, Liyuan Hao, Jinghong Li
      Abstract: Journal of Information Science, Ahead of Print.
      Although the citations between scientific documents are deemed as a vehicle for dissemination, inheritance and development of scientific knowledge, not all citations are well-positioned to be equal. A plethora of taxonomies and machine-learning models have been implemented to tackle the task of citation function and importance classification from qualitative aspect. Inspired by the success of kernel functions from resulting general models to promote the performance of the support vector machine (SVM) model, this work exploits the potential of combining generative and discriminative models for the task of citation importance classification. In more detail, generative features are generated from a topic model, citation influence model (CIM) and then fed to two discriminative traditional machine-learning models, SVM and RF (random forest), and a deep learning model, convolutional neural network (CNN), with other 13 traditional features to identify important citations. The extensive experiments are performed on two data sets with different characteristics. These three models perform better on the data set from one discipline. It is very possible that the patterns for important citations may vary by the fields, which disable machine-learning models to learn effectively the discriminative patterns from publications from multiple domains. The RF classifier outperforms the SVM classifier, which accords with many prior studies. However, the CNN model does not achieve the desired performance due to small-scaled data set. Furthermore, our CIM model–based features improve further the performance for identifying important citations.
      Citation: Journal of Information Science
      PubDate: 2021-02-08T06:28:00Z
      DOI: 10.1177/0165551521991034
       
  • Informational features of WhatsApp in everyday life in Madrid: An
           exploratory study

    • Free pre-print version: Loading...

      Authors: Juan-Antonio Martínez-Comeche, Ian Ruthven
      Abstract: Journal of Information Science, Ahead of Print.
      WhatsApp is one of the most used social media tools, but little is known about its use for everyday purposes. In this study, the informational features of WhatsApp in everyday life in Madrid are analysed through 30 semi-structured interviews, resulting in an informational typology of the messages, a description of the informational purposes of WhatsApp use and descriptions of the social use of WhatsApp. We conclude that WhatsApp allows us to deepen our understanding of the informational habits of people in everyday life.
      Citation: Journal of Information Science
      PubDate: 2021-02-08T06:22:40Z
      DOI: 10.1177/0165551521990612
       
  • A generic metamodel for data extraction and generic ontology population

    • Free pre-print version: Loading...

      Authors: Yohann Chasseray, Anne-Marie Barthe-Delanoë, Stéphane Négny, Jean-Marc Le Lann
      Abstract: Journal of Information Science, Ahead of Print.
      As the next step in the development of intelligent computing systems is the addition of human expertise and knowledge, it is a priority to build strong computable and well-documented knowledge bases. Ontologies partially respond to this challenge by providing formalisms for knowledge representation. However, one major remaining task is the population of these ontologies with concrete application. Based on Model-Driven Engineering principles, a generic metamodel for the extraction of heterogeneous data is presented in this article. The metamodel has been designed with two objectives, namely (1) the need of genericity regarding the source of collected pieces of knowledge and (2) the intent to stick to a structure close to an ontological structure. As well, an example of instantiation of the metamodel for textual data in chemistry domain and an insight of how this metamodel could be integrated in a larger automated domain independent ontology population framework are given.
      Citation: Journal of Information Science
      PubDate: 2021-02-04T05:14:11Z
      DOI: 10.1177/0165551521989641
       
  • Detection of conspiracy propagators using psycho-linguistic
           characteristics

    • Free pre-print version: Loading...

      Authors: Anastasia Giachanou, Bilal Ghanem, Paolo Rosso
      Abstract: Journal of Information Science, Ahead of Print.
      The rise of social media has offered a fast and easy way for the propagation of conspiracy theories and other types of disinformation. Despite the research attention that has received, fake news detection remains an open problem and users keep sharing articles that contain false statements but which they consider real. In this article, we focus on the role of users in the propagation of conspiracy theories that is a specific type of disinformation. First, we compare profile and psycho-linguistic patterns of online users that tend to propagate posts that support conspiracy theories and of those who propagate posts that refute them. To this end, we perform a comparative analysis over various profile, psychological and linguistic characteristics using social media texts of users that share posts about conspiracy theories. Then, we compare the effectiveness of those characteristics for predicting whether a user is a conspiracy propagator or not. In addition, we propose ConspiDetector, a model that is based on a convolutional neural network (CNN) and which combines word embeddings with psycho-linguistic characteristics extracted from the tweets of users to detect conspiracy propagators. The results show that ConspiDetector can improve the performance in detecting conspiracy propagators by 8.82% compared with the CNN baseline with regard to F1-metric.
      Citation: Journal of Information Science
      PubDate: 2021-01-28T06:31:58Z
      DOI: 10.1177/0165551520985486
       
  • Impact of COVID-19 on search in an organisation

    • Free pre-print version: Loading...

      Authors: Paul H Cleverley, Fionnuala Cousins, Simon Burnett
      Abstract: Journal of Information Science, Ahead of Print.
      COVID-19 has created unprecedented organisational challenges, yet no study has examined the impact on information search. A case study in a knowledge-intensive organisation was undertaken on 2.5 million search queries during the pandemic. A surge of unique users and COVID-19 search queries in March 2020 may equate to ‘peak uncertainty and activity’, demonstrating the importance of corporate search engines in times of crisis. Search volumes dropped 24% after lockdowns; an ‘L-shaped’ recovery may be a surrogate for business activity. COVID-19 search queries transitioned from awareness, to impact, strategy, response and ways of working that may influence future search design. Low click through rates imply some information needs were not met and searches on mental health increased. In extreme situations (i.e. a pandemic), companies may need to move faster, monitoring and exploiting their enterprise search logs in real time as these reflect uncertainty and anxiety that may exist in the enterprise.
      Citation: Journal of Information Science
      PubDate: 2021-01-22T07:14:34Z
      DOI: 10.1177/0165551521989531
       
  • A study of Turkish emotion classification with pretrained language models

    • Free pre-print version: Loading...

      Authors: Alaettin Uçan, Murat Dörterler, Ebru Akçapınar Sezer
      Abstract: Journal of Information Science, Ahead of Print.
      Emotion classification is a research field that aims to detect the emotions in a text using machine learning methods. In traditional machine learning (TML) methods, feature engineering processes cause the loss of some meaningful information, and classification performance is negatively affected. In addition, the success of modelling using deep learning (DL) approaches depends on the sample size. More samples are needed for Turkish due to the unique characteristics of the language. However, emotion classification data sets in Turkish are quite limited. In this study, the pretrained language model approach was used to create a stronger emotion classification model for Turkish. Well-known pretrained language models were fine-tuned for this purpose. The performances of these fine-tuned models for Turkish emotion classification were comprehensively compared with the performances of TML and DL methods in experimental studies. The proposed approach provides state-of-the-art performance for Turkish emotion classification.
      Citation: Journal of Information Science
      PubDate: 2021-01-13T05:12:31Z
      DOI: 10.1177/0165551520985507
       
  • Embodying algorithms, enactive artificial intelligence and the extended
           cognition: You can see as much as you know about algorithm

    • Free pre-print version: Loading...

      Authors: Donghee Shin
      Abstract: Journal of Information Science, Ahead of Print.
      The recent proliferation of artificial intelligence (AI) gives rise to questions on how users interact with AI services and how algorithms embody the values of users. Despite the surging popularity of AI, how users evaluate algorithms, how people perceive algorithmic decisions, and how they relate to algorithmic functions remain largely unexplored. Invoking the idea of embodied cognition, we characterize core constructs of algorithms that drive the value of embodiment and conceptualizes these factors in reference to trust by examining how they influence the user experience of personalized recommendation algorithms. The findings elucidate the embodied cognitive processes involved in reasoning algorithmic characteristics – fairness, accountability, transparency, and explainability – with regard to their fundamental linkages with trust and ensuing behaviors. Users use a dual-process model, whereby a sense of trust built on a combination of normative values and performance-related qualities of algorithms. Embodied algorithmic characteristics are significantly linked to trust and performance expectancy. Heuristic and systematic processes through embodied cognition provide a concise guide to its conceptualization of AI experiences and interaction. The identified user cognitive processes provide information on a user’s cognitive functioning and patterns of behavior as well as a basis for subsequent metacognitive processes.
      Citation: Journal of Information Science
      PubDate: 2021-01-13T05:08:51Z
      DOI: 10.1177/0165551520985495
       
  • Proposing an information value chain to improve information services to
           disabled library patrons using assistive technologies

    • Free pre-print version: Loading...

      Authors: Devendra Potnis, Kevin Mallary
      Abstract: Journal of Information Science, Ahead of Print.
      Information services offered by academic libraries increasingly rely on assistive technologies (AT) to facilitate disabled patrons’ retrieval and use of information for learning and teaching. However, disabled patrons’ access to AT might not always lead to their use, resulting in the underutilization of information services offered by academic libraries. We adopt an inward-looking, service innovation perspective to improve information services for disabled patrons using AT. The open coding of qualitative responses collected from administrators and librarians in 186 academic libraries in public universities in the United States, reveals 10 mechanisms (i.e. modified work practices), which involve searching, compiling, mixing, framing, sharing, or reusing information, and learning from it. Based on this information-centric reorganisation of work practices, we propose an ‘information value chain’, like Porter’s value chain, for improving information services to disabled patrons using AT in academic libraries, which is the major theoretical contribution of our study.
      Citation: Journal of Information Science
      PubDate: 2021-01-13T05:06:12Z
      DOI: 10.1177/0165551520984719
       
  • An exploratory study of the all-author bibliographic coupling analysis:
           Taking scientometrics for example

    • Free pre-print version: Loading...

      Authors: Song Yanhui, Wu Lijuan, Chen Shiji
      Abstract: Journal of Information Science, Ahead of Print.
      All-author bibliographic coupling analyses (AABCA) take all authors of the article into account when constructing author coupling relationships. Taking scientometrics as an example, this article takes the papers from 2010 to 2019 as data sample and divides them into two periods (limited to 5 years) to discuss the performance of AABCA in discovering potential academic communities and intellectual structure of this discipline. It is found that when all authors of the paper are considered, the relationship between the bibliographic coupling authors presents a certain regularity and the bibliographic coupling is likely to be passed between different pairs of authors. With the transitivity of the coupling relationship, AABCA can effectively identify and discover the potential academic groups of this discipline, and more fully reflect the degree of cooperation among authors. AABCA is an effective method to reveal the intellectual structure in the field of scientometrics, and it is easier to find some small research topics with weak correlation. In addition, AABCA is also an ideal way to explore the author’s research interests over time.
      Citation: Journal of Information Science
      PubDate: 2021-01-04T04:29:08Z
      DOI: 10.1177/0165551520981293
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 3.235.179.79
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-