for Journals by Title or ISSN
for Articles by Keywords
Followed Journals
Journal you Follow: 0
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Journals are published.
Already have an account? Sign In to see the journals you follow.
Journal Cover Journal of Information Science
  [SJR: 1.008]   [H-I: 40]   [854 followers]  Follow
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 0165-5515 - ISSN (Online) 1741-6485
   Published by Sage Publications Homepage  [838 journals]
  • Current state of Linked Data in digital libraries
    • Authors: Hallo, M; Lujan-Mora, S, Mate, A, Trujillo, J.
      Pages: 117 - 127
      Abstract: The Semantic Web encourages institutions, including libraries, to collect, link and share their data across the Web in order to ease its processing by machines to get better queries and results. Linked Data technologies enable us to connect related data on the Web using the principles outlined by Tim Berners-Lee in 2006. Digital libraries have great potential to exchange and disseminate data linked to external resources using Linked Data. In this paper, a study about the current uses of Linked Data in digital libraries, including the most important implementations around the world, is presented. The study focuses on selected vocabularies and ontologies, benefits and problems encountered in implementing Linked Data in digital libraries. In addition, it also identifies and discusses specific challenges that digital libraries face, offering suggestions for ways in which libraries can contribute to the Semantic Web. The study uses an adapted methodology for literature review, to find data available to answer research questions. It is based on the information found in the library websites recommended by W3C Library Linked Data Incubator Group in 2011, and scientific publications from Google Scholar, Scopus, ACM and Springer from the last 5 years. The selected libraries for the study are the National Library of France, the Europeana Library, the Library of Congress of the USA, the British Library and the National Library of Spain. In this paper, we outline the best practices found in each experience and identify gaps and future trends.
      PubDate: 2016-02-25T04:13:08-08:00
      DOI: 10.1177/0165551515594729
      Issue No: Vol. 42, No. 2 (2016)
  • Common neighbour similarity-based approach to support intimacy measurement
           in social networks
    • Authors: Seol, K; Kim, J.-D, Baik, D.-K.
      Pages: 128 - 137
      Abstract: A large amount of social data is being generated every day, as the Internet becomes more pervasive and mobile devices more ubiquitous. Accordingly, Internet users often experience difficulty finding the content they want, resulting in the popularity of personalized services that provide user-customized content. Intimacy between users of social network services can be utilized as a foundational technology for personalized services. In this paper, an intimacy measurement method for social networking services based on common neighbour similarity is proposed. The proposed method uses the link relationship between users for intimacy measurements and can be applied to general users. Further, it promotes easy data collection using publicly available data. To evaluate the proposed intimacy measurement method experimentally, a significant amount of user data was collected from Twitter. In addition, various statistical datasets were presented, and regression analyses conducted on graphs extracted from user data were collected to interpret the meaning of the intimacy index measured using the proposed method with existing social networking services.
      PubDate: 2016-02-25T04:13:08-08:00
      DOI: 10.1177/0165551515589230
      Issue No: Vol. 42, No. 2 (2016)
  • An accurate toponym-matching measure based on approximate string matching
    • Pages: 138 - 149
      Abstract: Approximate string matching (ASM) is a challenging problem, which aims to match different string expressions representing the same object. In this paper, detailed experimental studies were conducted on the subject of toponym matching, which is a new domain where ASM can be performed, and the creation of a single string-matching measure that can perform toponym matching process regardless of the language was attempted. For this purpose, an ASM measure called DAS, which comprises name similarity, word similarity and sentence similarity phases, was created. Considering the experimental results, the retrieval performance and system accuracy of DAS were much better than those of other well-known five measures that were compared on toponym test datasets. In addition, DAS had the best metric values of mean average precision in six languages, and precision/recall graphs confirm this result.
      PubDate: 2016-02-25T04:13:08-08:00
      DOI: 10.1177/0165551515590097
      Issue No: Vol. 42, No. 2 (2016)
  • Classifier and feature set ensembles for web page classification
    • Authors: Onan; A.
      Pages: 150 - 165
      Abstract: Web page classification is an important research direction on web mining. The abundant amount of data available on the web makes it essential to develop efficient and robust models for web mining tasks. Web page classification is the process of assigning a web page to a particular predefined category based on labelled data. It serves for several other web mining tasks, such as focused web crawling, web link analysis and contextual advertising. Machine learning and data mining methods have been successfully applied for several web mining tasks, including web page classification. Multiple classifier systems are a promising research direction in machine learning, which aims to combine several classifiers by differentiating base classifiers and/or dataset distributions so that more robust classification models can be built. This paper presents a comparative analysis of four different feature selections (correlation, consistency, information gain and chi-square-based feature selection) and four different ensemble learning methods (Boosting, Bagging, Dagging and Random Subspace) based on four different base learners (naive Bayes, K-nearest neighbour algorithm, C4.5 algorithm and FURIA algorithm). The article examines the predictive performance of ensemble methods for web page classification. The experimental results indicate that feature selection and ensemble learning can enhance the predictive performance of classifiers in web page classification. For the DMOZ-50 dataset, the highest average predictive performance (88.1%) is obtained with the combination of consistency-based feature selection with AdaBoost and naive Bayes algorithms, which is a promising result for web page classification. Experimental results indicate that Bagging and Random Subspace ensemble methods and correlation-based and consistency-based feature selection methods obtain better results in terms of accuracy rates.
      PubDate: 2016-02-25T04:13:08-08:00
      DOI: 10.1177/0165551515591724
      Issue No: Vol. 42, No. 2 (2016)
  • Semantic community detection using label propagation algorithm
    • Authors: Kianian, S; Khayyambashi, M. R, Movahhedinia, N.
      Pages: 166 - 178
      Abstract: The issue of detecting large communities in online social networks is the subject of a wide range of studies in order to explore the network sub-structure. Most of the existing studies are concerned with network topology with no emphasis on active communities among the large online social networks and social portals, which are not based on network topology like forums. Here, new semantic community detection is proposed by focusing on user attributes instead of network topology. In the proposed approach, a network of user activities is established and weighted through semantic data. Furthermore, consistent extended label propagation algorithm is presented. Doing so, semantic representations of active communities are refined and labelled with user-generated tags that are available in web.2. The results show that the proposed semantic algorithm is able to significantly improve the modularity compared with three previously proposed algorithms.
      PubDate: 2016-02-25T04:13:08-08:00
      DOI: 10.1177/0165551515592599
      Issue No: Vol. 42, No. 2 (2016)
  • DSont: DSpace to ontology transformation
    • Authors: Farid, H; Khan, S, Javed, M. Y.
      Pages: 179 - 199
      Abstract: Semantic web facilitates the effective sharing and reuse of existing information. Institutional repositories (IRs) are built to organize and manage the intellectual output of an institute. They generally use relational databases for maintaining metadata of digital documents. The focus of this research is to share the information of an existing IR with other information systems for discovering common interests. To process the data in semantic context, a relational database needs to be transformed into an ontology. The existing relation to ontology transformation systems produces odd results if they are applied on an IR database because its schema is meta-schema. The proposed system first creates an intermediate database, having a normalized schema for the data model of an institute preserved in an IR database and then transforms it into an ontology. Finally semantic correspondence is established between entities of source and target ontologies in order to integrate them. The system has been implemented and evaluated for its correct and lossless transformation. The results demonstrate that the transformation is correct and the information is preserved.
      PubDate: 2016-02-25T04:13:08-08:00
      DOI: 10.1177/0165551515591406
      Issue No: Vol. 42, No. 2 (2016)
  • Email thread identification using latent Dirichlet allocation and
           non-negative matrix factorization based clustering techniques
    • Authors: Sharaff, A; Nagwani, N. K.
      Pages: 200 - 212
      Abstract: Emails are the most popular and effective way of communicating over the internet. A number of applications are available today for computers and mobile devices for email messaging. Email messaging is constantly getting more popular and, as a result, numbers of sent and received emails are also increasing. It is very difficult for a user to remember emails and relate newer incoming emails to previous communications made on similar topics. Email threads provide a mechanism using which a user can obtain sequences of emails for a particular set of communication in a time frame and provides a number of benefits to users. In this work two email thread identification algorithms based on a nested textual clustering approach are presented. The work is planned in two stages; in the first stage two popular text clustering approaches, latent Dirichlet allocation and non-negative matrix factorization, are applied over the email messages to form the email clusters. Then in the second stage, clustering is again performed over the created email clusters to identify the email threads using threading features. Performance parameters like accuracy, precision, recall and F-measure are evaluated for the presented thread identification algorithms.
      PubDate: 2016-02-25T04:13:08-08:00
      DOI: 10.1177/0165551515587854
      Issue No: Vol. 42, No. 2 (2016)
  • Combining resources to improve unsupervised sentiment analysis at
    • Authors: Jimenez-Zafra, S. M; Martin-Valdivia, M. T, Martinez-Camara, E, Urena-Lopez, L. A.
      Pages: 213 - 229
      Abstract: Every day more companies are interested in users’ opinions about their products or services. Also, every day there are more users that search for reviews on the web before purchasing a product. These users and companies are not satisfied with knowing the overall sentiment of a product, they want a finer knowledge of users’ opinions. Owing to this fact, more and more researchers are working on sentiment analysis at aspect-level. This paper describes an unsupervised approach for aspect-based sentiment analysis, which aims to identify the aspects of given target entities and the sentiment expressed for each aspect. We have evaluated several tasks, although perhaps the major novelty is in the classification of the aspects. We employ a lexicon-based method combining different linguistic resources and we conclude that the combination of several classifiers improves the classification significantly. In addition, a comparison with a supervised system is performed in order to determine the strengths and weakness of each of them.
      PubDate: 2016-02-25T04:13:08-08:00
      DOI: 10.1177/0165551515593686
      Issue No: Vol. 42, No. 2 (2016)
  • Norms of data sharing in biological sciences: The roles of metadata, data
           repository, and journal and funding requirements
    • Authors: Kim, Y; Burns, C. S.
      Pages: 230 - 245
      Abstract: Institutional environments, comprising regulative pressures by funding agencies and journal publishers, and institutional resources, including the availabilities of data repositories and standards for metadata, function as important determinants in scientists’ data-sharing norms, attitudes and behaviours. This research investigates how these functions influence biological scientists’ data-sharing norms and how the data-sharing norms influence their data-sharing behaviours mediated by attitudes towards data sharing. The research model was developed based on the integration of institutional theory and theory of planned behaviour. The proposed research model was validated based on a total of 608 responses from a national survey conducted in the USA. The Partial Least Squares (PLS) was employed to analyse the survey data. Results show how institutional pressures by funding agencies and journals and the availabilities of data repository and metadata standards all have significant influences on data-sharing norms, which have significant influences on data-sharing behaviours, as mediated by attitudes towards data sharing.
      PubDate: 2016-02-25T04:13:08-08:00
      DOI: 10.1177/0165551515592098
      Issue No: Vol. 42, No. 2 (2016)
  • Semantically enhanced pseudo relevance feedback for Arabic information
    • Authors: Atwan, J; Mohd, M, Rashaideh, H, Kanaan, G.
      Pages: 246 - 260
      Abstract: The conventional information retrieval (IR) framework consists of four primary phases, namely, pre-processing, indexing, querying and retrieving results. Some phases of the current Arabic IR (AIR) framework have several drawbacks. This research aims to enhance an AIR by improving the processes in a conventional IR framework. We introduce an enhanced stop-word list in the pre-processing level and investigate several Arabic stemmers. In addition, an Arabic WordNet was utilized in the corpus and query expansion levels. We also adopted semantic information for the Pseudo Relevance Feedback. The enhanced Arabic IR framework was built and evaluated using TREC 2001 data. The technique of using the Arabic WordNet to build a semantic relationship between query and corpus in two levels, that is, the corpus and query levels, is a new one. The enhanced AIR framework demonstrated an improvement by 49% in terms of mean average precision, with an increase of 7.3% in recall compared with the baseline framework.
      PubDate: 2016-02-25T04:13:08-08:00
      DOI: 10.1177/0165551515594722
      Issue No: Vol. 42, No. 2 (2016)
  • WaPUPS: Web access pattern extraction under user-defined pattern scoring
    • Authors: Alkan, O. K; Karagoz, P.
      Pages: 261 - 273
      Abstract: Extracting patterns from web usage data helps to facilitate better web personalization and web structure readjustment. The classical frequency-based sequence mining techniques consider only the binary occurrences of web pages in sessions that result in the extraction of many patterns that are not informative for users. To handle this problem, utility-based mining technique has emerged, which assigns non-binary values, called utilities, to web pages and calculates pattern utilities accordingly. However, the utility of a pattern cannot always be determined from distinct web page utilities. For instance, the number of distinct users that traverse an extracted pattern or some demographic data about those users may affect the value of the extracted patterns. However, such information cannot be calculated directly from web page utilities. In this paper, we present a new approach based on a user-defined scoring mechanism so as to extract patterns from web log data. The proposed approach can limit the size of the search space; therefore it has the ability to extract patterns even for large and sparse datasets. The framework is hybrid in the sense that it combines clustering with a heuristic-based pattern extraction algorithm. Substantial experiments on real datasets show that the proposed solution effectively discovers patterns under user-defined evaluation.
      PubDate: 2016-02-25T04:13:08-08:00
      DOI: 10.1177/0165551515593495
      Issue No: Vol. 42, No. 2 (2016)
  • A novel algorithm for scalable k-nearest neighbour graph construction
    • Authors: Park, Y; Hwang, H, Lee, S.-g.
      Pages: 274 - 288
      Abstract: Finding the k-nearest neighbours of every node in a dataset is one of the most important data operations with wide application in various areas such as recommendation and information retrieval. However, a major challenge is that the execution time of existing approaches grows rapidly as the number of nodes or dimensions increases. In this paper, we present greedy filtering, an efficient and scalable algorithm for finding an approximate k-nearest neighbour graph. It selects a fixed number of nodes as candidates for every node by filtering out node pairs that do not have any matching dimensions with large values. Greedy filtering achieves consistent approximation accuracy across nodes in linear execution time. We also present a faster version of greedy filtering that uses inverted indices on the node prefixes. Through theoretical analysis, we show that greedy filtering is effective for datasets whose features have Zipfian distribution, a characteristic observed in majority of large datasets. We also conduct extensive comparative experiments against (a) three state-of-the-art algorithms, and (b) three algorithms in related research domains. Our experimental results show that greedy filtering consistently outperforms other algorithms in various types of high-dimensional datasets.
      PubDate: 2016-02-25T04:13:08-08:00
      DOI: 10.1177/0165551515594728
      Issue No: Vol. 42, No. 2 (2016)
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2015