for Journals by Title or ISSN
for Articles by Keywords
Followed Journals
Journal you Follow: 0
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Jurnals are published.
Already have an account? Sign In to see the journals you follow.
Journal Cover Journal of Information Science
   [712 followers]  Follow    
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
     ISSN (Print) 0165-5515 - ISSN (Online) 1741-6485
     Published by Sage Publications Homepage  [738 journals]   [SJR: 1.199]   [H-I: 35]
  • Extracting term units and fact units from existing databases using the
           Knowledge Discovery Metamodel
    • Authors: Normantas, K; Vasilecas, O.
      Pages: 413 - 425
      Abstract: The extraction of business vocabulary is one of the main tasks in discovering business knowledge implemented in a software system. In this paper we present a model-driven approach to the extraction of business vocabularies from databases of existing software systems. We describe a transformation framework for obtaining the Knowledge Discovery Metamodel based representation of data structure and define an algorithm for the extraction of candidates for business vocabulary entries (i.e. Term and Fact Units) from the representation. The extracted candidates may be further refined by business analysts and used for the identification of business scenarios and rules in software systems.
      PubDate: 2014-07-09T04:21:45-07:00
      DOI: 10.1177/0165551514526336|hwp:master-id:spjis;0165551514526336
      Issue No: Vol. 40, No. 4 (2014)
  • Automatic image annotation using affective vocabularies: Attribute-based
           learning approach
    • Authors: Jeong, J.-W; Lee, D.-H.
      Pages: 426 - 445
      Abstract: To improve image search results, understanding and exploiting the subjective aspects of an image is critical. However, how to effectively extract these subjective aspects (e.g. feeling, emotion, and so on) from an image is a challenging problem. In this paper, we propose a novel approach for predicting affective aspects, one of the most interesting subjective aspects, of concepts in images by learning the semantic attributes of the concept and mining the association between the attributes and affective aspects. The main idea of the proposed approach comes from the assumption that semantic attributes of a concept will influence the user’s affect towards the concept (e.g. an animal with the semantic attributes ‘small’, ‘furry’, ‘white’ can be associated with the affective term ‘cute’). Based on this assumption, we build a multi-layer affect learning framework that consists of (1) an attribute learning layer that predicts semantic attributes of a concept and (2) an affect learning layer that exploits the outputs from the attribute learning layer for predicting the affective aspects of the concept. Through the experimental results on the Animals with Attributes dataset, we show that the proposed approach outperforms traditional approaches by up to 25% in terms of precision and successfully predicts the affect of concepts in images according to different user preferences.
      PubDate: 2014-07-09T04:21:45-07:00
      DOI: 10.1177/0165551513501267|hwp:master-id:spjis;0165551513501267
      Issue No: Vol. 40, No. 4 (2014)
  • Refining Kea++ automatic keyphrase assignment
    • Authors: Irfan, R; Khan, S, Qamar, A. M, Bloodsworth, P. C.
      Pages: 446 - 459
      Abstract: Keyphrases facilitate finding the right information in digital sources. Keyphrase assignment is the alignment of documents or text with keyphrases of any standard taxonomy/classification system. Kea++ is an automatic keyphrase assignment tool using a machine learning-based technique. However, it does not effectively exploit the hierarchical relations that exist in its input taxonomy and returns noise in its results. The refinement methodology was designed as a top layer of Kea++ in order to fine tune its results. It was an initial step and focused on a single Computing domain. It was neither validated on multiple domains nor evaluated to determine whether the improvement in the results is significant or not. The aim of this task was to solidify the refinement methodology. The main contributions of this work are (a) to extend the methodology for multiple domains and (b) to statistically verify that the improvement in the Kea++ results is significant.
      PubDate: 2014-07-09T04:21:45-07:00
      DOI: 10.1177/0165551514529054|hwp:master-id:spjis;0165551514529054
      Issue No: Vol. 40, No. 4 (2014)
  • Evaluating collaborative information seeking - synthesis, suggestions, and
    • Authors: Shah; C.
      Pages: 460 - 475
      Abstract: Evaluating the performance of collaborative information seeking (CIS) systems and users can be challenging, often more so than individual information-seeking environments. This can be attributed to the complex and dynamic interactions that take place among various users and systems processes in a CIS environment. While some of the aspects of a CIS system or user could be measured by typical assessment techniques from single-user information retrieval/seeking (IR/IS), one often needs to go beyond them to provide a meaningful evaluation, helping to provide not only a sense of performance, but also insights into design decisions (regarding systems) and behavioural trends (regarding users). This article first provides an overview of existing methods and techniques for evaluating CIS (synthesis). It then extracts valuable directives and advice from the literature that inform evaluation choices (suggestions). Finally, the article presents a framework for CIS evaluation with two major parts: system-based and user-based (structure). The proposed framework incorporates various instruments taken from computer and social sciences literature as applicable to CIS evaluations. The lessons from the literature and the framework could serve as important starting points for designing experiments and systems, as well as evaluating system and user performances in CIS and related research areas.
      PubDate: 2014-07-09T04:21:45-07:00
      DOI: 10.1177/0165551514530651|hwp:master-id:spjis;0165551514530651
      Issue No: Vol. 40, No. 4 (2014)
  • Automatic identification of light stop words for Persian information
           retrieval systems
    • Authors: Sadeghi, M; Vegas, J.
      Pages: 476 - 487
      Abstract: Stop word identification is one of the most important tasks for many text processing applications such as information retrieval. Stop words occur too frequently in documents in a collection and do not contribute significantly to determining the context or information about the documents. These words are worthless as index terms and should be removed during indexing as well as before querying by an information retrieval system. In this paper, we propose an automatic aggregated methodology based on term frequency, normalized inverse document frequency and information model to extract the light stop words from Persian text. We define a ‘light stop word’ as a stop word that has few letters and is not a compound word. In the Persian language, a complete stop word list can be derived by combining the light stop words. The evaluation results, using a standard corpus, show a good percentage of coincidence between the Persian and English stop words and a significant improvement in the number of index terms. Specifically, the first 32 Persian light stop words have a great impact on the index size reduction and the set of stop words can reduce the number of index terms by about 27%.
      PubDate: 2014-07-09T04:21:45-07:00
      DOI: 10.1177/0165551514530655|hwp:master-id:spjis;0165551514530655
      Issue No: Vol. 40, No. 4 (2014)
  • Accurate keyphrase extraction by discriminating overlapping phrases
    • Authors: Haddoud, M; Abdeddaim, S.
      Pages: 488 - 500
      Abstract: In this paper we define the document phrase maximality index (DPM-index), a new measure to discriminate overlapping keyphrase candidates in a text document. As an application we developed a supervised learning system that uses 18 statistical features, among them the DPM-index and five other new features. We experimentally compared our results with those of 21 keyphrase extraction methods on SemEval-2010/Task-5 scientific articles corpus. When all the systems extract 10 keyphrases per document, our method enhances by 13% the F-score of the best system. In particular, the DPM-index feature increases the F-score of our keyphrase extraction system by a rate of 9%. This makes the DPM-index contribution comparable to that of the well-known TFIDF measure on such a system.
      PubDate: 2014-07-09T04:21:45-07:00
      DOI: 10.1177/0165551514530210|hwp:master-id:spjis;0165551514530210
      Issue No: Vol. 40, No. 4 (2014)
  • A study of the effects of preprocessing strategies on sentiment analysis
           for Arabic text
    • Authors: Duwairi, R; El-Orfali, M.
      Pages: 501 - 513
      Abstract: Sentiment analysis has drawn considerable interest among researchers owing to the realization of its fascinating commercial and business benefits. This paper deals with sentiment analysis in Arabic text from three perspectives. First, several alternatives of text representation were investigated. In particular, the effects of stemming, feature correlation and n-gram models for Arabic text on sentiment analysis were investigated. Second, the behaviour of three classifiers, namely, SVM, Naive Bayes, and K-nearest neighbour classifiers, with sentiment analysis was investigated. Third, the effects of the characteristics of the dataset on sentiment analysis were analysed. To this end, we applied the techniques proposed in this paper to two datasets; one was prepared in-house by the authors and the second one is freely available online. All the experimentation was done using Rapidminer. The results show that our selection of preprocessing strategies on the reviews increases the performance of the classifiers.
      PubDate: 2014-07-09T04:21:45-07:00
      DOI: 10.1177/0165551514534143|hwp:master-id:spjis;0165551514534143
      Issue No: Vol. 40, No. 4 (2014)
  • Hyperlinks as inter-university collaboration indicators
    • Authors: Kenekayoro, P; Buckley, K, Thelwall, M.
      Pages: 514 - 522
      Abstract: Collaboration is essential for some types of research, and some agencies include collaboration among the requirements for funding research projects. This makes it important to analyse collaborative research ties. Traditional methods to indicate the extent of collaboration between organizations use co-authorship data in citation databases. Publication data from these databases are not publicly available and can be expensive to access and so hyperlink data has been proposed as an alternative. This paper investigates whether using machine learning methods to filter page types can improve the extent to which hyperlink data can be used to indicate the extent of collaboration between universities. Structured information about research projects extracted from UK and EU funding agency websites, co-authored publications and academic links between universities were analysed to identify if there is any association between the number of hyperlinks connecting two universities, with and without machine learning filtering, and the number of publications they co-authored. An increased correlation was found between the number of inlinks to a university’s website and the extent to which it collaborates with other universities when machine learning techniques were used to filter out apparently irrelevant inlinks.
      PubDate: 2014-07-09T04:21:45-07:00
      DOI: 10.1177/0165551514534141|hwp:master-id:spjis;0165551514534141
      Issue No: Vol. 40, No. 4 (2014)
  • Improving pseudo relevance feedback based query expansion using genetic
           fuzzy approach and semantic similarity notion
    • Authors: Bhatnagar, P; Pareek, N.
      Pages: 523 - 537
      Abstract: Pseudo relevance feedback-based query expansion is a popular automatic query expansion technique. However, a survey of work done in the area shows that it has a mixed chance of success. This paper captures the limitations of pseudo relevance feedback (PRF)-based query expansion and proposes a method of enhancing its performance by hybridizing corpus-based information, with a genetic fuzzy approach and semantic similarity notion. First the paper suggests use of a genetic fuzzy approach to select an optimal combination of query terms from a pool of terms obtained using PRF-based query expansion. The query terms obtained are further ranked on the basis of semantic similarity with original query terms. The experiments were performed on CISI collection, a benchmark dataset for information retrieval. It was found that the results were better in both terms of recall and precision. The main observation is that the hybridization of various techniques of query expansion in an intelligent way allows us to incorporate the good features of all of them. As this is a preliminary attempt in this direction, there is a large scope for enhancing these techniques.
      PubDate: 2014-07-09T04:21:45-07:00
      DOI: 10.1177/0165551514533771|hwp:master-id:spjis;0165551514533771
      Issue No: Vol. 40, No. 4 (2014)
  • Integrating Spanish lexical resources by meta-classifiers for polarity
    • Authors: Martinez-Camara, E; Martin-Valdivia, M. T, Molina-Gonzalez, M. D, Perea-Ortega, J. M.
      Pages: 538 - 554
      Abstract: In this paper we focus on unsupervised sentiment analysis in Spanish. The lack of resources for languages other than English, as for example Spanish, adds more complexity to the task. However, we take advantage of some good already existing lexical resources. We have carried out several experiments using different unsupervised approaches in order to compare the different methodologies for solving the problem of the Spanish polarity classification in a corpus of movie reviews. Among all these approaches, perhaps the newest one integrates SentiWordNet with the Multilingual Central Repository to tackle polarity detection directly over the Spanish corpus. However, the results obtained were not as promising as we expected, and so we carried out another group of experiments combining all the methods using meta-classifiers. The results obtained with stacking outperformed the individual experiments and encourage us to continue in this way.
      PubDate: 2014-07-09T04:21:45-07:00
      DOI: 10.1177/0165551514535710|hwp:master-id:spjis;0165551514535710
      Issue No: Vol. 40, No. 4 (2014)
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2014