for Journals by Title or ISSN
for Articles by Keywords
Journal Cover Journal of Information Science
   [699 followers]  Follow    
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
     ISSN (Print) 0165-5515 - ISSN (Online) 1741-6485
     Published by Sage Publications Homepage  [737 journals]   [SJR: 1.199]   [H-I: 35]
  • Multilingual query expansion in the SveMed+ bibliographic database: A case
    • Authors: Gavel, Y; Andersson, P.-O.
      Pages: 269 - 280
      Abstract: SveMed+ is a bibliographic database covering Scandinavian medical journals. It is produced by the University Library of Karolinska Institutet in Sweden. The bibliographic references are indexed with terms from the Medical Subject Headings (MeSH) thesaurus. The MeSH has been translated into several languages, including Swedish, making it suitable as the basis for multilingual tools in the medical field. The data structure of SveMed+ closely mimics that of PubMed/MEDLINE. Users of PubMed/MEDLINE and similar databases typically expect retrieval features that are not readily available off-the-shelf. The SveMed+ interface is based on a free text search engine (Solr) and a relational database management system (Microsoft SQL Server) containing the bibliographic database and a multilingual thesaurus database. The thesaurus database contains medical terms in three different languages and information about relationships between the terms. A combined approach involving the Solr free text index, the bibliographic database and the thesaurus database allowed the implementation of functionality such as automatic multilingual query expansion, faceting and hierarchical explode searches. The present paper describes how this was done in practice.
      PubDate: 2014-05-27T03:26:06-07:00
      DOI: 10.1177/0165551514524685|hwp:master-id:spjis;0165551514524685
      Issue No: Vol. 40, No. 3 (2014)
  • Performance of LDA and DCT models
    • Authors: Rathore, A. S; Roy, D.
      Pages: 281 - 292
      Abstract: The Doubly Correlated Topic Model is a generative probabilistic topic model for automatically identifying topics from the corpus of the text documents. It is a mixed membership model, based on the fact that a document exhibits a number of topics. We used word co-occurrence statistical information for identifying an initial set of topics as posterior information for the model. Posterior inference methods utilized by the existing models are intractable and therefore provide an approximate solution. Consideration of co-occurred words as initial topics provides a tighter bound on the topic coherence. The proposed model is motivated by the Latent Dirichlet Allocation Model. The Doubly Correlated Topic Model differs from the Latent Dirichlet Allocation Model in its posterior inference; it uses the highest ranked co-occurred words as initial topics rather than obtaining from Dirichlet priors. The results of the proposed model suggest some improved performance on entropy and topical coherence over different datasets.
      PubDate: 2014-05-27T03:26:06-07:00
      DOI: 10.1177/0165551514524678|hwp:master-id:spjis;0165551514524678
      Issue No: Vol. 40, No. 3 (2014)
  • SQL-based semantics for path expressions over hierarchical data in
           relational databases
    • Authors: Vainio, J; Junkkari, M.
      Pages: 293 - 312
      Abstract: Hierarchical part-of relationships/aggregation structures and related queries are essential parts of information systems. However, relational database query languages do not explicitly support hierarchical relationships and queries. A hierarchical query may require a great number of join operations, which increases the effort in query formulation. Therefore, we propose path expressions in formulating hierarchical views over relational data because path expressions are a conventional and compact way to represent hierarchical relationships. We embed path expressions within SQL queries and compile them to standard SQL. This ensures that the path expressions can straightforwardly be implemented on the top of standard relational database systems. The compilation of a path expression is given by an attribute grammar, a conventional formalism to define the semantics of a language.
      PubDate: 2014-05-27T03:26:06-07:00
      DOI: 10.1177/0165551514520943|hwp:master-id:spjis;0165551514520943
      Issue No: Vol. 40, No. 3 (2014)
  • Exploiting reviewers' comment histories for sentiment analysis
    • Authors: Basiri, M. E; Ghasem-Aghaee, N, Naghsh-Nilchi, A. R.
      Pages: 313 - 328
      Abstract: Sentiment analysis is used to extract people’s opinion from their online comments in order to help automated systems provide more precise recommendations. Existing sentiment analysis methods often assume that the comments of any single reviewer are independent of each other and so they do not take advantage of significant information that may be extracted from reviewers’ comment histories. Using psychological findings and the theory of negativity bias, we propose a method for exploiting reviewers’ comment histories to improve sentiment analysis. Furthermore, to use more fine-grained information about the content of a review, our method predicts the overall ratings by aggregating sentence-level scores. In the proposed system, the Dempster–Shafer theory of evidence is utilized for score aggregation. The results from four large and diverse social Web datasets establish the superiority of our approach in comparison with the state-of-the-art machine learning techniques. In addition, the results show that the suggested method is robust to the size of training dataset.
      PubDate: 2014-05-27T03:26:06-07:00
      DOI: 10.1177/0165551514522734|hwp:master-id:spjis;0165551514522734
      Issue No: Vol. 40, No. 3 (2014)
  • Learning time-sensitive domain ontology from scientific papers with a
           hybrid learning method
    • Authors: Ren; F.
      Pages: 329 - 345
      Abstract: Large numbers of available scientific papers makes the research of ontology construction an attractive application area. However, there are two shortcomings for most current ontology construction approaches. First, implicit time properties of domain concepts are rarely taken into account in current approaches. Second, current automatic concept relation extraction methods mainly rely on the local context information that surrounds current considered concepts. These two problems prevent most current ontology construction methods from being employed to their full potential. To tackle these problems, we propose a hybrid learning method to integrate concepts’ global information and human experts’ knowledge together into ontology construction, among which concepts’ temporal attributes are taken into account. Our method first divides each concept into four time periods according to their attribution distribution on a time axis. Then global time-related attributions are collected for each concept. Finally, concept relations are extracted with a hybrid learning method. We evaluated our method by testing it on Chinese academic papers. It outperformed a baseline system based on only hierarchical concept relations, showing the effectiveness of our approach.
      PubDate: 2014-05-27T03:26:06-07:00
      DOI: 10.1177/0165551514521927|hwp:master-id:spjis;0165551514521927
      Issue No: Vol. 40, No. 3 (2014)
  • Systematically retrieving research in the digital age: Case study on the
           topic of social networking sites and young people's mental health
    • Authors: Best, P; Taylor, B, Manktelow, R, McQuilkin, J.
      Pages: 346 - 356
      Abstract: Online information seeking has become normative practice among both academics and the general population. This study appraised the performance of eight databases to retrieve research pertaining to the influence of social networking sites on the mental health of young people. A total of 43 empirical studies on young people’s use of social networking sites and the mental health implications were retrieved. Scopus and SSCI had the highest sensitivity with PsycINFO having the highest precision. Effective searching requires large generic databases, supplemented by subject-specific catalogues. The methodology developed here may provide inexperienced searchers, such as undergraduate students, with a framework to define a realistic scale of searching to undertake for a particular literature review or similar project.
      PubDate: 2014-05-27T03:26:06-07:00
      DOI: 10.1177/0165551514521936|hwp:master-id:spjis;0165551514521936
      Issue No: Vol. 40, No. 3 (2014)
  • An algorithm to improve the performance of string matching
    • Authors: Hlayel, A. A; Hnaif, A.
      Pages: 357 - 362
      Abstract: Approximate string matching algorithms are techniques used to find a pattern ‘P’ in a text ‘T’ partially or exactly. These techniques become very important in terms of performance and the accuracy of searching results. In this paper, we propose a general approach algorithm, called the Direct Matching Algorithm (DMA). The function of this algorithm is to perform direct access matching for the exact pattern or its similarities within a text depending on the location of a character in alphabetical order. We simulated the DMA in order to show its competence. The simulation result showed significant improvement in the exact string matching or similarity matching, and therefore extreme competence in the real applications.
      PubDate: 2014-05-27T03:26:06-07:00
      DOI: 10.1177/0165551513519039|hwp:master-id:spjis;0165551513519039
      Issue No: Vol. 40, No. 3 (2014)
  • Performance evaluation of parallel multithreaded A* heuristic search
    • Authors: Mahafzah; B. A.
      Pages: 363 - 375
      Abstract: Heuristic search is used in many problems and applications, such as the 15 puzzle problem, the travelling salesman problem and web search engines. In this paper, the A* heuristic search algorithm is reconsidered by proposing a parallel generic approach based on multithreading for solving the 15 puzzle problem. Using multithreading, sequential computers are provided with virtual parallelization, yielding faster execution and easy communication. These advantageous features are provided through creating a dynamic number of concurrent threads at the run time of an application. The proposed approach is evaluated analytically and experimentally and compared with its sequential counterpart in terms of various performance metrics. It is revealed by the experimental results that multithreading is a viable approach for parallel A* heuristic search. For instance, it has been found that the parallel multithreaded A* heuristic search algorithm, in particular, outperforms the sequential approach in terms of time complexity and speedup.
      PubDate: 2014-05-27T03:26:06-07:00
      DOI: 10.1177/0165551513519212|hwp:master-id:spjis;0165551513519212
      Issue No: Vol. 40, No. 3 (2014)
  • Extracting the roots of Arabic words without removing affixes
    • Authors: Yaseen, Q; Hmeidi, I.
      Pages: 376 - 385
      Abstract: Most research in Arabic roots extraction focuses on removing affixes from Arabic words. This process adds processing overhead and may remove non-affix letters, which leads to the extraction of incorrect roots. This paper advises a new approach to dealing with this issue by introducing a new algorithm for extracting Arabic words’ roots. The proposed algorithm, which is called the Word Substring Stemming Algorithm, does not remove affixes during the extraction process. Rather, it is based on producing the set of all substrings of an Arabic word, and uses the Arabic roots file, the Arabic patterns file and a concrete set of rules to extract correct roots from substrings. The experiments have shown that the proposed approach is competitive and its accuracy is 83.9%, Furthermore, its accuracy can be enhanced more in the sense that, for about 9.9% of the tested words, the WSS algorithm retrieves two candidates (in most cases) for the correct root.
      PubDate: 2014-05-27T03:26:06-07:00
      DOI: 10.1177/0165551514526348|hwp:master-id:spjis;0165551514526348
      Issue No: Vol. 40, No. 3 (2014)
  • Intellectual structure of the institutional repository field: A co-word
    • Authors: Cho; J.
      Pages: 386 - 397
      Abstract: The institutional repository is a major means of providing open access to academic output and is changing academic communications. As use of the institutional repository is spreading, research advancing its management policy and technology has been conducted in the library and academic communities. This study has undertaken a co-word analysis of author keywords in articles from the SCOPUS database from 1997 to 2012 and found 8 clusters that represent the intellectual structure of Institutional Repository Research, including ‘Metadata’, ‘Open Access’, ‘Institutional Repository’, ‘digital Library’, ‘dSpace’, ‘Copyright’, ‘Preservation’ and ‘Sematic Web’. To understand these intellectual structures, this study used a co-occurrence matrix based on Pearson’s correlation coefficient to create a clustering of the words using the hierarchical clustering technique. To visualize these intellectual structures, this study carried out a multidimensional scaling analysis, to which a PROXCAL algorithm was applied.
      PubDate: 2014-05-27T03:26:06-07:00
      DOI: 10.1177/0165551514524686|hwp:master-id:spjis;0165551514524686
      Issue No: Vol. 40, No. 3 (2014)
  • Aara'- a system for mining the polarity of Saudi public opinion through
           e-newspaper comments
    • Authors: Azmi, A. M; Alzanin, S. M.
      Pages: 398 - 410
      Abstract: Aara’ is a system for mining opinion polarity through the pool of comments that readers write anonymously at the online edition of Saudi newspapers. We use a nave Bayes classifier with a revised n-gram approach to extract the public opinion polarity, which is expressed in Arabic, classifying it into four categories. For training we manually marked the comments as belonging to one of the categories. All the words in the documents of the training set were removed except those with explicit connotations. After the training the words designated as vocabulary were classified into one of the categories. Our system carries out polarity classification over informal colloquial Arabic that is unstructured and with a reasonable proportion of spelling errors. The result of testing our system showed a macro-averaged precision of 86.5%, while the macro-averaged F-score was 84.5%. The accuracy of the system is 82%.
      PubDate: 2014-05-27T03:26:06-07:00
      DOI: 10.1177/0165551514524675|hwp:master-id:spjis;0165551514524675
      Issue No: Vol. 40, No. 3 (2014)
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2014