for Journals by Title or ISSN
for Articles by Keywords
help
Followed Journals
Journal you Follow: 0
 
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Jurnals are published.
Already have an account? Sign In to see the journals you follow.
Journal Cover   Journal of Information Science
  [SJR: 1.008]   [H-I: 40]   [761 followers]  Follow
    
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 0165-5515 - ISSN (Online) 1741-6485
   Published by Sage Publications Homepage  [819 journals]
  • Detecting near-duplicate text documents with a hybrid approach
    • Authors: Varol, C; Hari, S.
      Pages: 405 - 414
      Abstract: Near duplicate data not only increase the cost of information processing in big data, but also increase decision time. Therefore, detecting and eliminating nearly identical information is vital to enhance overall business decisions. To identify near-duplicates in large-scale text data, the shingling algorithm has been widely used. This algorithm is based on occurrences of contiguous subsequences of tokens in two or more sets of information, such as in documents. In other words, if there is a slight variation among documents, the overall performance of the algorithm decreases. Therefore, to increase the efficiency and accuracy performances of the shingling algorithm, we propose a hybrid approach that embeds Jaro distance and statistical results of word usage frequency for fixing the ill-defined data. In a real text dataset, the proposed hybrid approach improved the shingling algorithm’s accuracy performance by 27% on average and achieved above 90% common shingles.
      PubDate: 2015-07-09T09:02:42-07:00
      DOI: 10.1177/0165551515577912
      Issue No: Vol. 41, No. 4 (2015)
       
  • Modelling trust formation in health information contexts
    • Authors: Johnson, F; Rowley, J, Sbaffi, L.
      Pages: 415 - 429
      Abstract: This study explores trust formation in the context of health information. Trust as an interpersonal notion, when formed in a vulnerable state, is a response or belief about how the trusted will behave towards the trustor. This study focuses on the process of assessing the trustworthiness of information, in a dependency state of information need, through the identification of the many factors influencing this assessment. A set of propositions are developed to suggest the criteria by which trustworthiness is assessed as well as the factors that influence these judgements. The proposed model is tested in a large-scale survey using a trust inventory with factor analysis to explore the constructs of trust formation. Structural equation modelling is used to explore the relationship among the identified criteria and their influencing factors. The resulting framework contributes to the understanding of trust formation in digital information contexts on the criteria of usefulness and credibility and further research into the influencing factors is recommended.
      PubDate: 2015-07-09T09:02:42-07:00
      DOI: 10.1177/0165551515577914
      Issue No: Vol. 41, No. 4 (2015)
       
  • Language clustering and knowledge sharing in multilingual organizations: A
           social perspective on language
    • Authors: Ahmad, F; Widen, G.
      Pages: 430 - 443
      Abstract: Knowledge sharing is a product of the collaborative and supportive environment shaped by socialization and informal communication between employees. Under the pressure of globalization and business internationalization, workforces have become increasingly diverse, particularly in terms of language. This has implications for knowledge sharing. It has been observed that employees tend to gravitate towards their own language communities, leading to language clustering (language-based grouping), which affects informal communication and knowledge mobility in organizations negatively. Although the existence of such clusters has been reported in many previous studies, we do not clearly understand how and why language brings these clusters into being and what kind of implications this has for knowledge sharing. This paper draws upon the theory of the semiotic processes of linguistic differentiation taken from linguistic anthropology to provide a theoretical framework capable of explaining the dynamics of language creating language clusters. Unlike previous knowledge management studies, which largely focus on the instrumental aspect of language, this paper adopts a social perspective on language. It is argued that, to deal with language clustering, we have to explore the dynamics operating behind it in detail. This will not only allow us to understand its implications for knowledge sharing but will also be helpful in devising potent knowledge management initiatives in multilingual workplaces.
      PubDate: 2015-07-09T09:02:42-07:00
      DOI: 10.1177/0165551515581280
      Issue No: Vol. 41, No. 4 (2015)
       
  • Model mapping approaches for XML documents: A review
    • Authors: Qtaish, A; Ahmad, K.
      Pages: 444 - 466
      Abstract: XML has become the dominant standard for data exchange and representation on the Web. The Relational Database (RDB) possesses is widely used as a storage and retrieval medium in the business field. With the expanding utilization of XML data on the Web, the size of this data type has increased rapidly, and more complicated queries are issued by users through this data. This expansion has prompted numerous researchers to propose various approaches in managing XML data through RDB. In this study, the most cited and the latest model-mapping approaches are reviewed in terms of the description, the technique used and the RDB schema produced using each approach. The limitations of these approaches are discussed, in terms of the storage space and query response time. At the end of this study, a solution to these limitations is proposed. It is hoped that this paper will give some insight into storing XML documents in RDB schema and contribute to the XML community.
      PubDate: 2015-07-09T09:02:42-07:00
      DOI: 10.1177/0165551515579995
      Issue No: Vol. 41, No. 4 (2015)
       
  • TAG term weight-based N gram Thesaurus generation for query expansion in
           information retrieval application
    • Authors: Shaila, S. G; Vadivel, A.
      Pages: 467 - 485
      Abstract: Query expansion is an important task in information retrieval applications that improves the user query and helps in retrieving the relevant documents. In this paper, N gram Thesaurus is constructed from the documents for query expansion. The HTML TAGs in web documents are considered and their syntactical context is understood. Based on the nature, properties and significances, the TAGs are assigned a suitable weight. Later, the term weight is calculated using corresponding TAG weight and term frequency and later updated into the inverted index. All the single terms in the inverted index are updated as Unigrams in the Thesaurus. Further, Bigrams are constructed using Unigrams. Likewise, the rest of the (N + 1) grams are generated using N grams and their weights and later updated into the Thesaurus. During the query session, the user query terms are expanded based on the predicted N grams provided by the Thesaurus that are given as suggestions to the user. The performance of the proposed approach is evaluated using the Clueweb09B, WT10g and GOV2 benchmark dataset. The improvement gain against baseline is considered as an evaluation parameter and the proposed approach has acheved 7.9% gain on ClueWeb09B, 18.3% on WT10g and 29.4% on GOV2 in terms of Mean Average Precision (MAP). We also compared the performance of the proposed approach with two other query expansion approaches, KLDCo and BoCo. The approach achieved 0.574 (+0.236), 0.519 (+0.209), 0.422 (+0.185) and 0.654 (+0.243) gain in terms P@5, P@10, MAP and MRR against baselines.
      PubDate: 2015-07-09T09:02:42-07:00
      DOI: 10.1177/0165551515581567
      Issue No: Vol. 41, No. 4 (2015)
       
  • Topical tags vs non-topical tags: Towards a bipartite classification?
    • Authors: Basile, V; Peroni, S, Tamburini, F, Vitali, F.
      Pages: 486 - 505
      Abstract: In this paper we investigate whether it is possible to create a computational approach that allows us to distinguish topical tags (i.e. talking about the topic of a resource) and non-topical tags (i.e. describing aspects of a resource that are not related to its topic) in folksonomies, in a way that correlates with humans. Towards this goal, we collected 21 million tags (1.2 million unique terms) from Delicious and developed an unsupervised statistical algorithm that classifies such tags by applying a word space model adapted to the folksonomy space. Our algorithm analyses the co-occurrence network of tags to a target tag and exploits graph-based metrics for their classification. We validated its outcomes against a reference classification made by humans on a limited number of terms in three separate tests. The analysis of the outcomes of our algorithm shows, in some cases, a consistent disagreement among humans and between humans and our algorithm about what constitutes a topical tag, and suggests the rise of a new category of overly generic tags (i.e. umbrella tags).
      PubDate: 2015-07-09T09:02:42-07:00
      DOI: 10.1177/0165551515585283
      Issue No: Vol. 41, No. 4 (2015)
       
  • Modern information retrieval in Arabic - catering to standard and
           colloquial Arabic users
    • Authors: Azmi, A. M; Aljafari, E. A.
      Pages: 506 - 517
      Abstract: The widespread use of colloquial dialects among the younger generation of Arabs is depriving many of them the fruits of information freedom. Although most Arabs have no problem with reading text in formal Arabic, widely known as Modern Standard Arabic (MSA), the younger generation is more adept at colloquial Arabic, mainly owing to the widespread use of social media. The current search engines cater mostly to MSA. This means that materials written in colloquial are off-limits to those who use MSA, and similarly the MSA contents are off-limits for those who communicate in colloquial only. To achieve the full potential of an information-retrieval system, we need a successful scheme that interprets queries whether they are in MSA, colloquial Arabic or a combination of both. In this paper we design an information-retrieval system that addresses our concern against the backdrop of one of the local dialects in Saudi Arabia. Our system is based on modifying an MSA stemming technique and a set of colloquial MSA conversion rules that are lexicon based. We tested the system using 44 queries on a corpus of over 1400 documents (MSA, colloquial, mix). The average precision was 84.3%, while the average recall was 96.5%. In the second test we compared the precision of the retrieved documents by our system vs Google and Yahoo! search engines. The respective average precisions were 78.2, 51.9 and 56.2%.
      PubDate: 2015-07-09T09:02:42-07:00
      DOI: 10.1177/0165551515585720
      Issue No: Vol. 41, No. 4 (2015)
       
  • Quality prediction of multilingual news clustering: An experimental study
    • Authors: Montalvo, S; Martinez, R, Fresno, V.
      Pages: 518 - 530
      Abstract: The evaluation of clustering results is one of the most important issues in cluster analysis, a core task for effective information access. There are two types of measures for evaluating the quality of clustering results: internal and external. External validity measures evaluate how well the clustering results match prior knowledge about the data, whereas internal measures do not need external information, dealing only with information within the data. In this regard, the main drawback of external evaluation measures is that they are not applicable in real-world situations. In this paper we present an experimental study to determine whether it is possible to predict the quality of multilingual news clustering results by means of an internal evaluation measure. We study whether the internal evaluation measure Expected Density correlates with the external measure F-measure, the most common way of evaluating clustering results. In the experiments, we use different data collections, clustering algorithms and similarity measures in order to determine their influence in the correlation between those measures. Regarding similarity measures, another important issue in clustering, we propose a new similarity measure to calculate how similar two news documents are. This measure is based on the Named Entities shared by both documents. The results show that correlation depends on several different factors, such as the type of collection, the granularity of the clusters, the type of algorithm and the similarity measure.
      PubDate: 2015-07-09T09:02:42-07:00
      DOI: 10.1177/0165551515586671
      Issue No: Vol. 41, No. 4 (2015)
       
  • Using data-driven feature enrichment of text representation and ensemble
           technique for sentence-level polarity classification
    • Authors: Zhang, P; He, Z.
      Pages: 531 - 549
      Abstract: As an important issue in sentiment analysis, sentence-level polarity classification plays a critical role in many opinion-mining applications such as opinion question answering, opinion retrieval and opinion summarization. Employing a supervised learning paradigm to train a classifier from sentences often faces the data sparseness problem owing to the short-length limit introduced to texts. In this article, regarding this problem, we exploit two different feature sets learned from external data sets as additional features to enrich data representation: one is a latent topic feature set obtained using a topic model, and the other is a related word feature set derived using word embeddings. Furthermore, we propose an ensemble approach by using these additional features to guide the design of different members of the ensemble. Experimental results on the public movie review dataset demonstrate that the enriched representations are effective for improving the performance of polarity classification, and the proposed ensemble approach can further improve the overall performance.
      PubDate: 2015-07-09T09:02:42-07:00
      DOI: 10.1177/0165551515585264
      Issue No: Vol. 41, No. 4 (2015)
       
 
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
 
About JournalTOCs
API
Help
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2015