HOME > Journal Current TOC
Journal of Information Science
[454 followers] Follow
Subscription journal
ISSN (Print) 0165-5515 - ISSN (Online) 1741-6485
Published by Sage Publications
[676 journals]
[454 followers] Follow ISSN (Print) 0165-5515 - ISSN (Online) 1741-6485
Published by Sage Publications
[676 journals]-
Duplicate bibliographic record detection with an OCR-converted source of information
- Authors:
Taniguchi; S.
Pages: 153 - 168
Abstract: Duplicate record detection has been an important issue in the fields of data and records management and various detection methods have been proposed. A new method, which uses an optical character recognition (OCR)-converted source of information for record matching to detect duplicates, is proposed and examined in this paper. First, the design of an experiment for examining the performance of such a duplicate detection method is discussed. The base record set with an OCR-converted title page and its verso were prepared along with two test record sets from different union catalogues, and duplicate records between the base record set and the test sets were manually identified. A duplicate detection system was developed to execute matching (1) between records, (2) between a record and an OCR-converted source of information and (3) using a combination of these. Second, matching performance at the individual data element level is examined. Third, the performance of duplicate record detection based on matching at the element level is examined through rule-based detection and machine learning-based detection. The results of the experiment show the usefulness of incorporating source of information into duplicate detection to a certain extent.
PubDate: 2013-03-25T07:27:50-07:00
DOI: 10.1177/0165551512459923|hwp:master-id:spjis;0165551512459923
Issue No: Vol. 39, No. 2 (2013)
- Authors:
Taniguchi; S.
-
Comparing social tags with subject headings on annotating books: A study comparing the information science domain in English and Chinese
- Authors:
Wu, D; He, D, Qiu, J, Lin, R, Liu, Y.
Pages: 169 - 187
Abstract: The literature often views the emergence of social tagging as a potential alternative method to controlled vocabulary for organizing and indexing large-scale information resources. In this paper, we present an in-depth examination of the relationship between social tagging and controlled vocabulary-based indexing and organization in two unique contexts: the information science domain and when comparing data gathered from both English and Chinese sources. Our results show that the information science domain has more overlap between social tags and controlled vocabulary-based subject terms. This is reflected in the higher percentage of overlapping terms between tags and subject terms, as well as in the strong similarity (measured by Jaccard’s coefficient) in frequently used keywords among tags and subject terms. However, social tags in the information science domain still possess limitations in terms of uncontrolled terms, where inconsistencies and noisy usages exist. Our results also show that language difference does have an impact on social tagging. The numbers of Chinese tags overall and per book are less than those of English tags. The most frequently used English tags are single-word terms, which are different from multi-word controlled vocabulary terms. In comparison, the character difference between the most frequently used Chinese tags and Chinese subject terms is just one character (3 vs 4). However, English and Chinese users do share many similar behaviours when they tag books in the information science domain. Many of the most frequently used tags are shared between the two languages and the patterns of overlap between topical tags and subject terms are also similar between the two languages. Overall, despite the application limitations for social tagging in cataloguing and indexing, we believe that tagging has the potential to become a complementary resource for expanding and enriching controlled vocabulary systems. With the help of future technology to regulate and promote features related to controlled vocabulary in social tags, a hybrid cataloguing and indexing system that integrates social tags with controlled vocabulary would greatly improve people’s organizational and access capabilities within information resources.
PubDate: 2013-03-25T07:27:50-07:00
DOI: 10.1177/0165551512451808|hwp:master-id:spjis;0165551512451808
Issue No: Vol. 39, No. 2 (2013)
- Authors:
Wu, D; He, D, Qiu, J, Lin, R, Liu, Y.
-
Website content persistence and change: Longitudinal analysis of pro-white group identity
- Authors:
McCluskey; M.
Pages: 188 - 197
Abstract: Despite the ability of websites to quickly evolve, little attention has been paid to persistence and change in site content. Longitudinal examination of 163 pro-white advocacy group websites, in which establishing a core group identity is a critical strategic goal, showed a half-life of 2.40 years and 34% remained active after five years. Analysis of text content from 28 sites collected annually from 2007 to 2012 (n = 1947) showed that persistence was more likely for advocacy group identity, while examples of group goals were transient. Content persistence trends reflect broader phenomena of ideologically oriented website persuasive material.
PubDate: 2013-03-25T07:27:50-07:00
DOI: 10.1177/0165551512464148|hwp:master-id:spjis;0165551512464148
Issue No: Vol. 39, No. 2 (2013)
- Authors:
McCluskey; M.
-
SRank: Shortest paths as distance between nodes of a graph with application to RDF clustering
- Authors:
Khosravi-Farsani, H; Nematbakhsh, M, Lausen, G.
Pages: 198 - 210
Abstract: Similarity estimation between interconnected objects appears in many real-world applications and many domain-related measures have been proposed. This work proposes a new perspective on specifying the similarity between resources in linked data, and in general for vertices of a directed graph. More specifically, we compute a measure that says ‘two objects are similar if they are connected by multiple small-length shortest path’. This general similarity measure, called SRank, is based on simple and intuitive shortest paths. For a given domain, SRank can be combined with other domain-specific similarity measures. The suggested model is evaluated in a clustering procedure on a sample data from DBPedia knowledge-base, where the class label of each resource is estimated and compared with the ground-truth class label. Experimental results show that SRank outperforms other similarity measures in terms of precision and recall rate.
PubDate: 2013-03-25T07:27:50-07:00
DOI: 10.1177/0165551512463994|hwp:master-id:spjis;0165551512463994
Issue No: Vol. 39, No. 2 (2013)
- Authors:
Khosravi-Farsani, H; Nematbakhsh, M, Lausen, G.
-
General learning approach for event extraction: Case of management change event
- Authors:
Elloumi, S; Jaoua, A, Ferjani, F, Semmar, N, Besancon, R, Al-Jaam, J, Hammami, H.
Pages: 211 - 224
Abstract: Starting from an ontology of a targeted financial domain corresponding to transaction, performance and management change news, relevant segments of text containing at least a domain keyword are extracted. The linguistic pattern of each segment is automatically generated to serve initially as a learning model. Each pattern is composed of named entities, keywords and articulation words. Some generic named entities like organizations, persons, locations, dates and grammatical annotations are generated by an automatic tool. During the learning step, each relevant segment is manually annotated with respect to the targeted entities (roles) structuring an event of the ontology. Information extraction is processed by associating a role with a specific entity. By alignment of generic entities to specific entities, some strings of a text are automatically annotated. An original learning approach is presented. Experiments with the management change event showed how recognition rates are improved by using different generalization tools.
PubDate: 2013-03-25T07:27:50-07:00
DOI: 10.1177/0165551512464140|hwp:master-id:spjis;0165551512464140
Issue No: Vol. 39, No. 2 (2013)
- Authors:
Elloumi, S; Jaoua, A, Ferjani, F, Semmar, N, Besancon, R, Al-Jaam, J, Hammami, H.
-
Word sense disambiguation based on positional weighted context
- Authors:
Huang, S; Zheng, X, Kang, H, Chen, D.
Pages: 225 - 237
Abstract: Word sense disambiguation (WSD) is a key factor in solving natural language processing problems. The purpose of WSD is to make computers automatically determine the specific meaning of a word in a specific context. In this regard, state-of-art studies have focussed on the co-occurrences of words to measure context similarity. However, a problem with these approaches is that they consider all the words within a certain range to have equal influence on the ambiguous word. In this paper, we propose a position-based algorithm for measuring context similarity. By assigning positional weights to context words, we compared the context similarity between a new instance and pre-labelled instances to determine the appropriate sense of the ambiguous word. Experiments on the Senseval-2 English lexical sample task showed that our algorithm can achieve good precision and recall. Even in a minimally supervised state, it performs well with few training instances.
PubDate: 2013-03-25T07:27:51-07:00
DOI: 10.1177/0165551512459919|hwp:master-id:spjis;0165551512459919
Issue No: Vol. 39, No. 2 (2013)
- Authors:
Huang, S; Zheng, X, Kang, H, Chen, D.
-
Backward inference and pruning for RDF change detection using RDBMS
- Authors:
Im, D.-H; Lee, S.-W, Kim, H.-J.
Pages: 238 - 255
Abstract: Recent studies on change detection for RDF data have focused on minimizing the delta size and, as a way to exploit the semantics of RDF models in reducing the delta size, the forward-chaining inferences have been widely employed. However, since the forward-chaining inferences should pre-compute the entire closure of the RDF model, the existing approaches are not scalable to large RDF data sets. In this paper, we propose a scalable change detection scheme for RDF data, which is based on backward-chaining inference and pruning. Our scheme, instead of pre-computing the full closure, computes only the necessary closure on the fly, thus achieving fast and scalable change detection. In addition, for any two RDF data input files to be compared, the delta obtained from our scheme is always equivalent to the one from the existing forward-chaining inferences. In addition, in order to handle RDF data sets too large to fit in the available RAM, we present an SQL-based implementation of our scheme. Our experimental results show that our scheme, in comparison to the existing schemes, can reduce the number of inference triples for RDF change detection by 10–60%.
PubDate: 2013-03-25T07:27:51-07:00
DOI: 10.1177/0165551512463650|hwp:master-id:spjis;0165551512463650
Issue No: Vol. 39, No. 2 (2013)
- Authors:
Im, D.-H; Lee, S.-W, Kim, H.-J.
-
Online copyright enforcement by Internet Service Providers
- Authors:
Muir; A.
Pages: 256 - 269
Abstract: The culture of online sharing of information on the Internet extends to unauthorized sharing of copyright content, and is perceived as a major threat to copyright owners and content industries. Enforcement of existing copyright laws is difficult owing to the widespread nature of unauthorized sharing. Rights holders have pursued individuals and organizations involved through existing legal channels, with limited success. They have also engaged in voluntary arrangements with Internet Service Providers to educate and, potentially, punish infringers. Governments have more recently become involved in developing new legislation with similar aims. The approaches to addressing the issue have been controversial, mainly because of lack of transparency in their development and concerns about their potential impact on the rights of individuals. The approaches to addressing online copyright infringement are described. The nature of the policy-making process and its impact on how legal measures are perceived are analysed. The potential impact of measures on the rights of subscribers is discussed. A key conclusion is that new measures to combat unauthorized file sharing need not, in principle, adversely affect the balance between rights, but the design and implementation of legal measures do raise concerns in terms of necessity and proportionality.
PubDate: 2013-03-25T07:27:51-07:00
DOI: 10.1177/0165551512463992|hwp:master-id:spjis;0165551512463992
Issue No: Vol. 39, No. 2 (2013)
- Authors:
Muir; A.
-
Rethinking the liaisons between Intellectual Capital Management and Knowledge Management
- Authors:
Hendriks, P. H. J; Sousa, C. A. A.
Pages: 270 - 285
Abstract: Intellectual Capital Management (ICM) and Knowledge Management (KM), two highly popular topics in current management discussions, are often bracketed together. The common understanding of ICM is that concepts of measurement, reporting and valuation most distinctively define this perspective, whereas KM connects debates about organizational knowledge with possibilities and limitations of management. That raises the question of how the management focus on knowledge in KM discussions is connected to the valuation and measurement approaches of ICM. An extensive review of the literature shows that knowledge plays a background role in Intellectual Capital (IC) measurement discussions. Referral to knowledge as an intangible asset appears more rhetorical than based on in-depth understanding of what knowledge as an organizational resource or capability is or is not. More particularly, the predominant view of knowledge in IC measurement discussions is a neo-functionalist, possession approach, even if flow elements of knowledge are used to supplement stock elements. Critical understanding of knowledge, for instance, as practice-based dispute, are virtually absent from the ICM discussions. What the blind spots identified in the review highlight is that ICM and KM discussions, which are presently mostly developed in isolation, should set up more meaningful and elaborated liaisons than are currently established. Two important areas for building such liaisons include (1) the perusal of the contextual, possibly disputed and power-related nature of knowledge in relation to measurement and (2) developing a systematic approach to understanding what measuring or not measuring does to organizational knowledge.
PubDate: 2013-03-25T07:27:51-07:00
DOI: 10.1177/0165551512463995|hwp:master-id:spjis;0165551512463995
Issue No: Vol. 39, No. 2 (2013)
- Authors:
Hendriks, P. H. J; Sousa, C. A. A.



