e-ISSN : 0975-3397
Print ISSN : 2229-5631
Home | About Us | Contact Us

ARTICLES IN PRESS

Articles in Press

ISSUES

Current Issue
Archives

CALL FOR PAPERS

CFP 2021

TOPICS

IJCSE Topics

EDITORIAL BOARD

Editors

Indexed in

oa
 

ABSTRACT

Title : Query Based Duplicate Data Detection on WWW
Authors : Ranjna Gupta, Neelam Duhan, A.K. Sharma, Neha Aggarwal
Keywords : WWW; Query log; Cluster; Search Engine; Ranking Algorithm;
Issue Date : July 2010
Abstract :
The problem of finding relevant documents has become much more prominent due to the presence of duplicate data on the WWW. This redundancy in results increases the users’ seek time to find the desired information within the search results, while in general most users just want to cull through tens of result pages to find new/different results. The identification of similar or near-duplicate pairs in a large collection is a significant problem with wide-spread applications. Another contemporary materialization of the problem is the efficient identification of near-duplicate Web pages. This is certainly challenging in the web-scale due to the voluminous data. Therefore, a mechanism needs to be introduced for detecting duplicate data so that relevant search results can be provided to the user. In this paper, architecture is being proposed that introduces methods that run online as well as offline on the basis of favored and disfavored user queries to detect duplicates and near duplicates.
Page(s) : 1395-1400
ISSN : 0975–3397
Source : Vol. 2, Issue.4

All Rights Reserved © 2009-2024 Engg Journals Publications
Page copy protected against web site content infringement by CopyscapeCreative Commons License