e-ISSN : 0975-3397
Print ISSN : 2229-5631
Home | About Us | Contact Us

ARTICLES IN PRESS

Articles in Press

ISSUES

Current Issue
Archives

CALL FOR PAPERS

CFP 2021

TOPICS

IJCSE Topics

EDITORIAL BOARD

Editors

Indexed in

oa
 

ABSTRACT

Title : Record Matching : Improving Performance in Classification
Authors : Cyju Elizabeth Varghese, G. Naveen Sundar
Keywords : Record Matching; Duplication Detection; Record matching; SVM; Unsupervised
Issue Date : March 2011.
Abstract :
Duplication detection identifies the records that represent the same real-world entity. This is a vital process in data integration. Record matching refers to the task of finding entries that refer to the same entity in two or more files. Performing record matching solves the duplication detection problems; hence the needs for identifying the suitable record matching technique follow. Supervised methods are the current techniques used for duplication detection. This requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated on-the-fly. To address the problem of record matching in the Web database scenario, we present a Fast Duplication Detection, FDD, which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases. Starting from the non-duplicate set, we use two, a dynamic classification classifier and an SVM classifier, to iteratively identify duplicates in the query results from multiple Web databases. Performing clustering before giving vectors to classify should produce a better result. Moreover a nonlinear SVM produce a better result in case of noise document which improves overall performance of the system. Experimental results show that FDD performs better for web database scenario.
Page(s) : 1207-1212
ISSN : 0975–3397
Source : Vol. 3, Issue.03

All Rights Reserved © 2009-2024 Engg Journals Publications
Page copy protected against web site content infringement by CopyscapeCreative Commons License