e-ISSN : 0975-4024 p-ISSN : 2319-8613   
CODEN : IJETIY    

International Journal of Engineering and Technology

Home
IJET Topics
Call for Papers 2021
Author Guidelines
Special Issue
Current Issue
Articles in Press
Archives
Editorial Board
Reviewer List
Publication Ethics and Malpractice statement
Authors Publication Ethics
Policy of screening for plagiarism
Open Access Statement
Terms and Conditions
Contact Us

ABSTRACT

ISSN: 0975-4024

Title : DISTRIBUTED APPROACH to WEB PAGE CATEGORIZATION USING MAP-REDUCE PROGRAMMING MODEL
Authors : P.Malarvizhi, Ramachandra V.Pujeri
Keywords : P.Malarvizhi, Ramachandra V.Pujeri
Issue Date : Dec 2011-Jan 2012
Abstract :
The web is a large repository of information and to facilitate the search and retrieval of pages from it, categorization of web documents is essential. An effective means to handle the complexity of information retrieval from the internet is through automatic classification of web pages. Although lots of automatic classification algorithms and systems have been presented, most of the existing approaches are computationally challenging. In order to overcome this challenge, we have proposed a parallel algorithm, known as MapReduce programming model to automatically categorize the web pages. This approach incorporates three concepts. They are web crawler, MapReduce programming model and the proposed web page categorization approach. Initially, we have utilized web crawler to mine the World Wide Web and the crawled web pages are then directly given as input to the MapReduce programming model. Here the MapReduce programming model adapted to our proposed web page categorization approach finds the appropriate category of the web page according to its content. The experimental results show that our proposed parallel web page categorization approach achieves satisfactory results in finding the right category for any given web page.
Page(s) : 373-386
ISSN : 0975-4024
Source : Vol. 3, No.6