Abstract |
: |
Nowadays, most of the search engines are competing to index as much of the Surface Web as possible with leaving a lurch at the OAI content (pdf documents), which holds a huge amount of information than surface web. In this paper, a novel framework for OAI-PMH based Crawler is being proposed that uses agents to extract the metadata about the OAI resources and store them in a repository which is later on queried through the OAI-PMH layer to generate the XML pages containing the metadata. These pages are further added to the search engines repository for indexing that makes in turn increases the relevancy of Search Engine. Agents are being used to parallelize the whole process so that metadata extraction from multiple resources can be carried out simultaneously.
|