Abstract |
: |
In this era of digital tsunami of information on the web, everyone is completely dependent on the WWW for information retrieval. This has posed a challenging problem in extracting relevant data. Traditional web crawlers focus only on the surface web while the deep web keeps expanding behind the scene. The web databases are hidden behind the query interfaces. In this paper, we propose a Hidden Web Extractor (HWE) that can automatically discover and download data from the Hidden Web databases. Since the only “entry point” to a Hidden Web site is a query interface, the main challenge that a Hidden Web Extractor has to face is how to automatically generate meaningful queries for the unlimited number of website pages. |