DataWake Prefetch scours the internet for user provided keywords, scrapes those pages and provides the user a ranked list of the websites containing those keywords.
How it works
DataWake Prefetch consists of a Firefox Add-on, web server, and a distributed crawler. Searches are based on user provided keywords. The search results are then scraped and ranked based on those keywords.
Prefetch Firefox plugin showing user defined entities.
Prefetch Firefox plugin showing suggested websites.
Applied Technology
This work was funded by DARPA’s Memex program and leverages several technologies from DARPA’s Open Catalog. DataWake Prefetch is available on the Memex Open Catalog
DataWake Prefetch utilizes the following Darpa technology
Scrapy Cluster - Distributed scraper
MITIE: MIT Information Extraction - Entity extratror
Tangelo - Python web framework