DataWake Prefetch scours the internet for user provided keywords, scrapes those pages and provides the user a ranked list of the websites containing those keywords.

How it works

DataWake Prefetch consists of a Firefox Add-on, web server, and a distributed crawler. Searches are based on user provided keywords. The search results are then scraped and ranked based on those keywords.

Prefetch user defined entities list Prefetch Firefox plugin showing user defined entities.

Prefetch suggested websites Prefetch Firefox plugin showing suggested websites.

Applied Technology

This work was funded by DARPA’s Memex program and leverages several technologies from DARPA’s Open Catalog. DataWake Prefetch is available on the Memex Open Catalog

DataWake Prefetch utilizes the following Darpa technology

Scrapy Cluster - Distributed scraper
MITIE: MIT Information Extraction - Entity extratror
Tangelo - Python web framework