Email Harvester
Also known as: Email scraper, Address harvester
A bot or script that crawls the web to scrape email addresses from public pages, directories, and leaked databases, building target lists for spam and phishing.
Last updated:
What is an email harvester?
An email harvester is an automated program that crawls the public internet for anything that looks like an email address — pattern-matching strings that contain an @ and a valid domain — and compiles them into target lists. The lists are then sold, leaked, or used directly to send spam and phishing. Harvesters are one of the oldest forms of abusive web crawling and the reason most professional websites no longer put raw mailto: addresses on contact pages.
Where harvesters pull addresses from
- Published contact pages and staff directories — especially university, government, and corporate sites
- Mailing list archives and forum post footers
- WHOIS records for domain registrants (which is why modern registrars default to privacy protection)
- Leaked breach databases redistributed on underground forums
- Social media profiles that expose contact info
- GitHub commit metadata — every commit records the author's email
How it looks in server logs
Harvester traffic is a specific flavor of web crawler abuse: a single IP or small set of IPs pulling thousands of pages per minute, ignoring robots.txt, focusing on pages likely to contain contact info, and sometimes identifying itself with a forged User-Agent that mimics a legitimate search engine. Heavy harvester activity can overwhelm small sites just through bandwidth, even without any downstream abuse.
Defense
Common countermeasures: obfuscate email addresses on public pages (use JavaScript to render, or display as name [at] example [dot] com), require a contact form instead, enforce rate limits per IP, and block the IPs of known harvesters at the CDN or WAF layer. Running unfamiliar crawler IPs through an IP abuse report checker flags known harvester infrastructure.