====== What is Methabot? ====== **Methabot** is an open source web crawler and command line tool **optimized for speed**. It supports scripted filetype parsing, a wide variety of customization options and is easily configured to fit anyones particular needs. **WEBSITE MOVED**: This project has moved to a new website: http://metha-sys.org/ ===== Latest News ===== {{rss>http://sourceforge.net/export/rss2_projnews.php?group_id=193450 4 date}} =====Features===== Methabot is rich with fine features, some of them, but not all, are listed below. * It's fast, designed from the ground and up with speed-optimization in mind. * Scriptable through Javascript with E4X * User-defined filetype filtering (according to MIME type, file extension or UMEX expression) * Multi-threaded * Highly configurable from command line * Extensible module system, supporting custom data parsers, filters and protocol handlers. * MySQL support through the Javascript-MySQL binding (lmm_mysql). * Simple yet powerful filtering of URLs through UMEX. * Automated downloading * Support for automatic cookie handling when running over HTTP * Robots Exclusion Standard * Reliable, fault-tolerant networking, redirect-loop detection and some spider trap detection * Parser chaining, share data easily between C and javascript parsers * Unix-friendly interface, piping in and out data for parsing and crawling * HTML to XML/XHTML conversion * Portable, tested with success on 32-bit/64-bit Linux 2.6, 32-bit/64-bit FreeBSD 6.x/7.0 and Mac OS X. Should work on almost any Unix-like OS, partial support for Windows. Old versions of Methabot have full support for Windows. =====Further Project Information===== http://metha-sys.org/