The Methabot Project

A fast, scriptable web crawler system
Writing /var/www/bithack.se/projects/methabot/data/cache/9/938b3cee3e8e5f0f141dae1e719ce810.i failed
Unable to save cache file. Hint: disk full; file permissions; safe_mode setting.
Writing /var/www/bithack.se/projects/methabot/data/cache/9/938b3cee3e8e5f0f141dae1e719ce810.xhtml failed

Differences

This shows you the differences between the selected revision and the current version of the page.

docs:parser_chaining 2009/02/24 00:14 docs:parser_chaining 2009/02/24 00:16 current
Line 23: Line 23:
} }
</code> </code>
 +
 +Furthermore, if your parser does not extract URLs but only extracts meta-information about the page, you can send it to the default HTML parser afterwards, which will extract all URLs for you:
 +<code java>
 +filetype["your_filetype"]
 +{
 +    parser = "xmlconv, yourfile.js/yourparser, html";
 +}
 +</code>
 +
 
 
docs/parser_chaining.txt · Last modified: 2009/02/24 00:16 by sdac