The Methabot Project

A fast, scriptable web crawler system
Writing /var/www/bithack.se/projects/methabot/data/cache/9/938b3cee3e8e5f0f141dae1e719ce810.i failed
Unable to save cache file. Hint: disk full; file permissions; safe_mode setting.
Writing /var/www/bithack.se/projects/methabot/data/cache/9/938b3cee3e8e5f0f141dae1e719ce810.xhtml failed

Differences

This shows you the differences between the selected revision and the current version of the page.

option_reference 2009/01/16 09:50 option_reference 2009/02/16 13:23 current
Line 6: Line 6:
| %%-D%% | %%--depth-limit%%    | (int)  | Decides how deep Methabot will crawl | | %%-D%% | %%--depth-limit%%    | (int)  | Decides how deep Methabot will crawl |
| %%-e%% | %%--external%%      |        | If set, external URLs will not be discarded, temporarily disabled | | %%-e%% | %%--external%%      |        | If set, external URLs will not be discarded, temporarily disabled |
 +| %%-j%% | %%--jail%%          |        | Restrict the crawling to only subfolders |
 +|        | %%--spread%%        |        | Spread workers on multiple hosts |
| %%-p%% | %%--external-peek%%  | (int)  | Peek at external URLs for specified depth | | %%-p%% | %%--external-peek%%  | (int)  | Peek at external URLs for specified depth |
 +| %%-r%% | %%--robotstxt%%      |        | Enable robots.txt fetching and parsing |
|        | %%--dynamic-url%%    | (str)  | How to handle dynamic URLs (containing a '?') | |        | %%--dynamic-url%%    | (str)  | How to handle dynamic URLs (containing a '?') |
|        | %%--extless-url%%    | (str)  | How to handle extensionless file URLs | |        | %%--extless-url%%    | (str)  | How to handle extensionless file URLs |
-|        | %%--dir-url%%        | (str)  | How to handle directory URLs (ending with '/') |+|        | %%--dir-url%%        | (str)  | How to handle directory URLs (ending with '/')
 +|        | %%--unknown-url%%    | (str)  | How to completely unknown URLs | 
 +|        | %%--crawler%%        | (str)  | Set to the name of the crawler to modify |
| %%-t%% | %%--extensions%%    | (list)  | A comma-separated list of file extensions | | %%-t%% | %%--extensions%%    | (list)  | A comma-separated list of file extensions |
| %%-x%% | %%--expr%%          | (expr)  | [[UMEX]] expression | | %%-x%% | %%--expr%%          | (expr)  | [[UMEX]] expression |
| %%-m%% | %%--mimetypes%%      | (list)  | A comma-separated list of MIME types | | %%-m%% | %%--mimetypes%%      | (list)  | A comma-separated list of MIME types |
|        | %%--parser%%        | (name)  | Set the parser | |        | %%--parser%%        | (name)  | Set the parser |
-|        | %%--modify%%         | (name)  | Modify an already defined filetype with name +|        | %%--filetype%%       | (name)  | Set to the name of the filetype to modify
-| %%-E%% | %%--global-expr%%   | (expr)  | Global URL expression |+| %%-s%% | %%--silent%%         |         | Don't display as much output |
| %%-N%% | %%--num-pipelines%%  | (int)  | Max concurret HEAD requests. Default is 8 | | %%-N%% | %%--num-pipelines%%  | (int)  | Max concurret HEAD requests. Default is 8 |
| %%-n%% | %%--num-workers%%    | (int)  | Max concurrent worker threads. Default is 1 | | %%-n%% | %%--num-workers%%    | (int)  | Max concurrent worker threads. Default is 1 |
Line 22: Line 27:
| %%-d%% | %%--download%%      |        | Download matched URLs without parsers | | %%-d%% | %%--download%%      |        | Download matched URLs without parsers |
| %%-c%% | %%--enable-cookies%% |        | Enable automatic cookie handling | | %%-c%% | %%--enable-cookies%% |        | Enable automatic cookie handling |
-|       | %%--spread%%         |         | Spread workers on multiple hosts |+| %%-T%% | %%--type%%           | (string)| Filetype of first URL(s)/stdin |
|        | %%--config%%        | (files) | Extra configuration files (relative or absolute path) | |        | %%--config%%        | (files) | Extra configuration files (relative or absolute path) |
|        | %%--examples%%      |        | Example usage | |        | %%--examples%%      |        | Example usage |
Line 28: Line 33:
|        | %%--proxy%%          | user:pwd@host | Set proxy server accordingly | |        | %%--proxy%%          | user:pwd@host | Set proxy server accordingly |
|        | %%--license%%        |        | Show the Methabot license | |        | %%--license%%        |        | Show the Methabot license |
-|        | %%--type%%          | (str)  | Filetype of first URL(s)/stdin | 
-| %%-v%% | %%--verbose%%        |        | Display status information | 
|        | %%--io-verbose%%    |        | Display network IO information | |        | %%--io-verbose%%    |        | Display network IO information |
-|       | %%--version%%        |        | Output version information |+| %%-v%% | %%--version%%        |        | Output version information |
| %%-C%% | %%--working-dir%%    |        | Change the working directory | | %%-C%% | %%--working-dir%%    |        | Change the working directory |
-| %%-%%% |                      |        | Print debug information | 
| %%-%%  |                      |        | Read from stdin | | %%-%%  |                      |        | Read from stdin |
 
 
option_reference.txt · Last modified: 2009/02/16 13:23 by pajlada