Command Line Syntax for the W3C Mini Robot

W3C Robot MANUAL

Command Line Syntax

The generic syntax is:

	webbot [ options ]  [ docaddress [ keywords ]]

Options

The order of the options is not important and options can in fact be specified on either side of any docaddress. Currently available options are:-

Getting Help

-v [ a | b | c | g | p | s | t | u ]
Verbose mode: Gives a running commentary on the program's attempts to read data in various ways. As the amount of verbose output is substantial, the -v option can now be followed by zero, one or more of the following flags (without space) in order to differentiate the verbose output generated:
  • a: Anchor relevant information
  • b: Bindings to local file system
  • c: Cache trace
  • g: SGML trace
  • p: Protocol module information
  • s: SGML/HTML relevant information
  • t: Thread trace
  • u: URI relevant information

The -v option without any appended options shows all trace messages. An example is

	-vpt

showing thread and protocol trace messages

-version
Prints out the version number of the software, and the version number of the WWW library, and exits.

Configuration Options

-img
Test include inlined images using a HEAD request
-saveimg
Saving the inlined images on local disk or pump them to a black hole. This is primarili to test a true client behavior in the robot
-cache
Enable the libwww persistent cache
-cacheroot [ dir ]
Where should the cache be located? The default is /tmp/w3c-cache
-validate
Force validation using either the etag or the last-modified date provided by the server
-endvalidate
Force end-to-end validation by adding a max-age=0 cache control directive
-l [ file ]
Specifies a log file with a list of visited documents. The default value is "www-log"
-link [ n ]
Fetch all links from this document. By indicating an integer "n" as the parameter you can specify the depth of which the search should go. The default value is 0 which means that only the start page is searched. Level 1 indicates that the start page and all pages directly linked from the start page are searched.
-n
Non-interactive mode. Outputs the formatted document to the standard output, then exits. Pages are delimited with form feed (FF) characters.
-o [ file ]
Redirects output to specified file. The default value is "www-out". This mode forced non-interactive mode
-q
Quit mode. Don't say anything at all
-nopipe
Do not use HTTP/1.1 pipelining. The default for this option can be set using the configure script under installation.
-delay [ n]
Specify the write delay in ms for how long we can wait until we flush the output buffer when using pipelining. The default value is 50 ms. The longer delay, the bigger TCP packets but also longer response time.
-r <file>
Rule file, a.k.a. configuration file. If this is specified, a rule file may be used to map URLs, and to set up other aspects of the behavior of the browser. Many rule files may be given with successive -r options, and a default rule file name may be given using the WWW_CONFIG environment variable.
-ss
Print out date and time for start and stop for the job.
-single
Single threaded mode. If this flag is set then the browser uses blocking, non interruptible I/O in interactive mode. Non-interactive mode always uses blocking I/O.
-timeout <n>
Timeout in seconds on sockets

docaddress

If present, the next argument (docaddress) is the hypertext address , of the document at which you want to start browsing. You may want to define an alias for www followed by name of your favorite index.

keywords

Any further command line arguments are taken as keywords. The first argument must refer to an index in this case. The index is searched for entries matching the keywords, and a list of matching entries is displayed.


Henrik Frystyk, libwww@w3.org,
@(#) $Id: CommandLine.html,v 1.11 1997/02/06 16:33:53 frystyk Exp $