Webmirror 2.0, auth

auth and noauth define the username and password to be used to retrieve pages. You can define different usernames and passwords for different part of the retrieval domain.
auth http://www.mydomain.com/* REALM* username:pass word
noauth http://www.mydomain.com/*
The command auth says that the username:pass word authentication string should be used whenever a page URL fits the domain pattern http://www.mydomain.com/* and the authentication REALM fits the pattern REALM*.

The program does not care the realm parameter when the page is first tried to be retrieved because that time no realm is available.

If the retrieval fails and gets error "401 Not Authenticated" then the program checks if there are more auth commands that fit the page URL and the realm. If there is any left it tries to download the page again using the next authentication string.

The command noauth says that the program should try to retrieve the page that fits the domain http://www.mydomain.com/* without any authentication. However this command does not say that the program should not try to retrieve with basic authentication. It only says that it should try first without authentication, and if further auth commands specify authentication string for the page those should be tried.

The authentication string usually has the form username:password. This string is coded with the base64 coding method and sent to the server in the authentication header field. This string can contain spaces.

Note that in case multiple auth commands fit a single page they are taken into accunt in the order as they appear in the RDF file. The program does not learn. Suppose you have the lines

auth http://www.mydom.com/* * peter:tarkabarka
auth http://www.mydom.com/* * sigmund:freud
noauth http://www.mydom.com/*
in the RDF file and the program retrieves the page http://www.mydom.com/index.html

The program first tries to retrieve the page without autentication. If it fails it tries to use the authentication string peter:tarkabarka. If this string does not lead to success it finally tries to use the string sigmund:freud.

When downloading http://www.mydom.com/mydomintro.html the program first tries to retrieve the page again without autentication. If it fails it tries to use the authentication string peter:tarkabarka even though it has failed when trying to retrieve http://www.mydom.com/index.html. If this string does not lead to success it finally tries to use the string sigmund:freud.

In other words th program does not learn from the first retrieval when nor the non authenticated retrieval neither the authentication peter:tarkabarka failed. It tries again all probes without authentication and with authentication in the order they were specified. Except that the first trial is without authentication if noauth is specified for the page and no more than one trial will happen for a page without authentication of couse, no matter how many noauth command pattern fits the page.

TOC