http NSE Library — Nmap Scripting Engine documentation (original) (raw)
Implements the HTTP client protocol in a standard form that Nmap scripts can take advantage of.
Because HTTP has so many uses, there are a number of interfaces to this library.
The most obvious and common ones are simply get
,post
, and head
; or, if more control is required,generic_request
can be used. These functions take host and port as their main parameters and they do what one would expect. Theget_url
helper function can be used to parse and retrieve a full URL.
HTTPS support is transparent. The library uses comm.tryssl
to determine whether SSL is required for a request.
These functions return a table of values, including:
status-line
- A string representing the status, such as "HTTP/1.1 200 OK", followed by a newline. In case of an error, a description will be provided in this line.status
- The HTTP status value; for example, "200". If an error occurs during a request, then this value is going to be nil.version
- HTTP protocol version string, as stated in the status line. Example: "1.1"header
- An associative array representing the header. Keys are all lowercase, and standard headers, such as 'date', 'content-length', etc. will typically be present.rawheader
- A numbered array of the headers, exactly as the server sent them. While header['content-type'] might be 'text/html', rawheader[3] might be 'Content-type: text/html'.cookies
- A numbered array of the cookies the server sent. Each cookie is a table with the expected keys, such asname
,value
,path
,domain
, andexpires
. This table can be sent to the server in subsequent responses in theoptions
table to any function (see below).rawbody
- The full body, as returned by the server. Chunked transfer encoding is handled transparently.body
- The full body, after processing the Content-Encoding header, if any. The Content-Encoding and Content-Length headers are adjusted to stay consistent with the processed body.incomplete
- Partially received response object, in case of an error.truncated
- A flag to indicate that the body has been truncateddecoded
- A list of processed named content encodings (like "identity" or "gzip")undecoded
- A list of named content encodings that could not be processed (due to lack of support or the body being corrupted for a given encoding). A body has been successfully decoded if this list is empty (or nil, if no encodings were used in the first place).location
- A numbered array of the locations of redirects that were followed.
Many of the functions optionally allow an "options" input table, which can modify the HTTP request or its processing in many ways like adding headers or setting the timeout. The following are valid keys in "options" (note: not all options will necessarily affect every function):
timeout
: A timeout used for socket operations.header
: A table containing additional headers to be used for the request. For example,options['header']['Content-Type'] = 'text/xml'
content
: The content of the message. This can be either a string, which will be directly added as the body of the message, or a table, which will have each key=value pair added (like a normal POST request). (A corresponding Content-Length header will be added automatically. Set header['Content-Length'] to override it).cookies
: A list of cookies as either a string, which will be directly sent, or a table. If it's a table, the following fields are recognized:name
,value
andpath
. Onlyname
andvalue
fields are required.auth
: A table containing the keysusername
andpassword
, which will be used for HTTP Basic authentication. If a server requires HTTP Digest authentication, then there must also be a keydigest
, with valuetrue
. If a server requires NTLM authentication, then there must also be a keyntlm
, with valuetrue
.bypass_cache
: Do not perform a lookup in the local HTTP cache.no_cache
: Do not save the result of this request to the local HTTP cache.no_cache_body
: Do not save the body of the response to the local HTTP cache.max_body_size
: Limit the received body to specific number of bytes. Overrides script argumenthttp.max-body-size
. See the script argument for details.truncated_ok
: Do not treat oversized body as error. Overrides script argumenthttp.truncated-ok
.any_af
: Allow connecting to any address family, inet or inet6. By default, these functions will only use the same AF as nmap.address_family to resolve names. (This option is a straight pass-thru tocomm.lua
functions.)redirect_ok
: Closure that overrides the default redirect_ok used to validate whether to follow HTTP redirects or not. False, if no HTTP redirects should be followed. Alternatively, a number may be passed to change the number of redirects to follow. The following example shows how to write a custom closure that follows 5 consecutive redirects, without the safety checks in the default redirect_ok:
redirect_ok = function(host,port)
local c = 5
return function(url)
if ( c==0 ) then return false end
c = c - 1
return true
end
end
If a script is planning on making a lot of requests, the pipelining functions can be helpful. pipeline_add
queues requests in a table, andpipeline_go
performs the requests, returning the results as an array, with the responses in the same order as the requests were added. As a simple example:
-- Start by defining the 'all' variable as nil local all = nil
-- Add two GET requests and one HEAD to the queue but these requests are -- not performed yet. The second parameter represents the "options" table -- (which we don't need in this example). all = http.pipeline_add('/book', nil, all) all = http.pipeline_add('/test', nil, all) all = http.pipeline_add('/monkeys', nil, all, 'HEAD')
-- Perform all three requests as parallel as Nmap is able to local results = http.pipeline_go('nmap.org', 80, all)
At this point, results
is an array with three elements. Each element is a table containing the HTTP result, as discussed above.
One more interface provided by the HTTP library helps scripts determine whether or not a page exists. The identify_404
function will try several URLs on the server to determine what the server's 404 pages look like. It will attempt to identify customized 404 pages that may not return the actual status code 404. If successful, the functionpage_exists
can then be used to determine whether or not a page exists.
Some other miscellaneous functions that can come in handy areresponse_contains
, can_use_head
, andsave_path
. See the appropriate documentation for details.
Source: https://svn.nmap.org/nmap/nselib/http.lua
Script Arguments
http.useragent
The value of the User-Agent header field sent with requests. By default it is"Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)"
. A value of the empty string disables sending the User-Agent header field.
http.host
The value to use in the Host header of all requests unless otherwise set. By default, the Host header uses the output ofstdnse.get_hostname()
.
http.max-body-size
Limit the received body to specific number of bytes. An oversized body results in an error unless script argumenthttp.truncated-ok
or request optiontruncated_ok
is set to true. The default is 2097152 (2MB). Use value -1 to disable the limit altogether. This argument can be overridden case-by-case with request option max_body_size
.
http.pipeline
If set, it represents the number of HTTP requests that'll be sent on one connection. This can be set low to make debugging easier, or it can be set high to test how a server reacts (its chosen max is ignored).
http.max-cache-size
The maximum memory size (in bytes) of the cache.
http.max-pipeline
If set, it represents the number of outstanding HTTP requests that should be sent together in a single burst. Defaults tohttp.pipeline
(if set), or to what functionget_pipeline_limit
returns.
http.truncated-ok
Do not treat oversized body as error. (Use response object flag truncated
to check if the returned body has been truncated.) This argument can be overridden case-by-case with request optiontruncated_ok
.
Functions
can_use_head (host, port, result_404, path)
Determine whether or not the server supports HEAD.
clean_404 (body)
Try to remove anything that might change within a 404.
generic_request (host, port, method, path, options)
Do a single request with a given method. The response is returned as the standard response table (see the module documentation).
get (host, port, path, options)
Fetches a resource with a GET request and returns the result as a table.
get_status_string (data)
Take the data returned from a HTTP request and return the status string. Useful for stdnse.debug
messages and even advanced output.
get_url (u, options)
Parses a URL and calls http.get
with the result. The URL can contain all the standard fields, protocol://host:port/path
grab_forms (body)
Finds forms in html code
head (host, port, path, options)
Fetches a resource with a HEAD request.
identify_404 (host, port)
Try requesting a non-existent file to determine how the server responds to unknown pages ("404 pages")
page_exists (data, result_404, known_404, page, displayall)
Determine whether or not the page that was returned is a 404 page.
parse_date (s)
Parses an HTTP date string
parse_form (form)
Parses a form, that is, finds its action and fields.
parse_redirect (host, port, path, response)
Handles a HTTP redirect
Parses the WWW-Authenticate header as described in RFC 2616, section 14.47 and RFC 2617, section 1.2.
pipeline_add (path, options, all_requests, method)
Adds a pending request to the HTTP pipeline.
pipeline_go (host, port, all_requests)
Performs all queued requests in the all_requests variable (created by thepipeline_add
function).
post (host, port, path, options, ignored, postdata)
Fetches a resource with a POST request.
put (host, port, path, options, putdata)
Uploads a file using the PUT method and returns a result table. This is a simple wrapper around generic_request
redirect_ok (host, port, counter)
Provides the default behavior for HTTP redirects.
response_contains (response, pattern, case_sensitive)
Check if the response variable contains the given text.
save_path (host, port, path, status, links_to, linked_from, contenttype)
This function should be called whenever a valid path (a path that doesn't contain a known 404 page) is discovered.
tag_pattern (tag, endtag)
Create a pattern to find a tag
Functions
can_use_head (host, port, result_404, path)
Determine whether or not the server supports HEAD.
Tests by requesting / and verifying that it returns 200, and doesn't return data. We implement the check like this because can't always rely on OPTIONS to tell the truth.
Note: If identify_404
returns a 200 status, HEAD requests should be disabled. Sometimes, servers use a 200 status code with a message explaining that the page wasn't found. In this case, to actually identify a 404 page, we need the full body that a HEAD request doesn't supply. This is determined automatically if the result_404
field is set.
Parameters
host
The host object.
port
The port to use.
result_404
[optional] The result when an unknown page is requested. This is returned by identify_404
. If the 404 page returns a 200 code, then we disable HEAD requests.
path
The path to request; by default, / is used.
Return values:
- A boolean value: true if HEAD is usable, false otherwise.
- If HEAD is usable, the result of the HEAD request is returned (so potentially, a script can avoid an extra call to HEAD)
clean_404 (body)
Try to remove anything that might change within a 404.
For example:
- A file path (includes URI)
- A time
- A date
- An execution time (numbers in general, really)
The intention is that two 404 pages from different URIs and taken hours apart should, whenever possible, look the same.
During this function, we're likely going to over-trim things. This is fine -- we want enough to match on that it'll a) be unique, and b) have the best chance of not changing. Even if we remove bits and pieces from the file, as long as it isn't a significant amount, it'll remain unique.
One case this doesn't cover is if the server generates a random haiku for the user.
Parameters
body
The body of the page.
generic_request (host, port, method, path, options)
Do a single request with a given method. The response is returned as the standard response table (see the module documentation).
The get
, head
, and post
functions are simple wrappers around generic_request
.
Any 1XX (informational) responses are discarded.
Parameters
host
The host to connect to.
port
The port to connect to.
method
The method to use; for example, 'GET', 'HEAD', etc.
path
The path to retrieve.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
Return value:
A response table, see module documentation for description.
See also:
- request
get (host, port, path, options)
Fetches a resource with a GET request and returns the result as a table.
This is a simple wrapper around generic_request
, with the added benefit of having local caching and support for HTTP redirects. Redirects are followed only if they pass all the validation rules of the redirect_ok function. This function may be overridden by supplying a custom function in the redirect_ok
field of the options array. The default function redirects the request if the destination is:
- Within the same host or domain
- Has the same port number
- Stays within the current scheme
- Does not exceed
MAX_REDIRECT_COUNT
count of redirects
Caching and redirects can be controlled in the options
array, see module documentation for more information.
Parameters
host
The host to connect to.
port
The port to connect to.
path
The path to retrieve.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
Return value:
A response table, see module documentation for description.
See also:
get_status_string (data)
Take the data returned from a HTTP request and return the status string. Useful for stdnse.debug
messages and even advanced output.
Parameters
data
The response table from any HTTP request
Return value:
The best status string we could find: either the actual status string, the status code, or "<unknown status>"
.
get_url (u, options)
Parses a URL and calls http.get
with the result. The URL can contain all the standard fields, protocol://host:port/path
Parameters
u
The URL of the host.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
Return value:
A response table, see module documentation for description.
See also:
grab_forms (body)
Finds forms in html code
returns table of found forms, in plaintext.
Parameters
body
A response.body
in which to search for forms
Return value:
A list of forms.
head (host, port, path, options)
Fetches a resource with a HEAD request.
Like get
, this is a simple wrapper aroundgeneric_request
with response caching. This function also has support for HTTP redirects. Redirects are followed only if they pass all the validation rules of the redirect_ok function. This function may be overridden by supplying a custom function in the redirect_ok
field of the options array. The default function redirects the request if the destination is:
- Within the same host or domain
- Has the same port number
- Stays within the current scheme
- Does not exceed
MAX_REDIRECT_COUNT
count of redirects
Caching and redirects can be controlled in the options
array, see module documentation for more information.
Parameters
host
The host to connect to.
port
The port to connect to.
path
The path to retrieve.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
Return value:
A response table, see module documentation for description.
See also:
identify_404 (host, port)
Try requesting a non-existent file to determine how the server responds to unknown pages ("404 pages")
This tells us
- what to expect when a non-existent page is requested, and
- if the server will be impossible to scan.
If the server responds with a 404 status code, as it is supposed to, then this function simply returns 404. If it contains one of a series of common status codes, including unauthorized, moved, and others, it is returned like a 404.
I (Ron Bowes) have observed one host that responds differently for three scenarios:
- A non-existent page, all lowercase (a login page)
- A non-existent page, with uppercase (a weird error page that says, "Filesystem is corrupt.")
- A page in a non-existent directory (a login page with different font colours)
As a result, I've devised three different 404 tests, one to check each of these conditions. They all have to match, the tests can proceed; if any of them are different, we can't check 404s properly.
Parameters
host
The host object.
port
The port to which we are establishing the connection.
Return values:
- status Did we succeed?
- result If status is false, result is an error message. Otherwise, it's the code to expect (typically, but not necessarily, '404').
- body Body is a hash of the cleaned-up body that can be used when detecting a 404 page that doesn't return a 404 error code.
page_exists (data, result_404, known_404, page, displayall)
Determine whether or not the page that was returned is a 404 page.
This is actually a pretty simple function, but it's best to keep this logic close to identify_404
, since they will generally be used together.
Parameters
data
The data returned by the HTTP request
result_404
The status code to expect for non-existent pages. This is returned by identify_404
.
known_404
The 404 page itself, if result_404
is 200. Ifresult_404
is something else, this parameter is ignored and can be set to nil
. This is returned by identify_404
.
page
The page being requested (used in error messages).
displayall
[optional] If set to true, don't exclude non-404 errors (such as 500).
Return value:
A boolean value: true if the page appears to exist, and false if it does not.
parse_date (s)
Parses an HTTP date string
Supports any of the following formats from section 3.3.1 of RFC 2616:
- Sun, 06 Nov 1994 08:49:37 GMT (RFC 822, updated by RFC 1123)
- Sunday, 06-Nov-94 08:49:37 GMT (RFC 850, obsoleted by RFC 1036)
- Sun Nov 6 08:49:37 1994 (ANSI C's
asctime()
format)
Parameters
s
the date string.
Return value:
a table with keys year
, month
,day
, hour
, min
, sec
, andisdst
, relative to GMT, suitable for input toos.time
.
parse_form (form)
Parses a form, that is, finds its action and fields.
Parameters
form
A plaintext representation of form
Return value:
A dictionary with keys: action
,method
if one is specified, fields
which is a list of fields found in the form each of which has aname
attribute and type
if specified.
parse_redirect (host, port, path, response)
Handles a HTTP redirect
Parameters
host
table as received by the script action function
port
table as received by the script action function
path
string
response
table as returned by http.get or http.head
Return value:
url table as returned by url.parse
or nil if there's no redirect taking place
parse_www_authenticate (s)
Parses the WWW-Authenticate header as described in RFC 2616, section 14.47 and RFC 2617, section 1.2.
The return value is an array of challenges. Each challenge is a table with the keys scheme
and params
.
Parameters
s
The header value text.
Return value:
An array of challenges, or nil
on error.
pipeline_add (path, options, all_requests, method)
Adds a pending request to the HTTP pipeline.
The HTTP pipeline is a set of requests that will all be sent at the same time, or as close as the server allows. This allows more efficient code, since requests are automatically buffered and sent simultaneously.
The all_requests
argument contains the current list of queued requests (if this is the first time calling pipeline_add
, it should be nil
). After adding the request to end of the queue, the queue is returned and can be passed to the nextpipeline_add
call.
When all requests have been queued, call pipeline_go
with the all_requests table that has been built.
Parameters
path
The path to retrieve.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
all_requests
[optional] The current pipeline queue (returned from a previous add_pipeline
call), or nil if it's the first call.
method
[optional] The HTTP method ('GET', 'HEAD', 'POST', etc). Default: 'GET'.
Return value:
Table with the pipeline requests (plus this new one)
See also:
pipeline_go (host, port, all_requests)
Performs all queued requests in the all_requests variable (created by thepipeline_add
function).
Returns an array of responses, each of which is a table as defined in the module documentation above.
Parameters
host
The host to connect to.
port
The port to connect to.
all_requests
A table with all the previously built pipeline requests
Return value:
A list of responses, in the same order as the requests were queued. Each response is a table as described in the module documentation. The response list may be either nil or shorter than expected (up to and including being completely empty) due to communication issues or other errors.
post (host, port, path, options, ignored, postdata)
Fetches a resource with a POST request.
Like get
, this is a simple wrapper aroundgeneric_request
except that postdata is handled properly.
Parameters
host
The host to connect to.
port
The port to connect to.
path
The path to retrieve.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
ignored
Ignored for backwards compatibility.
postdata
A string or a table of data to be posted. If a table, the keys and values must be strings, and they will be encoded into an application/x-www-form-encoded form submission.
Return value:
A response table, see module documentation for description.
See also:
put (host, port, path, options, putdata)
Uploads a file using the PUT method and returns a result table. This is a simple wrapper around generic_request
Parameters
host
The host to connect to.
port
The port to connect to.
path
The path to retrieve.
options
[optional] A table that lets the caller control socket timeouts, HTTP headers, and other parameters. For full documentation, see the module documentation (above).
putdata
The contents of the file to upload
Return value:
A response table, see module documentation for description.
See also:
redirect_ok (host, port, counter)
Provides the default behavior for HTTP redirects.
Redirects will be followed unless they:
- contain credentials
- are on a different domain or host
- have a different port number or URI scheme
- redirect to the same URI
- exceed the maximum number of redirects specified
Parameters
host
table as received by the action function
port
table as received by the action function
counter
number of redirects to follow.
Return value:
a default closure suitable for option "redirect_ok"
response_contains (response, pattern, case_sensitive)
Check if the response variable contains the given text.
Response variable could be a return from a http.get, http.post, http.pipeline_go, etc. The text can be:
- Part of a header ('content-type', 'text/html', '200 OK', etc)
- An entire header ('Content-type: text/html', 'Content-length: 123', etc)
- Part of the body
The search text is treated as a Lua pattern.
Parameters
response
The full response table from a HTTP request.
pattern
The pattern we're searching for. Don't forget to escape '-', for example, 'Content%-type'. The pattern can also contain captures, like 'abc(.*)def', which will be returned if successful.
case_sensitive
[optional] Set to true
for case-sensitive searches. Default: not case sensitive.
Return values:
- result True if the string matched, false otherwise
- matches An array of captures from the match, if any
save_path (host, port, path, status, links_to, linked_from, contenttype)
This function should be called whenever a valid path (a path that doesn't contain a known 404 page) is discovered.
It will add the path to the registry in several ways, allowing other scripts to take advantage of it in interesting ways.
Parameters
host
The host the path was discovered on (not necessarily the host being scanned).
port
The port the path was discovered on (not necessarily the port being scanned).
path
The path discovered. Calling this more than once with the same path is okay; it'll update the data as much as possible instead of adding a duplicate entry
status
[optional] The status code (200, 404, 500, etc). This can be left off if it isn't known.
links_to
[optional] A table of paths that this page links to.
linked_from
[optional] A table of paths that link to this page.
contenttype
[optional] The content-type value for the path, if it's known.
tag_pattern (tag, endtag)
Create a pattern to find a tag
Case-insensitive search for tags
Parameters
tag
The name of the tag to find
endtag
Boolean true if you are looking for an end tag, otherwise it will look for a start tag
Return value:
A pattern to find the tag