Amazon Kinesis Data Analytics SQL Reference (original) (raw)

VARIABLE_COLUMN_LOG_PARSE

 VARIABLE_COLUMN_LOG_PARSE(
  <character-expression>, <columns>, <delimiter-string>
  [ , <escape-string>, <quote-string> ] )
  <columns> := <number of columns> | <list of columns>
  <number of columns> := <numeric value expression>
  <list of columns> := '<column description>[, ...]'
  <column description> := <identifier> TYPE <data type> [ NOT NULL ]
  <delimiter string> := <character-expression>
  <escape-string> := <character-expression>
  <quote-string> := '<begin quote character> [ <end quote character> ]'

VARIABLE_COLUMN_LOG_PARSE splits an input string (its first argument, ) into fields separated by a delimiter character or delimiter string. Thus it handles comma-separated values or tab-separated values. It can be combined withFIXED_COLUMN_LOG_PARSE to handle something like maillog, where some fields are fixed-length and others are variable-length.

Note

Parsing of binary files is not supported.

The arguments and are optional. Specifying an allows the value of a field to contain an embedded delimiter. As a simple example, if the specified a comma, and the specified a backslash, then an input of "a,b' would be split into two fields "a" and "b", but an input of "a\,b" would result in a single field "a,b".

Since Amazon Kinesis Data Analytics supports Expressions and Literals, a tab can also be a delimiter, specified using a unicode escape, e.g., u&'\0009', which is a string consisting only of a tab character.

Specifying a is another way to hide an embedded delimiter. The should be a one or two character expression: the first is used as the character; the second, if present, is used as the character. If only one character is supplied, it is used as both to begin and to end quoted strings. When the input includes a quoted string, that is, a string enclosed in the characters specified as , then that string appears in one field, even if it contains a delimiter.

Note that the and are single characters and can be different. The can be used to start and end the quoted string, or the can start the quoted string and the used to end that quoted string.

When a list of columns is supplied as the second parameter , the column specifications () for types DATE, TIME, and TIMESTAMP support a format parameter allowing the user to specify exact time component layout. The parser uses the Java class java.lang.SimpleDateFormat to parse the strings for those types. Date and Time Patterns gives a full description of timestamp format strings, with examples. The following is an example of a column definition with a format string:

    "name" TYPE TIMESTAMP 'dd/MMM/yyyy:HH:mm:ss'

By default, the output columns are named COLUMN1, COLUMN2, COLUMN3, etc., each of SQL data type VARCHAR(1024).