Amazon Kinesis Data Analytics SQL Reference (original) (raw)

VAR_POP

Returns the population variance of a non-null set of numbers (nulls being ignored)

VAR_POP uses the following calculation:

In other words, for a given set of non-null values, using S1 as the sum of the values and S2 as the sum of the squares of the values, VAR_POP returns the result (S2-S1*S1/N)/N.

When you use VAR_POP, be aware of the following:

Syntax

 VAR_POP ( [DISTINCT | ALL] number-expression )

Parameters

ALL

Includes duplicate values in the input set. ALL is the default.

DISTINCT

Excludes duplicate values in the input set.

Examples

Example Dataset

The examples following are based on the sample stock dataset that is part of the Getting Started Exercise in the_Amazon Kinesis Analytics Developer Guide_. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see Getting Started in the Amazon Kinesis Analytics Developer Guide.

The sample stock dataset has the schema following.


(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
       

Example 1: Determine the population variance in a column in a tumbling window query

The following example demonstrates how to use the VARPOP function to determine the population variance of the values in a tumbling window of the PRICE column of the example dataset. DISTINCT is not specified, so duplicate values are included in the calculation.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), var_pop_price REAL);

CREATE OR REPLACE  PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, VAR_POP(price) AS var_pop_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, STEP(("SOURCE_SQL_STREAM_001".ROWTIME) BY INTERVAL '60' SECOND); 
      

Using FLOOR

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), var_pop_price REAL);

CREATE OR REPLACE  PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, VAR_POP(price) AS var_pop_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR(("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00') SECOND / 10 TO SECOND);  
      

Results

The preceding examples output a stream similar to the following:

Table showing ROWTIME, TICKER_SYMBOL, and VAR_POP_PRICE columns with sample data entries.

Example 2: Determine the population variance of the values in a columm in a sliding window query

The following example demonstrates how to use the VARPOP function to determine the population variance of the values in a sliding window of the PRICE column of the example dataset. DISTINCT is not specified, so duplicate values are included in the calculation.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), var_pop_price REAL);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, VAR_POP(price) OVER TEN_SECOND_SLIDING_WINDOW AS var_pop_price
FROM "SOURCE_SQL_STREAM_001"
 
WINDOW TEN_SECOND_SLIDING_WINDOW AS (
  PARTITION BY ticker_symbol
  RANGE INTERVAL '10' SECOND PRECEDING);
      

The preceding example outputs a stream similar to the following:

Table showing ROWTIME, TICKER_SYMBOL, and VAR_POP_PRICE columns with sample data entries.

See Also