turicreate.Sketch — Turi Create API 6.4.1 documentation (original) (raw)

The Sketch object contains a sketch of a single SArray (a column of an SFrame). Using a sketch representation of an SArray, many approximate and exact statistics can be computed very quickly.

To construct a Sketch object, the following methods are equivalent:

my_sarray = turicreate.SArray([1,2,3,4,5]) sketch = turicreate.Sketch(my_sarray) sketch = my_sarray.summary()

Typically, the SArray is a column of an SFrame:

my_sframe = turicreate.SFrame({'column1': [1,2,3]}) sketch = turicreate.Sketch(my_sframe['column1']) sketch = my_sframe['column1'].summary()

The sketch computation is fast, with complexity approximately linear in the length of the SArray. After the Sketch is computed, all queryable functions are performed nearly instantly.

A sketch can compute the following information depending on the dtype of the SArray:

For numeric columns, the following information is provided exactly:

And the following information is provided approximately:

For non-numeric columns(str), the following information is provided exactly:

And the following information is provided approximately:

For SArray of type list or array, there is a sub sketch for all sub elements. The sub sketch flattens all list/array values and then computes sketch summary over flattened values. Element sub sketch may be retrieved through:

For SArray of type dict, there are sub sketches for both dict key and value. The sub sketch may be retrieved through:

For SArray of type dict, user can also pass in a list of dictionary keys to summary function, this would generate one sub sketch for each key. For example:

sa = turicreate.SArray([{'a':1, 'b':2}, {'a':3}]) sketch = sa.summary(sub_sketch_keys=["a", "b"])

Then the sub summary may be retrieved by:

sketch.element_sub_sketch()

or to get subset keys:

sketch.element_sub_sketch(["a"])

Similarly, for SArray of type vector(array), user can also pass in a list of integers which is the index into the vector to get sub sketch For example:

sa = turicreate.SArray([[100,200,300,400,500], [100,200,300], [400,500]]) sketch = sa.summary(sub_sketch_keys=[1,3,5])

Then the sub summary may be retrieved by:

sketch.element_sub_sketch()

Or:

sketch.element_sub_sketch([1,3])

for subset of keys

Please see the individual function documentation for detail about each of these statistics.

Parameters: array : SArray Array to generate sketch summary. background : boolean If True, the sketch construction will return immediately and the sketch will be constructed in the background. While this is going on, the sketch can be queried incrementally, but at a performance penalty. Defaults to False.

References