pyarrow.dataset.Expression — Apache Arrow v20.0.0 (original) (raw)
class pyarrow.dataset.Expression#
Bases: _Weakrefable
A logical expression to be evaluated against some input.
To create an expression:
- Use the factory function
pyarrow.compute.scalar()
to create a scalar (not necessary when combined, see example below). - Use the factory function
pyarrow.compute.field()
to reference a field (column in table). - Compare fields and scalars with
<
,<=
,==
,>=
,>
. - Combine expressions using python operators
&
(logical and),|
(logical or) and~
(logical not). Note: python keywordsand
,or
andnot
cannot be used to combine expressions. - Create expression predicates using Expression methods such as
pyarrow.compute.Expression.isin()
.
Examples
import pyarrow.compute as pc (pc.field("a") < pc.scalar(3)) | (pc.field("b") > 7) <pyarrow.compute.Expression ((a < 3) or (b > 7))> pc.field('a') != 3 <pyarrow.compute.Expression (a != 3)> pc.field('a').isin([1, 2, 3]) <pyarrow.compute.Expression is_in(a, {value_set=int64:[ 1, 2, 3 ], null_matching_behavior=MATCH})>
__init__(*args, **kwargs)#
Methods
cast(self, type=None, safe=None, options=None)#
Explicitly set or change the expression’s data type.
This creates a new expression equivalent to calling thecast compute function on this expression.
Parameters:
typeDataType
, default None
Type to cast array to.
Whether to check for conversion errors such as overflow.
optionsCastOptions
, default None
Additional checks pass by CastOptions
Returns:
castExpression
equals(self, Expression other)#
Parameters:
otherpyarrow.dataset.Expression
Returns:
static from_substrait(message)#
Deserialize an expression from Substrait
The serialized message must be an ExtendedExpression message that has only a single expression. The name of the expression and the schema the expression was bound to will be ignored. Use pyarrow.substrait.deserialize_expressions if this information is needed or if the message might contain multiple expressions.
Parameters:
messagebytes or Buffer
or a
protobuf
Message
The Substrait message to deserialize
Returns:
The deserialized expression
is_nan(self)#
Check whether the expression is NaN.
This creates a new expression equivalent to calling theis_nan compute function on this expression.
Returns:
is_nanExpression
is_null(self, bool nan_is_null=False)#
Check whether the expression is null.
This creates a new expression equivalent to calling theis_null compute function on this expression.
Parameters:
nan_is_nullbool, default False
Whether floating-point NaNs are considered null.
Returns:
is_nullExpression
is_valid(self)#
Check whether the expression is not-null (valid).
This creates a new expression equivalent to calling theis_valid compute function on this expression.
Returns:
is_validExpression
isin(self, values)#
Check whether the expression is contained in values.
This creates a new expression equivalent to calling theis_in compute function on this expression.
Parameters:
The values to check for.
Returns:
isinExpression
A new expression that, when evaluated, checks whether this expression’s value is contained in values.
to_substrait(self, Schema schema, bool allow_arrow_extensions=False)#
Serialize the expression using Substrait
The expression will be serialized as an ExtendedExpression message that has a single expression named “expression”
Parameters:
schemaSchema
The input schema the expression will be bound to
allow_arrow_extensionsbool, default False
If False then only functions that are part of the core Substrait function definitions will be allowed. Set this to True to allow pyarrow-specific functions but the result may not be accepted by other compute libraries.
Returns:
Buffer
A buffer containing the serialized Protobuf plan.