Aggregate Functions

This section provides reference information for the aggregate functions in PlaidCloud Lakehouse.


ANY_VALUE

Learn how to use the ANY_VALUE aggregate function in PlaidCloud Lakehouse. Returns any arbitrary value from a group of rows - with syntax and examples.

APPROX_COUNT_DISTINCT

Learn how to use the APPROX_COUNT_DISTINCT aggregate function in PlaidCloud Lakehouse. Returns an approximate count of distinct values using HyperLogLog.

APPROX_TOP_K

Learn how to use the APPROX_TOP_K aggregate function in PlaidCloud Lakehouse. Returns the top-k most frequent values and their approximate counts.

AVG

Learn how to use the AVG aggregate function in PlaidCloud Lakehouse. Returns the average value of a numeric column - see syntax, examples, and output.

BITMAP

Learn how to use the BITMAP aggregate function in PlaidCloud Lakehouse. Returns a bitmap union of a set of values. Typically used with BITMAP_AGG.

BOOL_OR

Learn how to use the BOOL_OR aggregate function in PlaidCloud Lakehouse. Returns TRUE if any value in the group is TRUE - see syntax, examples, and output.

CORR

Learn how to use the CORR aggregate function in PlaidCloud Lakehouse. Returns the Pearson correlation coefficient between two expressions.

COUNT

Learn how to use the COUNT aggregate function in PlaidCloud Lakehouse. Returns the number of rows or non-NULL values - see syntax, examples, and output.

COUNT_IF

Learn how to use the COUNT_IF aggregate function in PlaidCloud Lakehouse. Returns the number of rows for which the expression is TRUE.

COVAR_POP

Learn how to use the COVAR_POP aggregate function in PlaidCloud Lakehouse. Returns the population covariance of two expressions - with syntax and examples.

COVAR_SAMP

Learn how to use the COVAR_SAMP aggregate function in PlaidCloud Lakehouse. Returns the sample covariance of two expressions - with syntax and examples.

DS_HLL_ACCUMULATE

Use the DS_HLL_ACCUMULATE aggregate function in PlaidCloud Lakehouse. Accumulates values into a DataSketches HLL sketch for approximate distinct counting.

DS_HLL_COMBINE

Learn how to use the DS_HLL_COMBINE aggregate function in PlaidCloud Lakehouse. Combines multiple DataSketches HLL sketches into a single sketch.

DS_HLL_COUNT_DISTINCT

Use the DS_HLL_COUNT_DISTINCT aggregate function in PlaidCloud Lakehouse. Returns an approximate distinct count using DataSketches HLL algorithm. More.

DS_HLL_ESTIMATE

Learn how to use the DS_HLL_ESTIMATE aggregate function in PlaidCloud Lakehouse. Estimates the cardinality from a DataSketches HLL sketch.

DS_THETA_COUNT_DISTINCT

Use the DS_THETA_COUNT_DISTINCT aggregate function in PlaidCloud Lakehouse. Returns an approximate distinct count using DataSketches Theta algorithm. Supports.

GROUP_CONCAT

Learn how to use the GROUP_CONCAT aggregate function in PlaidCloud Lakehouse. Concatenates values from a group into a single string with a separator.

GROUPING

Use the GROUPING aggregate function in PlaidCloud Lakehouse. Indicates whether a specified column in a GROUP BY clause is aggregated. Returns 1 if aggregated,.

GROUPING_ID

Learn how to use the GROUPING_ID aggregate function in PlaidCloud Lakehouse. Returns a bitmask corresponding to the grouping of columns.

HLL_RAW_AGG

Learn how to use the HLL_RAW_AGG aggregate function in PlaidCloud Lakehouse. Aggregates HLL values into a single HLL value - with syntax and examples.

HLL_UNION

Learn how to use the HLL_UNION aggregate function in PlaidCloud Lakehouse. Returns the union of multiple HLL values - see syntax, examples, and output.

HLL_UNION_AGG

Learn how to use the HLL_UNION_AGG aggregate function in PlaidCloud Lakehouse. Aggregates HLL values by computing the union - with syntax and examples.

MANN_WHITNEY_U_TEST

Learn how to use the MANN_WHITNEY_U_TEST aggregate function in PlaidCloud Lakehouse. Performs a Mann-Whitney U test on two independent samples.

MAX

Learn how to use the MAX aggregate function in PlaidCloud Lakehouse. Returns the maximum value in a set of values - see syntax, examples, and output.

MAX_BY

Learn how to use the MAX_BY aggregate function in PlaidCloud Lakehouse. Returns the value of one column associated with the maximum value of another column.

MIN

Learn how to use the MIN aggregate function in PlaidCloud Lakehouse. Returns the minimum value in a set of values - see syntax, examples, and output.

MIN_BY

Learn how to use the MIN_BY aggregate function in PlaidCloud Lakehouse. Returns the value of one column associated with the minimum value of another column.

MULTI_DISTINCT_COUNT

Learn how to use the MULTI_DISTINCT_COUNT aggregate function in PlaidCloud Lakehouse. Returns the count of distinct values. Equivalent to COUNT(DISTINCT).

MULTI_DISTINCT_SUM

Learn how to use the MULTI_DISTINCT_SUM aggregate function in PlaidCloud Lakehouse. Returns the sum of distinct values - see syntax, examples, and output.

PERCENTILE_APPROX

Learn how to use the PERCENTILE_APPROX aggregate function in PlaidCloud Lakehouse. Returns an approximate percentile value using the t-digest algorithm.

PERCENTILE_APPROX_WEIGHT

Learn how to use the PERCENTILE_APPROX_WEIGHT aggregate function in PlaidCloud Lakehouse. Returns a weighted approximate percentile value.

PERCENTILE_CONT

Learn how to use the PERCENTILE_CONT aggregate function in PlaidCloud Lakehouse. Returns an interpolated percentile value based on a continuous distribution.

PERCENTILE_DISC

Use the PERCENTILE_DISC aggregate function in PlaidCloud Lakehouse. Returns the smallest value whose cumulative distribution is >= the specified percentile.

PERCENTILE_DISC_LC

Learn how to use the PERCENTILE_DISC_LC aggregate function in PlaidCloud Lakehouse. Returns the percentile value using a low-cardinality optimized algorithm.

RETENTION

Learn how to use the RETENTION aggregate function in PlaidCloud Lakehouse. Calculates retention for a set of events. Returns an array of 0s and 1s.

STD

Learn how to use the STD aggregate function in PlaidCloud Lakehouse. Returns the population standard deviation. Alias for STDDEV_POP.

STDDEV

Learn how to use the STDDEV aggregate function in PlaidCloud Lakehouse. Returns the population standard deviation. Alias for STDDEV_POP.

STDDEV_POP

Learn how to use the STDDEV_POP aggregate function in PlaidCloud Lakehouse. Returns the population standard deviation - see syntax, examples, and output.

STDDEV_SAMP

Learn how to use the STDDEV_SAMP aggregate function in PlaidCloud Lakehouse. Returns the sample standard deviation - see syntax, examples, and output.

SUM

Learn how to use the SUM aggregate function in PlaidCloud Lakehouse. Returns the sum of all values in a group - see syntax, examples, and output.

SUM_MAP

Learn how to use the SUM_MAP aggregate function in PlaidCloud Lakehouse. Sums values grouped by keys in map columns - see syntax, examples, and output.

VAR_POP

Learn how to use the VAR_POP aggregate function in PlaidCloud Lakehouse. Returns the population variance. Alias for VARIANCE_POP - with syntax and examples.

VAR_SAMP

Learn how to use the VAR_SAMP aggregate function in PlaidCloud Lakehouse. Returns the sample variance. Alias for VARIANCE_SAMP - with syntax and examples.

VARIANCE

Learn how to use the VARIANCE aggregate function in PlaidCloud Lakehouse. Returns the population variance. Alias for VARIANCE_POP - with syntax and examples.

VARIANCE_POP

Learn how to use the VARIANCE_POP aggregate function in PlaidCloud Lakehouse. Returns the population variance - see syntax, examples, and output.

VARIANCE_SAMP

Learn how to use the VARIANCE_SAMP aggregate function in PlaidCloud Lakehouse. Returns the sample variance - see syntax, examples, and output.

WINDOW_FUNNEL

Use the WINDOW_FUNNEL aggregate function in PlaidCloud Lakehouse. Searches for event chains in a time-ordered sequence and returns the maximum chain length.