Window Functions

Overview

A window function operates on a group ("window") of related rows.

For each input row, a window function returns one output row that depends on the specific row passed to the function and the values of the other rows in the window.

There are two main types of order-sensitive window functions:

  • Rank-related functions: Rank-related functions list information based on the "rank" of a row. For example, ranking stores in descending order by profit per year, the store with the most profit will be ranked 1, and the second-most profitable store will be ranked 2, and so on.

  • Window frame functions: Window frame functions enable you to perform rolling operations, such as calculating a running total or a moving average, on a subset of the rows in the window.

List of Functions that Support Windows

The list below shows all the window functions.

Function NameCategoryWindowWindow FrameNotes
ARRAY_AGGGeneral
AVGGeneral
AVG_IFGeneral
COUNTGeneral
COUNT_IFGeneral
COVAR_POPGeneral
COVAR_SAMPGeneral
MAXGeneral
MAX_IFGeneral
MINGeneral
MIN_IFGeneral
STDDEV_POPGeneral
STDDEV_SAMPGeneral
MEDIANGeneral
QUANTILE_CONTGeneral
QUANTILE_DISCGeneral
KURTOSISGeneral
SKEWNESSGeneral
SUMGeneral
SUM_IFGeneral
CUME_DISTRank-related
PERCENT_RANKRank-related
DENSE_RANKRank-related
RANKRank-related
ROW_NUMBERRank-related
NTILERank-related
FIRST_VALUERank-related
FIRSTRank-related
LAST_VALUERank-related
LASTRank-related
NTH_VALUERank-related
LEADRank-related
LAGRank-related

Window Syntax

<function> ( [ <arguments> ] ) OVER ( { named window | inline window } )

named window ::=
    { window_name | ( window_name ) }

inline window ::=
    [ PARTITION BY <expression_list> ]
    [ ORDER BY <expression_list> ]
    [ window frame ]

The named window is a window that is defined in the WINDOW clause of the SELECT statement, eg: SELECT a, SUM(a) OVER w FROM t WINDOW w AS ( inline window ).

The <function> is one of (aggregate function, rank function, value function).

The OVER clause specifies that the function is being used as a window function.

The PARTITION BY sub-clause allows rows to be grouped into sub-groups, for example by city, by year, etc. The PARTITION BY clause is optional. You can analyze an entire group of rows without breaking it into sub-groups.

The ORDER BY clause orders rows within the window.

The window frame clause specifies the window frame type and the window frame extent. The window frame clause is optional. If you omit the window frame clause, the default window frame type is RANGE and the default window frame extent is UNBOUNDED PRECEDING AND CURRENT ROW.

Window Frame Syntax

window frame can be one of the following types:

cumulativeFrame ::=
    {
       { ROWS | RANGE } BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
     | { ROWS | RANGE } BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    }
slidingFrame ::=
    {
       ROWS BETWEEN <N> { PRECEDING | FOLLOWING } AND <N> { PRECEDING | FOLLOWING }
     | ROWS BETWEEN UNBOUNDED PRECEDING AND <N> { PRECEDING | FOLLOWING }
     | ROWS BETWEEN <N> { PRECEDING | FOLLOWING } AND UNBOUNDED FOLLOWING
    }

SQL Examples

Create the table

CREATE TABLE employees (
  employee_id INT,
  first_name VARCHAR,
  last_name VARCHAR,
  department VARCHAR,
  salary INT
);

Insert data

INSERT INTO employees (employee_id, first_name, last_name, department, salary) VALUES
  (1, 'John', 'Doe', 'IT', 75000),
  (2, 'Jane', 'Smith', 'HR', 85000),
  (3, 'Mike', 'Johnson', 'IT', 90000),
  (4, 'Sara', 'Williams', 'Sales', 60000),
  (5, 'Tom', 'Brown', 'HR', 82000),
  (6, 'Ava', 'Davis', 'Sales', 62000),
  (7, 'Olivia', 'Taylor', 'IT', 72000),
  (8, 'Emily', 'Anderson', 'HR', 77000),
  (9, 'Sophia', 'Lee', 'Sales', 58000),
  (10, 'Ella', 'Thomas', 'IT', 67000);

Example 1: Ranking employees by salary

In this example, we use the RANK() function to rank employees based on their salaries in descending order. The highest salary will get a rank of 1, and the lowest salary will get the highest rank number.

SELECT employee_id, first_name, last_name, department, salary, RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees;

Result:

employee_idfirst_namelast_namedepartmentsalaryrank
3MikeJohnsonIT900001
2JaneSmithHR850002
5TomBrownHR820003
8EmilyAndersonHR770004
1JohnDoeIT750005
7OliviaTaylorIT720006
10EllaThomasIT670007
6AvaDavisSales620008
4SaraWilliamsSales600009
9SophiaLeeSales5800010

Example 2: Calculating the total salary per department

In this example, we use the SUM() function with PARTITION BY to calculate the total salary paid per department. Each row will show the department and the total salary for that department.

SELECT department, SUM(salary) OVER (PARTITION BY department) AS total_salary
FROM employees;

Result:

departmenttotal_salary
HR244000
HR244000
HR244000
IT304000
IT304000
IT304000
IT304000
Sales180000
Sales180000
Sales180000

Example 3: Calculating a running total of salaries per department

In this example, we use the SUM() function with a cumulative window frame to calculate a running total of salaries within each department. The running total is calculated based on the employee's salary ordered by their employee_id.

SELECT employee_id, first_name, last_name, department, salary, 
       SUM(salary) OVER (PARTITION BY department ORDER BY employee_id
                         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM employees;

Result:

employee_idfirst_namelast_namedepartmentsalaryrunning_total
2JaneSmithHR8500085000
5TomBrownHR82000167000
8EmilyAndersonHR77000244000
1JohnDoeIT7500075000
3MikeJohnsonIT90000165000
7OliviaTaylorIT72000237000
10EllaThomasIT67000304000
4SaraWilliamsSales6000060000
6AvaDavisSales62000122000
9SophiaLeeSales58000180000

CUME_DIST

Learn how to use the CUME_DIST window function in PlaidCloud Lakehouse. Returns the cumulative distribution of a given value in a set of values.

DENSE_RANK

Learn how to use the DENSE_RANK window function in PlaidCloud Lakehouse. Returns the rank of a value within a group of values, without gaps in the ranks.

FIRST

Learn how to use FIRST in PlaidCloud Lakehouse. Alias for the FIRST_VALUE window function. Includes detailed syntax, examples, and usage reference.

FIRST_VALUE

Learn how to use the FIRST_VALUE window function in PlaidCloud Lakehouse. Returns the first value from an ordered group of values. Includes syntax and examples.

LAG

Learn how to use the LAG window function in PlaidCloud Lakehouse. LAG allows you to access the value of a column from a preceding row within the same result...

LAST

Learn how to use LAST in PlaidCloud Lakehouse. Alias for the LAST_VALUE window function. Includes detailed syntax, examples, and usage reference.

LAST_VALUE

Learn how to use the LAST_VALUE window function in PlaidCloud Lakehouse. Returns the last value from an ordered group of values. Includes syntax and examples.

LEAD

Learn how to use the LEAD window function in PlaidCloud Lakehouse. LEAD allows you to access the value of a column from a subsequent row within the same...

NTH_VALUE

Learn how to use the NTH_VALUE window function in PlaidCloud Lakehouse. Returns the Nth value from an ordered group of values. Includes syntax and examples.

NTILE

Learn how to use the NTILE window function in PlaidCloud Lakehouse. Divides the sorted result set into a specified number of buckets or groups.

PERCENT_RANK

Learn how to use the PERCENT_RANK window function in PlaidCloud Lakehouse. Returns the relative rank of a given value within a set of values.

RANK

Learn how to use the RANK window function in PlaidCloud Lakehouse. Assigns a unique rank to each value within an ordered group of values. Includes syntax...

ROW_NUMBER

Learn how to use the ROW_NUMBER window function in PlaidCloud Lakehouse. Assigns a temporary sequential number to each row within a partition of a result...