LEVEL 04

Window Functions

The skill that separates intermediate from senior SQL. Aggregates without collapsing rows — used daily in every DE role.

Key difference: GROUP BY collapses rows into one per group. Window functions compute across rows while keeping all original rows intact.

Syntax anatomy

function_name() OVER (
    PARTITION BY col   -- split into groups (optional)
    ORDER BY     col   -- order within each group
    ROWS BETWEEN ...   -- define the frame (optional)
)

-- Example: branch total alongside every row
SELECT txn_id, branch, amount,
       SUM(amount) OVER (PARTITION BY branch) AS branch_total
FROM   transactions;

ROW_NUMBER, RANK, DENSE_RANK

FunctionTies behaviourExample output for ties
ROW_NUMBER()Always unique, ties broken arbitrarily1, 2, 3, 4
RANK()Ties share rank, gaps after1, 2, 2, 4
DENSE_RANK()Ties share rank, no gaps1, 2, 2, 3
-- Top 3 transactions per branch (most common real-world use)
WITH ranked AS (
    SELECT txn_id, branch, amount,
           ROW_NUMBER() OVER (PARTITION BY branch
                                ORDER BY amount DESC) AS rn
    FROM   transactions
    WHERE  status = 'SUCCESS'
)
SELECT txn_id, branch, amount
FROM   ranked
WHERE  rn <= 3;
-- You MUST use a CTE/subquery — can't filter on window fn in WHERE directly

LAG & LEAD — look backward / forward

-- Day-over-day stock price change
SELECT symbol, trade_date, close,
       LAG(close, 1) OVER (PARTITION BY symbol
                            ORDER BY trade_date)       AS prev_close,
       close - LAG(close) OVER (PARTITION BY symbol
                                   ORDER BY trade_date)  AS daily_change,
       ROUND(
           (close - LAG(close) OVER (PARTITION BY symbol ORDER BY trade_date))
           * 100.0 / LAG(close) OVER (PARTITION BY symbol ORDER BY trade_date),
           2
       ) AS pct_change
FROM   daily_stock
ORDER BY symbol, trade_date;

-- Fraud signal: two consecutive FAILED transactions
WITH seq AS (
    SELECT account_id, txn_id, txn_date, status,
           LAG(status) OVER (PARTITION BY account_id
                               ORDER BY txn_date) AS prev_status
    FROM   transactions
)
SELECT account_id, txn_id, txn_date
FROM   seq
WHERE  status = 'FAILED' AND prev_status = 'FAILED';

Running totals & Moving averages

-- Frame clause controls which rows are in the window
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  -- running total
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW          -- 5-day window
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING -- entire partition

-- 5-day and 20-day moving average (stock technical analysis)
SELECT symbol, trade_date, close,
       ROUND(AVG(close) OVER (
           PARTITION BY symbol ORDER BY trade_date
           ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
       ), 2) AS ma_5,
       ROUND(AVG(close) OVER (
           PARTITION BY symbol ORDER BY trade_date
           ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
       ), 2) AS ma_20
FROM   daily_stock
ORDER BY symbol, trade_date;
LAST_VALUE needs ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Without this frame, it just returns the current row's value.

NTILE & PERCENT_RANK — segmentation

-- Classify accounts into quartiles by transaction volume
WITH vol AS (
    SELECT account_id, SUM(amount) AS total_volume
    FROM   transactions WHERE status='SUCCESS'
    GROUP BY account_id
)
SELECT account_id, total_volume,
       NTILE(4)  OVER (ORDER BY total_volume DESC) AS quartile,
       NTILE(10) OVER (ORDER BY total_volume DESC) AS decile,
       ROUND(PERCENT_RANK() OVER (ORDER BY total_volume) * 100, 2) AS percentile
FROM   vol;

Quick reference

FunctionWhat it doesORDER BY needed?
ROW_NUMBER()Unique sequential numberYes
RANK()Ties share rank, gaps after tiesYes
DENSE_RANK()Ties share rank, no gapsYes
LAG(col, n)Value n rows before currentYes
LEAD(col, n)Value n rows after currentYes
FIRST_VALUE(col)First value in frameYes
LAST_VALUE(col)Last value in frame (needs full frame!)Yes
SUM/AVG/COUNT OVER()Running or partitioned aggregateOptional
NTILE(n)Divide into n bucketsYes
PERCENT_RANK()Percentile 0.0 to 1.0Yes

Practice Problems

Q1 · Easy
From daily_stock, return only the most recent trading day per symbol using ROW_NUMBER.
Q2 · Medium
Rank all accounts by total 2024 transaction amount. Show rank, dense_rank, account_id, total. Observe what changes when two accounts tie.
Q3 · Medium
For each stock: previous day's close, daily change in ₹, daily % change. Flag rows where the stock dropped more than 2% in a day.
Q4 · Hard
For HDFC stock in 2024: every trading day, cumulative volume YTD, 5-day moving average close and volume, and BULLISH/BEARISH flag (close above/below MA5).
Q5 · Hard
Every SUCCESS transaction showing: rank within branch by amount, running total within branch by date, branch avg amount, how much this txn is above/below branch avg, and overall percentile across all branches. Use 4+ different window functions.