LEVEL 05

Advanced Joins & Set Operations

Beyond LEFT/RIGHT — the joins that handle real-world messy, multi-source pipeline data.

FULL OUTER JOIN

Returns every row from both tables. NULLs fill the missing side.

-- Reconcile two daily snapshots — find new, closed, and changed accounts
WITH comparison AS (
    SELECT
        COALESCE(t.account_id, y.account_id) AS account_id,
        y.balance  AS yesterday_balance,
        t.balance  AS today_balance,
        CASE
            WHEN y.account_id IS NULL      THEN 'NEW'
            WHEN t.account_id IS NULL      THEN 'CLOSED'
            WHEN t.balance <> y.balance     THEN 'CHANGED'
            ELSE                                 'UNCHANGED'
        END AS status
    FROM   accounts_today     t
    FULL OUTER JOIN accounts_yesterday y ON t.account_id = y.account_id
)
SELECT status, COUNT(*) AS count
FROM   comparison GROUP BY status;
This reconciliation pattern is used in every data pipeline that does daily delta loads. Memorise it.

CROSS JOIN

Every row from the left × every row from the right. No join condition. If left has 5 rows and right has 4 — result is 20 rows.

-- Generate a date × stock grid to find missing trading days
WITH all_combos AS (
    SELECT d.trade_date, s.symbol
    FROM   (SELECT DISTINCT trade_date FROM daily_stock) d
    CROSS JOIN (SELECT DISTINCT symbol FROM daily_stock) s
)
SELECT ac.symbol, ac.trade_date AS missing_date
FROM   all_combos ac
LEFT JOIN daily_stock ds
    ON ac.symbol = ds.symbol AND ac.trade_date = ds.trade_date
WHERE  ds.symbol IS NULL
ORDER BY ac.symbol, ac.trade_date;

SELF JOIN

A table joined to itself using two different aliases. Used for hierarchies and same-table comparisons.

-- Employee with their manager's name and salary gap
SELECT e.name                    AS employee,
       e.salary,
       m.name                    AS manager,
       e.salary - m.salary       AS salary_diff
FROM   employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;

-- Stocks that closed within 1% of each other on different days
SELECT s1.trade_date, s2.trade_date AS date2, s1.close, s2.close
FROM   daily_stock s1
JOIN   daily_stock s2
    ON  s1.symbol = s2.symbol
    AND s1.trade_date < s2.trade_date       -- avoid duplicates
    AND ABS(s1.close - s2.close) / s1.close < 0.01
WHERE  s1.symbol = 'HDFC';

Non-equi JOIN

Join on a range or condition instead of equality. Used for tax slabs, risk bands, and date ranges in BFSI.

-- Assign each transaction to a risk band
-- risk_bands: band_name | min_amount | max_amount
SELECT t.txn_id, t.amount, r.band_name
FROM   transactions t
JOIN   risk_bands   r
    ON t.amount BETWEEN r.min_amount AND r.max_amount;

-- Assign transactions to financial quarters
SELECT t.txn_id, t.txn_date, q.q_name
FROM   transactions t
JOIN   quarters     q
    ON t.txn_date BETWEEN q.q_start AND q.q_end
WHERE  t.status = 'SUCCESS';

LATERAL join (PostgreSQL) / CROSS APPLY (Oracle)

-- PostgreSQL: last 3 transactions per account
SELECT a.account_id, a.customer, t.txn_id, t.amount
FROM   accounts a
CROSS JOIN LATERAL (
    SELECT txn_id, amount
    FROM   transactions
    WHERE  account_id = a.account_id  -- references outer row
    ORDER BY txn_date DESC
    LIMIT 3
) t;

-- Oracle equivalent
CROSS APPLY (
    SELECT txn_id, amount FROM transactions
    WHERE  account_id = a.account_id
    ORDER BY txn_date DESC
    FETCH FIRST 3 ROWS ONLY
) t

Set operations

Both queries must return the same number of columns with compatible types. Column names come from the first query.
OperatorReturnsDuplicatesOraclePG/MySQL
UNION ALLAll rows from bothKeeps
UNIONAll rows, dedupedRemoves
INTERSECTRows in both setsRemoves
MINUSIn left, not in rightRemovesMINUSEXCEPT
-- Accounts active only in 2023 (not 2024)
SELECT account_id FROM transactions WHERE EXTRACT(YEAR FROM txn_date)=2023
MINUS                          -- Oracle | use EXCEPT in PostgreSQL
SELECT account_id FROM transactions WHERE EXTRACT(YEAR FROM txn_date)=2024;

-- Delta detection: new records in staging not in production
SELECT txn_id, account_id, amount FROM transactions_staging
EXCEPT
SELECT txn_id, account_id, amount FROM transactions_production;

Practice Problems

Q1 · Easy
You have nse_stocks_jan and nse_stocks_feb with (symbol, avg_close, total_volume). FULL OUTER JOIN them. Show all symbols from both months, NULL where missing, and volume change (feb - jan).
Q2 · Medium
Generate a report template: every branch × every txn_type combination with count and total amount. Branches/types with zero transactions must still appear showing 0. Use CROSS JOIN then LEFT JOIN.
Q3 · Medium
Find pairs of employees in the same department where one earns more than 20% above the other. Show both names, dept, salaries, pct_difference. Avoid showing the same pair twice.
Q4 · Medium
Assign each SUCCESS transaction to a band: MICRO <1000, SMALL 1K-10K, MEDIUM 10K-100K, LARGE 100K-1M, CRITICAL 1M+. Show band, count, total amount, % of total volume.
Q5 · Hard
Three queries combined with UNION ALL: accounts only in 2023, accounts in both years, accounts only in 2024. Add a 'cohort' label. Final output: cohort, account count, total volume.