LEVEL 03

Subqueries & CTEs

Queries inside queries. CTEs make complex logic readable and reusable — the foundation of professional SQL.

Scalar subquery
Returns one value. Used in SELECT or WHERE.
Inline view
Returns a table. Used in FROM clause.
Correlated subquery
References outer query. Runs once per row.
CTE (WITH clause)
Named subquery. Reusable. Modern way to write all of the above.

Scalar subquery

-- Transactions above the overall average
SELECT txn_id, amount
FROM   transactions
WHERE  amount > (SELECT AVG(amount) FROM transactions)
AND    status = 'SUCCESS';

-- Show each row vs the average
SELECT txn_id, amount,
       (SELECT ROUND(AVG(amount),2) FROM transactions) AS overall_avg,
       amount - (SELECT AVG(amount) FROM transactions)  AS diff
FROM   transactions
WHERE  status = 'SUCCESS';

Inline view (subquery in FROM)

-- Top 3 accounts by total volume
SELECT account_id, total_volume
FROM   (
    SELECT account_id,
           SUM(amount) AS total_volume
    FROM   transactions
    WHERE  status = 'SUCCESS'
    GROUP BY account_id
) account_summary
ORDER BY total_volume DESC
FETCH FIRST 3 ROWS ONLY;    -- Oracle | use LIMIT 3 in PG/MySQL

Correlated subquery

Runs once per outer row — references the outer query's column. Powerful but slow on large tables.

-- Employees earning above their own department average
SELECT e.name, e.dept, e.salary
FROM   employees e
WHERE  e.salary > (
    SELECT AVG(e2.salary)
    FROM   employees e2
    WHERE  e2.dept = e.dept    -- ← references outer row
);
-- For each employee row, inner query runs with THAT employee's dept
Correlated subqueries run N times (once per outer row). Replace with window functions or CTEs for large tables.

WITH clause — CTEs

Give your subquery a name. Reference it like a table. Chain multiple CTEs together.

-- Single CTE
WITH branch_stats AS (
    SELECT branch,
           COUNT(*) AS total_txns,
           COUNT(CASE WHEN status='SUCCESS' THEN 1 END) AS success_txns,
           SUM(amount) AS total_volume
    FROM   transactions
    GROUP BY branch
)
SELECT branch,
       ROUND(success_txns * 100.0 / total_txns, 2) AS success_rate
FROM   branch_stats
WHERE  total_txns >= 100
ORDER BY success_rate DESC;
-- Chained CTEs — each can reference the previous
WITH
account_totals AS (
    SELECT account_id, SUM(amount) AS total_spent
    FROM   transactions WHERE status='SUCCESS'
    GROUP BY account_id
),
segment_summary AS (
    SELECT a.segment,
           COUNT(DISTINCT a.account_id) AS accounts,
           SUM(t.total_spent)           AS total_volume
    FROM   accounts a
    JOIN   account_totals t ON a.account_id = t.account_id
    GROUP BY a.segment
)
SELECT * FROM segment_summary ORDER BY total_volume DESC;

Recursive CTE — org hierarchy

-- PostgreSQL / SQL Server
WITH RECURSIVE org AS (
    -- Anchor: top-level managers
    SELECT emp_id, name, manager_id, 1 AS lvl
    FROM   employees WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: find each person's direct reports
    SELECT e.emp_id, e.name, e.manager_id, o.lvl + 1
    FROM   employees e
    JOIN   org o ON e.manager_id = o.emp_id
)
SELECT lvl, name FROM org ORDER BY lvl, name;

-- Oracle uses CONNECT BY instead
SELECT LEVEL, name
FROM   employees
START WITH  manager_id IS NULL
CONNECT BY  PRIOR emp_id = manager_id;

EXISTS vs IN vs JOIN

MethodPerformanceNULL safetyBest for
INOK for small listsDangerous with NULLsFixed small lists
EXISTSFast — short circuitsNULL safeLarge tables, existence check
JOINFastest with indexesPredictableWhen you need columns from both
NOT INOKBroken if NULL presentAvoid — use NOT EXISTS
NOT EXISTSFastNULL safeAnti-join pattern
-- Accounts active in 2024 — three ways
WHERE account_id IN (SELECT account_id FROM transactions WHERE txn_date >= '2024-01-01')

WHERE EXISTS (SELECT 1 FROM transactions t
              WHERE t.account_id = a.account_id
              AND   t.txn_date >= '2024-01-01')  -- preferred for large tables

-- Accounts with NO transactions (use NOT EXISTS, not NOT IN)
WHERE NOT EXISTS (SELECT 1 FROM transactions t
                  WHERE t.account_id = a.account_id)  -- NULL safe ✓
NOT IN breaks silently if the subquery returns any NULL. All rows are excluded. Always use NOT EXISTS instead.

Practice Problems

Q1 · Easy
Find all stock trading days where HDFC's close was above its all-time average. Show trade_date, close, and the average.
Q2 · Medium
Top 5 accounts by transaction count in 2024. Show account_id, customer name, segment, txn_count. Use an inline view joined to accounts.
Q3 · Medium
For each branch, find the single largest transaction. Try with correlated subquery first, then rewrite using a CTE.
Q4 · Hard
Using 3 chained CTEs: (1) total SUCCESS volume per account, (2) join to accounts for segment, (3) summarise by segment with total accounts, total volume, avg volume, and rank by volume.
Q5 · Hard
Find accounts with at least one CREDIT > ₹50,000 AND that have NEVER had a FAILED transaction. Write twice — once with IN/NOT IN, once with EXISTS/NOT EXISTS. Which is safer?