LEVEL 08

Pro & Interview Mastery

No new syntax. Everything uses Levels 1–7. The skill being tested is how you think and decompose hard problems.

Interviewer tip: always decompose → CTE per step → handle edge cases → mention indexes. They score your reasoning as much as correct SQL.

Pattern 1 — Gaps & Islands

Find consecutive sequences (islands) and breaks in them (gaps). The trick: subtract ROW_NUMBER from the value — consecutive values produce the same difference.

-- Visualise the trick:
-- date        rn   date - rn      ← same value = same island
-- 2024-01-01   1   2023-12-31  ┐
-- 2024-01-02   2   2023-12-31  ├── Island 1
-- 2024-01-03   3   2023-12-31  ┘
-- 2024-01-05   4   2024-01-01  ┐  gap here (Jan 4 missing)
-- 2024-01-06   5   2024-01-01  ┘── Island 2

-- Find accounts with 5+ consecutive transaction days
WITH daily AS (
    SELECT DISTINCT account_id, TRUNC(txn_date) AS txn_day
    FROM   transactions WHERE status='SUCCESS'
),
grouped AS (
    SELECT account_id, txn_day,
           txn_day - ROW_NUMBER() OVER (
               PARTITION BY account_id ORDER BY txn_day
           ) AS grp    -- Oracle: date arithmetic in days
    FROM   daily
)
SELECT account_id,
       MIN(txn_day)                    AS streak_start,
       MAX(txn_day)                    AS streak_end,
       MAX(txn_day) - MIN(txn_day) + 1 AS streak_days
FROM   grouped
GROUP BY account_id, grp
HAVING MAX(txn_day) - MIN(txn_day) + 1 >= 5
ORDER BY streak_days DESC;

Pattern 2 — Session Analysis

Group events into sessions separated by a time gap. A new session starts when the gap exceeds a threshold.

WITH gaps AS (
    SELECT account_id, txn_id, txn_date, amount,
           (txn_date - LAG(txn_date) OVER (
               PARTITION BY account_id ORDER BY txn_date
           )) * 24 * 60  AS mins_since_prev   -- Oracle: days × 1440
    FROM   transactions WHERE status='SUCCESS'
),
session_flags AS (
    SELECT *,
           CASE WHEN mins_since_prev > 30 OR mins_since_prev IS NULL
                THEN 1 ELSE 0 END AS new_session
    FROM   gaps
),
sessions AS (
    SELECT *,
           SUM(new_session) OVER (
               PARTITION BY account_id ORDER BY txn_date
               ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
           ) AS session_num
    FROM   session_flags
)
SELECT account_id, session_num,
       MIN(txn_date) AS session_start,
       MAX(txn_date) AS session_end,
       COUNT(*)      AS txn_count,
       SUM(amount)   AS session_total
FROM   sessions
GROUP BY account_id, session_num
ORDER BY account_id, session_num;

Pattern 3 — ROLLUP & GROUPING SETS

Subtotals and grand totals in a single query — no UNION ALL needed.

-- ROLLUP: detail rows + branch subtotals + grand total
SELECT
    CASE GROUPING(branch)   WHEN 1 THEN 'ALL BRANCHES' ELSE branch   END AS branch,
    CASE GROUPING(txn_type) WHEN 1 THEN 'ALL TYPES'    ELSE txn_type END AS txn_type,
    COUNT(*)    AS txn_count,
    SUM(amount) AS total
FROM   transactions WHERE status='SUCCESS'
GROUP BY ROLLUP(branch, txn_type);

-- GROUPING SETS: pick exactly which subtotals you want
GROUP BY GROUPING SETS (
    (branch, txn_type),   -- detail
    (branch),             -- branch subtotals
    ()                    -- grand total only
);

Pattern 4 — MoM, YoY, Cohort Retention

-- Month-over-month growth using LAG
WITH monthly AS (
    SELECT TO_CHAR(txn_date,'YYYY-MM') AS month,
           SUM(amount) AS volume
    FROM   transactions WHERE status='SUCCESS'
    GROUP BY TO_CHAR(txn_date,'YYYY-MM')
)
SELECT month, volume,
       LAG(volume) OVER (ORDER BY month)     AS prev_month,
       ROUND((volume - LAG(volume) OVER (ORDER BY month)) * 100.0
             / NULLIF(LAG(volume) OVER (ORDER BY month), 0), 2) AS mom_pct
FROM   monthly ORDER BY month;
-- Cohort retention — % still active in month X+N
WITH cohorts AS (
    SELECT account_id,
           MIN(TRUNC(txn_date,'MM')) AS cohort_month
    FROM   transactions WHERE status='SUCCESS'
    GROUP BY account_id
),
activity AS (
    SELECT DISTINCT account_id, TRUNC(txn_date,'MM') AS active_month
    FROM   transactions WHERE status='SUCCESS'
)
SELECT
    c.cohort_month,
    MONTHS_BETWEEN(a.active_month, c.cohort_month) AS month_num,
    COUNT(DISTINCT a.account_id)                    AS active,
    (SELECT COUNT(*) FROM cohorts WHERE cohort_month=c.cohort_month) AS cohort_size
FROM   cohorts  c
JOIN   activity a ON c.account_id = a.account_id
GROUP BY c.cohort_month, MONTHS_BETWEEN(a.active_month, c.cohort_month)
ORDER BY c.cohort_month, month_num;

Mock interview — 5 real questions

From Flipkart, Paytm, HDFC Tech, Oracle India DE rounds. No hints — think through each one before writing SQL.

Final 5 questions

Q1 · Flipkart-style — Gaps & Islands
A stock is in a "bull run" if its closing price increases for 3 or more consecutive trading days. Find all bull runs for every stock in 2024.
Output: symbol, run_start_date, run_end_date, days. Order by days descending.
Q2 · Paytm-style — Session Fraud Detection
Flag accounts where: 3+ transactions within 5 minutes, total amount > ₹1,00,000, at least 2 different cities in the same session. These are suspected account takeover attempts.
Output: account_id, session_start, txn_count, total_amount, city_count.
Q3 · HDFC Tech-style — Cohort Retention
Find the top 3 monthly cohorts (Jan–Dec 2024) with the highest 3-month retention. For each, show the full month-by-month retention curve from month 0 to month 6.
Q4 · Oracle India-style — Multi-level aggregation
Single query (no UNION) producing: detail rows (segment + city + month + volume), subtotals (segment + city), subtotals (segment), grand total. Add a row_type column: DETAIL / CITY_SUB / SEGMENT_SUB / GRAND_TOTAL.
Q5 · DE System Design
Daily pipeline: 10M new transactions/day, target has 10B rows over 3 years.
Requirements: (a) incremental load only, (b) handle late arrivals up to 3 days, (c) dedup before loading, (d) SCD2 on accounts dimension, (e) idempotent pipeline.
Write the SQL for a–d. For e: explain what idempotency means here and how your SQL achieves it.

Completion checklist

L1 — Foundations
SELECT, WHERE, JOIN, ORDER BY
L2 — Aggregations
GROUP BY, HAVING, CASE WHEN, dates
L3 — Subqueries & CTEs
WITH, recursive, EXISTS vs IN
L4 — Window Functions
RANK, LAG/LEAD, frames, running totals
L5 — Advanced Joins
FULL OUTER, CROSS, non-equi, UNION
L6 — Performance
EXPLAIN, indexes, partitions, MVs
L7 — DE Patterns
SCD2, MERGE, watermark, dedup, JSON
L8 — Pro Mastery
Gaps/islands, sessions, cohort, ROLLUP
All 8 levels complete. You can now write SQL that handles any data engineering problem — from basic queries to production warehouse pipelines.