Best on Desktop — To run SQL queries and use the full playground, open this on a laptop or PC.
query.sql
SELECT name, salary
FROM   employees
WHERE  dept = 'Engineering'
  AND  salary > 80000
ORDER BY salary DESC
LIMIT  5;
SQL Learning Lab — Interactive

Learn SQL by Playing

Master SQL with an interactive cheat sheet and two skill-building games — designed to make database concepts stick.

0
SQL Queries
0
Interview Questions
0
Cheat Sheet Topics
SQL Mastery Course  ·  June 2026 Batch  ·  Live Online
Enjoying the playground? Get structured, job-ready SQL training.
Live weekly sessions Interview question walkthroughs Completion certificate ₹14,000 one-time
View SQL Course
01
Cheat Sheet
8 topics with live reference tables
Reference
02
Guess the Output
Read a query, pick the correct result
Game 1
03
Speed Round
60 seconds, streak multipliers
Game 2
04
Go Premium
Question booklet, cheat sheet & more — ₹999
Unlock
WHY SQL?

The language behind
every data-driven decision.

Swiggy tracks your order. Flipkart predicts your next buy. Google ranks every search result.
All of it — runs on SQL.

Find top revenue cities in 1 query
Detect fraud patterns in millions of rows
Calculate month-over-month growth
Rank products by customer rating
Find users who churned this quarter
Build a real-time sales dashboard
Compare delivery times across cities
Identify your highest-value customers
Find top revenue cities in 1 query
Detect fraud patterns in millions of rows
Calculate month-over-month growth
Rank products by customer rating
Find users who churned this quarter
Build a real-time sales dashboard
Compare delivery times across cities
Identify your highest-value customers
3M+
SQL jobs globally
₹12L
avg salary in India
10x
faster than Excel
#1
most-asked data skill
How SQL works in the real world
Raw Data
millions of rows
You Write SQL
5 lines of code
DB Answers
in milliseconds
Clear Insight
decision made
Your Job
₹15L–₹40L
Without SQL
  • Download CSV, open Excel, wait
  • 3 hours of pivot tables
  • "I think the data shows..."
  • Dependent on the analyst team
  • Entry-level, easily replaced
VS
With SQL
  • Write 5 lines, hit Run
  • Answer in under 1 second
  • "The data clearly shows..."
  • Self-serve — no waiting
  • ₹15L–₹40L, irreplaceable
Who needs SQL?
Data Analyst
Finds trends, builds dashboards, answers "why did sales drop last month?"
Data Engineer
Builds pipelines, transforms raw data into clean, queryable tables
Backend Developer
Powers apps, manages databases, writes queries that serve millions of users
Product Manager
Self-serves data insights — no more waiting 3 days for the analyst team
You write this...
SELECT city,
  COUNT(*) AS total_orders,
  ROUND(AVG(delivery_mins), 1) AS avg_delivery
FROM deliveries
WHERE status = 'delivered'
GROUP BY city
ORDER BY total_orders DESC
LIMIT 5;
...and instantly discover
Mumbai leads with 48,200 orders.
Chennai delivers fastest — 22 mins avg.
Jaipur needs more delivery partners.
What took 3 hours in Excel now takes 0.3 seconds. This is the power of SQL — and why every data job demands it.
Start Practising Free
📋 145 Questions Live Playground Free — No Signup
04
SQL Unlocked

SQL Unlocked

Your structured path from zero to confident SQL — topic by topic, story by story.

What's Inside

6 topics.
150+ questions.
Zero confusion.

SELECT → WHERE → ORDER BY → Aggregates → GROUP BY → NULLs. Each topic has theory, diagrams, MCQs, Fix-It drills, and write-from-scratch challenges. One structured staircase — all the way up.

6Topics
150+Questions
35+Per Topic
Start Learning Free →
SELECT
35 questions
WHERE
29+ questions
ORDER BY
29+ questions
Aggregates
29+ questions
GROUP BY
29+ questions
NULLs
29+ questions

Real SQL Interview Questions

Curated from Amazon, Google, Microsoft, Meta, Flipkart, Swiggy and LeetCode — organized by company, level, and topic.

Source
Collections
Level
Topic
Progress
Legend
Newbie — Never written SQL? Start here. Zero experience needed.
·
Interview Boost — High-frequency questions from real SQL rounds. Practise these first.
65 shown
 Newbie
 Beginner
 Intermediate
 Expert
0 / – completed
Newbie0/–
Beginner0/–
Intermediate0/–
Expert0/–
Run a query successfully in the playground to auto-mark it done — progress saves automatically in this browser
Lv 1
Absolute Beginner
0 XP earned
74
🌍 NewbieWHERE
Filter Rows with WHERE
WHERE narrows your result to only rows that match a condition — the most-used clause in SQL

Show all students who scored grade 'A'. Use the WHERE clause to filter rows — only rows where grade = 'A' should appear in the result.

Schema
students
idnamegradecity
1ArjunADelhi
2PriyaBMumbai
3RajAPune
4SimranCDelhi
5KaranAChennai
Expected Output
idnamegradecity
1ArjunADelhi
3RajAPune
5KaranAChennai
Solution
SELECT *
FROM students
WHERE grade = 'A';
75
🌍 NewbieORDER BY
Sort Results with ORDER BY
ASC = low to high (default), DESC = high to low — ORDER BY always runs last

Show all products sorted by price from lowest to highest. Then try changing it to highest-to-lowest by adding DESC. ORDER BY always comes at the end of a query.

Schema
products
idnamecategoryprice
1PenStationery10
2NotebookStationery45
3BagAccessories350
4EraserStationery5
5Water BottleAccessories120
Expected Output (ASC)
idnameprice
4Eraser5
1Pen10
2Notebook45
5Water Bottle120
3Bag350
Solution
SELECT id, name, price
FROM products
ORDER BY price ASC;  -- ASC is default; swap DESC for highest first
76
🌍 NewbieLIKE
Search Patterns with LIKE
% matches any characters; _ matches exactly one — combine with LIKE to search by pattern

Find all customers whose name starts with the letter 'A'. Use LIKE 'A%' — the % wildcard matches any characters after 'A'. Try LIKE '%a' in the playground to find names ending with 'a'.

Schema
customers
idnamecity
1AnkitDelhi
2PriyaMumbai
3AishaHyderabad
4RahulPune
5AmitaBengaluru
Expected Output
idnamecity
1AnkitDelhi
3AishaHyderabad
5AmitaBengaluru
Solution
SELECT *
FROM customers
WHERE name LIKE 'A%';

-- Try also:
-- WHERE name LIKE '%a'   -- ends with 'a'
-- WHERE name LIKE '_i%'  -- second letter is 'i'
77
🌍 NewbieGROUP BY
Count Records per Group
GROUP BY collapses rows by a column; COUNT(*) tells you how many rows are in each group

Count how many students are in each city. Use GROUP BY city to group rows by city, then COUNT(*) to count how many students fall in each group.

Schema
students
idnamegradecity
1ArjunADelhi
2PriyaBMumbai
3RajADelhi
4SimranCMumbai
5KaranAChennai
Expected Output
citystudent_count
Chennai1
Delhi2
Mumbai2
Solution
SELECT city, COUNT(*) AS student_count
FROM students
GROUP BY city
ORDER BY city;
78
🌍 NewbieJOIN
Your First JOIN — Combine Two Tables
JOIN links two tables on a shared column so you can see data from both in one result row

Show each order with the customer's name and amount. The customers table has names; orders has amounts. Use INNER JOIN to match them on customer_id.

Schema
customers & orders
customers.idcustomers.nameorders.order_idorders.customer_idorders.amount
1Arjun1011500
2Priya1022300
1Arjun1031750
3Karan1043200
Expected Output
order_idnameamount
101Arjun500
102Priya300
103Arjun750
104Karan200
Solution
SELECT o.order_id, c.name, o.amount
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
ORDER BY o.order_id;
79
🌍 NewbieSELECT
Pick Specific Columns — Don't Always Use *
SELECT * grabs everything; name only the columns you actually need

The HR team needs a report with only employee name and department — they don't want salary or phone visible. Select just those two columns from the employees table.

Schema
employees
idnamedepartmentsalaryphone
1ArjunEngineering750009812345678
2PriyaMarketing620009987654321
3RajEngineering810009776543210
Expected Output
namedepartment
ArjunEngineering
PriyaMarketing
RajEngineering
Solution
SELECT name, department
FROM employees;
80
🌍 NewbieWHERE > < !=
WHERE with Comparison Operators
= equals, != not equal, > greater, < less — combine them to filter any range

Show all products where price is greater than ₹100. Then try modifying the query to show products priced less than ₹50 and products where category is NOT 'Food'.

Schema
products
idnamecategoryprice
1RiceFood45
2HeadphonesElectronics799
3NotebookStationery60
4PenStationery10
5MixerAppliances1200
Expected Output (price > 100)
idnameprice
2Headphones799
5Mixer1200
Solution
SELECT *
FROM products
WHERE price > 100;

-- Also try:
-- WHERE price < 50          -- cheaper than ₹50
-- WHERE category != 'Food'  -- everything except Food
81
🌍 NewbieAND / OR
Combine Conditions with AND and OR
AND requires both conditions true; OR requires at least one true

Part A — Show employees in the 'Sales' department AND with salary above ₹40,000.
Part B — Show customers from 'Delhi' OR 'Pune'.

Schema
employees
idnamedepartmentsalary
1ArjunSales38000
2PriyaSales52000
3RajEngineering70000
4SimranSales47000
Expected Output (AND)
namedepartmentsalary
PriyaSales52000
SimranSales47000
Solution
-- Part A: AND — both conditions must be true
SELECT *
FROM employees
WHERE department = 'Sales' AND salary > 40000;

-- Part B: OR — either condition is enough
-- SELECT * FROM customers WHERE city = 'Delhi' OR city = 'Pune';
82
🌍 NewbieDISTINCT
Remove Duplicates with SELECT DISTINCT
DISTINCT keeps only one row per unique value — like a de-duplication filter

The orders table has hundreds of rows but you only need to know which unique cities orders have come from. Use SELECT DISTINCT to get each city name exactly once.

Schema
orders
idcustomercityamount
1ArjunDelhi500
2PriyaMumbai300
3RajDelhi750
4SimranPune200
5KaranMumbai900
Expected Output
city
Delhi
Mumbai
Pune
Solution
SELECT DISTINCT city
FROM orders
ORDER BY city;
83
🌍 NewbieIS NULL
Find Missing Values with IS NULL
NULL means no value stored — you can't use = NULL, only IS NULL

Find all employees who have not provided a phone number (phone is NULL). Then find those who have provided one (IS NOT NULL).

Schema
employees
idnamephoneemail
1Arjun9812345678a@co.in
2PriyaNULLp@co.in
3RajNULLr@co.in
4Simran9776543210s@co.in
Expected Output (IS NULL)
idnamephone
2PriyaNULL
3RajNULL
Solution
SELECT *
FROM employees
WHERE phone IS NULL;

-- Find employees who DO have a phone:
-- WHERE phone IS NOT NULL
84
🌍 NewbieORDER BY
Sort by Multiple Columns
List columns left to right — SQL sorts by the first, then breaks ties using the next

Show all students sorted by score descending (highest first). When two students have the same score, sort their names alphabetically (A–Z).

Schema
students
idnamegradescore
1Raj1088
2Aisha1095
3Priya1088
4Karan1072
Expected Output
namescore
Aisha95
Priya88
Raj88
Karan72
Solution
SELECT name, score
FROM students
ORDER BY score DESC, name ASC;
85
🌍 NewbieIN
Match a List of Values with IN
IN is a cleaner replacement for writing multiple OR conditions

Show all products that belong to the categories 'Electronics', 'Books', or 'Clothing'. Use IN instead of three separate OR conditions.

Schema
products
idnamecategoryprice
1HeadphonesElectronics799
2RiceFood45
3NovelBooks299
4T-ShirtClothing399
5MixerAppliances1200
Expected Output
namecategoryprice
HeadphonesElectronics799
NovelBooks299
T-ShirtClothing399
Solution
SELECT name, category, price
FROM products
WHERE category IN ('Electronics', 'Books', 'Clothing')
ORDER BY category;
86
🌍 NewbieBETWEEN
Filter a Range with BETWEEN
BETWEEN is inclusive on both ends — equivalent to >= low AND <= high

Show all orders where the amount is between ₹500 and ₹1000 (both ends included). Works for numbers, dates, and text ranges.

Schema
orders
idcustomeramountorder_date
1Arjun4502024-01-05
2Priya7502024-01-08
3Raj12002024-01-12
4Simran5002024-01-15
5Karan9992024-01-20
Expected Output
customeramount
Priya750
Simran500
Karan999
Solution
SELECT customer, amount
FROM orders
WHERE amount BETWEEN 500 AND 1000
ORDER BY amount;
87
🌍 NewbieMIN / MAX
Find the Highest and Lowest Value
MIN and MAX work on numbers, dates, and text — they return a single value

Find the highest and lowest salary in the company with a single query. Also find the most recent joining date using MAX on a date column.

Schema
employees
idnamesalaryjoin_date
1Arjun750002021-03-15
2Priya620002022-07-01
3Raj900002020-11-20
4Simran480002023-01-10
Expected Output
highest_salarylowest_salarylatest_join
90000480002023-01-10
Solution
SELECT
  MAX(salary)    AS highest_salary,
  MIN(salary)    AS lowest_salary,
  MAX(join_date) AS latest_join
FROM employees;
88
🌍 NewbieSUM / AVG
Total and Average with SUM and AVG
SUM adds all values; AVG divides sum by count — both skip NULLs automatically

Calculate the total revenue and average order value from the sales table in a single query. Round the average to 2 decimal places.

Schema
sales
idproductamountregion
1Laptop45000North
2Phone18000South
3Tablet22000North
4Earbuds3500West
Expected Output
total_revenueavg_order
8850022125.00
Solution
SELECT
  SUM(amount)            AS total_revenue,
  ROUND(AVG(amount), 2) AS avg_order
FROM sales;
89
🌍 NewbieHAVING
Filter Groups with HAVING
HAVING is WHERE for groups — it runs after GROUP BY so it can see aggregate results

Show only customers whose total order value exceeds ₹1,000. You need GROUP BY to group per customer, then HAVING to filter on the total.

Schema
orders
idcustomeramount
1Arjun400
2Priya750
3Arjun800
4Karan200
5Priya500
Expected Output
customertotal
Arjun1200
Priya1250
Solution
SELECT customer, SUM(amount) AS total
FROM orders
GROUP BY customer
HAVING SUM(amount) > 1000
ORDER BY total DESC;
90
🌍 NewbieAliases
Rename Columns and Tables with AS
AS gives a column or table a temporary name — it only exists in that query's output

Display each employee's full name (first + last concatenated) as full_name, and rename salary to monthly_pay. Use AS to create column aliases.

Schema
employees
idfirst_namelast_namesalary
1ArjunSharma75000
2PriyaMehta62000
3RajPatel90000
Expected Output
full_namemonthly_pay
Arjun Sharma75000
Priya Mehta62000
Raj Patel90000
Solution
SELECT
  first_name || ' ' || last_name AS full_name,
  salary AS monthly_pay
FROM employees;
91
🌍 NewbieLEFT JOIN
LEFT JOIN — Find Customers with No Orders
LEFT JOIN + WHERE right.id IS NULL = the classic "find unmatched rows" pattern

Find all customers who have never placed an order. INNER JOIN would drop them — LEFT JOIN keeps them with NULLs, then filter for those NULLs.

Schema
customers
idname
1Arjun
2Priya
3Raj
4Simran
orders
order_idcustomer_id
1011
1023
Expected Output
name
Priya
Simran
Solution
SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.order_id IS NULL
ORDER BY c.name;
92
🌍 NewbieUNION
Stack Results from Two Queries with UNION
UNION removes duplicates; UNION ALL keeps every row — both need matching column count and types

Build a combined list of cities from both the customers and suppliers tables — no duplicates. Then try UNION ALL to see duplicates included.

Schema
customers
idnamecity
1ArjunDelhi
2PriyaMumbai
suppliers
idnamecity
1Tata Co.Mumbai
2ReliancePune
Expected Output (UNION)
city
Delhi
Mumbai
Pune
Solution
SELECT city FROM customers
UNION
SELECT city FROM suppliers
ORDER BY city;

-- Try UNION ALL to include the duplicate 'Mumbai':
-- SELECT city FROM customers UNION ALL SELECT city FROM suppliers
93
🌍 Newbie
Filter a City Table — Multiple WHERE Conditions
Filter rows using multiple AND conditions across different columns

Query all columns for every city in Maharashtra (state_code = 'MH') with a population greater than 500,000.

Schema
city
idnamestate_codedistrictpopulation
1MumbaiMHMumbai City12478447
2PuneMHPune3124458
3JaipurRJJaipur3046163
4NashikMHNashik1486053
5AurangabadMHAurangabad373311
Expected Output
idnamestate_codedistrictpopulation
1MumbaiMHMumbai City12478447
2PuneMHPune3124458
4NashikMHNashik1486053
Solution
SELECT *
FROM city
WHERE state_code = 'MH'
  AND population > 500000;
94
🌍 Newbie
Names Only — SELECT One Column + ORDER BY
Select one column and sort alphabetically with ORDER BY

List the names of all cities in Rajasthan (state_code = 'RJ'), sorted alphabetically.

Schema
city (same as Q93)
idnamestate_codepopulation
3JaipurRJ3046163
6UdaipurRJ451100
7AjmerRJ542321
Expected Output
name
Ajmer
Jaipur
Udaipur
Solution
SELECT name
FROM city
WHERE state_code = 'RJ'
ORDER BY name ASC;
95
🌍 Newbie
Even ID Cities — DISTINCT + Modulo Filter
Use modulo to filter by even/odd IDs and DISTINCT to remove duplicates

List distinct city names where the city's ID is an even number. Use the modulo operator % (or MOD(id, 2) = 0) to check for even numbers.

Schema
city
idnamestate_code
1MumbaiMH
2PuneMH
2PuneMH
4NashikMH
5JaipurRJ
Expected Output
name
Nashik
Pune
Solution
SELECT DISTINCT name
FROM city
WHERE id % 2 = 0
ORDER BY name;
96
🌍 Newbie
Count Duplicates — COUNT vs COUNT DISTINCT
Detect duplicates by comparing COUNT(*) vs COUNT(DISTINCT col)

Find how many duplicate city names exist in the table. Subtract the number of distinct names from the total count — the difference equals the duplicate count.

Schema
city
idname
1Delhi
2Mumbai
3Delhi
4Pune
5Mumbai
Expected Output
duplicate_count
2
Solution
SELECT
  COUNT(name) - COUNT(DISTINCT name) AS duplicate_count
FROM city;
97
🌍 Newbie
Big Countries — OR in WHERE
A country is "big" if its area is huge OR its population is huge — either condition qualifies

A country is classified as "big" if it has an area of at least 3,000,000 km² OR a population of at least 25,000,000. Report the name, population, and area for all big countries.

Schema
world
namecontinentareapopulation
IndiaAsia32872631380000000
MaldivesAsia298540000
RussiaEurope17098242144000000
NepalAsia14718129136808
Expected Output
namepopulationarea
India13800000003287263
Russia14400000017098242
Nepal29136808147181
Solution
SELECT name, population, area
FROM world
WHERE area >= 3000000
   OR population >= 25000000;
98
🌍 Newbie
Find Customers Not Referred by a Specific Person
NULL != 2 is not the same as IS NULL — you need both conditions with OR

Find the names of customers who were not referred by customer with id = 2. Include customers whose referee_id is NULL (they were referred by no one).

Schema
customer
idnamereferee_id
1ArjunNULL
2PriyaNULL
3Raj1
4Simran2
5Karan2
6Neha3
Expected Output
name
Arjun
Priya
Raj
Neha
Solution
SELECT name
FROM customer
WHERE referee_id != 2
   OR referee_id IS NULL
ORDER BY name;
99
🌍 Newbie
Customers Who Never Ordered
LEFT JOIN keeps all customers; filter WHERE order IS NULL to find those with no orders

Find all customers who have never placed an order. Use LEFT JOIN to keep all customers, then filter for rows where the order side is NULL.

Schema
Customers
idname
1Arjun
2Priya
3Raj
4Simran
Orders
idcustomerId
11
23
Expected Output
Customers
Priya
Simran
Solution
SELECT c.name AS Customers
FROM Customers c
LEFT JOIN Orders o ON c.id = o.customerId
WHERE o.id IS NULL
ORDER BY c.name;
100
🌍 Newbie
Find Duplicate Emails
GROUP BY email, then HAVING COUNT > 1 keeps only the duplicated ones

Find all email addresses that appear more than once in the Person table. Use GROUP BY to group by email, then HAVING to filter groups with more than 1 row.

Schema
Person
idemail
1a@example.com
2b@example.com
3a@example.com
Expected Output
Email
a@example.com
Solution
SELECT email AS Email
FROM Person
GROUP BY email
HAVING COUNT(*) > 1;
101
🌍 Newbie
Classes with 5 or More Students
GROUP BY class, then HAVING COUNT(student_id) >= 5

Find all classes that have at least 5 students enrolled. Each row in the table represents one student enrolled in one class.

Schema
Courses
studentclass
ArjunMaths
PriyaMaths
RajBiology
SimranMaths
KaranMaths
NehaBiology
AishaMaths
Expected Output
class
Maths
Solution
SELECT class
FROM Courses
GROUP BY class
HAVING COUNT(student) >= 5;
102
🌍 Newbie
Not Boring Movies — Odd ID + WHERE NOT LIKE
Filter for odd IDs using id % 2 = 1, then exclude boring descriptions

Show all movies with an odd ID and a description that is not 'boring', ordered by rating descending.

Schema
Cinema
idmoviedescriptionrating
1Wargreat 3D8.9
2Sciencefiction8.5
3irishboring6.2
4Ice songFantacy8.6
5House cardInteresting9.1
Expected Output
idmoviedescriptionrating
5House cardInteresting9.1
1Wargreat 3D8.9
Solution
SELECT *
FROM Cinema
WHERE id % 2 = 1
  AND description != 'boring'
ORDER BY rating DESC;
103
🌍 Newbie
Authors Who Viewed Their Own Articles
Compare two columns in the same row — no JOIN needed when both columns are in the same table

Find all authors who viewed at least one of their own articles (i.e. author_id = viewer_id). Return distinct author IDs sorted ascending.

Schema
Views
article_idauthor_idviewer_idview_date
1352019-08-01
1362019-08-02
2772019-08-01
2762019-08-02
4712019-07-22
3442019-07-21
Expected Output
id
4
7
Solution
SELECT DISTINCT author_id AS id
FROM Views
WHERE author_id = viewer_id
ORDER BY id;
104
🌍 Newbie▰ MCQHAVING
WHERE vs HAVING — which one filters after grouping?
One clause runs before GROUP BY, the other after — pick the right one

You want to show only departments where the total salary is above ₹50,000. Your query already has GROUP BY department. Which clause do you add?

105
🌍 Newbie▰ MCQNULL
How do you correctly check for NULL values?
NULL is the absence of a value — regular comparison operators don't work on it

A phone column has some rows with no value stored. Which query correctly finds those rows?

106
🌍 Newbie▰ MCQDISTINCT
What does SELECT DISTINCT return?
DISTINCT removes something from the result — what exactly?

The orders table has 1,000 rows but only 40 unique cities. What does SELECT DISTINCT city FROM orders return?

107
🌍 Newbie▰ MCQClause Order
In what order does SQL actually execute clauses?
The order you write SQL and the order SQL runs it are different

You write a query with SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY. In which order does SQL actually execute these clauses?
Hint: this is why you can't use a SELECT alias inside WHERE.

108
🌍 Newbie▰ MCQLEFT JOIN
What does a LEFT JOIN return when there's no match?
LEFT JOIN always keeps all rows from the left table — but what about the right side?

You LEFT JOIN customers (5 rows) with orders (3 rows). Two customers have no orders at all. How many rows appear in the result?

109
🌍 Newbie▰ MCQCOUNT
COUNT(*) vs COUNT(column) — do they always give the same result?
One of them ignores NULLs; the other doesn't

A status column has 10 rows, but 3 of them are NULL. What does COUNT(status) return?

110
🌍 Newbie▰ MCQLIKE
LIKE wildcards — % vs _ — which matches more?
One is greedy (matches any length), one is strict (matches exactly one)

Which pattern matches any name that starts with 'A' and ends with 'n', with any number of characters in between (like 'Arjun', 'Adrian', or even just 'An')?

111
🌍 Newbie▰ MCQORDER BY
ORDER BY without ASC or DESC — what's the default?
SQL has a default sort direction when you don't specify

You write SELECT name FROM employees ORDER BY salary without specifying ASC or DESC. In what order are results returned?

112
🌍 Newbie▰ MCQUNION
UNION vs UNION ALL — which removes duplicates?
One stacks rows as-is; the other runs a de-duplication step

Table A has 3 rows, Table B has 3 rows. One row exists in both tables. UNION gives ___ rows; UNION ALL gives ___ rows.

113
🌍 Newbie▰ MCQPRIMARY KEY
What rules does a PRIMARY KEY enforce?
PRIMARY KEY is actually a combination of two other constraints

Table students has id as PRIMARY KEY. Row with id=1 already exists. You try:
INSERT INTO students VALUES(1, 'Riya') — what happens? And which general rule does that reveal about PRIMARY KEY?

114
🌍 Newbie▰ MCQINNER JOIN
INNER JOIN vs LEFT JOIN — which drops unmatched rows?
One join type is strict (both must match); the other is lenient (keep left regardless)

You have 5 customers and 3 orders. Two customers have never ordered. You INNER JOIN customers to orders. How many rows appear in the result?

115
🌍 Newbie▰ MCQBETWEEN
Is BETWEEN inclusive or exclusive at the boundaries?
Does WHERE age BETWEEN 18 AND 25 include 18 and 25 themselves?

A student table has ages 17, 18, 20, 25, 26. Query: WHERE age BETWEEN 18 AND 25. Which ages appear in the result?

116
🌍 Newbie▰ MCQGROUP BY
GROUP BY rule — which SELECT causes an error?
Every column in SELECT must either be in GROUP BY or inside an aggregate function

Table employees before grouping:

idnamedepartmentsalary
1ArjunHR40000
2PriyaHR55000
3RaviIT70000

After GROUP BY department, HR collapses to 1 row. SQL can't decide which name (Arjun or Priya?) to show. Which SELECT line causes an error?

117
🌍 Newbie▰ MCQDELETE vs DROP
DELETE FROM vs DROP TABLE — what's the difference?
One removes data; the other removes the entire object including its structure

After running DELETE FROM employees, what is left in the database?

118
🌍 Newbie▰ MCQCOALESCE
What does COALESCE return?
COALESCE scans its arguments left to right and returns the first one that isn't NULL

A row has phone = NULL and email = 'p@co.in'. What does COALESCE(phone, email, 'No contact') return?

119
🌍 Newbie▰ MCQCASE WHEN
CASE WHEN — which branch runs first?
CASE evaluates conditions top to bottom and stops at the first match

A student scores 85. What does this return?
CASE WHEN score >= 90 THEN 'A' WHEN score >= 80 THEN 'B' ELSE 'C' END

120
🌍 Newbie▰ MCQFOREIGN KEY
What does a FOREIGN KEY constraint enforce?
FOREIGN KEY links two tables — it prevents orphan rows

orders.customer_id has a FOREIGN KEY referencing customers.id. What does this prevent?

121
🌍 Newbie▰ MCQINDEX
What does adding an INDEX to a column primarily improve?
Think of an index as a book's index — it helps you find things faster, but takes space

You add CREATE INDEX idx_email ON users(email). What is the primary benefit?

122
🌍 Newbie▰ MCQSELF JOIN
When do you need a SELF JOIN?
A self join is when a table joins to itself — used for hierarchies and same-table comparisons

Table employees — notice manager_id points to another row in the same table:

idnamesalarymanager_id
1Sunita90000NULL
2Arjun600001
3Priya950001

Priya (₹95k) earns more than her manager Sunita (₹90k). To find such employees, you use:

123
🌍 Newbie▰ MCQLIMIT
How do you get only the top 5 results in SQLite?
LIMIT restricts how many rows come back — it works together with ORDER BY

You want the 3 most expensive products from the products table. Which query is correct in SQLite?

124
🌍 Newbie▰ MCQSubquery
What does a scalar subquery return?
A subquery in WHERE produces a value — what type of value?

Query: SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees). What does the inner SELECT AVG(salary) return?

125
🌍 Newbie▰ MCQEXISTS
What does WHERE EXISTS check?
EXISTS doesn't care about values — it only asks one question

WHERE EXISTS (SELECT 1 FROM orders WHERE orders.customer_id = customers.id) — what does this check for each customer row?

01
BeginnerSELECT
Find Duplicate Emails
Find all emails that appear more than once in the Person table

Write a SQL query to find all duplicate email addresses in the Person table. An email is a duplicate if it appears more than once.

Schema
Person
idemail
1a@b.com
2c@d.com
3a@b.com
Solution
SELECT email
FROM Person
GROUP BY email
HAVING COUNT(email) > 1;
02
BeginnerJOINs
Customers Who Never Ordered
Find customers with no matching order record

Find all customers who have never placed an order. A classic LEFT JOIN + NULL check pattern asked in almost every SQL interview.

Schema
Customers
idname
1Alice
2Bob
3Carol
Orders
idcustomerId
11
21
Solution
SELECT c.name AS Customers
FROM Customers c
LEFT JOIN Orders o ON c.id = o.customerId
WHERE o.id IS NULL;
03
MicrosoftBeginner
Employees Earning More Than Their Manager
Self-join to compare employee salary with their manager's

Write a query to find employees whose salary is higher than their direct manager's salary. Uses a self-join on the Employee table using managerId.

Schema
Employee
idnamesalarymanagerId
1Joe700003
2Henry800004
3Sam60000NULL
4Max90000NULL
Solution
SELECT e.name AS Employee
FROM Employee e
JOIN Employee m ON e.managerId = m.id
WHERE e.salary > m.salary;
04
AmazonBeginnerAggregates
Average Product Rating Per Category
GROUP BY + AVG with ROUND — DataLemur classic

Calculate the average star rating per product category, rounded to 2 decimal places, ordered by category name. A common warm-up question in Amazon data analyst rounds.

Schema
reviews
review_idproduct_idstars
11015
21023
31014
products
product_idcategory
101Electronics
102Books
Solution
SELECT p.category,
  ROUND(AVG(r.stars), 2) AS avg_rating
FROM reviews r
JOIN products p ON r.product_id = p.product_id
GROUP BY p.category
ORDER BY p.category;
05
GoogleBeginner
Rising Temperature
Self-join Weather to itself — compare each day's temp to the day before using julianday()

Find all dates where the temperature was higher than the previous day. Use a self-join on the Weather table and julianday() arithmetic to link each row to the row exactly 1 day earlier.

Schema
Weather
idrecordDatetemperature
12023-01-0110
22023-01-0225
32023-01-0320
42023-01-0430
Solution
SELECT w1.id
FROM Weather w1
JOIN Weather w2
  ON julianday(w1.recordDate) - julianday(w2.recordDate) = 1
WHERE w1.temperature > w2.temperature;
06
ClassicBeginnerSELECT
Second Highest Salary
Asked in almost every SQL interview — multiple approaches

Write a query to find the second highest distinct salary. If there is no second highest salary, return NULL. Most asked SQL question globally — know 3 approaches.

Schema
Employee
idsalary
1100
2200
3300
4300
Solution (3 approaches)
-- Approach 1: Subquery (most readable)
SELECT MAX(salary) AS SecondHighestSalary
FROM Employee
WHERE salary < (SELECT MAX(salary) FROM Employee);

-- Approach 2: LIMIT + OFFSET
SELECT DISTINCT salary
FROM Employee
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Approach 3: Window function (preferred in interviews)
SELECT salary AS SecondHighestSalary
FROM (
  SELECT salary,
         DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM Employee
) t
WHERE rnk = 2;
07
FlipkartBeginnerAggregates
Top 5 Customers by Order Count
COUNT + GROUP BY + ORDER BY + LIMIT

Find the top 5 customers by total number of orders placed. Show their name and order count, highest first. Frequently asked in Flipkart/e-commerce data rounds.

Schema
customers
idname
1Alice
2Bob
3Carol
orders
idcustomer_idamount
11500
21300
32700
Solution
SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
ORDER BY order_count DESC
LIMIT 5;
08
AmazonIntermediate
Department Top 3 Salaries
DENSE_RANK() OVER (PARTITION BY department)

Find the top 3 unique salaries in each department. This is the most commonly asked window function question. Uses DENSE_RANK so ties are handled correctly.

Schema
Employee
idnamesalarydeptId
1Alice900001
2Bob800001
3Carol700002
Department
idname
1Engineering
2Marketing
Solution
WITH ranked AS (
  SELECT d.name AS Department,
         e.name AS Employee,
         e.salary,
         DENSE_RANK() OVER (
           PARTITION BY e.deptId
           ORDER BY e.salary DESC
         ) AS rnk
  FROM Employee e
  JOIN Department d ON e.deptId = d.id
)
SELECT Department, Employee, salary
FROM ranked
WHERE rnk <= 3;
09
MicrosoftIntermediateSubqueries
Employees with Salary Above Department Average
Correlated subquery or CTE approach

Find employees whose salary is above the average salary of their own department. Classic correlated subquery or CTE — commonly asked at Microsoft, LinkedIn, and Oracle.

Schema
employees
idnamedeptsalary
1AliceEng90000
2BobEng70000
3CarolHR60000
4DaveHR55000
Solution
WITH dept_avg AS (
  SELECT dept, AVG(salary) AS avg_sal
  FROM employees
  GROUP BY dept
)
SELECT e.name, e.dept, e.salary
FROM employees e
JOIN dept_avg d ON e.dept = d.dept
WHERE e.salary > d.avg_sal
ORDER BY e.dept, e.salary DESC;
10
MetaIntermediateDate
Month-over-Month Revenue Growth
LAG() window function for period comparison

Calculate the month-over-month revenue growth percentage. Use LAG() to access the previous month's revenue and compute the percentage change.

Schema
monthly_orders
order_idamountorder_date
110002024-01-15
28002024-01-28
315002024-02-10
49002024-02-22
512002024-03-05
611002024-03-18
Solution
WITH monthly AS (
  SELECT
    strftime('%Y-%m', order_date) AS month,
    SUM(amount) AS revenue
  FROM monthly_orders
  GROUP BY 1
)
SELECT
  month,
  revenue,
  LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
  ROUND(
    (revenue - LAG(revenue) OVER (ORDER BY month))
    * 100.0
    / LAG(revenue) OVER (ORDER BY month), 2
  ) AS growth_pct
FROM monthly;
11
GoogleIntermediateWindow
7-Day Rolling Average of Daily Active Users
AVG() OVER (ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

Calculate a 7-day rolling average of daily active users. Asked extensively at Google, Twitter, and analytics-heavy companies. Tests window frame knowledge.

Schema
user_activity
activity_dateactive_users
2024-01-011200
2024-01-021350
2024-01-03980
Solution
SELECT
  activity_date,
  active_users,
  ROUND(
    AVG(active_users) OVER (
      ORDER BY activity_date
      ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ), 2
  ) AS rolling_7d_avg
FROM user_activity
ORDER BY activity_date;
12
SwiggyIntermediateCTEs
Restaurants with Declining Orders Month-over-Month
LAG + filter to find consecutive decline — Swiggy SQL round

Find all restaurants where the order count this month is lower than the previous month. Directly from Swiggy data analyst interview rounds.

Schema
restaurant_orders
restaurant_idmonthorder_count
R12024-01500
R12024-02420
R22024-01300
R22024-02350
Solution
WITH monthly AS (
  SELECT
    restaurant_id, month, order_count,
    LAG(order_count) OVER (
      PARTITION BY restaurant_id
      ORDER BY month
    ) AS prev_count
  FROM restaurant_orders
)
SELECT restaurant_id, month, order_count, prev_count
FROM monthly
WHERE order_count < prev_count;
13
FlipkartIntermediateSubqueries
Customers Who Ordered in Both 2023 and 2024
INTERSECT or double EXISTS/IN — Flipkart retention analysis

Find customers who placed at least one order in both 2023 and 2024 — a classic cohort retention question asked in Flipkart and e-commerce data interviews.

Schema
orders
order_idcustomer_idorder_date
1C12023-06-10
2C12024-02-15
3C22024-03-01
Solution
SELECT customer_id
FROM orders
WHERE YEAR(order_date) = 2023

INTERSECT

SELECT customer_id
FROM orders
WHERE YEAR(order_date) = 2024;
14
ClassicIntermediate
Rank Scores Without Gaps (DENSE_RANK)
RANK vs DENSE_RANK — a must-know difference

Rank all scores from highest to lowest. If two scores are tied, they should have the same rank. The next rank after a tie should NOT skip numbers — this is what DENSE_RANK does.

Schema
Scores
idscore
13.50
23.65
34.00
43.65
Solution
SELECT
  score,
  DENSE_RANK() OVER (ORDER BY score DESC) AS rank
FROM Scores
ORDER BY score DESC;
15
MicrosoftIntermediateWindow
Running Total of Sales by Date
SUM() OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)

Calculate a cumulative/running total of daily sales ordered by date. The most commonly tested window frame concept at Microsoft and Oracle interviews.

Schema
sales
sale_dateamount
2024-01-011000
2024-01-021500
2024-01-03800
Solution
SELECT
  sale_date,
  amount,
  SUM(amount) OVER (
    ORDER BY sale_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM sales;
16
AmazonExpert
Median Employee Salary per Department
ROW_NUMBER + COUNT — no PERCENTILE_CONT in MySQL

Find the median salary of employees in each department. There's no built-in MEDIAN in MySQL — requires row numbering + even/odd count logic. Frequently asked at Amazon and top analytics roles.

Schema
Employee
idcompanysalary
1A2341
2A341
3A15000
4B15000
Solution
WITH ranked AS (
  SELECT id, company, salary,
    ROW_NUMBER() OVER (PARTITION BY company ORDER BY salary) AS rn,
    COUNT(*) OVER (PARTITION BY company) AS cnt
  FROM Employee
)
SELECT id, company, salary
FROM ranked
WHERE rn IN (
  FLOOR((cnt + 1) / 2),
  CEIL((cnt + 1) / 2)
);
17
GoogleExpert
Human Traffic of Stadium — 3+ Consecutive Days
Row numbering gap trick to find consecutive sequences

Find all rows where 3 or more consecutive days had stadium traffic ≥ 100 people. The key insight: id - ROW_NUMBER() is constant for consecutive rows — the famous "gap-and-island" trick.

Schema
Stadium
idvisit_datepeople
12024-01-0110
22024-01-02109
32024-01-03150
42024-01-0499
52024-01-05145
62024-01-06200
72024-01-07120
Solution (Gap-and-Island)
WITH high_traffic AS (
  SELECT id, visit_date, people,
    id - ROW_NUMBER() OVER (ORDER BY id) AS grp
  FROM Stadium
  WHERE people >= 100
),
groups AS (
  SELECT grp
  FROM high_traffic
  GROUP BY grp
  HAVING COUNT(*) >= 3
)
SELECT h.id, h.visit_date, h.people
FROM high_traffic h
JOIN groups g ON h.grp = g.grp
ORDER BY h.visit_date;
18
ClassicExpert
Trips and Users — Cancellation Rate
Filter unbanned users + conditional aggregation

Calculate the cancellation rate of requests made by unbanned users between two dates. This multi-join + conditional aggregation problem appears in Uber, Lyft, and analytics interviews.

Schema
Trips
idclient_idstatusrequest_at
11completed2024-10-01
22cancelled_by_driver2024-10-01
Users
users_idbannedrole
1Noclient
2Yesclient
Solution
SELECT
  t.request_at AS Day,
  ROUND(
    SUM(CASE WHEN t.status != 'completed' THEN 1 ELSE 0 END)
    * 1.0 / COUNT(*), 2
  ) AS 'Cancellation Rate'
FROM Trips t
JOIN Users u ON t.client_id = u.users_id
  AND u.banned = 'No'
WHERE t.request_at BETWEEN '2024-10-01' AND '2024-10-03'
GROUP BY t.request_at;
19
AmazonExpertWindow
Managers with 5+ Direct Reports
Self-join + HAVING COUNT

Find the names of managers who have at least 5 direct reports. A self-join aggregation problem asked heavily at Amazon, Google, and LinkedIn.

Schema
Employee
idnamemanagerId
101AliceNULL
102Bob101
103Carol101
104Dave101
105Eve101
106Frank101
Solution
SELECT m.name
FROM Employee e
JOIN Employee m ON e.managerId = m.id
GROUP BY m.id, m.name
HAVING COUNT(e.id) >= 5;
20
MetaExpertCTEs
Consecutive Login Streak per User
DATE - ROW_NUMBER() gap trick for consecutive dates

Find each user's longest consecutive daily login streak. This uses the classic gap-and-island technique: subtracting ROW_NUMBER from the date to group consecutive days. Asked at Meta and Google.

Schema
logins
user_idlogin_date
12024-01-01
12024-01-02
12024-01-03
12024-01-05
22024-01-01
22024-01-02
Solution
WITH gaps AS (
  SELECT user_id, login_date,
    date(login_date, '-' || ROW_NUMBER() OVER (
      PARTITION BY user_id ORDER BY login_date
    ) || ' days') AS grp
  FROM logins
),
streaks AS (
  SELECT user_id, grp, COUNT(*) AS streak_len
  FROM gaps
  GROUP BY user_id, grp
)
SELECT user_id, MAX(streak_len) AS longest_streak
FROM streaks
GROUP BY user_id;
41
⚡ Interview BoostIntermediateAggregates
Zomato Sales Rep Contest — Build the Leaderboard
Each deal row records two sales reps who competed and who closed it — derive the full standings table

Zomato's sales team runs a head-to-head deal contest each quarter. Two reps compete per restaurant deal, and only one wins it. Given the raw contest results, build the full leaderboard showing each rep's deals played, won, lost, and points (2 per win). The catch: a rep can appear as either rep_1 or rep_2 — both columns must be counted.

Schema
deal_contest
rep_1rep_2winner
RahulPriyaRahul
PriyaAnkitAnkit
SnehaVikramSneha
VikramRahulRahul
AnkitSnehaAnkit
Expected Output
repplayedwonlostpoints
Rahul2204
Ankit2204
Sneha2112
Priya2020
Vikram2020
Solution
-- UNION ALL gives each rep both sides of every deal
WITH all_deals AS (
  SELECT rep_1 AS rep,
         CASE WHEN winner = rep_1 THEN 1 ELSE 0 END AS win
  FROM deal_contest
  UNION ALL
  SELECT rep_2 AS rep,
         CASE WHEN winner = rep_2 THEN 1 ELSE 0 END AS win
  FROM deal_contest
)
SELECT
  rep,
  COUNT(*)              AS played,
  SUM(win)              AS won,
  COUNT(*) - SUM(win)  AS lost,
  SUM(win) * 2          AS points
FROM all_deals
GROUP BY rep
ORDER BY points DESC, won DESC;
21
AmazonBeginnerAggregates
Count Orders by Status
Return the number of orders for each status, sorted from most to least frequent

Group the orders table by status and count how many orders fall into each group. Sort the results so the most common status appears first.

Schema
orders
order_idcustomer_idstatusorder_date
1101delivered2026-01-05
2102shipped2026-01-06
3103cancelled2026-01-07
4104delivered2026-01-08
Solution
SELECT status, COUNT(*) AS order_count
FROM orders
GROUP BY status
ORDER BY order_count DESC;
22
MicrosoftBeginnerSELECT
Top 5 Highest-Paid Employees
Retrieve the top 5 employees by salary; use name alphabetically as a tiebreaker

Return the 5 employees with the highest salaries. If two employees share the same salary, order them alphabetically by name.

Schema
employees
emp_idnamedepartmentsalary
1AliceEngineering120000
2BobMarketing85000
3CarolEngineering150000
4EveEngineering150000
Solution
SELECT emp_id, name, department, salary
FROM employees
ORDER BY salary DESC, name ASC
LIMIT 5;
23
ClassicBeginnerAggregates
Departments With Average Salary Above 90K
Find departments where average salary exceeds $90,000 — a classic HAVING trap

Find all departments where the average salary exceeds $90,000. Return department name and average salary rounded to 2 decimal places. Many candidates fail this by using WHERE instead of HAVING.

Schema
employees
emp_idnamedepartmentsalary
1AliceEngineering120000
2BobMarketing85000
3CarolEngineering150000
Solution
SELECT department,
       ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 90000
ORDER BY avg_salary DESC;
24
MetaBeginnerJOINs
Products With No Sales in 2025
Identify products that had zero sales — classic LEFT JOIN anti-join pattern

Find products that had zero sales transactions in 2025. The trick: put the year filter in the ON clause, not WHERE — otherwise you accidentally convert the LEFT JOIN to an INNER JOIN.

Schema
products
product_idproduct_namecategory
1LaptopElectronics
2HeadphonesElectronics
3Desk ChairFurniture
sales
sale_idproduct_idsale_daterevenue
112025-03-102400
222025-06-15750
312024-11-201200
Solution
SELECT p.product_id, p.product_name
FROM products p
LEFT JOIN sales s
       ON p.product_id = s.product_id
      AND strftime('%Y', s.sale_date) = '2025'
WHERE s.sale_id IS NULL;
25
ClassicBeginnerAggregates
Monthly Revenue Summary
Calculate total revenue per month for 2025 using date functions

Calculate the total revenue per month for 2025. Use strftime to extract the month number and sort results chronologically.

Schema
sales
sale_idproduct_idsale_daterevenue
112025-01-05900
222025-01-181200
332025-02-20600
442025-04-1048
Solution
SELECT strftime('%m', sale_date) AS month_num,
       ROUND(SUM(revenue), 2)     AS total_revenue
FROM sales
WHERE strftime('%Y', sale_date) = '2025'
GROUP BY month_num
ORDER BY month_num;
26
ClassicIntermediate
Employees Earning More Than Their Manager
Self-join on the same table to compare employee salary vs manager salary

Find all employees who earn a higher salary than their direct manager. The key insight: join the employees table to itself using the manager_id foreign key.

Schema
employees
emp_idnamemanager_idsalary
1CEONULL200000
2Alice1120000
3Carol2130000
4Dave280000
Solution
SELECT e.name AS employee_name,
       e.salary AS employee_salary,
       m.name   AS manager_name,
       m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.emp_id
WHERE e.salary > m.salary;
27
AmazonIntermediateSubqueries
Second Highest Salary Per Department
Find the second-highest salary in each department using a self-join (no window functions)

Find the second-highest salary within each department without using window functions. Use a self-join: pair each employee with every higher-paid colleague in the same dept, then HAVING COUNT = 1 means exactly one salary is higher.

Schema
employees
emp_idnamedepartmentsalary
1AliceEngineering120000
2CarolEngineering150000
3GraceMarketing95000
4BobMarketing85000
Solution
SELECT e1.department, e1.name, e1.salary AS second_highest
FROM employees e1
JOIN employees e2
     ON  e1.department = e2.department
    AND e1.salary < e2.salary
GROUP BY e1.department, e1.name, e1.salary
HAVING COUNT(DISTINCT e2.salary) = 1;
28
GoogleIntermediateAggregates
Pivot Monthly Sales by Category
Rotate rows into columns using CASE WHEN inside SUM — no PIVOT keyword needed

Pivot monthly revenue so each row is a month and columns show revenue for Electronics, Furniture, and Stationery. Use SUM(CASE WHEN) — the portable approach that works in all SQL dialects.

Schema
products
product_idproduct_namecategory
1LaptopElectronics
3ChairFurniture
sales
sale_idproduct_idsale_daterevenue
112025-01-052400
232025-01-18900
Solution
SELECT strftime('%Y-%m', s.sale_date) AS month,
       SUM(CASE WHEN p.category = 'Electronics' THEN s.revenue ELSE 0 END) AS electronics,
       SUM(CASE WHEN p.category = 'Furniture'   THEN s.revenue ELSE 0 END) AS furniture,
       SUM(CASE WHEN p.category = 'Stationery'  THEN s.revenue ELSE 0 END) AS stationery
FROM sales s
JOIN products p ON s.product_id = p.product_id
GROUP BY month
ORDER BY month;
29
MetaIntermediateAggregates
Sending vs Opening Snaps Ratio
Calculate send-to-open time ratio per age group using CASE WHEN inside SUM

For each age group, calculate the ratio of time spent sending snaps vs. opening snaps. Use NULLIF in the denominator to prevent division-by-zero errors. Real Snapchat/Meta question from DataLemur.

Schema
activities
user_idactivity_typetime_spentage_bucket
1send3.521-25
1open1.521-25
2send2.026-30
2open5.026-30
Solution
SELECT age_bucket,
       ROUND(
         SUM(CASE WHEN activity_type = 'send' THEN time_spent ELSE 0 END) /
         NULLIF(SUM(CASE WHEN activity_type = 'open' THEN time_spent ELSE 0 END), 0),
       2) AS send_to_open_ratio
FROM activities
GROUP BY age_bucket
ORDER BY age_bucket;
30
MicrosoftIntermediateWindow
3-Day Rolling Average Revenue
Use ROWS BETWEEN 2 PRECEDING AND CURRENT ROW to build a 3-day rolling window

Calculate a 3-day rolling average of daily revenue. The frame clause ROWS BETWEEN 2 PRECEDING AND CURRENT ROW restricts the window to exactly 3 rows. Using RANGE instead of ROWS would include tied dates — a subtle but important difference.

Schema
daily_revenue
rev_daterevenue
2025-01-011000
2025-01-021200
2025-01-03900
2025-01-041500
2025-01-051100
Solution
SELECT rev_date, revenue,
       ROUND(AVG(revenue) OVER (
         ORDER BY rev_date
         ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
       ), 2) AS rolling_3day_avg
FROM daily_revenue
ORDER BY rev_date;
31
MetaIntermediateAggregates
Active User Retention Month-Over-Month
Self-join with a month offset to count users active in both current and previous month

Find how many users active in a given month were also active the month before. A self-join links each month to the previous one via a date offset. Real Facebook/Meta DataLemur question.

Schema
user_activity
user_idactivity_month
12025-01-01
12025-02-01
22025-01-01
22025-02-01
32025-02-01
Solution
SELECT curr.activity_month,
       COUNT(DISTINCT curr.user_id) AS retained_users
FROM user_activity curr
JOIN user_activity prev
     ON  curr.user_id = prev.user_id
    AND  curr.activity_month = date(prev.activity_month, '+1 month')
GROUP BY curr.activity_month
ORDER BY curr.activity_month;
32
AmazonIntermediateWindow
Users' Third Transaction
ROW_NUMBER OVER PARTITION BY user_id — filter where rn = 3 for the third event

For each user, retrieve their third transaction chronologically. Use ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY txn_date) then filter where rn = 3. Classic DataLemur medium question.

Schema
transactions
txn_iduser_idamounttxn_date
110150.002025-01-01
210175.002025-01-15
3101120.002025-02-01
410290.002025-03-01
Solution
WITH ranked AS (
  SELECT *, ROW_NUMBER() OVER (
    PARTITION BY user_id ORDER BY txn_date
  ) AS rn
  FROM transactions
)
SELECT txn_id, user_id, amount, txn_date
FROM ranked
WHERE rn = 3;
33
AmazonExpertWindow
Nth Highest Salary — Generalized
DENSE_RANK handles ties correctly; LIMIT/OFFSET silently skips tied values

Find the Nth highest distinct salary. DENSE_RANK() is the correct approach — if two people share the top salary, the next distinct value ranks 2nd. Using LIMIT/OFFSET would silently skip ties and return a wrong answer.

Schema
employees
emp_idnamesalary
1Alice150000
2Bob150000
3Carol120000
4Dave95000
Solution
WITH ranked_salaries AS (
  SELECT DISTINCT salary,
         DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM employees
)
SELECT salary AS nth_highest_salary
FROM ranked_salaries
WHERE rnk = 2;  -- change 2 to any N
34
AmazonExpertWindow
Year-Over-Year Revenue Growth Rate
LAG() shifts prior year's value into the current row — divide for growth %

Calculate year-over-year revenue growth per category. LAG(revenue) OVER (PARTITION BY category ORDER BY year) fetches last year's value. Use NULLIF to guard against division by zero.

Schema
yearly_sales
sale_yearcategoryrevenue
2023Electronics500000
2024Electronics650000
2025Electronics780000
Solution
WITH yoy AS (
  SELECT sale_year, category, revenue,
         LAG(revenue) OVER (
           PARTITION BY category ORDER BY sale_year
         ) AS prev_revenue
  FROM yearly_sales
)
SELECT sale_year, category, revenue,
       ROUND(
         100.0 * (revenue - prev_revenue)
               / NULLIF(prev_revenue, 0), 1
       ) AS yoy_growth_pct
FROM yoy
WHERE prev_revenue IS NOT NULL
ORDER BY category, sale_year;
35
ClassicExpertWindow
Gaps and Islands: Active Subscription Periods
Subtract row_number as a day interval — consecutive dates produce the same group key

Find each user's continuous active subscription periods (start date, end date, duration). The gaps-and-islands trick: subtracting the row number (as days) from each consecutive active date produces the same constant — revealing each contiguous island.

Schema
subscription_status
user_idstatus_dateis_active
12025-01-011
12025-01-021
12025-01-031
12025-01-040
12025-01-051
Solution
WITH active_days AS (
  SELECT user_id, status_date,
         ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY status_date) AS rn
  FROM subscription_status
  WHERE is_active = 1
),
islands AS (
  SELECT user_id, status_date,
         date(status_date, '-' || rn || ' days') AS grp
  FROM active_days
)
SELECT user_id,
       MIN(status_date) AS period_start,
       MAX(status_date) AS period_end,
       COUNT(*) AS duration_days
FROM islands
GROUP BY user_id, grp
ORDER BY user_id, period_start;
36
MicrosoftExpertSubqueries
Recursive CTE: Full Management Hierarchy
Anchor on root nodes (no manager), recursively join children, build path string

Traverse the full org chart from CEO to every employee using a recursive CTE. The anchor selects root nodes (manager_id IS NULL). The recursive part joins each employee to their parent's CTE row, building the hierarchy level and path.

Schema
employees
emp_idnamemanager_id
1CEONULL
2Alice1
3Bob1
4Carol2
Solution
WITH RECURSIVE org AS (
  SELECT emp_id, name, manager_id,
         1 AS lvl, name AS path
  FROM employees
  WHERE manager_id IS NULL
  UNION ALL
  SELECT e.emp_id, e.name, e.manager_id,
         o.lvl + 1,
         o.path || ' > ' || e.name
  FROM employees e
  JOIN org o ON e.manager_id = o.emp_id
)
SELECT emp_id, name, lvl, path
FROM org
ORDER BY lvl, name;
37
GoogleExpertWindow
Sessionization: Group Events by 30-Min Inactivity
LAG detects gaps > 30 min; cumulative SUM of flags assigns incrementing session IDs

Group user clickstream events into sessions where any gap exceeding 30 minutes starts a new session. Two-CTE pattern: LAG() detects gaps, then SUM(new_session_flag) as a running count assigns the session ID. Real Google/Meta data engineering interview question.

Schema
clickstream
event_iduser_idevent_time
112025-01-01 09:00
212025-01-01 09:15
312025-01-01 10:25
412025-01-01 11:10
Solution
WITH gaps AS (
  SELECT event_id, user_id, event_time,
    CASE
      WHEN (julianday(event_time) -
            julianday(LAG(event_time) OVER (
              PARTITION BY user_id ORDER BY event_time
            ))) * 1440 > 30
      OR  LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) IS NULL
      THEN 1 ELSE 0
    END AS new_session
  FROM clickstream
),
sessions AS (
  SELECT *,
    SUM(new_session) OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
  FROM gaps
)
SELECT user_id, session_id,
       MIN(event_time) AS session_start,
       MAX(event_time) AS session_end
FROM sessions
GROUP BY user_id, session_id
ORDER BY user_id, session_id;
38
ClassicExpertWindow
Deduplication: Keep Latest Record Per Customer
ROW_NUMBER ORDER BY updated_at DESC — rn=1 is always the freshest row

A customer table has duplicates from system retries. Keep only the most recent record per customer. ROW_NUMBER() PARTITION BY customer_id ORDER BY updated_at DESC assigns rank 1 to the freshest row. Note: DENSE_RANK would fail here — tied timestamps would both get rank 1.

Schema
customer_records
record_idcustomer_idnameupdated_at
11001Alice2025-01-01 10:00
21001Alice2025-03-15 14:30
31002Bob2025-02-01 09:00
41002Bob2025-02-01 09:00
Solution
WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY customer_id
      ORDER BY updated_at DESC
    ) AS rn
  FROM customer_records
)
SELECT record_id, customer_id, name, updated_at
FROM ranked
WHERE rn = 1;
39
AmazonExpertWindow
Running Balance: Before and After Each Transaction
SUM UNBOUNDED PRECEDING for balance after; LAG of that for balance before

Show each transaction with the account balance both before and after it. Three-step pattern: (1) convert debits to negative amounts; (2) running SUM ... ROWS UNBOUNDED PRECEDING for balance-after; (3) LAG() of that value for balance-before.

Schema
account_txns
txn_idaccount_idtxn_dateamounttxn_type
110012025-01-05500credit
210012025-01-12200debit
310012025-01-201000credit
Solution
WITH signed AS (
  SELECT *,
    CASE WHEN txn_type = 'credit' THEN  amount
         WHEN txn_type = 'debit'  THEN -amount
    END AS signed_amt
  FROM account_txns
),
running AS (
  SELECT *,
    SUM(signed_amt) OVER (
      PARTITION BY account_id
      ORDER BY txn_date, txn_id
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS balance_after
  FROM signed
)
SELECT txn_id, txn_date, amount, txn_type,
  COALESCE(LAG(balance_after) OVER (
    PARTITION BY account_id ORDER BY txn_date, txn_id
  ), 0) AS balance_before,
  balance_after
FROM running
ORDER BY account_id, txn_date, txn_id;
40
ClassicExpertWindow
Follow-Up Purchase Detection With LEAD
LEAD() looks at the NEXT row per user — check if the next product is the target item

Find what % of users who bought Product A immediately bought Product B next. LEAD(product) retrieves the very next purchase per user. Filter where current = A and check next = B. Real Apple/DataLemur hard question pattern.

Schema
purchase_history
txn_idcustomer_idproductpurchased_at
1101iPhone2025-01-05
2101AirPods2025-01-10
3102iPhone2025-02-01
4102Charger2025-02-05
Solution
WITH next_product AS (
  SELECT customer_id, product AS current_product,
    LEAD(product) OVER (
      PARTITION BY customer_id ORDER BY purchased_at
    ) AS next_product
  FROM purchase_history
)
SELECT ROUND(
  100.0 * COUNT(CASE WHEN next_product = 'AirPods' THEN 1 END) /
  NULLIF(COUNT(*), 0), 1
) AS airpods_followup_pct
FROM next_product
WHERE current_product = 'iPhone';
42
🌍 NewbieSELECT
Your First SQL Query
SELECT * fetches every row and every column — the starting point of all SQL

Write a query to show all records from the students table. The asterisk * is a wildcard meaning "all columns". This is the very first query every SQL learner writes.

Schema
students
idnamegradecity
1ArjunADelhi
2PriyaBMumbai
3RajAPune
Solution
SELECT *
FROM students;
43
🌍 NewbieSELECT
Choose Specific Columns
Only fetch the columns you actually need — avoid pulling unnecessary data

Retrieve only the name and price columns from the products table. In real systems, selecting specific columns is faster than SELECT * because less data travels over the network.

Schema
products
idnamecategorypricestock
1MouseElectronics799150
2MonitorElectronics499925
3NotebookStationery299500
Solution
SELECT name, price
FROM products;
44
🌍 NewbieSELECT
Filter Rows with WHERE
WHERE lets you pick only the rows that match a condition

Show only employees who work in the Engineering department. The WHERE clause filters rows before they are returned — think of it as the "row filter" in SQL.

Schema
employees
idnamedepartmentsalary
1AliceEngineering95000
2BobSales65000
3CarolEngineering110000
4DaveHR70000
Solution
SELECT name, salary
FROM employees
WHERE department = 'Engineering';
45
🌍 NewbieSELECT
Sort Results with ORDER BY
ASC = lowest to highest (default), DESC = highest to lowest

Retrieve all products and sort them by price from cheapest to most expensive. ORDER BY col ASC is ascending (smallest first); DESC flips it. Without ORDER BY, SQL gives back rows in no guaranteed order.

Schema
products
idnameprice
1Mouse799
2Monitor4999
3Notebook299
Solution
SELECT name, price
FROM products
ORDER BY price ASC;
46
🌍 NewbieSELECT
Get the Top 3 Students by Score
LIMIT caps the number of rows returned — combine with ORDER BY to get a true "top N"

Return the 3 students with the highest exam scores. Always pair LIMIT with ORDER BY — otherwise you get a random 3 rows, not the top 3.

Schema
students
idnamescore
1Arjun88
2Priya95
3Raj76
4Sneha91
5Vikram83
Solution
SELECT name, score
FROM students
ORDER BY score DESC
LIMIT 3;
47
🌍 NewbieAggregates
Count Total Rows with COUNT
COUNT(*) counts every row including NULLs; COUNT(col) skips NULL values in that column

Find the total number of orders in the orders table. COUNT(*) is the most common aggregate function — it counts every row regardless of NULLs. Adding AS total_orders gives the column a readable name.

Schema
orders
idcustomeramountstatus
1Alice799delivered
2Bob1299shipped
3Carol4999delivered
Solution
SELECT COUNT(*) AS total_orders
FROM orders;
48
🌍 NewbieAggregates
Calculate Average Salary with AVG
AVG sums all values and divides by count — it automatically ignores NULL values

Find the average salary of all employees. AVG(salary) adds up every salary and divides by the number of rows. Use ROUND(AVG(salary), 0) to remove decimal places when you want a clean number.

Schema
employees
idnamesalary
1Alice95000
2Bob72000
3Carol110000
4Dave83000
Solution
SELECT ROUND(AVG(salary), 0) AS avg_salary
FROM employees;
49
🌍 NewbieAggregates
Count Employees per Department
GROUP BY collapses rows with the same value into one group for aggregation

Show how many employees are in each department. GROUP BY department creates one row per unique department value, then COUNT(*) counts the employees inside each group. This is the most asked beginner aggregate question.

Schema
employees
idnamedepartment
1AliceEngineering
2BobSales
3CarolEngineering
4DaveHR
5EveSales
Solution
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department
ORDER BY headcount DESC;
50
⚡ Interview BoostIntermediateWindow Functions
Pareto Principle — Identify the 20% of Products Driving 80% of Revenue
Use a running cumulative SUM window to rank products and find which ones together cross the 80% revenue threshold

Your analytics team at Flipkart notices that a small group of products drives the majority of revenue. Following the Pareto (80/20) rule, write a SQL query to find all products whose combined revenue — when ranked from highest to lowest — accounts for the first 80% of total revenue. Include the cumulative revenue percentage in the result. This pattern is one of the most frequently asked business analytics questions in SQL interviews.

Schema
products
product_idproduct_namerevenue
1Premium Laptop50000
24K Monitor30000
3Keyboard8000
4USB-C Hub6000
5Mouse Pad3000
6Webcam2000
7Desk Organiser800
8Cable Clips200
Solution 1 — Running Sum (Best for Interviews)
-- Running cumulative SUM to pinpoint the 80% revenue threshold
WITH ranked AS (
  SELECT
    product_id, product_name, revenue,
    SUM(revenue) OVER (
      ORDER BY revenue DESC
      ROWS UNBOUNDED PRECEDING
    ) AS running_total,
    SUM(revenue) OVER () AS total_revenue
  FROM products
)
SELECT
  product_id, product_name, revenue,
  ROUND(running_total * 100.0 / total_revenue, 1) AS cumulative_pct
FROM ranked
WHERE running_total - revenue < total_revenue * 0.8;
Solution 2 — NTILE (Top 20% by Row Count)
-- NTILE(5) splits into 5 equal buckets; bucket 1 = top 20% of rows
SELECT product_id, product_name, revenue
FROM (
  SELECT product_id, product_name, revenue,
    NTILE(5) OVER (ORDER BY revenue DESC) AS quintile
  FROM products
) t
WHERE quintile = 1;
Solution 3 — PERCENT_RANK (Top 20% by Relative Rank)
-- PERCENT_RANK = 0 for the top row; ≤ 0.2 means top 20% of products
SELECT product_id, product_name, revenue
FROM (
  SELECT product_id, product_name, revenue,
    PERCENT_RANK() OVER (ORDER BY revenue DESC) AS pct_rank
  FROM products
) t
WHERE pct_rank <= 0.2;
51
⚡ Interview BoostIntermediateCTEs
New vs Repeat Customer Report — Daily Acquisition & Retention Breakdown
Classify each order as new or repeat by comparing its date to the customer's first-ever order date, then count both types per day

Swiggy's growth team needs a daily customer classification report. For each order date, determine whether each order belongs to a new customer (placing their very first order on that day) or a repeat customer (who ordered at least once before). Output each date with the count of new and repeat customers side by side. This is one of the most common business analytics questions in data analyst interviews — it tests your ability to break a problem into steps using CTEs and conditional aggregation.

Schema
orders
customer_idorder_dateamount
C12024-01-10250
C22024-01-10180
C12024-01-11320
C32024-01-11150
C22024-01-12200
C42024-01-1290
C12024-01-12410
C32024-01-13270
C52024-01-13130
Solution 1 — CTE + JOIN + CASE WHEN (Best for Interviews)
-- Step 1: find each customer's first ever order date
WITH first_order AS (
  SELECT customer_id, MIN(order_date) AS first_date
  FROM orders
  GROUP BY customer_id
)
-- Step 2: join back, classify, then aggregate per day
SELECT
  o.order_date,
  SUM(CASE WHEN o.order_date = f.first_date THEN 1 ELSE 0 END) AS new_customers,
  SUM(CASE WHEN o.order_date > f.first_date THEN 1 ELSE 0 END) AS repeat_customers
FROM orders o
JOIN first_order f ON o.customer_id = f.customer_id
GROUP BY o.order_date
ORDER BY o.order_date;
Solution 2 — Window MIN() in CTE (No JOIN needed)
-- MIN() OVER (PARTITION BY customer_id) gives first date without a separate GROUP BY
WITH classified AS (
  SELECT
    customer_id, order_date,
    MIN(order_date) OVER (PARTITION BY customer_id) AS first_date
  FROM orders
)
SELECT
  order_date,
  SUM(CASE WHEN order_date = first_date THEN 1 ELSE 0 END) AS new_customers,
  SUM(CASE WHEN order_date > first_date THEN 1 ELSE 0 END) AS repeat_customers
FROM classified
GROUP BY order_date
ORDER BY order_date;
Solution 3 — Two-CTE Step-by-Step (Most Readable)
-- CTE 1: tag every order as new or repeat
WITH first_order AS (
  SELECT customer_id, MIN(order_date) AS first_date
  FROM orders GROUP BY customer_id
),
tagged AS (
  SELECT o.order_date,
    CASE WHEN o.order_date = f.first_date THEN 'new' ELSE 'repeat' END AS cust_type
  FROM orders o
  JOIN first_order f ON o.customer_id = f.customer_id
)
-- CTE 2: count each type per day
SELECT order_date,
  SUM(CASE WHEN cust_type = 'new'    THEN 1 ELSE 0 END) AS new_customers,
  SUM(CASE WHEN cust_type = 'repeat' THEN 1 ELSE 0 END) AS repeat_customers
FROM tagged
GROUP BY order_date ORDER BY order_date;
52
⚡ Interview BoostExpertCTEsWindow
Co-Working Space Activity Report — Total Visits, Top Floor & Resources Per Member
Combine COUNT, ROW_NUMBER mode-finding, and STRING_AGG into one member-level summary — the classic three-technique product interview question

Smartworks, a co-working space provider, logs every member check-in: which floor they visit and which resource (laptop station, desktop, monitor, etc.) they use. The analytics team wants a member-level summary report — total check-ins, the floor each member visits most often (the statistical mode, not the maximum), and every distinct resource they have ever used, as a comma-separated list. This question is a staple in product-company interviews because it requires three distinct SQL techniques working together: plain aggregation, window-based mode calculation (SQL has no MODE() function), and string aggregation with deduplication.

Schema
cowork_visits
memberfloorresource
Priya1CPU
Priya1LAPTOP
Priya2MONITOR
Rahul2DESKTOP
Rahul2DESKTOP
Rahul1PRINTER
Solution 1 — Three-CTE Pattern: visit_count + floor_mode + resource_list (SQLite / MySQL)
WITH visit_count AS (
  SELECT member, COUNT(*) AS total_visits
  FROM cowork_visits
  GROUP BY member
),
floor_mode AS (
  -- rank floors by visit frequency; rn=1 is the most visited
  SELECT member, floor,
    ROW_NUMBER() OVER (
      PARTITION BY member
      ORDER BY COUNT(*) DESC
    ) AS rn
  FROM cowork_visits
  GROUP BY member, floor
),
resource_list AS (
  SELECT member,
    GROUP_CONCAT(DISTINCT resource) AS resources
  FROM cowork_visits
  GROUP BY member
)
SELECT
  v.member, v.total_visits,
  f.floor   AS top_floor,
  r.resources
FROM visit_count v
JOIN floor_mode f
  ON v.member = f.member AND f.rn = 1
JOIN resource_list r
  ON v.member = r.member
ORDER BY v.member;
Solution 2 — Two CTEs: floor_mode + summary (PostgreSQL / SQL Server)
WITH floor_mode AS (
  SELECT member, floor,
    ROW_NUMBER() OVER (
      PARTITION BY member
      ORDER BY COUNT(*) DESC
    ) AS rn
  FROM cowork_visits
  GROUP BY member, floor
),
summary AS (
  SELECT member,
    COUNT(*) AS total_visits,
    STRING_AGG(DISTINCT resource, ','
      ORDER BY resource) AS resources
  FROM cowork_visits
  GROUP BY member
)
SELECT
  s.member, s.total_visits,
  f.floor AS top_floor,
  s.resources
FROM summary s
JOIN floor_mode f
  ON s.member = f.member AND f.rn = 1
ORDER BY s.member;
Solution 3 — MySQL / SQLite: GROUP_CONCAT instead of STRING_AGG
-- GROUP_CONCAT for MySQL and SQLite (no STRING_AGG support)
WITH floor_mode AS (
  SELECT member, floor,
    ROW_NUMBER() OVER (
      PARTITION BY member
      ORDER BY COUNT(*) DESC
    ) AS rn
  FROM cowork_visits
  GROUP BY member, floor
),
summary AS (
  SELECT member,
    COUNT(*) AS total_visits,
    GROUP_CONCAT(DISTINCT resource) AS resources
  FROM cowork_visits
  GROUP BY member
)
SELECT
  s.member, s.total_visits,
  f.floor AS top_floor,
  s.resources
FROM summary s
JOIN floor_mode f
  ON s.member = f.member AND f.rn = 1
ORDER BY s.member;
53
⚡ Interview BoostExpertGaps & IslandsWindow
Server Health Window Report — Group Consecutive Uptime & Downtime Periods
Find contiguous same-status date ranges using the ROW_NUMBER subtraction island trick — SQL's canonical answer to "find consecutive groups"

Razorpay's SRE team runs a daily health check on their payment gateway. Each day is logged as up (all systems operational) or down (incident detected). For SLA reporting and incident post-mortems, the team needs a condensed view: each contiguous window of the same status collapsed into a single row with its start date, end date, and status. This is the classic Gaps and Islands problem — SQL has no built-in way to detect "consecutive same-value runs", so you need a window-function trick to assign each island a unique group key.

Schema
gateway_log
check_datestatus
2024-01-01up
2024-01-02up
2024-01-03up
2024-01-04down
2024-01-05down
2024-01-06up
Solution 1 — ROW_NUMBER Subtraction (Most Portable — No Date Math)
WITH grp AS (
  SELECT check_date, status,
    ROW_NUMBER() OVER (ORDER BY check_date) -
    ROW_NUMBER() OVER (
      PARTITION BY status
      ORDER BY check_date
    ) AS grp_id
  FROM gateway_log
)
SELECT
  MIN(check_date) AS start_date,
  MAX(check_date) AS end_date,
  status
FROM grp
GROUP BY grp_id, status
ORDER BY start_date;
Solution 2 — LAG Change Detection + Cumulative SUM
WITH changes AS (
  SELECT check_date, status,
    CASE WHEN LAG(status)
      OVER (ORDER BY check_date) = status
      THEN 0 ELSE 1 END AS is_start
  FROM gateway_log
),
grps AS (
  SELECT check_date, status,
    SUM(is_start) OVER
      (ORDER BY check_date) AS grp_id
  FROM changes
)
SELECT
  MIN(check_date) AS start_date,
  MAX(check_date) AS end_date,
  status
FROM grps
GROUP BY grp_id, status
ORDER BY start_date;
Solution 3 — Date − ROW_NUMBER (SQLite-adapted version of the classic SQL Server approach)
-- SQL Server uses DATEADD(day, -rn, date); SQLite uses DATE(date, '-N days')
WITH grp AS (
  SELECT check_date, status,
    DATE(check_date,
      '-' || ROW_NUMBER() OVER (
        PARTITION BY status
        ORDER BY check_date
      ) || ' days') AS grp_key
  FROM gateway_log
)
SELECT
  MIN(check_date) AS start_date,
  MAX(check_date) AS end_date,
  status
FROM grp
GROUP BY grp_key, status
ORDER BY start_date;
54
⚡ Interview BoostIntermediateJOIN Row Count
INNER JOIN Row Count — Predict Output Rows with a One-sided Duplicate Key
Formula: for each matching key K, output rows = count(K in t1) × count(K in t2). Key=1 appears twice in t1 and once in t2 — that's 2×1=2 rows, not 1.

A Flipkart data team is auditing a JOIN before writing a pipeline. Table t1 has 3 rows and t2 has 3 rows — yet id=1 is duplicated in t1. A junior engineer assumes INNER JOIN returns 3 rows since both tables have 3 rows. How many rows does t1 INNER JOIN t2 ON t1.id = t2.id actually return — and why does the duplicate key matter?

Schema
t1
id
1
1
2
t2
id
1
2
3
Solution — INNER JOIN: 3 rows (key-by-key multiplication)
-- Key 1: 2 rows in t1 × 1 row in t2 = 2 rows
-- Key 2: 1 row in t1 × 1 row in t2 = 1 row
-- Key 3: no match in t1 → 0 rows
-- Total: 3 rows
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
INNER JOIN t2 ON t1.id = t2.id;
55
⚡ Interview BoostIntermediateJOIN Explosion
INNER JOIN Explosion — Predict Row Count When Both Tables Have Duplicate Keys
When both sides have duplicates for key=1 (t1 has 2, t2 has 3), INNER JOIN creates a 2×3=6 Cartesian product for that key group — larger than both input tables combined.

Amazon's data team joins two tables before a nightly aggregation. Both tables have duplicate entries for the same key value. A classic interview trap: before running the query, predict the exact row count. Most candidates guess 3 or 4 (min or max of the input table sizes) — both are wrong. The actual result shocks them.

Schema
t1
id
1
1
2
t2
id
1
1
1
3
Solution — INNER JOIN Explosion: 6 rows (Cartesian product per key group)
-- Key 1: 2 rows in t1 × 3 rows in t2 = 6 rows  ← Cartesian explosion!
-- Key 2: no match in t2 → 0 rows
-- Key 3: no match in t1 → 0 rows
-- Total: 6 rows (larger than both input tables!)
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
INNER JOIN t2 ON t1.id = t2.id;
56
⚡ Interview BoostIntermediateLEFT JOINNULL Handling
LEFT JOIN Row Count — Unmatched Left Rows Survive as NULL
LEFT JOIN = INNER JOIN rows + one NULL-padded row per unmatched t1 key. Key=1 explodes to 6 rows; key=2 (only in t1) adds 1 NULL row. Total: 7.

Same tables as Q55, but switching to LEFT JOIN. LEFT JOIN guarantees every row in t1 appears in the output — matched or not. t1's key=2 has no match in t2. How many rows does the LEFT JOIN return? Compare to Q55's INNER JOIN answer of 6.

Schema
t1
id
1
1
2
t2
id
1
1
1
3
Solution — LEFT JOIN: 7 rows (6 matched + 1 unmatched from t1)
-- INNER part — key=1: 2 × 3 = 6 rows
-- LEFT-only — key=2 in t1, no match in t2: 1 row (t2_id = NULL)
-- key=3 is only in t2 → LEFT JOIN does not preserve it
-- Total: 6 + 1 = 7 rows
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
LEFT JOIN t2 ON t1.id = t2.id;
57
⚡ Interview BoostIntermediateRIGHT JOINNULL Handling
RIGHT JOIN Row Count — Unmatched Right Rows Survive as NULL
RIGHT JOIN = INNER JOIN rows + one NULL-padded row per unmatched t2 key. Key=3 (only in t2) adds 1 NULL row — same count as LEFT JOIN, but a different unmatched key.

Same tables, now RIGHT JOIN. Every t2 row must appear — matched or not. t2's key=3 has no match in t1 and gets NULL for t1.id. But t1's key=2 (unmatched) is now dropped. Notice: LEFT JOIN and RIGHT JOIN both return 7 rows here, but they preserve different unmatched keys.

Schema
t1
id
1
1
2
t2
id
1
1
1
3
Solution — RIGHT JOIN: 7 rows (6 matched + 1 unmatched from t2)
-- INNER part — key=1: 2 × 3 = 6 rows
-- RIGHT-only — key=3 in t2, no match in t1: 1 row (t1_id = NULL)
-- key=2 is only in t1 → RIGHT JOIN drops it (LEFT JOIN would keep it)
-- Total: 6 + 1 = 7 rows
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
RIGHT JOIN t2 ON t1.id = t2.id;
-- SQLite 3.39+ supports RIGHT JOIN natively
-- Equivalent: FROM t2 LEFT JOIN t1 ON t2.id = t1.id
58
⚡ Interview BoostExpertAll 4 JOINsFULL OUTER
All 4 JOINs Compared — Predict INNER / LEFT / RIGHT / FULL OUTER Row Counts
INNER first: (2×2)+(1×1)=5. LEFT adds t1-only key=3 → 6. RIGHT adds t2-only key=4 → 6. FULL adds both: 5+1+1=7. Rule: INNER ≤ LEFT, INNER ≤ RIGHT, max(LEFT,RIGHT) ≤ FULL.

The definitive JOIN interview question. Given t1 = [1,1,2,3] and t2 = [1,1,2,4], predict the row count for all four JOIN types. Keys 1 and 2 are shared; key 3 is exclusive to t1; key 4 is exclusive to t2. Master the formula INNER ≤ LEFT, INNER ≤ RIGHT, max(LEFT, RIGHT) ≤ FULL — this appears in senior data engineering interviews at every major tech company.

Schema
t1
id
1
1
2
3
t2
id
1
1
2
4
Solution 1 — INNER JOIN: 5 rows
-- Key 1: 2 (t1) × 2 (t2) = 4 rows
-- Key 2: 1 (t1) × 1 (t2) = 1 row
-- Key 3: only in t1 → 0 rows  |  Key 4: only in t2 → 0 rows
-- INNER JOIN total: 5 rows
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
INNER JOIN t2 ON t1.id = t2.id;
Solution 2 — All 4 JOINs compared (SQLite-compatible FULL OUTER emulation)
-- Results: INNER=5, LEFT=6, RIGHT=6, FULL OUTER=7
SELECT 'INNER JOIN' AS join_type,
       COUNT(*) AS row_count
FROM t1 INNER JOIN t2 ON t1.id = t2.id
UNION ALL
SELECT 'LEFT JOIN', COUNT(*)
FROM t1 LEFT JOIN t2 ON t1.id = t2.id
UNION ALL
SELECT 'RIGHT JOIN', COUNT(*)
FROM t2 LEFT JOIN t1 ON t2.id = t1.id
UNION ALL
SELECT 'FULL OUTER', COUNT(*) FROM (
  SELECT t1.id, t2.id
  FROM t1 LEFT JOIN t2 ON t1.id = t2.id
  UNION ALL
  SELECT t1.id, t2.id
  FROM t2 LEFT JOIN t1 ON t2.id = t1.id
  WHERE t1.id IS NULL
);
59
⚡ Interview BoostIntermediateNULL TrapJOIN
NULL in JOIN Key — NULLs Never Match in Any JOIN Type
NULL = NULL evaluates to UNKNOWN, not TRUE. Both the t1 NULL and the t2 NULL appear as unmatched rows in LEFT/RIGHT JOIN — they never pair with each other or with anything else.

Zomato's data team joins an orders table (t1) with a customers table (t2). Both tables have a NULL key row — unassigned records. A junior engineer expects the two NULLs to pair up in the INNER JOIN. They won't. SQL uses three-valued logic: the ON clause keeps only rows where the condition is TRUE. NULL = NULL is UNKNOWN — not TRUE — so NULL keys are invisible to INNER JOIN and appear only as unmatched rows in LEFT/RIGHT JOIN.

Schema
t1
id
1
2
NULL
t2
id
1
NULL
3
Solution 1 — INNER JOIN: 1 row (NULLs never match)
-- NULL = NULL → UNKNOWN → treated as no match
-- Key 1: 1 × 1 = 1 row
-- id=2 (t1): no t2 match → 0 rows
-- id=NULL (t1): NULL = NULL is UNKNOWN → 0 rows
-- id=NULL (t2): same — never matches any t1 row
-- Total INNER: 1 row
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
INNER JOIN t2 ON t1.id = t2.id;
Solution 2 — LEFT JOIN: 3 rows (NULL-keyed t1 row survives as unmatched)
-- LEFT JOIN output:
-- (1, 1)     ← matched
-- (2, NULL)  ← t1.id=2 unmatched, t2 side is NULL
-- (NULL, NULL) ← t1.id=NULL unmatched (NULL key never matches)
-- Total LEFT: 3 rows
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
LEFT JOIN t2 ON t1.id = t2.id;
-- To force NULL = NULL to match, use IS (SQLite NULL-safe equals):
-- INNER JOIN t2 ON t1.id IS t2.id  → 2 rows: (1,1) and (NULL,NULL)
60
⚡ Interview BoostExpertNULL vs EmptyJOIN
Empty String vs NULL in JOIN — '' Matches, NULL Never Does
'' (empty string) is a value of length zero — it equals another '' and DOES match in JOIN. NULL is the absence of any value — NULL = '' is UNKNOWN and never matches anything.

Swiggy's data team finds product IDs contain both NULL entries (missing data) and empty string entries (data entry errors). t1 has key '1', key '', and key NULL. t2 has key '1', key '', and key '2'. In most UIs both NULL and '' display as blank — but SQL treats them completely differently. Predict the INNER JOIN row count and which rows appear.

Schema
t1
id (TEXT)
1
'' (empty)
NULL
t2
id (TEXT)
1
'' (empty)
2
Solution 1 — INNER JOIN: 2 rows ('' matches, NULL does not)
-- '' = '' is TRUE  → empty strings match each other!
-- NULL = '' is UNKNOWN → NULL key never matches
-- Key '1': 1 × 1 = 1 row
-- Key '' : 1 × 1 = 1 row  ← empty string IS a matchable value
-- Key NULL (t1): NULL = anything → UNKNOWN → 0 rows
-- Total INNER: 2 rows
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
INNER JOIN t2 ON t1.id = t2.id;
Solution 2 — Force NULL to match using IS (SQLite NULL-safe equals)
-- Standard ON t1.id = t2.id → 2 rows (NULL never matches)
-- NULL-safe ON t1.id IS t2.id → 2 rows (no NULL in t2 to pair with)
-- To see NULL-safe behavior, add NULL to t2 and compare:
-- INSERT INTO t2 VALUES(NULL);
-- ON t1.id = t2.id   → still 2 rows (NULL = NULL = UNKNOWN)
-- ON t1.id IS t2.id  → 3 rows (NULL IS NULL = TRUE in SQLite)
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
INNER JOIN t2 ON t1.id IS t2.id;
-- PostgreSQL equivalent: ON t1.id IS NOT DISTINCT FROM t2.id
61
⚡ Interview BoostIntermediateSCD Type 1Data Warehouse
SCD Type 1 — Overwrite: Apply a Price Change with No History Kept
SCD Type 1 simply overwrites the existing row. The old value is gone forever. Use UPDATE, not INSERT — INSERT creates a duplicate row for the same entity.

Flipkart's data warehouse team manages a product dimension table. iPhone 11 was listed at ₹10,000 on 2022-01-01. On 2022-03-01 the price drops to ₹8,000. Apply Slowly Changing Dimension Type 1: overwrite the record. Historical price is not needed — only the current state matters. Which SQL operation correctly implements SCD Type 1?

Schema (current state — before price change)
product
product_idproduct_nameprice
1iphone1110000
2iphone1215000
Solution — SCD Type 1: UPDATE the existing row (history discarded)
-- SCD Type 1: overwrite — old price 10000 is lost forever
UPDATE product
SET   price       = 8000,
      update_date = '2022-03-01'
WHERE product_id = 1;

-- After update: only the new price remains
SELECT * FROM product WHERE product_id = 1;
62
⚡ Interview BoostExpertSCD Type 2Data Warehouse
SCD Type 2 — Full History: Expire the Old Row, Insert the New Row
SCD Type 2 keeps every version. Two-step: (1) set end_date + is_current=0 on the old row, (2) INSERT a new row with start_date + is_current=1. Never UPDATE the price column.

The finance team needs full price history for audits — every price change must be preserved. Apply SCD Type 2: keep the old record with an expiry date and a is_current=0 flag, then insert a new current record. This table can be queried for both current state and any historical point in time.

Schema (before price change — only current row exists)
product_history
product_idpriceis_current
1100001
Solution — SCD Type 2: Expire old row → Insert new row
-- Step 1: expire the current row (close its time window)
UPDATE product_history
SET   end_date   = '2022-02-28',
      is_current = 0
WHERE product_id = 1
  AND is_current = 1;

-- Step 2: insert the new current row (open new time window)
INSERT INTO product_history
  (product_id, product_name, price, start_date, end_date, is_current)
VALUES
  (1, 'iphone11', 8000, '2022-03-01', NULL, 1);

-- Verify: two rows now exist for product_id=1
SELECT product_id, price, start_date, end_date, is_current
FROM product_history
WHERE product_id = 1;
63
⚡ Interview BoostIntermediateSCD Type 3Data Warehouse
SCD Type 3 — Previous Value Column: Keep Exactly One Version Back
SCD Type 3 adds a prev_price column to the row. On change: shift current_price → prev_price, set new current_price. Only the immediately previous value is retained — older history is gone.

Marketing needs to show "was ₹X, now ₹Y" on the product page — only the previous price matters, not the full history. Apply SCD Type 3: add a prev_price column to the product table. On each price change, shift the current value to prev_price and write the new price to current_price. Simple, compact — but limited to one version back.

Schema (before price change — prev_price is NULL)
product
product_idcurrent_priceprev_price
110000NULL
Solution — SCD Type 3: Shift current → prev, write new current
-- SCD Type 3: shift column, no new row created
UPDATE product
SET   prev_price    = current_price,   -- 10000 moves to prev
      current_price = 8000,             -- new price
      update_date   = '2022-03-01'
WHERE product_id = 1;

-- After: 1 row, prev_price=10000, current_price=8000
SELECT product_id, current_price, prev_price
FROM product;

-- ⚠ Limitation: if price changes again (8000→7000),
-- prev_price becomes 8000 and 10000 is lost forever.
64
⚡ Interview BoostIntermediateSCD Type 2Query Pattern
SCD Type 2 — Query Current State: Get Only the Active Row per Product
Filter WHERE is_current = 1 (or equivalently WHERE end_date IS NULL) to get the single active row per entity. Without this filter, all historical rows are returned, mixing stale prices into reports.

After applying SCD Type 2, the product_history table contains both historical and current rows for iPhone 11. An analyst needs only the current (active) price for each product. Without a filter, the query returns every version — mixing old and new prices into the same result set. Which filter isolates only the live record?

Schema (SCD Type 2 table with history)
product_history
product_idpriceis_current
1100000
180001
2150001
Solution — Filter by is_current = 1 (or end_date IS NULL)
-- Get only the active (current) row per product
SELECT product_id, product_name, price
FROM product_history
WHERE is_current = 1;

-- Equivalent alternative: open-ended rows have no end_date
-- WHERE end_date IS NULL

-- Returns: product 1 → 8000, product 2 → 15000
-- Historical rows (is_current=0) are stored but hidden from day-to-day queries
65
⚡ Interview BoostExpertSCD Type 2Point-in-Time
SCD Type 2 — Point-in-Time Query: What Was the Price on a Given Date?
Use BETWEEN start_date AND COALESCE(end_date, '9999-12-31') — the COALESCE converts open-ended (NULL) rows into a far-future date so active rows are included in range checks.

SCD Type 2's killer feature: time travel queries. The finance team needs to know what iPhone 11's price was on 2022-02-15 — before the March price drop. The old row (price=10000, start=2022-01-01, end=2022-02-28) covers Feb 15. The new row (price=8000, start=2022-03-01, end=NULL) does not. Querying by exact start_date returns nothing — you need a range check.

Schema (SCD Type 2 table — 2 versions for product 1)
product_history
product_idpricestart_date
1100002022-01-01
180002022-03-01
Solution — Point-in-time: BETWEEN start_date AND COALESCE(end_date, '9999-12-31')
-- What was the price on 2022-02-15?
SELECT product_id, product_name, price, start_date, end_date
FROM product_history
WHERE product_id = 1
  AND '2022-02-15' BETWEEN start_date
      AND COALESCE(end_date, '9999-12-31');
-- Returns: price=10000 (the Jan 1 – Feb 28 version)
-- COALESCE handles active rows: end_date=NULL → '9999-12-31'
-- so the open-ended row always includes future dates
Solution 2 — Compare any two dates side-by-side
-- Which version was active on Feb 15 vs Apr 01?
SELECT
  product_id,
  price,
  start_date,
  COALESCE(end_date, 'active') AS end_date
FROM product_history
WHERE product_id = 1
  AND (
    '2022-02-15' BETWEEN start_date AND COALESCE(end_date, '9999-12-31')
    OR
    '2022-04-01' BETWEEN start_date AND COALESCE(end_date, '9999-12-31')
  );
66
⚡ Interview BoostIntermediateAggregates
COUNT(*) vs COUNT(0) vs COUNT(-1) vs COUNT(col) vs COUNT(DISTINCT col)
COUNT(*), COUNT(1), COUNT(0), COUNT(-1) all return the same value — only COUNT(col) and COUNT(DISTINCT col) behave differently

This is one of the most-asked SQL interview tricks. The emp table has 12 rows — the last row is all NULLs. What does each variant of COUNT return, and why? Understanding this proves you know how the SQL engine evaluates aggregate functions under the hood.

Schema
emp
emp_idemp_namesalarymanager_iddep_id
1Ankit143004100
2Mohit156005200
3Vikas121004100
4Rohit72602100
5Mudit156006200
6Agam156002200
7Sanjay120002200
8Ashish72002200
9Mukesh70006300
10Rakesh80006300
11Akhil40001500
NULLNULLNULLNULLNULL
Expected Output
count_starcount_1count_0count_managerdistinct_deps
121212114
Key Rule COUNT(anything non-NULL) = same as COUNT(*). The expression COUNT(0) evaluates 0 for every row — 0 is not NULL — so it counts all 12 rows. COUNT(col) evaluates the column value per row and skips NULLs. COUNT(DISTINCT col) additionally de-duplicates.
Solution
SELECT
  COUNT(*)               AS count_star,    -- 12: every row, NULLs included
  COUNT(1)               AS count_1,       -- 12: literal 1 is never NULL
  COUNT(0)               AS count_0,       -- 12: literal 0 is never NULL
  COUNT(manager_id)      AS count_manager, -- 11: skips the NULL row
  COUNT(DISTINCT dep_id) AS distinct_deps  -- 4:  unique non-NULL dep_ids (100,200,300,500)
FROM emp;

-- Mental model:
-- COUNT(*) / COUNT(literal)  → count rows
-- COUNT(col)                 → count non-NULL values in that column
-- COUNT(DISTINCT col)        → count unique non-NULL values
Why do COUNT(0) and COUNT(-1) equal COUNT(*)?
SQL evaluates the expression inside COUNT for every row. COUNT(0) resolves to the integer 0 for row 1, 0 for row 2 … 0 for row 12. None of those are NULL, so all 12 are counted. The actual value (0, 1, 42, 'hello') doesn't matter — what matters is whether the expression is NULL or not.
67
⚡ Interview BoostIntermediateAggregatesLEFT JOIN
COUNT(*) vs COUNT(col) in a LEFT JOIN — The Classic NULL Trap
LEFT JOIN produces NULL columns for unmatched rows — COUNT(*) counts them but COUNT(col) doesn't

A very common interview follow-up after COUNT(*) vs COUNT(col): what happens inside a LEFT JOIN? The query below finds all customers and how many orders each has placed. Bob has never ordered — his order_id will be NULL after the LEFT JOIN. Which COUNT catches that?

Schema
customers
idname
1Alice
2Bob
3Carol
orders
order_idcustomer_idamount
1011500
1021300
1033750
Expected Output
nametotal_rowsorder_count
Alice22
Bob10
Carol11
Trap Bob's LEFT JOIN row has order_id = NULL. COUNT(*) sees a row and counts it (→ 1). COUNT(o.order_id) sees NULL and skips it (→ 0). In most reporting queries you want COUNT(o.order_id) because 0 orders is more useful than 1.
Solution
SELECT
  c.name,
  COUNT(*)          AS total_rows,   -- counts the NULL row for Bob → 1
  COUNT(o.order_id) AS order_count   -- skips NULL → 0 for Bob ✓
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.name
ORDER BY c.name;
68
⚡ Interview BoostIntermediateAggregatesConditional Count
Conditional COUNT with CASE — Pivot Row Counts in One Query
COUNT(CASE WHEN ... THEN 1 END) counts only matching rows because unmatched branches return NULL, which COUNT ignores

A natural extension of COUNT(col): combine COUNT with CASE to pivot status counts into columns without multiple subqueries. This pattern appears in nearly every Flipkart, Swiggy, and Zomato interview when asked to "show delivered, shipped, and cancelled orders in one row." The trick works because CASE WHEN false THEN 1 END returns NULL — and COUNT silently skips NULLs.

Schema
orders
order_idstatus
1delivered
2shipped
3cancelled
4delivered
5delivered
6shipped
Expected Output
totaldeliveredshippedcancelled
6321
Why it works CASE WHEN status='delivered' THEN 1 END returns 1 for delivered rows and NULL for everything else. COUNT ignores NULLs, so only the matching rows are counted. This is a single table scan — far faster than three separate subqueries.
Solution
SELECT
  COUNT(*)                                           AS total,
  COUNT(CASE WHEN status = 'delivered' THEN 1 END) AS delivered,
  COUNT(CASE WHEN status = 'shipped'   THEN 1 END) AS shipped,
  COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled
FROM orders;

-- Equivalent using SUM (also common in interviews):
-- SUM(CASE WHEN status = 'delivered' THEN 1 ELSE 0 END)
-- Both give the same result — COUNT skips NULLs, SUM adds zeros
COUNT(CASE...) vs SUM(CASE...)
Both work but differ in style. COUNT(CASE WHEN x THEN 1 END) leans on NULL-skipping behaviour. SUM(CASE WHEN x THEN 1 ELSE 0 END) is explicit — never returns NULL even on an empty set (returns 0 vs NULL). Use SUM when a guaranteed 0 matters; use COUNT when brevity matters.
70
⚡ Interview Boost Expert Subqueries / CTEs Recursive CTE
Employee Org Chart — Find All Subordinates at Any Depth Using Recursive CTE
Recursive CTEs split into two parts: an anchor query (runs once) and a recursive member (runs until it returns 0 rows), joined by UNION ALL

The employees table stores a self-referencing hierarchy via manager_id. Write a query to find all direct and indirect reports under Priya (emp_id = 2, VP Engineering) along with their reporting depth. A simple JOIN only finds direct reports — you need a Recursive CTE to walk the tree to any depth.

Schema
employees
emp_idemp_namemanager_id
1Rahul (CEO)NULL
2Priya (VP Eng)1
3Amit (VP Sales)1
4Neha (BE Lead)2
5Karan (FE Lead)2
6Ravi (Sales)3
7Sana (Engineer)4
8Vijay (Engineer)5
Expected Output (under Priya)
emp_idemp_namereporting_level
4Neha (BE Lead)1
5Karan (FE Lead)1
7Sana (Engineer)2
8Vijay (Engineer)2
Recursive Pattern Anchor: seed with direct reports of emp_id=2 (depth=1). Recursive member: join employees back to the CTE on e.manager_id = cte.emp_id, incrementing depth each time. Stops automatically when no new employees match. Add WHERE depth < 10 as a safety guard against circular data.
Solution
WITH RECURSIVE org_tree AS (

  -- Anchor: seed with direct reports of Priya (emp_id = 2)
  SELECT emp_id, emp_name, manager_id, 1 AS depth
  FROM employees
  WHERE manager_id = 2

  UNION ALL

  -- Recursive member: walk one level deeper each iteration
  SELECT e.emp_id, e.emp_name, e.manager_id, ot.depth + 1
  FROM employees e
  INNER JOIN org_tree ot ON e.manager_id = ot.emp_id
  WHERE ot.depth < 10     -- safety guard: prevents infinite loop on bad data

)
SELECT
  emp_id,
  emp_name,
  depth AS reporting_level
FROM org_tree
ORDER BY depth, emp_name;

-- Iteration 1 (anchor):  Neha(4,depth=1), Karan(5,depth=1)
-- Iteration 2 (recursive): Sana(7,depth=2), Vijay(8,depth=2)
-- Iteration 3: no employees report to Sana or Vijay → stops
How the SQL engine executes a Recursive CTE
The engine maintains a working table and a result table. Step 1: run the anchor, put rows into both tables. Step 2: run the recursive member using the working table as input, producing new rows. Step 3: move new rows to the result table, replace the working table. Repeat until the recursive member returns 0 new rows. This is why WHERE depth < 10 is important — circular data (A → B → A) would produce an infinite loop without it.

Syntax note: SQLite and PostgreSQL require WITH RECURSIVE. SQL Server uses just WITH — the optimizer detects the recursion automatically.
69
⚡ Interview Boost Expert Subqueries / CTEs LEFT JOIN
Daily Channel Breakdown — Classify Users as App-Only, Web-Only, or Both
Two CTEs: one to label each user-date pair, one to generate all date × channel skeletons — then LEFT JOIN so zero rows always appear

An e-commerce platform lets users order via app or web. The purchases table records one row per user per channel per day. Write a query that returns, for each date, the total revenue and distinct user count split into three groups: app-only, web-only, and both. Dates where no one used both channels must still appear with 0 / 0 — they must not be silently dropped.

Schema
purchases
user_idorder_datechannelamount
12024-01-01app250
12024-01-01web180
22024-01-01app320
32024-01-01web150
22024-01-02app400
42024-01-02web200
Expected Output
order_datechanneltotal_revenuetotal_users
2024-01-01app3201
2024-01-01web1501
2024-01-01both4301
2024-01-02app4001
2024-01-02web2001
2024-01-02both00
Two-Step Pattern CTE 1 classifies each (user, date) as 'app', 'web', or 'both' using COUNT(DISTINCT channel) = 2. CTE 2 uses UNION ALL to generate every date × channel skeleton — this is what guarantees both = 0, 0 on 2024-01-02 instead of a missing row.
Solution
WITH user_channel AS (
  SELECT
    user_id,
    order_date,
    SUM(amount)                           AS total_revenue,
    CASE
      WHEN COUNT(DISTINCT channel) = 2 THEN 'both'
      ELSE MAX(channel)                  -- 'app' or 'web'
    END                                   AS channel
  FROM purchases
  GROUP BY user_id, order_date
),
all_combos AS (
  -- generate skeleton: every date paired with all 3 channel labels
  SELECT DISTINCT order_date, 'app'  AS channel FROM purchases
  UNION ALL
  SELECT DISTINCT order_date, 'web'  AS channel FROM purchases
  UNION ALL
  SELECT DISTINCT order_date, 'both' AS channel FROM purchases
)
SELECT
  ac.order_date,
  ac.channel,
  COALESCE(SUM(uc.total_revenue), 0) AS total_revenue,
  COUNT(uc.user_id)                   AS total_users
FROM all_combos ac
LEFT JOIN user_channel uc
  ON  ac.order_date = uc.order_date
  AND ac.channel    = uc.channel
GROUP BY ac.order_date, ac.channel
ORDER BY ac.order_date,
         CASE ac.channel
           WHEN 'app'  THEN 1
           WHEN 'web'  THEN 2
           ELSE 3
         END;

-- user_channel: labels each (user, date) — 'app', 'web', or 'both'
-- all_combos:   skeleton rows for every date × channel (guarantees zeros)
-- LEFT JOIN:    fills actual revenue/users; unmatched → NULL → 0
Why UNION ALL and not just GROUP BY?
After the user_channel CTE, only dates that actually had 'both' users produce a 'both' row. On 2024-01-02, nobody used both channels — so a plain GROUP BY returns only 'app' and 'web' rows. The all_combos UNION ALL pre-generates the 'both' skeleton for every date regardless. The LEFT JOIN then matches real data where it exists and leaves NULLs elsewhere. COALESCE(..., 0) and COUNT(uc.user_id) convert those NULLs to 0.
73
⚡ Interview Boost Expert Recursive CTE String Functions
Most In-Demand Job Skills — Split Comma-Separated Column into Rows
When a column stores multiple values as a comma-separated string, use a Recursive CTE with INSTR + SUBSTR to peel one value per iteration until the string is empty

The job_listings table stores required skills as a single comma-separated string per row (e.g., 'Python,SQL,Excel'). Write a query to find the most in-demand skills across all listings — each skill in a multi-skill listing must be counted as a separate row. There is no STRING_SPLIT() in SQLite; solve it with a Recursive CTE.

Schema
job_listings
job_idcompanyrequired_skills
1FlipkartPython,SQL,Excel
2SwiggySQL,Tableau
3ZomatoPython,Power BI,SQL
4AmazonExcel,SQL,Python
5MyntraTableau,Power BI
Expected Output
skillcnt
SQL4
Python3
Excel2
Power BI2
Tableau2
Split Logic INSTR(s, ',') returns position of the first comma (0 if none).
Anchor: extract everything before the first comma as skill; everything after as remaining. If no comma, skill = whole string, remaining = ''.
Recursive: repeat on remaining until remaining = ''.
Solution
WITH RECURSIVE split_skills AS (

  -- Anchor: extract the first skill from each job listing
  SELECT
    job_id,
    TRIM(SUBSTR(required_skills, 1,
      CASE WHEN INSTR(required_skills, ',') > 0
           THEN INSTR(required_skills, ',') - 1   -- up to first comma
           ELSE LENGTH(required_skills) END))   -- or whole string if no comma
      AS skill,
    CASE WHEN INSTR(required_skills, ',') > 0
         THEN SUBSTR(required_skills, INSTR(required_skills, ',') + 1)
         ELSE '' END                        -- everything after the comma
      AS remaining
  FROM job_listings

  UNION ALL

  -- Recursive: peel the next skill from remaining
  SELECT
    job_id,
    TRIM(SUBSTR(remaining, 1,
      CASE WHEN INSTR(remaining, ',') > 0
           THEN INSTR(remaining, ',') - 1
           ELSE LENGTH(remaining) END))
      AS skill,
    CASE WHEN INSTR(remaining, ',') > 0
         THEN SUBSTR(remaining, INSTR(remaining, ',') + 1)
         ELSE '' END
      AS remaining
  FROM split_skills
  WHERE remaining != ''        -- stop when nothing is left

)
SELECT skill, COUNT(*) AS cnt
FROM split_skills
GROUP BY skill
ORDER BY cnt DESC, skill ASC;

-- SQL Server equivalent (much simpler):
-- SELECT value AS skill, COUNT(*) AS cnt
-- FROM job_listings
-- CROSS APPLY STRING_SPLIT(required_skills, ',')
-- GROUP BY value ORDER BY cnt DESC;
How the recursion peels 'Python,SQL,Excel':
Iteration 1 (anchor): INSTR('Python,SQL,Excel', ',') = 7 → skill = 'Python', remaining = 'SQL,Excel'
Iteration 2: INSTR('SQL,Excel', ',') = 4 → skill = 'SQL', remaining = 'Excel'
Iteration 3: INSTR('Excel', ',') = 0 → skill = 'Excel', remaining = ''
Iteration 4: WHERE remaining != '' → stops. Three rows produced for job_id=1.

SQL Server shortcut: STRING_SPLIT(required_skills, ',') with CROSS APPLY does all of the above in one line. The recursive CTE is the portable cross-database equivalent.
72
⚡ Interview Boost Expert Aggregates Window Functions
Aggregation Mastery — 7 Techniques Every SQL Developer Must Know
Simple aggregate → GROUP BY → HAVING → ROLLUP → GROUPING SETS → Window OVER() → CTE multi-step: each solves a different aggregation shape

The sales table records regional sales across two years. The same dataset is used to demonstrate all 7 aggregation techniques — from a single aggregate function on the whole table all the way to window functions and multi-step CTEs. Knowing which technique to reach for in an interview is what separates intermediate from expert SQL.

Schema
sales
sale_idregionyearamount
1North20231000
2South20231500
3North2023800
4South20242000
5North20241200
6South2024900
Method 6 Output — Window Aggregate (active in playground)
sale_idregionamountregion_totalgrand_total
1North100030007400
3North80030007400
5North120030007400
2South150044007400
4South200044007400
6South90044007400
When to use which 1. No grouping needed? Simple aggregate function.   2. Group-level totals? GROUP BY.   3. Filter after grouping? + HAVING.   4. Subtotals + grand total? ROLLUP.   5. All grouping combos? CUBE / GROUPING SETS.   6. Keep every row + add totals? Window OVER().   7. Multi-step or % share? CTE.
Solution — All 7 Methods
-- ── METHOD 1: Simple Aggregate — whole-table summary ────────────────
-- SELECT COUNT(*) AS rows, SUM(amount) AS total,
--        AVG(amount) AS avg_amt, MIN(amount) AS min_amt, MAX(amount) AS max_amt
-- FROM sales;
-- ↳ 1 row result: rows=6, total=7400, avg=1233, min=800, max=2000

-- ── METHOD 2: GROUP BY — collapse rows into groups ───────────────────
-- SELECT region, SUM(amount) AS total FROM sales GROUP BY region;
-- ↳ North=3000, South=4400

-- ── METHOD 3: GROUP BY + HAVING — filter groups after aggregating ────
-- SELECT region, SUM(amount) AS total
-- FROM sales GROUP BY region HAVING SUM(amount) > 3000;
-- ↳ Only South (4400) survives the HAVING filter

-- ── METHOD 4: GROUP BY multiple columns ──────────────────────────────
-- SELECT region, year, SUM(amount) AS total
-- FROM sales GROUP BY region, year ORDER BY region, year;
-- ↳ 4 rows: North/2023=1800, North/2024=1200, South/2023=1500, South/2024=2900

-- ── METHOD 5: ROLLUP — subtotals + grand total ───────────────────────
-- Supported: SQL Server / PostgreSQL / MySQL 8+  (NOT SQLite)
-- SELECT region, year, SUM(amount) FROM sales GROUP BY ROLLUP(region, year);
-- ↳ Adds subtotal rows per region + a grand total NULL/NULL row
-- CUBE gives all dimension combinations; GROUPING SETS lets you pick custom ones

-- ── METHOD 6: Window Aggregate — per-row totals, no row collapse ─────
SELECT
  sale_id, region, amount,
  SUM(amount) OVER(PARTITION BY region) AS region_total,
  SUM(amount) OVER()                    AS grand_total
FROM sales
ORDER BY region, sale_id;
-- ↳ Every row kept, region_total repeated per partition, grand_total=7400 on all rows

-- ── METHOD 7: CTE — multi-step aggregation with % share ──────────────
-- WITH rt AS (SELECT region, SUM(amount) AS total FROM sales GROUP BY region)
-- SELECT region, total,
--   ROUND(total * 100.0 / SUM(total) OVER(), 1) AS pct_of_total
-- FROM rt;
-- ↳ North=3000 (40.5%), South=4400 (59.5%)
The key distinction: GROUP BY vs Window OVER()
GROUP BY collapses rows — 6 rows become 2 (one per region). You lose the individual row data.
Window OVER() keeps all rows and adds aggregate columns alongside them — 6 rows stay 6 rows, each gaining region_total and grand_total.

ROLLUP / CUBE / GROUPING SETS (Method 5) are SQL Server / PostgreSQL / MySQL 8+ only — not available in SQLite. SQLite workaround: UNION ALL the GROUP BY queries.

Rule of thumb: if you need a row-level percentage (amount / region_total), you need a Window function — GROUP BY destroys the row before you can divide it.
71
⚡ Interview Boost Intermediate String Functions
Count Occurrences of a Character or Word in a String — LENGTH minus REPLACE
SQL has no built-in count-occurrences function — subtract the string length after REPLACE from the original length; each removed character reduces length by exactly 1

The strings table holds employee full names. Write a query that returns, for each name: (1) how many times the letter 'k' appears (case-insensitive), and (2) how many words the name contains. Neither COUNT nor any native function solves this directly — the trick uses LENGTH and REPLACE together.

Schema
strings
idfull_name
1Priya Sharma
2Ram Kumar Verma
3Akshay Kumar Ak k
4Rahul
Expected Output
full_namek_countword_count
Akshay Kumar Ak k44
Ram Kumar Verma13
Priya Sharma02
Rahul01
Core Formula Single char: LENGTH(s) - LENGTH(REPLACE(LOWER(s), 'k', '')) — removing N occurrences of 'k' shrinks the string by exactly N.
Multi-char word: (LENGTH(s) - LENGTH(REPLACE(LOWER(s), 'kumar', ''))) / LENGTH('kumar') — divide by word length since each removal shrinks by 5 chars.
Word count: LENGTH(TRIM(s)) - LENGTH(REPLACE(TRIM(s), ' ', '')) + 1 — spaces = words - 1.
Solution
SELECT
  full_name,

  -- Count occurrences of 'k' (case-insensitive)
  LENGTH(full_name) - LENGTH(REPLACE(LOWER(full_name), 'k', ''))
    AS k_count,

  -- Count words: spaces between words = words - 1
  LENGTH(TRIM(full_name))
    - LENGTH(REPLACE(TRIM(full_name), ' ', '')) + 1
    AS word_count

FROM strings
ORDER BY k_count DESC, full_name;

-- Extension: count a multi-char word (divide by pattern length)
-- (LENGTH(full_name) - LENGTH(REPLACE(LOWER(full_name),'kumar','')))
--   / LENGTH('kumar')  AS kumar_count
Why divide by LENGTH(pattern) for multi-char words?
Removing one 'k' (length 1) shrinks the string by 1 → difference = count × 1 → no division needed.
Removing one 'kumar' (length 5) shrinks the string by 5 → difference = count × 5 → divide by 5 to get count.
General rule: occurrences = (LENGTH(s) - LENGTH(REPLACE(LOWER(s), pattern, ''))) / LENGTH(pattern).

Word count edge cases: TRIM strips leading/trailing spaces before counting so " Rahul " still returns 1, not 3.
126
🌍 NewbieCOALESCE
Replace Missing Contact Info
COALESCE returns the first non-NULL value in its argument list — perfect for supplying a default when data is absent

A customer support team has a contacts table. Some customers haven't provided a phone number (stored as NULL). Write a query that shows every customer's name and their phone number — but displays 'N/A' for anyone with a missing number.

Schema
contacts
idnamephone
1Riya9876543210
2RajNULL
3Priya9123456789
4AnkitNULL
Expected Output
namephone
Riya9876543210
RajN/A
Priya9123456789
AnkitN/A
Solution
SELECT name, COALESCE(phone, 'N/A') AS phone
FROM contacts
ORDER BY id;
127
🌍 NewbieCASE WHEN
Classify Products into Price Tiers
CASE WHEN is SQL's if-else — evaluate conditions top to bottom and return the first match

The marketing team wants to label every product with a pricing tier: 'Budget' (price < 500), 'Mid-Range' (500–2000), or 'Premium' (above 2000). Write a query that returns each product's name, price, and its computed tier, ordered by price.

Schema
products
idnameprice
1Pen15
2Bag850
3Laptop55000
4Mouse450
5Keyboard1800
Expected Output
namepricetier
Pen15Budget
Mouse450Budget
Bag850Mid-Range
Keyboard1800Mid-Range
Laptop55000Premium
Solution
SELECT name, price,
  CASE
    WHEN price < 500   THEN 'Budget'
    WHEN price <= 2000 THEN 'Mid-Range'
    ELSE 'Premium'
  END AS tier
FROM products
ORDER BY price;
128
🌍 NewbieCOUNT DISTINCT
Count Unique Delivery Cities
COUNT(DISTINCT col) counts only unique values — duplicates are collapsed before counting

The logistics team wants to know how many unique cities the company currently ships to. Multiple orders to the same city should count as one. Write a query that returns a single number.

Schema
orders
idcustomercityamount
1RahulDelhi500
2PriyaMumbai300
3AmitDelhi700
4SnehaPune200
5RohitMumbai400
6KavyaHyderabad600
Expected Output
unique_cities
4
Solution
SELECT COUNT(DISTINCT city) AS unique_cities
FROM orders;
129
🌍 NewbieIN
Filter Sales by Selected Regions
IN is shorthand for multiple OR conditions — cleaner and easier to extend

The regional sales manager oversees only three territories: North, South, and West. Write a query to show all sales records from those regions only, sorted by region and then by amount (highest first).

Schema
sales
idregionamount
1North5000
2East3000
3South7000
4West4500
5North2000
6East1500
Expected Output
idregionamount
1North5000
5North2000
3South7000
4West4500
Solution
SELECT id, region, amount
FROM sales
WHERE region IN ('North', 'South', 'West')
ORDER BY region, amount DESC;
130
BeginnerMIN + GROUP BY
First Login Date per Player
MIN() on a date column gives the earliest — pair with GROUP BY to get one earliest date per group

A gaming company tracks every session in a game_activity table. The growth team needs each player's first login date to measure early engagement and D1 retention. Return one row per player.

Schema
game_activity
player_idlogin_datescore
12024-01-15120
12024-02-0385
22024-01-1050
22024-03-15180
32024-02-2895
Expected Output
player_idfirst_login
12024-01-15
22024-01-10
32024-02-28
Solution
SELECT player_id, MIN(login_date) AS first_login
FROM game_activity
GROUP BY player_id
ORDER BY player_id;
131
BeginnerLEFT JOIN
Employees Who Did Not Receive a Bonus
LEFT JOIN keeps all rows from the left table — pair with WHERE right side IS NULL to find non-matches

HR wants to identify employees who did not receive a performance bonus this year so they can follow up. Return the names of all employees absent from the bonus table, sorted alphabetically.

Schema
employees
idname
1Alice
2Bob
3Carol
4Dave
bonus
employee_idamount
110000
315000
Expected Output
name
Bob
Dave
Solution
SELECT e.name
FROM employees e
LEFT JOIN bonus b ON e.id = b.employee_id
WHERE b.employee_id IS NULL
ORDER BY e.name;
132
BeginnerHAVING
Classes with Minimum Enrolment
HAVING filters groups after GROUP BY — use it when your condition involves an aggregate like COUNT

A school only assigns a dedicated teacher to classes that have at least 5 enrolled students. Write a query to return the names of those classes, sorted alphabetically.

Schema
enrollments
student_idclass
1Math
2Math
3Math
4Math
5Math
6Math
7Physics
8Physics
9Physics
10Physics
11Chemistry
12Chemistry
13Chemistry
14Chemistry
15Chemistry
Expected Output
class
Chemistry
Math
Solution
SELECT class
FROM enrollments
GROUP BY class
HAVING COUNT(student_id) >= 5
ORDER BY class;
133
BeginnerWHERE + AND
Active Products with High Ratings
Combine multiple WHERE conditions with AND — all conditions must be true for the row to pass

The product catalogue team wants to feature items that are not discontinued AND have a customer rating above 3.5. Return each qualifying product's id, name, and rating — highest rated first.

Schema
products
idnamestatusrating
1Wireless Earbudsactive4.2
2Old Routerdiscontinued4.5
3Smart Watchactive4.8
4USB Hubactive3.2
5Webcamactive3.9
Expected Output
idnamerating
3Smart Watch4.8
1Wireless Earbuds4.2
5Webcam3.9
Solution
SELECT id, name, rating
FROM products
WHERE status != 'discontinued'
  AND rating > 3.5
ORDER BY rating DESC;
134
IntermediateSelf JOIN
Adjacent Available Seats for Cinema Booking
Self-join on ABS(a.seat_id - b.seat_id) = 1 finds physically adjacent rows — filter both sides for availability

A cinema booking system marks each seat as free (1) or taken (0). Couples need two consecutive free seats. Write a query that returns all seat IDs where the seat and its immediate neighbour are both free, ordered by seat ID.

Schema
cinema
seat_idfree
11
20
31
41
51
60
71
81
Expected Output
seat_id
3
4
5
7
8
Solution
SELECT DISTINCT a.seat_id
FROM cinema a
JOIN cinema b
  ON ABS(a.seat_id - b.seat_id) = 1
WHERE a.free = 1 AND b.free = 1
ORDER BY a.seat_id;
135
IntermediateCASE + AVG
Same-Day Delivery Rate
Convert boolean conditions to 1/0 with CASE WHEN, then AVG or SUM to calculate percentages

A food delivery platform records the order_date and the customer's preferred_date. The ops team needs to know what percentage of orders were delivered on the same day they were placed. Return one number rounded to 2 decimal places.

Schema
deliveries
idorder_datepreferred_date
12024-01-152024-01-15
22024-01-162024-01-18
32024-01-172024-01-17
42024-01-182024-01-20
52024-01-192024-01-19
Expected Output
immediate_pct
60.00
Solution
SELECT ROUND(
  SUM(CASE WHEN order_date = preferred_date THEN 1 ELSE 0 END)
    * 100.0 / COUNT(*),
  2) AS immediate_pct
FROM deliveries;
136
IntermediateGROUP BY + CASE
Monthly Transaction Approval Summary
Pivot row-level status into columns using CASE WHEN inside SUM — one pass through the table, multiple aggregates

The finance team needs a monthly report showing total transaction count, how many were approved, and the total approved amount — all in a single row per month. Transactions have a state column: either 'approved' or 'declined'.

Schema
transactions
idtrans_datestateamount
12024-01-10approved1000
22024-01-15declined500
32024-01-20approved800
42024-02-05approved1200
52024-02-10declined600
Expected Output
monthtrans_countapproved_countapproved_total
2024-01321800
2024-02211200
Solution
SELECT
  strftime('%Y-%m', trans_date) AS month,
  COUNT(*)                    AS trans_count,
  SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count,
  SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total
FROM transactions
GROUP BY month
ORDER BY month;
137
IntermediateSubquery
Customers Who Bought Every Product
Relational division: group purchases by customer, count distinct products, compare to total product count

Identify power buyers — customers who have purchased every single product in the catalogue. Use HAVING to compare each customer's distinct product count against the total number of products.

Schema
products
idname
1Widget
2Gadget
3Gizmo
purchases
customer_idproduct_id
1011
1012
1013
1021
1022
1031
1032
1033
Expected Output
customer_id
101
103
Solution
SELECT customer_id
FROM purchases
GROUP BY customer_id
HAVING COUNT(DISTINCT product_id) = (
  SELECT COUNT(*) FROM products
)
ORDER BY customer_id;
138
ExpertLAG + CTE
Find Values That Appear 3+ Times Consecutively
Use LAG() twice to look back 2 rows — if all three match you have a run of 3

A factory logs machine output readings sequentially by ID. Quality control wants to flag any reading value that appears in at least 3 consecutive log entries — this signals a sustained operational state that may need investigation. Return each such value once.

Schema
logs
idnum
11
21
31
42
51
62
72
82
Expected Output
ConsecutiveNums
1
2
Solution
WITH cte AS (
  SELECT num,
    LAG(num, 1) OVER (ORDER BY id) AS prev1,
    LAG(num, 2) OVER (ORDER BY id) AS prev2
  FROM logs
)
SELECT DISTINCT num AS ConsecutiveNums
FROM cte
WHERE num = prev1 AND num = prev2
ORDER BY num;
139
ExpertCASE WHEN
Classify Org Chart Nodes: Root, Leaf, Inner
Root has no manager (NULL parent), Inner appears as someone else's manager, Leaf has a manager but no direct reports

An organisational chart is stored as a self-referencing table. Classify each person: Root (no manager — the CEO), Inner (has a manager AND manages others), or Leaf (has a manager but no direct reports). Order by id.

Schema
org
idnamemanager_id
1CEONULL
2Alice1
3Bob1
4Carol2
5Dave3
6Eve3
Expected Output
idnamenode_type
1CEORoot
2AliceInner
3BobInner
4CarolLeaf
5DaveLeaf
6EveLeaf
Solution
SELECT id, name,
  CASE
    WHEN manager_id IS NULL THEN 'Root'
    WHEN id IN (
      SELECT DISTINCT manager_id FROM org
      WHERE manager_id IS NOT NULL
    ) THEN 'Inner'
    ELSE 'Leaf'
  END AS node_type
FROM org
ORDER BY id;
140
⚡ Interview Boost Intermediate CASE + GROUP BY
Customer Activity Classification
Compare each customer's most recent order date to a reference date — CASE WHEN on a date difference classifies status

The retention team classifies customers as Active (ordered within the last 90 days from 2024-12-31) or Churned (no recent order). Write a query showing each customer's last order date and their status, newest first.

Schema
orders
idcustomer_idorder_date
11012024-12-20
21012024-03-10
31022024-05-10
41032024-11-30
51042024-06-15
Expected Output
customer_idlast_orderstatus
1012024-12-20Active
1032024-11-30Active
1042024-06-15Churned
1022024-05-10Churned
Solution
WITH last_order AS (
  SELECT customer_id, MAX(order_date) AS last_order
  FROM orders
  GROUP BY customer_id
)
SELECT customer_id, last_order,
  CASE
    WHEN julianday('2024-12-31') - julianday(last_order) <= 90
      THEN 'Active'
    ELSE 'Churned'
  END AS status
FROM last_order
ORDER BY last_order DESC;
141
⚡ Interview Boost Intermediate Window Functions
Product Revenue Rank with Running Total
Combine RANK() and SUM() OVER() in one query — window functions compose naturally in SELECT

The analytics team wants to understand revenue concentration. For each product, show its revenue, rank among all products, cumulative revenue up to that rank, and what percentage of total revenue that cumulative figure represents. Sort highest revenue first.

Schema
product_sales
product_namerevenue
Analytics Pro120000
Data Studio85000
Query Builder95000
Report Gen60000
Dashboard78000
Expected Output
product_namerevenuerev_rankcumulative_pct
Analytics Pro120000127.4
Query Builder95000249.1
Data Studio85000368.5
Dashboard78000486.3
Report Gen600005100.0
Solution
SELECT
  product_name,
  revenue,
  RANK() OVER (ORDER BY revenue DESC) AS rev_rank,
  ROUND(
    SUM(revenue) OVER (
      ORDER BY revenue DESC
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) * 100.0 / SUM(revenue) OVER (),
  1) AS cumulative_pct
FROM product_sales
ORDER BY revenue DESC;
142
⚡ Interview Boost Expert Pivot / CASE
Quarterly Revenue Pivot per Product
Pivot rows to columns using SUM(CASE WHEN quarter = 'Q1' THEN amount END) — one column per bucket

The CFO wants a single-row-per-product view showing Q1, Q2, Q3, Q4 revenue side by side — a classic pivot. Transform the long-format sales table into wide format using conditional aggregation.

Schema
sales
product_idsale_daterevenue
12024-02-153000
12024-05-204500
12024-08-102000
12024-11-255500
22024-01-301500
22024-07-152800
22024-10-053200
Expected Output
product_idQ1Q2Q3Q4
13000450020005500
21500028003200
Solution
SELECT
  product_id,
  SUM(CASE WHEN strftime('%m',sale_date) IN ('01','02','03') THEN revenue ELSE 0 END) AS Q1,
  SUM(CASE WHEN strftime('%m',sale_date) IN ('04','05','06') THEN revenue ELSE 0 END) AS Q2,
  SUM(CASE WHEN strftime('%m',sale_date) IN ('07','08','09') THEN revenue ELSE 0 END) AS Q3,
  SUM(CASE WHEN strftime('%m',sale_date) IN ('10','11','12') THEN revenue ELSE 0 END) AS Q4
FROM sales
GROUP BY product_id
ORDER BY product_id;
143
⚡ Interview Boost Expert Window + CTE
Median Salary Without MEDIAN()
Most databases lack MEDIAN() — use ROW_NUMBER to find the middle row(s) then AVG them

SQL has no built-in MEDIAN() function. Calculate the median salary across all employees using ROW_NUMBER — pick the middle row for odd counts, average the two middle rows for even counts.

Schema
employees
idnamesalary
1A45000
2B60000
3C75000
4D85000
5E95000
6F110000
Expected Output
median_salary
80000.0
Solution
WITH ranked AS (
  SELECT salary,
    ROW_NUMBER() OVER (ORDER BY salary) AS rn,
    COUNT(*) OVER () AS total
  FROM employees
)
SELECT AVG(CAST(salary AS REAL)) AS median_salary
FROM ranked
WHERE rn IN ((total + 1) / 2, (total + 2) / 2);
144
⚡ Interview Boost Expert Funnel Analysis
Purchase Funnel Drop-off Analysis
Count each stage with CASE WHEN inside SUM — compute conversion rates between stages using NULLIF to avoid divide-by-zero

The product team tracks user events through a purchase funnel: view → cart → purchase. Write a single query that returns how many users reached each stage, and the conversion rate between consecutive stages. Rates rounded to 1 decimal.

Schema
funnel_events
user_idstage
1view
2view
3view
4view
5view
6view
7view
8view
1cart
2cart
3cart
4cart
5cart
1purchase
2purchase
3purchase
Expected Output
viewedadded_to_cartpurchasedview_to_cart_pctcart_to_purchase_pct
85362.560.0
Solution
SELECT
  SUM(CASE WHEN stage='view'     THEN 1 ELSE 0 END) AS viewed,
  SUM(CASE WHEN stage='cart'     THEN 1 ELSE 0 END) AS added_to_cart,
  SUM(CASE WHEN stage='purchase' THEN 1 ELSE 0 END) AS purchased,
  ROUND(
    SUM(CASE WHEN stage='cart'     THEN 1.0 ELSE 0 END) * 100
    / NULLIF(SUM(CASE WHEN stage='view' THEN 1 ELSE 0 END), 0),
  1) AS view_to_cart_pct,
  ROUND(
    SUM(CASE WHEN stage='purchase' THEN 1.0 ELSE 0 END) * 100
    / NULLIF(SUM(CASE WHEN stage='cart' THEN 1 ELSE 0 END), 0),
  1) AS cart_to_purchase_pct
FROM funnel_events;
145
⚡ Interview Boost Expert YoY / Pivot
Year-over-Year Revenue Growth by Category
Self-pivot 2023 and 2024 revenue using CASE WHEN inside SUM, then compute growth % using NULLIF to guard against zero denominators

The business review team needs a year-over-year comparison of revenue by product category. Return each category's 2023 revenue, 2024 revenue, and percentage growth — sorted by growth rate descending so the fastest-growing categories appear first.

Schema
category_sales
categorysale_yearrevenue
Electronics2023450000
Electronics2024520000
Clothing2023280000
Clothing2024310000
Food2023190000
Food2024175000
Expected Output
categoryrev_2023rev_2024yoy_growth_pct
Electronics45000052000015.56
Clothing28000031000010.71
Food190000175000-7.89
Solution
SELECT
  category,
  SUM(CASE WHEN sale_year=2023 THEN revenue ELSE 0 END) AS rev_2023,
  SUM(CASE WHEN sale_year=2024 THEN revenue ELSE 0 END) AS rev_2024,
  ROUND(
    (SUM(CASE WHEN sale_year=2024 THEN revenue ELSE 0 END)
     - SUM(CASE WHEN sale_year=2023 THEN revenue ELSE 0 END))
    * 100.0
    / NULLIF(SUM(CASE WHEN sale_year=2023 THEN revenue ELSE 0 END), 0),
  2) AS yoy_growth_pct
FROM category_sales
GROUP BY category
ORDER BY yoy_growth_pct DESC;
/ Loading…
0/done
Use Desktop to SolveTo write and run SQL queries for this question, open MarutiNovaTech on a laptop or PC.
SQL
Output Ready
Run your query to see results
Test Cases
No questions match your filters Try removing a filter or selecting "All"

01
Reference

SQL Cheat Sheet

Every essential clause, function, and pattern — click any card to copy the snippet.

0
Categories
0+
Code Patterns
0
CTE Deep Dives
0+
Copyable Snippets
Basics & JOINs
SELECT WHERE GROUP BY JOIN types
Window Functions
ROW_NUMBER LAG / LEAD PARTITION BY Running Total
CTEs — 20 Patterns
WITH RECURSIVE Gaps & Islands Basket Analysis
Interview Prep
WHERE vs HAVING NULL traps Indexes Normalization
Topics covered
SELECT FROM WHERE GROUP BY HAVING ORDER BY LIMIT CASE WHEN INNER JOIN LEFT JOIN FULL OUTER JOIN UNION ALL EXISTS IN / NOT IN ROW_NUMBER() RANK() DENSE_RANK() LAG() LEAD() PARTITION BY SUM() OVER AVG() OVER WITH RECURSIVE NULLIF() COALESCE() CAST() DISTINCT CROSS JOIN ROWS BETWEEN
#1 Interview Topic
SQL doesn't run top-to-bottom
You write clauses in one order. The engine processes them in a completely different order — and knowing this explains every confusing SQL error.
You write it as…
1 SELECT name, COUNT(*) runs 6th
2 FROM employees runs 1st
3 JOIN departments ON d.id=e.dept_id runs 2nd
4 WHERE e.active = 1 runs 3rd
5 GROUP BY dept_id runs 4th
6 HAVING COUNT(*) > 1 runs 5th
7 ORDER BY name runs 8th
8 LIMIT 10 runs 9th
Engine runs it as…
1FROM — load table(s) into memory
2JOIN — combine rows across tables
3WHERE — filter row by row
4GROUP BY — collapse into groups
5HAVING — filter groups
6SELECT ← you wrote this line 1
7DISTINCT — deduplicate output
8ORDER BY — sort final rows
9LIMIT — cut to N rows
Stage: Data Source Row Filter Grouping Output Sorting
1
FROM
Load the table. This is your raw dataset — every row, every column.
No table = no data. Everything starts here.
2
JOIN
Combine rows from other tables. Runs before any filtering.
Joins expand the dataset — WHERE shrinks it next.
3
WHERE
Filter rows one at a time — before groups or aggregates exist.
Can't use SELECT aliases or COUNT() here — not ready yet.
4
GROUP BY
Collapse rows with matching values into groups. Aggregates now work.
COUNT(), SUM(), AVG() only exist after this runs.
5
HAVING
Filter groups — like WHERE, but runs after aggregation.
HAVING is the only place to filter on COUNT(), AVG(), etc.
6
SELECT
Finally — pick which columns to return. Aliases are born here.
You write this first, but it runs sixth. That's the key insight.
7
DISTINCT
Remove duplicate rows from the output. Compares final column values.
Runs after SELECT — it deduplicates what you chose to show.
8
ORDER BY
Sort the output. Can use SELECT aliases — SELECT already ran.
One of the only clauses that can reference SELECT aliases.
9
LIMIT / TOP
Cut result to N rows. Always the absolute last thing that runs.
Cuts after sorting — so you always get the right top-N rows.
Why this order trips people up — 3 real errors explained
Why can't I use a SELECT alias inside WHERE?
WHERE is step 3. SELECT is step 6. The alias doesn't exist yet when WHERE runs. Fix: repeat the full expression in WHERE, or wrap in a subquery / CTE.
Why can't I write WHERE COUNT(*) > 1?
WHERE runs at step 3 — before GROUP BY at step 4. No groups exist yet, so there's no COUNT to filter on. Use HAVING COUNT(*) > 1 instead.
Why does ORDER BY inside a subquery get ignored?
ORDER BY is step 8 — the last operation. A subquery produces a temporary set with no guaranteed order. The outer query re-processes it; most engines ignore or error on inner ORDER BY.

CREATE TABLE

CREATE TABLE employees (
  id        INT PRIMARY KEY,
  name      VARCHAR(100) NOT NULL,
  dept      VARCHAR(50),
  salary    DECIMAL(10,2),
  hire_date DATE
);

INSERT / UPDATE / DELETE

INSERT INTO users (name, email)
VALUES ('Alice', 'a@x.com');

UPDATE users SET status = 'active'
WHERE id = 5;

DELETE FROM logs
WHERE created_at < '2023-01-01';

ORDER BY & LIMIT

SELECT * FROM products
ORDER BY price DESC
LIMIT 10;

-- Skip first 20 rows (pagination)
ORDER BY created_at DESC
LIMIT 10 OFFSET 20;

DISTINCT & NULL handling

SELECT DISTINCT department
FROM employees;

-- Replace NULL with default
SELECT COALESCE(phone, 'N/A')
FROM contacts;

-- NULL check (never use = NULL)
WHERE manager_id IS NULL;

SELECT & WHERE — filtering rows

SELECT name, salary
FROM employees
WHERE dept = 'Engineering'
  AND salary > 80000
ORDER BY salary DESC;
Table: employees
INPUT — all rows
namedeptsalary
AliceEngineering90,000
BobMarketing60,000
CarolEngineering85,000
DaveHR55,000
↓ after WHERE + ORDER BY
OUTPUT — 2 rows returned
namesalary
Alice90,000
Carol85,000
TIPWHERE runs before SELECT. It filters rows first, then SELECT picks which columns to show. You can't use a column alias from SELECT inside WHERE.

GROUP BY & HAVING — aggregating groups

SELECT dept,
       COUNT(*) AS headcount,
       AVG(salary) AS avg_sal
FROM employees
GROUP BY dept
HAVING COUNT(*) > 1
ORDER BY avg_sal DESC;
Table: employees
INPUT — 5 rows, 3 departments
namedeptsalary
AliceEng90,000
BobEng80,000
CarolHR55,000
DaveMktg70,000
EveHR60,000
↓ GROUP BY dept + HAVING count > 1
OUTPUT — Mktg removed (only 1 row)
deptheadcountavg_sal
Eng285,000
HR257,500
NOTEHAVING ≠ WHERE. Use WHERE to filter individual rows before grouping. Use HAVING to filter the grouped results. HAVING COUNT(*) > 1 is valid; WHERE COUNT(*) > 1 is an error.
INNER JOIN
Matching rows only
LEFT JOIN
All left + matching right
RIGHT JOIN
All right + matching left
FULL OUTER
All rows from both tables

INNER JOIN — rows matched in both tables

SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d
  ON e.dept_id = d.id;

-- Only rows where dept_id exists
-- in BOTH tables are returned.
-- Unmatched rows are dropped.
Two tables being joined
employees
namedept_id
Alice1
Bob2
Carol99
departments
iddept_name
1Engineering
2Marketing
↓ INNER JOIN on dept_id = id
OUTPUT — Carol excluded (dept_id 99 has no match)
namedept_name
AliceEngineering
BobMarketing
TIPCarol is silently dropped. INNER JOIN only returns rows where the join condition matches in both tables. If an employee has a dept_id that doesn't exist in departments, that employee disappears from results.

LEFT JOIN — keep all left rows, NULL where no match

SELECT e.name, o.order_id
FROM employees e
LEFT JOIN orders o
  ON e.id = o.emp_id;

-- Tip: find employees with NO orders:
SELECT e.name
FROM employees e
LEFT JOIN orders o
  ON e.id = o.emp_id
WHERE o.order_id IS NULL;
employees LEFT JOIN orders
employees
idname
1Alice
2Bob
3Carol
orders
order_idemp_id
1011
1021
↓ LEFT JOIN (all employees kept)
OUTPUT — Bob & Carol kept with NULL
nameorder_id
Alice101
Alice102
BobNULL
CarolNULL
NOTEClassic interview pattern: Find records with no match by doing a LEFT JOIN then WHERE right_table.id IS NULL. This is faster than a NOT IN subquery on large tables.

Multi-table JOIN

SELECT e.name, d.name, l.city
FROM employees e
JOIN departments d ON e.dept_id = d.id
JOIN locations l  ON d.loc_id  = l.id
WHERE l.city = 'Mumbai';

SELF JOIN

-- Find each employee & their manager
SELECT e.name  AS employee,
       m.name  AS manager
FROM employees e
LEFT JOIN employees m
  ON e.manager_id = m.id;

Core Aggregate Functions — what they return

SELECT
  COUNT(*),        -- every row
  COUNT(salary),  -- non-NULL only
  SUM(salary),
  AVG(salary),
  MIN(salary),
  MAX(salary)
FROM employees;
Table: employees (4 rows)
INPUT
namesalary
Alice90,000
Bob70,000
Carol80,000
DaveNULL
↓ aggregate results
OUTPUT — single row of totals
functionresult
COUNT(*)4
COUNT(salary)3 (NULL skipped)
SUM(salary)240,000
AVG(salary)80,000
MIN / MAX70,000 / 90,000
NOTECOUNT(*) vs COUNT(col). COUNT(*) counts all rows including NULLs. COUNT(salary) skips rows where salary is NULL. A very common interview trick question.

GROUP BY

SELECT department,
       COUNT(*) AS headcount,
       AVG(salary) AS avg_sal
FROM employees
GROUP BY department
ORDER BY avg_sal DESC;

HAVING (filter groups)

SELECT dept, COUNT(*) AS cnt
FROM employees
GROUP BY dept
HAVING COUNT(*) > 5;

-- HAVING filters after grouping
-- WHERE filters before grouping

ROLLUP & CUBE

-- Subtotals + grand total
SELECT dept, job, SUM(salary)
FROM employees
GROUP BY ROLLUP(dept, job);

-- All combinations
GROUP BY CUBE(dept, job);

String Functions

UPPER(name)           -- ALICE
LOWER(name)           -- alice
LENGTH(name)          -- 5
TRIM(name)            -- strip spaces
SUBSTRING(name,1,3)  -- Ali
CONCAT(first,' ',last)
REPLACE(str,'a','@')

Date Functions

NOW()               -- current datetime
CURDATE()           -- current date
YEAR(date_col)
MONTH(date_col)
DAY(date_col)
DATEDIFF(d1, d2)    -- days between
DATE_ADD(d, INTERVAL 7 DAY)

CASE WHEN

SELECT name,
  CASE
    WHEN salary >= 100000 THEN 'Senior'
    WHEN salary >= 60000  THEN 'Mid'
    ELSE 'Junior'
  END AS level
FROM employees;

CAST & CONVERT

SELECT CAST('2024-01-15' AS DATE);
SELECT CAST(42.7 AS INT);       -- 42
SELECT CAST(price AS CHAR);

-- MySQL: CONVERT(val, type)
SELECT CONVERT(salary, CHAR);

Subquery in WHERE

-- Employees earning above avg
SELECT name, salary
FROM employees
WHERE salary > (
  SELECT AVG(salary)
  FROM employees
);

IN / EXISTS

SELECT * FROM customers
WHERE id IN (
  SELECT customer_id FROM orders
);

WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.cust_id = c.id
);

CTE (WITH clause)

WITH dept_avg AS (
  SELECT dept,
         AVG(salary) AS avg_sal
  FROM employees
  GROUP BY dept
)
SELECT e.name, d.avg_sal
FROM employees e
JOIN dept_avg d ON e.dept = d.dept;

Correlated Subquery

-- Top earner per dept
SELECT name, dept, salary
FROM employees e1
WHERE salary = (
  SELECT MAX(salary)
  FROM employees e2
  WHERE e2.dept = e1.dept
);

Window Functions — add a column, keep all rows

SELECT name, dept, salary,
  ROW_NUMBER() OVER (
    PARTITION BY dept
    ORDER BY salary DESC
  ) AS rn,
  RANK() OVER (
    ORDER BY salary DESC
  ) AS rnk
FROM employees;

-- OVER() = window definition
-- PARTITION BY = reset per group
-- ORDER BY = ranking direction
Window functions don't collapse rows
INPUT — 4 rows, 2 departments
namedeptsalary
AliceEng90k
BobEng80k
CarolHR70k
DaveHR70k
↓ window adds rn + rnk columns
OUTPUT — same 4 rows + 2 new columns
namedeptrnrnk
AliceEng11
BobEng22
CarolHR13
DaveHR23
TIPrn resets to 1 per department (PARTITION BY). Carol and Dave both get RANK 3 because their salaries are tied — that's RANK vs DENSE_RANK in action. Use this pattern inside a CTE + WHERE rn = 1 to get the top earner per department.

LAG & LEAD

-- Compare row with prev/next
SELECT month, revenue,
  LAG(revenue) OVER (
    ORDER BY month
  ) AS prev_month,
  revenue - LAG(revenue) OVER (
    ORDER BY month
  ) AS growth
FROM sales;

Running Total

SELECT order_date, amount,
  SUM(amount) OVER (
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING
      AND CURRENT ROW
  ) AS running_total
FROM orders;

DENSE_RANK Top-N

-- Top 3 per department
WITH ranked AS (
  SELECT *, DENSE_RANK() OVER (
    PARTITION BY dept
    ORDER BY salary DESC
  ) AS dr
  FROM employees
)
SELECT * FROM ranked
WHERE dr <= 3;

Top questions asked in Data Analyst, Data Engineer, and BI Developer interviews. Click any question to reveal the answer.

Very HotWhat is the difference between WHERE and HAVING?
WHERE filters individual rows before grouping (runs at step ③). HAVING filters groups after GROUP BY (runs at step ⑤).

Rule: If you need COUNT(), SUM(), or any aggregate in your filter — use HAVING. Otherwise use WHERE.

Example: WHERE salary > 50000 ✅  |  HAVING COUNT(*) > 3 ✅  |  WHERE COUNT(*) > 3 ❌ (error)
Very HotDELETE vs TRUNCATE vs DROP — what's the difference?
DELETE — removes specific rows (with WHERE), logs each deletion, can be rolled back. Triggers fire.
TRUNCATE — removes ALL rows instantly, minimal logging, cannot use WHERE, resets auto-increment. Much faster.
DROP — removes the entire table structure + data. Permanent. Can't rollback in most databases.

Memory trick: DELETE a person, TRUNCATE a notebook, DROP a bomb.
Very HotWhat is a NULL? Is NULL = NULL true?
NULL means "unknown" or "missing" — not zero, not empty string. It's the absence of a value.

NULL = NULL → returns NULL (not TRUE). This is a very common interview trap.
To check for NULL you must use: IS NULL or IS NOT NULL.

Also: NULL + 5 = NULL. Any arithmetic with NULL gives NULL. Use COALESCE(col, 0) to substitute a default.
Very HotROW_NUMBER vs RANK vs DENSE_RANK?
All three number rows, but differ when there are ties:

Imagine scores: 90, 90, 85, 80:
ROW_NUMBER: 1, 2, 3, 4 (always unique, no gaps)
RANK: 1, 1, 3, 4 (ties get same rank, but skips next number)
DENSE_RANK: 1, 1, 2, 3 (ties get same rank, no gaps)

Interview tip: Use DENSE_RANK for "top N" problems to avoid missing ranks.
CommonWhat is a Primary Key vs Foreign Key?
Primary Key (PK) — uniquely identifies each row in a table. Must be NOT NULL + UNIQUE. One per table.

Foreign Key (FK) — a column that references the Primary Key of another table. Enforces referential integrity (you can't add an order for a customer that doesn't exist).

Example: orders.customer_id is a Foreign Key referencing customers.id (Primary Key).
CommonWhat is a subquery vs a JOIN? When to use which?
JOIN combines tables horizontally (adds columns). Best when you need columns from multiple tables.
Subquery is a query inside a query. Best for single-value lookups or filtering based on aggregated data.

Use JOIN when: you want multiple columns from related tables.
Use Subquery when: you need "employees earning above the company average" (can't JOIN on a calculated value easily).

CTEs (WITH clause) are often cleaner than nested subqueries for readability.
CommonUNION vs UNION ALL?
UNION — combines results of two queries and removes duplicate rows. Slower (needs to sort/compare).
UNION ALL — combines results and keeps ALL rows including duplicates. Faster.

Rule: Use UNION ALL by default (faster). Use UNION only when you specifically need to remove duplicates.

Both require: same number of columns and compatible data types in the same order.
CommonWhat is an Index? When should you create one?
An index is a data structure that speeds up SELECT queries — like the index at the back of a textbook.

Create an index when: the column appears frequently in WHERE, JOIN ON, or ORDER BY.
Avoid over-indexing: every index slows down INSERT/UPDATE/DELETE because the index also needs updating.

Clustered index — determines physical order of rows on disk. One per table (usually PK).
Non-clustered index — separate lookup structure. Can have many per table.
Good to KnowWhat is Normalization? Explain 1NF, 2NF, 3NF.
Normalization organizes a database to reduce redundancy and improve data integrity.

1NF (First Normal Form) — No repeating groups; every cell holds one value; each row is unique.
2NF (Second Normal Form) — 1NF + every non-key column fully depends on the entire primary key (no partial dependency).
3NF (Third Normal Form) — 2NF + no transitive dependencies (non-key column depends only on the PK, not on another non-key column).

Tip: "3NF = no column should depend on a non-key column."
Good to KnowWhat are Window Functions and when would you use them?
Window functions perform calculations across a set of related rows without collapsing them into one row (unlike GROUP BY).

Syntax: function() OVER (PARTITION BY col ORDER BY col)

Use cases:
• Ranking employees by salary within each department (RANK())
• Running totals, moving averages (SUM() OVER)
• Comparing each row to the previous row (LAG())
• Finding top-N per group (DENSE_RANK + WHERE dr <= N)
WHERE vs HAVING — most confused pair in SQL
FeatureWHEREHAVING
FiltersIndividual rowsGroups (after GROUP BY)
When it runsBefore GROUP BY (step ③)After GROUP BY (step ⑤)
Can use aggregate functions?❌ No✅ Yes
ExampleWHERE salary > 50000HAVING COUNT(*) > 5
Works without GROUP BY?✅ YesRarely useful without it
DELETE vs TRUNCATE vs DROP
FeatureDELETETRUNCATEDROP
RemovesSpecific rowsAll rowsEntire table
WHERE clause✅ Yes❌ No❌ No
Can rollback?✅ YesDepends on DB❌ No
Resets auto-increment?❌ No✅ YesTable is gone
SpeedSlowerFastFast
Triggers fire?✅ Yes❌ No❌ No
ROW_NUMBER vs RANK vs DENSE_RANK — with example (scores: 90, 90, 85)
ScoreROW_NUMBER()RANK()DENSE_RANK()
90111
90211
8533 (gap!)2 (no gap)
Ties get same rank?❌ Always unique✅ Yes✅ Yes
Skips numbers after tie?✅ Yes❌ Never
INNER JOIN vs LEFT JOIN vs FULL OUTER JOIN
TypeReturnsNULLs in result?Use when...
INNER JOINMatching rows only❌ NoYou only want records that exist in both tables
LEFT JOINAll left + matched rightRight side may be NULLYou want all customers even if they have no orders
RIGHT JOINMatched left + all rightLeft side may be NULLRarely used (just swap tables and use LEFT JOIN)
FULL OUTERAll rows from bothBoth sides may be NULLFind mismatches — rows missing in either table
UNION vs UNION ALL
FeatureUNIONUNION ALL
Removes duplicates?✅ Yes❌ No, keeps all rows
PerformanceSlower (sorts to find dupes)Faster
Use whenYou need unique combined resultsYou know there are no dupes OR you want all rows
DefaultThis is what "UNION" means aloneMust specify ALL explicitly
SQL Interview Prep

Top 20 CTE Patterns

Common Table Expressions · Modern SQL · From Basics to Advanced

01 Basic Readable CTE
11 YoY Growth (Lag)
02 Multiple CTEs
12 Monthly Active Users
03 Recursion (1 to 10)
13 Find Missing IDs
04 Manager Hierarchy
14 String Splitting
05 Nth Highest Salary
15 Basket Analysis
06 Delete Duplicates
16 Pivot Prep
07 Running Total
17 Consecutive Wins
08 Moving Average
18 Update via CTE
09 Gaps & Islands
19 Tree Path Generation
10 Date Generation
20 Insert from CTE

01 Basic Readable CTE Readability

Don't nest. Define first — Replacing Nested Subqueries
-- Don't nest. Define first.
WITH HighSales AS (
  SELECT * FROM orders
  WHERE amount > 1000
)
SELECT * FROM HighSales;
TIPName your logic before you use it. A CTE makes complex queries read top-to-bottom — like prose, not inside-out.

02 Multiple CTEs Readability

Chain named subqueries with commas
WITH dept_avg AS (
  SELECT dept, AVG(salary) AS avg_sal
  FROM employees GROUP BY dept
),
above_avg AS (
  SELECT e.name, e.dept, e.salary
  FROM employees e
  JOIN dept_avg d ON e.dept = d.dept
  WHERE e.salary > d.avg_sal
)
SELECT * FROM above_avg;
TIPSeparate CTEs with a comma, not WITH. Each CTE can reference all CTEs defined before it.

03 Recursion (1 to 10) Recursion

Generate a number series — Anchor + Recursive step
WITH RECURSIVE nums AS (
  SELECT 1 AS n          -- anchor: start
  UNION ALL
  SELECT n + 1           -- recursive: add 1
  FROM nums WHERE n < 10 -- stop condition
)
SELECT n FROM nums;
TIPAnchor + UNION ALL + recursive member. The WHERE clause is your stop condition — always include one or you get an infinite loop.

04 Manager Hierarchy Recursion

Full org chart traversal from CEO to every employee
WITH RECURSIVE org AS (
  SELECT id, name, manager_id, 1 AS lvl
  FROM employees
  WHERE manager_id IS NULL   -- anchor: CEO
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.lvl + 1
  FROM employees e
  JOIN org o ON e.manager_id = o.id
)
SELECT lvl, name
FROM org ORDER BY lvl, name;
TIPEach pass fetches direct children of the previous output. Stops when no new child rows are found. Add a path column to build breadcrumb strings.

05 Nth Highest Salary Analytics

DENSE_RANK inside a CTE — clean and reusable
WITH ranked AS (
  SELECT salary,
    DENSE_RANK() OVER (
      ORDER BY salary DESC
    ) AS rnk
  FROM employees
)
SELECT salary
FROM ranked
WHERE rnk = 2; -- change N here
TIPUse DENSE_RANK, not RANK — RANK skips numbers after ties so "2nd highest" could disappear. Change rnk = N for any Nth value.

06 Delete Duplicates DML

Keep only the latest row per group — ROW_NUMBER in DML
WITH dupes AS (
  SELECT id,
    ROW_NUMBER() OVER (
      PARTITION BY email
      ORDER BY created_at DESC
    ) AS rn
  FROM users
)
DELETE FROM users
WHERE id IN (
  SELECT id FROM dupes WHERE rn > 1
);
TIPrn = 1 is the newest per email. Rows with rn > 1 are older duplicates — safe to delete. Swap ORDER BY to ASC to keep the oldest instead.

07 Running Total Analytics

Cumulative SUM over an ordered date sequence
WITH daily AS (
  SELECT order_date, SUM(amount) AS rev
  FROM orders GROUP BY order_date
)
SELECT order_date, rev,
  SUM(rev) OVER (
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING
      AND CURRENT ROW
  ) AS running_total
FROM daily;
TIPROWS BETWEEN is explicit and predictable. Without it, RANGE is the default — which can behave unexpectedly on tied dates.

08 Moving Average Analytics

7-day rolling window — smooth out daily noise
WITH daily AS (
  SELECT sale_date, SUM(revenue) AS rev
  FROM sales GROUP BY sale_date
)
SELECT sale_date, rev,
  ROUND(AVG(rev) OVER (
    ORDER BY sale_date
    ROWS BETWEEN 6 PRECEDING
      AND CURRENT ROW
  ), 2) AS avg_7d
FROM daily;
TIP6 PRECEDING + CURRENT ROW = 7-day window. Adjust the number to change window size. The first 6 rows will average fewer days — that's expected.

09 Gaps & Islands Advanced

Group consecutive dates into contiguous periods
WITH numbered AS (
  SELECT user_id, login_date,
    ROW_NUMBER() OVER (
      PARTITION BY user_id
      ORDER BY login_date
    ) AS rn
  FROM logins
),
islands AS (
  SELECT user_id,
    DATE_SUB(login_date, INTERVAL rn DAY) AS grp
  FROM numbered
)
SELECT user_id,
  MIN(login_date) AS start,
  MAX(login_date) AS end,
  COUNT(*) AS days
FROM islands GROUP BY user_id, grp;
TIPConsecutive dates − their row number = the same constant. That constant is the island key — rows sharing it belong to the same streak.

10 Date Generation Recursion

Generate a full date series to fill data gaps
WITH RECURSIVE dates AS (
  SELECT '2024-01-01' AS d
  UNION ALL
  SELECT DATE_ADD(d, INTERVAL 1 DAY)
  FROM dates
  WHERE d < '2024-01-31'
)
SELECT d FROM dates;
TIPLEFT JOIN this series to your data to surface days with zero sales — they won't appear otherwise, which breaks dashboards and moving averages.

11 YoY Growth (Lag) Analytics

Year-over-Year percentage growth via LAG()
WITH yearly AS (
  SELECT YEAR(sale_date) AS yr,
    SUM(revenue) AS rev
  FROM sales GROUP BY yr
)
SELECT yr, rev,
  LAG(rev) OVER (ORDER BY yr) AS prev_rev,
  ROUND(
    100.0 * (rev - LAG(rev) OVER (ORDER BY yr))
    / NULLIF(LAG(rev) OVER (ORDER BY yr), 0),
  2) AS yoy_pct
FROM yearly;
TIPNULLIF(prev_rev, 0) prevents division-by-zero. The first year always returns NULL for yoy_pct — correct, there's no prior year to compare.

12 Monthly Active Users Analytics

COUNT DISTINCT active users per calendar month
WITH monthly AS (
  SELECT
    DATE_FORMAT(event_date, '%Y-%m') AS month,
    user_id
  FROM events
)
SELECT month,
  COUNT(DISTINCT user_id) AS mau
FROM monthly
GROUP BY month
ORDER BY month;
TIPDATE_FORMAT normalizes all dates to YYYY-MM. COUNT(DISTINCT user_id) counts each user only once per month regardless of how many events they fired.

13 Find Missing IDs Advanced

Detect gaps in a numeric sequence
WITH RECURSIVE seq AS (
  SELECT MIN(id) AS n, MAX(id) AS mx
  FROM orders
  UNION ALL
  SELECT n + 1, mx
  FROM seq WHERE n < mx
)
SELECT n AS missing_id
FROM seq
WHERE n NOT IN (
  SELECT id FROM orders
);
TIPGenerate every ID from MIN to MAX, then subtract what exists. What remains = the gaps. On large tables, prefer a LEFT JOIN approach for better performance.

14 String Splitting Advanced

Flatten comma-separated tag strings into rows
WITH RECURSIVE split AS (
  SELECT id,
    TRIM(SUBSTRING_INDEX(tags, ',', 1)) AS tag,
    IF(LOCATE(',', tags) > 0,
      SUBSTRING(tags, LOCATE(',', tags) + 1),
      NULL) AS rest
  FROM products
  UNION ALL
  SELECT id,
    TRIM(SUBSTRING_INDEX(rest, ',', 1)),
    IF(LOCATE(',', rest) > 0,
      SUBSTRING(rest, LOCATE(',', rest) + 1), NULL)
  FROM split WHERE rest IS NOT NULL
)
SELECT id, tag FROM split ORDER BY id;
TIPEach recursive step peels off the leftmost item. Stops when rest IS NULL (no more commas). PostgreSQL users: use STRING_TO_TABLE() instead.

15 Basket Analysis Advanced

Find products frequently bought together
WITH baskets AS (
  SELECT
    a.product_id AS p1,
    b.product_id AS p2,
    COUNT(*) AS freq
  FROM order_items a
  JOIN order_items b
    ON  a.order_id = b.order_id
    AND a.product_id < b.product_id
  GROUP BY a.product_id, b.product_id
)
SELECT p1, p2, freq
FROM baskets
ORDER BY freq DESC LIMIT 10;
TIPSelf-join on the same order pairs every product with every other. p1 < p2 prevents counting (A,B) and (B,A) separately.

16 Pivot Prep Analytics

Conditional aggregation — SQL's manual pivot
WITH raw AS (
  SELECT product, region, revenue FROM sales
)
SELECT product,
  SUM(CASE WHEN region='North' THEN revenue END) AS north,
  SUM(CASE WHEN region='South' THEN revenue END) AS south,
  SUM(CASE WHEN region='East'  THEN revenue END) AS east,
  SUM(CASE WHEN region='West'  THEN revenue END) AS west
FROM raw GROUP BY product;
TIPWrap each SUM in COALESCE(…, 0) to show zeros instead of NULL for missing combinations. PostgreSQL also has a native CROSSTAB() function.

17 Consecutive Wins Advanced

Detect streaks using the double ROW_NUMBER trick
WITH base AS (
  SELECT match_date, result,
    ROW_NUMBER() OVER (ORDER BY match_date) AS rn
  FROM matches
),
streaks AS (
  SELECT match_date, result,
    rn - ROW_NUMBER() OVER (
      PARTITION BY result ORDER BY match_date
    ) AS grp
  FROM base
)
SELECT MIN(match_date) AS start,
  MAX(match_date) AS end,
  COUNT(*) AS streak_len
FROM streaks WHERE result='Win'
GROUP BY grp HAVING COUNT(*) >= 3;
TIPGlobal rn minus per-result rn is constant within a streak. Rows sharing that constant belong to the same winning run. Change HAVING for minimum streak length.

18 Update via CTE DML

CTEs work inside UPDATE, DELETE & INSERT
WITH top_performers AS (
  SELECT id FROM employees
  WHERE dept = 'Engineering'
    AND performance >= 90
)
UPDATE employees
SET salary = salary * 1.10
WHERE id IN (
  SELECT id FROM top_performers
);
TIPCTEs work inside UPDATE, DELETE, and INSERT…SELECT. Use them to keep complex filter logic readable rather than burying it in a subquery inside WHERE.

19 Tree Path Generation Recursion

Build full breadcrumb path from root to each node
WITH RECURSIVE paths AS (
  SELECT id, name,
    CAST(name AS CHAR(1000)) AS path
  FROM categories
  WHERE parent_id IS NULL
  UNION ALL
  SELECT c.id, c.name,
    CONCAT(p.path, ' > ', c.name)
  FROM categories c
  JOIN paths p ON c.parent_id = p.id
)
SELECT id, name, path
FROM paths ORDER BY path;
TIPCAST to CHAR(1000) gives enough room for deep paths. CONCAT builds Electronics > Phones > Smartphones automatically as the tree grows.

20 Insert from CTE DML

Populate a table from a computed CTE result set
WITH vip_candidates AS (
  SELECT customer_id,
    SUM(amount) AS total_spent
  FROM orders
  WHERE order_date >= '2024-01-01'
  GROUP BY customer_id
  HAVING total_spent > 5000
)
INSERT INTO vip_customers
  (customer_id, total_spent, tagged_at)
SELECT customer_id, total_spent, NOW()
FROM vip_candidates;
TIPINSERT … SELECT is the most common CTE + DML pattern. The CTE computes which rows to insert; INSERT writes them. Clean, auditable, and readable.
In Depth

SQL Deep Dives

Visual comparisons · Annotated patterns · Production-ready snippets

Comparison
SUM()vsSUM OVER()
Aggregate collapses rows — window keeps them all. This is the most important window function concept in analytics SQL, and the most commonly misunderstood.
SUM()
SELECT job, SUM(sales)
FROM sales
GROUP BY job;
  • Calculates total for each group
  • Requires GROUP BY
  • Collapses rows — one row per group
VS.
SUM OVER()
SELECT job, sales,
  SUM(sales) OVER (
    PARTITION BY job
  ) AS total_per_job
FROM sales;
  • Calculates total within each partition
  • No GROUP BY needed
  • Keeps ALL rows — repeats total per row
Grouped SUMs — 3 rows returned
jobSUM(sales)
Engineer50
Sales70
Manager90
Window SUMs — all 5 rows kept
jobsalestotal_per_job
Engineer2050
Engineer3050
Sales3070
Sales4070
Manager9090
TIP Rule: Need individual rows AND a group total in the same result? Use SUM() OVER(). Only need one row per group? Use SUM() + GROUP BY. You cannot mix window functions with GROUP BY in the same SELECT level.
Ranking
ROW_NUMBER()vsRANK()vsDENSE_RANK()
All three rank rows — but they differ dramatically when ties exist. Getting this wrong produces silent, hard-to-debug query bugs in production.
ROW_NUMBER() Always Unique
Always assigns a unique sequential number, even to identical values. Never repeats, never skips.
No ties — always 1, 2, 3, 4…
ROW_NUMBER() OVER (
  ORDER BY score DESC
)
✅ Use for: deduplication, pagination, picking exactly one row per partition
RANK() Skips on Ties
Tied rows share the same rank, but the next rank jumps forward by the number of tied rows.
Ties share rank, next rank jumps
RANK() OVER (
  ORDER BY score DESC
)
⚠️ Avoid for "top N" — rank 2 can disappear if two rows tie at rank 1
DENSE_RANK() No Gaps
Tied rows share the same rank and the next rank is always the very next integer — no gaps ever.
Ties share rank, no skipping
DENSE_RANK() OVER (
  ORDER BY score DESC
)
✅ Use for: "Nth highest value" — guaranteed to return a result even with ties
Applied to scores: 90, 90, 85, 80
score ROW_NUMBER() RANK() DENSE_RANK()
90111
90211
8533 ← gap!2
80443
TIP Interview answer: For "find the Nth highest salary" always use DENSE_RANK. With RANK, if two people tie at rank 1, there is no rank 2 — your WHERE rnk = 2 returns zero rows and you get a silent wrong answer.

Snippet Library

Production-ready SQL patterns — copy, adapt, and ship

01 Top N per Group Analytics
WITH ranked AS (
  SELECT *,
    DENSE_RANK() OVER (
      PARTITION BY dept
      ORDER BY salary DESC
    ) AS rnk
  FROM employees
)
SELECT * FROM ranked
WHERE rnk <= 3; -- top 3 per dept
02 Month-over-Month Growth Analytics
WITH monthly AS (
  SELECT
    DATE_FORMAT(order_date, '%Y-%m') AS month,
    SUM(revenue) AS rev
  FROM orders GROUP BY month
)
SELECT month, rev,
  LAG(rev) OVER (ORDER BY month) AS prev_rev,
  ROUND(100.0 *
    (rev - LAG(rev) OVER (ORDER BY month))
    / NULLIF(LAG(rev) OVER (ORDER BY month),0),
  1) AS mom_pct
FROM monthly;
03 Latest Record per Group Practical
-- Most recent order per customer
WITH latest AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY customer_id
      ORDER BY order_date DESC
    ) AS rn
  FROM orders
)
SELECT * FROM latest WHERE rn = 1;
04 Running Balance Analytics
SELECT txn_date, description, amount,
  SUM(
    CASE WHEN type = 'credit' THEN  amount
         WHEN type = 'debit'  THEN -amount
    END
  ) OVER (
    ORDER BY txn_date
    ROWS BETWEEN UNBOUNDED PRECEDING
      AND CURRENT ROW
  ) AS balance
FROM transactions;
05 Percentage of Total Analytics
SELECT dept, name, salary,
  ROUND(
    100.0 * salary /
    SUM(salary) OVER (PARTITION BY dept),
  1) AS pct_of_dept,
  ROUND(
    100.0 * salary /
    SUM(salary) OVER (),
  1) AS pct_of_total
FROM employees;
06 Session Detection Advanced
-- Gap > 30 min = new session
WITH gaps AS (
  SELECT user_id, event_time,
    CASE WHEN TIMESTAMPDIFF(MINUTE,
      LAG(event_time) OVER (
        PARTITION BY user_id
        ORDER BY event_time),
      event_time) > 30
    THEN 1 ELSE 0 END AS new_sess
  FROM events
)
SELECT user_id, event_time,
  SUM(new_sess) OVER (
    PARTITION BY user_id
    ORDER BY event_time
  ) + 1 AS session_id
FROM gaps;

02
Game 1

Guess the Output

Read the table + SQL query, then pick the correct result. Score points and earn XP for each correct answer.

Guess the Output
Score: 0
XP: 0
0x Combo!
30

Choose a difficulty, then pick the correct SQL output before the timer runs out. Score points and earn XP for each correct answer.

Q 1 / 5
Beginner

        
A
Excellent
0
Final Score
0
Correct
0
Wrong
0
XP Earned
0
Best Combo
Answer Review — See what you got right & wrong

03
Game 2

SQL Speed Round

60 seconds. As many correct answers as possible. Build streaks for bonus points.

SQL Speed Round
60
Score: 0

Answer as many SQL questions as you can in 60 seconds. Build streaks for combo multipliers.

60sTimer counts down
×Streak multiplier
+Fast answers = more questions
0No penalty for wrong answers
Question 1
0
Speed Round Score
0
Correct
0
Attempted
0
Best Streak
0%
Accuracy
SQL & Database Engineering

Query to Mastery

From SQL basics to advanced analytics — live online sessions every Sat & Sun, real-world capstone project, and interview preparation built in.

10
Modules
8
Weeks
Sat+Sun
Live Online
June
Next Batch
Foundational — 01 to 05
MODULE 01
Introduction to Databases & SQL
01
MODULE 02
Filtering & Conditions
02
MODULE 03
Sorting, Aggregation & Grouping
03
MODULE 04
JOINs — Connecting Tables
04
MODULE 05
Subqueries & CTEs
05
Advanced — 06 to 10
MODULE 06
Window Functions
06
MODULE 07
Database Design & Normalization
07
MODULE 08
Views, Stored Procedures & Triggers
08
MODULE 09
Performance & Query Optimisation
09
MODULE 10
Real-World Analytics Capstone
10
Course Fee
₹14,000 / one-time
10 live online modules — every Sat & Sun
Real-world capstone project
Interview preparation built in
Completion certificate included
Reserve via WhatsApp
Limited seats· June 2026· Online only

SQL Interview Pack

Crack SQL Interviews.
Land the Job.

Four resources, one payment — delivered on WhatsApp in minutes. No app, no login, no subscription ever.

01
Question Booklet
Monthly Updates
Amazon Google Microsoft Meta
02
SQL Cheat Sheet
JOINs Window Functions CTEs 60+ Queries
03
Deep Dive
20 CTE Patterns RANK Variants SUM vs SUM OVER
Mock Interview
Bonus
30 Min Session Live Feedback Real SQL Problems
₹999 ₹4,999 80% off

One-time payment  ·  No subscription  ·  Delivered on WhatsApp in minutes

Get the SQL Pack on WhatsApp
✓  Copied to clipboard
✓ Link copied!
Question completed!