SQL Playground — Learn SQL by Playing

Question Bank

Real SQL Interview Questions

Curated from Amazon, Google, Microsoft, Meta, Flipkart, Swiggy and LeetCode — organized by company, level, and topic.

Legend

Newbie — Never written SQL? Start here. Zero experience needed.

Interview Boost — High-frequency questions from real SQL rounds. Practise these first.

65 shown

– Newbie

– Beginner

– Intermediate

– Expert

0 / – completed

Newbie0/–

Beginner0/–

Intermediate0/–

Expert0/–

Run a query successfully in the playground to auto-mark it done — progress saves automatically in this browser

Lv 1

Absolute Beginner

0 XP earned

No questions match your search.

🌍 NewbieWHERE

Filter Rows with WHERE

WHERE narrows your result to only rows that match a condition — the most-used clause in SQL

Show all students who scored grade 'A'. Use the WHERE clause to filter rows — only rows where grade = 'A' should appear in the result.

Schema

students

id	name	grade	city
1	Arjun	A	Delhi
2	Priya	B	Mumbai
3	Raj	A	Pune
4	Simran	C	Delhi
5	Karan	A	Chennai

Expected Output

id	name	grade	city
1	Arjun	A	Delhi
3	Raj	A	Pune
5	Karan	A	Chennai

Solution

SELECT *
FROM students
WHERE grade = 'A';

🌍 NewbieORDER BY

Sort Results with ORDER BY

ASC = low to high (default), DESC = high to low — ORDER BY always runs last

Show all products sorted by price from lowest to highest. Then try changing it to highest-to-lowest by adding DESC. ORDER BY always comes at the end of a query.

Schema

products

id	name	category	price
1	Pen	Stationery	10
2	Notebook	Stationery	45
3	Bag	Accessories	350
4	Eraser	Stationery	5
5	Water Bottle	Accessories	120

Expected Output (ASC)

id	name	price
4	Eraser	5
1	Pen	10
2	Notebook	45
5	Water Bottle	120
3	Bag	350

Solution

SELECT id, name, price
FROM products
ORDER BY price ASC;  -- ASC is default; swap DESC for highest first

🌍 NewbieLIKE

Search Patterns with LIKE

% matches any characters; _ matches exactly one — combine with LIKE to search by pattern

Find all customers whose name starts with the letter 'A'. Use LIKE 'A%' — the % wildcard matches any characters after 'A'. Try LIKE '%a' in the playground to find names ending with 'a'.

Schema

customers

id	name	city
1	Ankit	Delhi
2	Priya	Mumbai
3	Aisha	Hyderabad
4	Rahul	Pune
5	Amita	Bengaluru

Expected Output

id	name	city
1	Ankit	Delhi
3	Aisha	Hyderabad
5	Amita	Bengaluru

Solution

SELECT *
FROM customers
WHERE name LIKE 'A%';

-- Try also:
-- WHERE name LIKE '%a'   -- ends with 'a'
-- WHERE name LIKE '_i%'  -- second letter is 'i'

🌍 NewbieGROUP BY

Count Records per Group

GROUP BY collapses rows by a column; COUNT(*) tells you how many rows are in each group

Count how many students are in each city. Use GROUP BY city to group rows by city, then COUNT(*) to count how many students fall in each group.

Schema

students

id	name	grade	city
1	Arjun	A	Delhi
2	Priya	B	Mumbai
3	Raj	A	Delhi
4	Simran	C	Mumbai
5	Karan	A	Chennai

Expected Output

city	student_count
Chennai	1
Delhi	2
Mumbai	2

Solution

SELECT city, COUNT(*) AS student_count
FROM students
GROUP BY city
ORDER BY city;

🌍 NewbieJOIN

Your First JOIN — Combine Two Tables

JOIN links two tables on a shared column so you can see data from both in one result row

Show each order with the customer's name and amount. The customers table has names; orders has amounts. Use INNER JOIN to match them on customer_id.

Schema

customers & orders

customers.id	customers.name	orders.order_id	orders.customer_id	orders.amount
1	Arjun	101	1	500
2	Priya	102	2	300
1	Arjun	103	1	750
3	Karan	104	3	200

Expected Output

order_id	name	amount
101	Arjun	500
102	Priya	300
103	Arjun	750
104	Karan	200

Solution

SELECT o.order_id, c.name, o.amount
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
ORDER BY o.order_id;

🌍 NewbieSELECT

Pick Specific Columns — Don't Always Use *

SELECT * grabs everything; name only the columns you actually need

The HR team needs a report with only employee name and department — they don't want salary or phone visible. Select just those two columns from the employees table.

Schema

employees

id	name	department	salary	phone
1	Arjun	Engineering	75000	9812345678
2	Priya	Marketing	62000	9987654321
3	Raj	Engineering	81000	9776543210

Expected Output

name	department
Arjun	Engineering
Priya	Marketing
Raj	Engineering

Solution

SELECT name, department
FROM employees;

🌍 NewbieWHERE > < !=

WHERE with Comparison Operators

= equals, != not equal, > greater, < less — combine them to filter any range

Show all products where price is greater than ₹100. Then try modifying the query to show products priced less than ₹50 and products where category is NOT 'Food'.

Schema

products

id	name	category	price
1	Rice	Food	45
2	Headphones	Electronics	799
3	Notebook	Stationery	60
4	Pen	Stationery	10
5	Mixer	Appliances	1200

Expected Output (price > 100)

id	name	price
2	Headphones	799
5	Mixer	1200

Solution

SELECT *
FROM products
WHERE price > 100;

-- Also try:
-- WHERE price < 50          -- cheaper than ₹50
-- WHERE category != 'Food'  -- everything except Food

🌍 NewbieAND / OR

Combine Conditions with AND and OR

AND requires both conditions true; OR requires at least one true

Part A — Show employees in the 'Sales' department AND with salary above ₹40,000.
Part B — Show customers from 'Delhi' OR 'Pune'.

Schema

employees

id	name	department	salary
1	Arjun	Sales	38000
2	Priya	Sales	52000
3	Raj	Engineering	70000
4	Simran	Sales	47000

Expected Output (AND)

name	department	salary
Priya	Sales	52000
Simran	Sales	47000

Solution

-- Part A: AND — both conditions must be true
SELECT *
FROM employees
WHERE department = 'Sales' AND salary > 40000;

-- Part B: OR — either condition is enough
-- SELECT * FROM customers WHERE city = 'Delhi' OR city = 'Pune';

🌍 NewbieDISTINCT

Remove Duplicates with SELECT DISTINCT

DISTINCT keeps only one row per unique value — like a de-duplication filter

The orders table has hundreds of rows but you only need to know which unique cities orders have come from. Use SELECT DISTINCT to get each city name exactly once.

Schema

orders

id	customer	city	amount
1	Arjun	Delhi	500
2	Priya	Mumbai	300
3	Raj	Delhi	750
4	Simran	Pune	200
5	Karan	Mumbai	900

Expected Output

city
Delhi
Mumbai
Pune

Solution

SELECT DISTINCT city
FROM orders
ORDER BY city;

🌍 NewbieIS NULL

Find Missing Values with IS NULL

NULL means no value stored — you can't use = NULL, only IS NULL

Find all employees who have not provided a phone number (phone is NULL). Then find those who have provided one (IS NOT NULL).

Schema

employees

id	name	phone	email
1	Arjun	9812345678	a@co.in
2	Priya	NULL	p@co.in
3	Raj	NULL	r@co.in
4	Simran	9776543210	s@co.in

Expected Output (IS NULL)

id	name	phone
2	Priya	NULL
3	Raj	NULL

Solution

SELECT *
FROM employees
WHERE phone IS NULL;

-- Find employees who DO have a phone:
-- WHERE phone IS NOT NULL

🌍 NewbieORDER BY

Sort by Multiple Columns

List columns left to right — SQL sorts by the first, then breaks ties using the next

Show all students sorted by score descending (highest first). When two students have the same score, sort their names alphabetically (A–Z).

Schema

students

id	name	grade	score
1	Raj	10	88
2	Aisha	10	95
3	Priya	10	88
4	Karan	10	72

Expected Output

name	score
Aisha	95
Priya	88
Raj	88
Karan	72

Solution

SELECT name, score
FROM students
ORDER BY score DESC, name ASC;

🌍 NewbieIN

Match a List of Values with IN

IN is a cleaner replacement for writing multiple OR conditions

Show all products that belong to the categories 'Electronics', 'Books', or 'Clothing'. Use IN instead of three separate OR conditions.

Schema

products

id	name	category	price
1	Headphones	Electronics	799
2	Rice	Food	45
3	Novel	Books	299
4	T-Shirt	Clothing	399
5	Mixer	Appliances	1200

Expected Output

name	category	price
Headphones	Electronics	799
Novel	Books	299
T-Shirt	Clothing	399

Solution

SELECT name, category, price
FROM products
WHERE category IN ('Electronics', 'Books', 'Clothing')
ORDER BY category;

🌍 NewbieBETWEEN

Filter a Range with BETWEEN

BETWEEN is inclusive on both ends — equivalent to >= low AND <= high

Show all orders where the amount is between ₹500 and ₹1000 (both ends included). Works for numbers, dates, and text ranges.

Schema

orders

id	customer	amount	order_date
1	Arjun	450	2024-01-05
2	Priya	750	2024-01-08
3	Raj	1200	2024-01-12
4	Simran	500	2024-01-15
5	Karan	999	2024-01-20

Expected Output

customer	amount
Priya	750
Simran	500
Karan	999

Solution

SELECT customer, amount
FROM orders
WHERE amount BETWEEN 500 AND 1000
ORDER BY amount;

🌍 NewbieMIN / MAX

Find the Highest and Lowest Value

MIN and MAX work on numbers, dates, and text — they return a single value

Find the highest and lowest salary in the company with a single query. Also find the most recent joining date using MAX on a date column.

Schema

employees

id	name	salary	join_date
1	Arjun	75000	2021-03-15
2	Priya	62000	2022-07-01
3	Raj	90000	2020-11-20
4	Simran	48000	2023-01-10

Expected Output

highest_salary	lowest_salary	latest_join
90000	48000	2023-01-10

Solution

SELECT
  MAX(salary)    AS highest_salary,
  MIN(salary)    AS lowest_salary,
  MAX(join_date) AS latest_join
FROM employees;

🌍 NewbieSUM / AVG

Total and Average with SUM and AVG

SUM adds all values; AVG divides sum by count — both skip NULLs automatically

Calculate the total revenue and average order value from the sales table in a single query. Round the average to 2 decimal places.

Schema

sales

id	product	amount	region
1	Laptop	45000	North
2	Phone	18000	South
3	Tablet	22000	North
4	Earbuds	3500	West

Expected Output

total_revenue	avg_order
88500	22125.00

Solution

SELECT
  SUM(amount)            AS total_revenue,
  ROUND(AVG(amount), 2) AS avg_order
FROM sales;

🌍 NewbieHAVING

Filter Groups with HAVING

HAVING is WHERE for groups — it runs after GROUP BY so it can see aggregate results

Show only customers whose total order value exceeds ₹1,000. You need GROUP BY to group per customer, then HAVING to filter on the total.

Schema

orders

id	customer	amount
1	Arjun	400
2	Priya	750
3	Arjun	800
4	Karan	200
5	Priya	500

Expected Output

customer	total
Arjun	1200
Priya	1250

Solution

SELECT customer, SUM(amount) AS total
FROM orders
GROUP BY customer
HAVING SUM(amount) > 1000
ORDER BY total DESC;

🌍 NewbieAliases

Rename Columns and Tables with AS

AS gives a column or table a temporary name — it only exists in that query's output

Display each employee's full name (first + last concatenated) as full_name, and rename salary to monthly_pay. Use AS to create column aliases.

Schema

employees

id	first_name	last_name	salary
1	Arjun	Sharma	75000
2	Priya	Mehta	62000
3	Raj	Patel	90000

Expected Output

full_name	monthly_pay
Arjun Sharma	75000
Priya Mehta	62000
Raj Patel	90000

Solution

SELECT
  first_name || ' ' || last_name AS full_name,
  salary AS monthly_pay
FROM employees;

🌍 NewbieLEFT JOIN

LEFT JOIN — Find Customers with No Orders

LEFT JOIN + WHERE right.id IS NULL = the classic "find unmatched rows" pattern

Find all customers who have never placed an order. INNER JOIN would drop them — LEFT JOIN keeps them with NULLs, then filter for those NULLs.

Schema

customers

id	name
1	Arjun
2	Priya
3	Raj
4	Simran

orders

order_id	customer_id
101	1
102	3

Expected Output

name
Priya
Simran

Solution

SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.order_id IS NULL
ORDER BY c.name;

🌍 NewbieUNION

Stack Results from Two Queries with UNION

UNION removes duplicates; UNION ALL keeps every row — both need matching column count and types

Build a combined list of cities from both the customers and suppliers tables — no duplicates. Then try UNION ALL to see duplicates included.

Schema

customers

id	name	city
1	Arjun	Delhi
2	Priya	Mumbai

suppliers

id	name	city
1	Tata Co.	Mumbai
2	Reliance	Pune

Expected Output (UNION)

city
Delhi
Mumbai
Pune

Solution

SELECT city FROM customers
UNION
SELECT city FROM suppliers
ORDER BY city;

-- Try UNION ALL to include the duplicate 'Mumbai':
-- SELECT city FROM customers UNION ALL SELECT city FROM suppliers

🌍 Newbie

Filter a City Table — Multiple WHERE Conditions

Filter rows using multiple AND conditions across different columns

Query all columns for every city in Maharashtra (state_code = 'MH') with a population greater than 500,000.

Schema

city

id	name	state_code	district	population
1	Mumbai	MH	Mumbai City	12478447
2	Pune	MH	Pune	3124458
3	Jaipur	RJ	Jaipur	3046163
4	Nashik	MH	Nashik	1486053
5	Aurangabad	MH	Aurangabad	373311

Expected Output

id	name	state_code	district	population
1	Mumbai	MH	Mumbai City	12478447
2	Pune	MH	Pune	3124458
4	Nashik	MH	Nashik	1486053

Solution

SELECT *
FROM city
WHERE state_code = 'MH'
  AND population > 500000;

🌍 Newbie

Names Only — SELECT One Column + ORDER BY

Select one column and sort alphabetically with ORDER BY

List the names of all cities in Rajasthan (state_code = 'RJ'), sorted alphabetically.

Schema

city (same as Q93)

id	name	state_code	population
3	Jaipur	RJ	3046163
6	Udaipur	RJ	451100
7	Ajmer	RJ	542321

Expected Output

name
Ajmer
Jaipur
Udaipur

Solution

SELECT name
FROM city
WHERE state_code = 'RJ'
ORDER BY name ASC;

🌍 Newbie

Even ID Cities — DISTINCT + Modulo Filter

Use modulo to filter by even/odd IDs and DISTINCT to remove duplicates

List distinct city names where the city's ID is an even number. Use the modulo operator % (or MOD(id, 2) = 0) to check for even numbers.

Schema

city

id	name	state_code
1	Mumbai	MH
2	Pune	MH
2	Pune	MH
4	Nashik	MH
5	Jaipur	RJ

Expected Output

name
Nashik
Pune

Solution

SELECT DISTINCT name
FROM city
WHERE id % 2 = 0
ORDER BY name;

🌍 Newbie

Count Duplicates — COUNT vs COUNT DISTINCT

Detect duplicates by comparing COUNT(*) vs COUNT(DISTINCT col)

Find how many duplicate city names exist in the table. Subtract the number of distinct names from the total count — the difference equals the duplicate count.

Schema

city

id	name
1	Delhi
2	Mumbai
3	Delhi
4	Pune
5	Mumbai

Expected Output

duplicate_count
2

Solution

SELECT
  COUNT(name) - COUNT(DISTINCT name) AS duplicate_count
FROM city;

🌍 Newbie

Big Countries — OR in WHERE

A country is "big" if its area is huge OR its population is huge — either condition qualifies

A country is classified as "big" if it has an area of at least 3,000,000 km² OR a population of at least 25,000,000. Report the name, population, and area for all big countries.

Schema

world

name	continent	area	population
India	Asia	3287263	1380000000
Maldives	Asia	298	540000
Russia	Europe	17098242	144000000
Nepal	Asia	147181	29136808

Expected Output

name	population	area
India	1380000000	3287263
Russia	144000000	17098242
Nepal	29136808	147181

Solution

SELECT name, population, area
FROM world
WHERE area >= 3000000
   OR population >= 25000000;

🌍 Newbie

Find Customers Not Referred by a Specific Person

NULL != 2 is not the same as IS NULL — you need both conditions with OR

Find the names of customers who were not referred by customer with id = 2. Include customers whose referee_id is NULL (they were referred by no one).

Schema

customer

id	name	referee_id
1	Arjun	NULL
2	Priya	NULL
3	Raj	1
4	Simran	2
5	Karan	2
6	Neha	3

Expected Output

name
Arjun
Priya
Raj
Neha

Solution

SELECT name
FROM customer
WHERE referee_id != 2
   OR referee_id IS NULL
ORDER BY name;

🌍 Newbie

Customers Who Never Ordered

LEFT JOIN keeps all customers; filter WHERE order IS NULL to find those with no orders

Find all customers who have never placed an order. Use LEFT JOIN to keep all customers, then filter for rows where the order side is NULL.

Schema

Customers

id	name
1	Arjun
2	Priya
3	Raj
4	Simran

Orders

id	customerId
1	1
2	3

Expected Output

Customers
Priya
Simran

Solution

SELECT c.name AS Customers
FROM Customers c
LEFT JOIN Orders o ON c.id = o.customerId
WHERE o.id IS NULL
ORDER BY c.name;

100

🌍 Newbie

Find Duplicate Emails

GROUP BY email, then HAVING COUNT > 1 keeps only the duplicated ones

Find all email addresses that appear more than once in the Person table. Use GROUP BY to group by email, then HAVING to filter groups with more than 1 row.

Schema

Person

id	email
1	a@example.com
2	b@example.com
3	a@example.com

Expected Output

Email
a@example.com

Solution

SELECT email AS Email
FROM Person
GROUP BY email
HAVING COUNT(*) > 1;

101

🌍 Newbie

Classes with 5 or More Students

GROUP BY class, then HAVING COUNT(student_id) >= 5

Find all classes that have at least 5 students enrolled. Each row in the table represents one student enrolled in one class.

Schema

Courses

student	class
Arjun	Maths
Priya	Maths
Raj	Biology
Simran	Maths
Karan	Maths
Neha	Biology
Aisha	Maths

Expected Output

class
Maths

Solution

SELECT class
FROM Courses
GROUP BY class
HAVING COUNT(student) >= 5;

102

🌍 Newbie

Not Boring Movies — Odd ID + WHERE NOT LIKE

Filter for odd IDs using id % 2 = 1, then exclude boring descriptions

Show all movies with an odd ID and a description that is not 'boring', ordered by rating descending.

Schema

Cinema

id	movie	description	rating
1	War	great 3D	8.9
2	Science	fiction	8.5
3	irish	boring	6.2
4	Ice song	Fantacy	8.6
5	House card	Interesting	9.1

Expected Output

id	movie	description	rating
5	House card	Interesting	9.1
1	War	great 3D	8.9

Solution

SELECT *
FROM Cinema
WHERE id % 2 = 1
  AND description != 'boring'
ORDER BY rating DESC;

103

🌍 Newbie

Authors Who Viewed Their Own Articles

Compare two columns in the same row — no JOIN needed when both columns are in the same table

Find all authors who viewed at least one of their own articles (i.e. author_id = viewer_id). Return distinct author IDs sorted ascending.

Schema

Views

article_id	author_id	viewer_id	view_date
1	3	5	2019-08-01
1	3	6	2019-08-02
2	7	7	2019-08-01
2	7	6	2019-08-02
4	7	1	2019-07-22
3	4	4	2019-07-21

Expected Output

id
4
7

Solution

SELECT DISTINCT author_id AS id
FROM Views
WHERE author_id = viewer_id
ORDER BY id;

104

🌍 Newbie▰ MCQHAVING

WHERE vs HAVING — which one filters after grouping?

One clause runs before GROUP BY, the other after — pick the right one

You want to show only departments where the total salary is above ₹50,000. Your query already has GROUP BY department. Which clause do you add?

HAVING filters groups after GROUP BY runs, so it can see aggregate values like SUM(salary). WHERE filters individual rows before grouping — the aggregate hasn't been computed yet, so WHERE SUM(...) is a syntax error. FILTER is not standard SQL. ORDER BY sorts but never filters.

105

🌍 Newbie▰ MCQNULL

How do you correctly check for NULL values?

NULL is the absence of a value — regular comparison operators don't work on it

A phone column has some rows with no value stored. Which query correctly finds those rows?

NULL is not a value — it means "unknown" or "missing". You cannot compare NULL with = or == because any comparison with NULL returns NULL (neither true nor false). The only correct syntax is IS NULL (or IS NOT NULL). Note: '' (empty string) is a real value, not NULL — they are different.

106

🌍 Newbie▰ MCQDISTINCT

What does SELECT DISTINCT return?

DISTINCT removes something from the result — what exactly?

The orders table has 1,000 rows but only 40 unique cities. What does SELECT DISTINCT city FROM orders return?

SELECT DISTINCT removes duplicate values from the result, keeping only one row per unique value. With 40 unique cities across 1,000 rows, DISTINCT returns exactly 40 rows. It works like GROUP BY city but without any aggregation. No extra clauses are required — DISTINCT is applied directly in SELECT.

107

🌍 Newbie▰ MCQClause Order

In what order does SQL actually execute clauses?

The order you write SQL and the order SQL runs it are different

You write a query with SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY. In which order does SQL actually execute these clauses?
Hint: this is why you can't use a SELECT alias inside WHERE.

SQL execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY.

Think of it as a pipeline: ① pick the table (FROM), ② filter rows (WHERE), ③ collapse into groups (GROUP BY), ④ filter groups (HAVING), ⑤ choose what to show (SELECT), ⑥ sort the output (ORDER BY).

SELECT runs near the end — that's why you cannot use a SELECT alias inside WHERE: the alias hasn't been created yet when WHERE runs. HAVING comes after GROUP BY so it can reference aggregate results like SUM(). ORDER BY is always the very last step.

108

🌍 Newbie▰ MCQLEFT JOIN

What does a LEFT JOIN return when there's no match?

LEFT JOIN always keeps all rows from the left table — but what about the right side?

You LEFT JOIN customers (5 rows) with orders (3 rows). Two customers have no orders at all. How many rows appear in the result?

LEFT JOIN keeps every row from the LEFT table (customers), regardless of whether a match exists in the right table (orders). For the 2 customers with no orders, the order columns appear as NULL in the result. This is the classic way to find "customers with no orders": WHERE orders.id IS NULL after a LEFT JOIN.

109

🌍 Newbie▰ MCQCOUNT

COUNT(*) vs COUNT(column) — do they always give the same result?

One of them ignores NULLs; the other doesn't

A status column has 10 rows, but 3 of them are NULL. What does COUNT(status) return?

COUNT(column_name) skips NULL values — only non-NULL rows are counted. With 3 NULLs out of 10 rows, it returns 7. COUNT(*) counts every row regardless of NULLs — it returns 10. This is a common interview trick: use COUNT(*) for total rows, COUNT(col) for non-null values in a specific column.

110

🌍 Newbie▰ MCQLIKE

LIKE wildcards — % vs _ — which matches more?

One is greedy (matches any length), one is strict (matches exactly one)

Which pattern matches any name that starts with 'A' and ends with 'n', with any number of characters in between (like 'Arjun', 'Adrian', or even just 'An')?

% matches any sequence of characters (including zero). So A%n matches 'An', 'Arjun', 'Adrian' — anything starting with A and ending with n. _ matches exactly one character — A_n only matches 3-character names like 'Abn'. * is not a SQL wildcard (it's a glob pattern from shell/other tools). %A%n% would match any string that contains 'A' somewhere and 'n' somewhere after it.

111

🌍 Newbie▰ MCQORDER BY

ORDER BY without ASC or DESC — what's the default?

SQL has a default sort direction when you don't specify

You write SELECT name FROM employees ORDER BY salary without specifying ASC or DESC. In what order are results returned?

ORDER BY defaults to ASC (ascending) — smallest to largest for numbers, A–Z for text. You must explicitly write DESC if you want descending order. Without any ORDER BY, SQL makes no guarantee about row order — it depends on the query plan and can change between runs, so never assume insertion order.

112

🌍 Newbie▰ MCQUNION

UNION vs UNION ALL — which removes duplicates?

One stacks rows as-is; the other runs a de-duplication step

Table A has 3 rows, Table B has 3 rows. One row exists in both tables. UNION gives ___ rows; UNION ALL gives ___ rows.

113

🌍 Newbie▰ MCQPRIMARY KEY

What rules does a PRIMARY KEY enforce?

PRIMARY KEY is actually a combination of two other constraints

Table students has id as PRIMARY KEY. Row with id=1 already exists. You try:
INSERT INTO students VALUES(1, 'Riya') — what happens? And which general rule does that reveal about PRIMARY KEY?

The INSERT fails with a UNIQUE constraint error — id=1 already exists and PRIMARY KEY enforces that every value must be unique AND non-null. It doesn't matter what the name column says; the entire row is rejected because of the duplicate key. Primary KEY works on any data type (INT, TEXT, etc.). A table can have only one primary key, though it can span multiple columns as a composite key.

114

🌍 Newbie▰ MCQINNER JOIN

INNER JOIN vs LEFT JOIN — which drops unmatched rows?

One join type is strict (both must match); the other is lenient (keep left regardless)

You have 5 customers and 3 orders. Two customers have never ordered. You INNER JOIN customers to orders. How many rows appear in the result?

115

🌍 Newbie▰ MCQBETWEEN

Is BETWEEN inclusive or exclusive at the boundaries?

Does WHERE age BETWEEN 18 AND 25 include 18 and 25 themselves?

A student table has ages 17, 18, 20, 25, 26. Query: WHERE age BETWEEN 18 AND 25. Which ages appear in the result?

BETWEEN is inclusive on BOTH ends. BETWEEN 18 AND 25 is exactly equivalent to >= 18 AND <= 25. So 18 and 25 are both included. This surprises many beginners who assume exclusive boundaries (like Python's range()). If you need exclusive bounds, use > 18 AND < 25 explicitly.

116

🌍 Newbie▰ MCQGROUP BY

GROUP BY rule — which SELECT causes an error?

Every column in SELECT must either be in GROUP BY or inside an aggregate function

Table employees before grouping:

id	name	department	salary
1	Arjun	HR	40000
2	Priya	HR	55000
3	Ravi	IT	70000

After GROUP BY department, HR collapses to 1 row. SQL can't decide which name (Arjun or Priya?) to show. Which SELECT line causes an error?

117

🌍 Newbie▰ MCQDELETE vs DROP

DELETE FROM vs DROP TABLE — what's the difference?

One removes data; the other removes the entire object including its structure

After running DELETE FROM employees, what is left in the database?

DELETE FROM removes rows but keeps the table shell. After DELETE FROM employees, you can still do INSERT INTO employees — the columns, data types, and indexes still exist. DROP TABLE employees removes everything: structure + data, irreversibly. TRUNCATE TABLE employees (not in SQLite) is a faster version of DELETE that empties the table but keeps the structure.

118

🌍 Newbie▰ MCQCOALESCE

What does COALESCE return?

COALESCE scans its arguments left to right and returns the first one that isn't NULL

A row has phone = NULL and email = 'p@co.in'. What does COALESCE(phone, email, 'No contact') return?

COALESCE returns the first non-NULL argument. It scans left to right: phone is NULL → skip. email is 'p@co.in' → return it. 'No contact' is never reached. If both phone and email were NULL, it would return 'No contact'. COALESCE is the standard SQL replacement for "if null, use this fallback" logic. Use it to replace NULLs in reports: COALESCE(discount, 0).

119

🌍 Newbie▰ MCQCASE WHEN

CASE WHEN — which branch runs first?

CASE evaluates conditions top to bottom and stops at the first match

A student scores 85. What does this return?
CASE WHEN score >= 90 THEN 'A' WHEN score >= 80 THEN 'B' ELSE 'C' END

120

🌍 Newbie▰ MCQFOREIGN KEY

What does a FOREIGN KEY constraint enforce?

FOREIGN KEY links two tables — it prevents orphan rows

orders.customer_id has a FOREIGN KEY referencing customers.id. What does this prevent?

121

🌍 Newbie▰ MCQINDEX

What does adding an INDEX to a column primarily improve?

Think of an index as a book's index — it helps you find things faster, but takes space

You add CREATE INDEX idx_email ON users(email). What is the primary benefit?

122

🌍 Newbie▰ MCQSELF JOIN

When do you need a SELF JOIN?

A self join is when a table joins to itself — used for hierarchies and same-table comparisons

Table employees — notice manager_id points to another row in the same table:

id	name	salary	manager_id
1	Sunita	90000	NULL
2	Arjun	60000	1
3	Priya	95000	1

Priya (₹95k) earns more than her manager Sunita (₹90k). To find such employees, you use:

A SELF JOIN joins a table to itself using two different aliases: FROM employees e JOIN employees m ON e.manager_id = m.id WHERE e.salary > m.salary. Alias e = the employee row; alias m = the manager row. This puts both in the same query row so you can compare their salaries. Self joins are used for hierarchies (org charts), finding pairs (matching rows), and comparing a row against related rows in the same table.

123

🌍 Newbie▰ MCQLIMIT

How do you get only the top 5 results in SQLite?

LIMIT restricts how many rows come back — it works together with ORDER BY

You want the 3 most expensive products from the products table. Which query is correct in SQLite?

SQLite (and MySQL/PostgreSQL) uses LIMIT n at the end of the query. ORDER BY must come BEFORE LIMIT — you sort first, then take the top rows. Option C has them reversed (syntax error in most databases). Option A uses TOP n which is SQL Server / MS Access syntax. Option D uses FIRST which is not standard SQL. LIMIT also accepts an OFFSET: LIMIT 3 OFFSET 6 skips the first 6 rows and takes the next 3 — useful for pagination.

124

🌍 Newbie▰ MCQSubquery

What does a scalar subquery return?

A subquery in WHERE produces a value — what type of value?

Query: SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees). What does the inner SELECT AVG(salary) return?

A scalar subquery returns exactly one value (one row, one column). AVG(salary) on the whole table produces a single number — say ₹62,500. The outer query then compares each row's salary against that number. You can absolutely use SELECT inside WHERE — it's called a subquery or nested query and is fundamental SQL. The inner query runs once, the outer query uses its result to filter rows.

125

🌍 Newbie▰ MCQEXISTS

What does WHERE EXISTS check?

EXISTS doesn't care about values — it only asks one question

WHERE EXISTS (SELECT 1 FROM orders WHERE orders.customer_id = customers.id) — what does this check for each customer row?

EXISTS returns TRUE if the subquery produces at least one row — it doesn't care about the values. That's why SELECT 1 is written instead of selecting a real column — EXISTS only checks "did any row come back?" For each customer, SQL runs the inner query. If any order matches → customer passes the EXISTS check. NOT EXISTS is the opposite — keep customers where NO matching order exists (equivalent to LEFT JOIN + IS NULL).

BeginnerSELECT

Find Duplicate Emails

Find all emails that appear more than once in the Person table

Write a SQL query to find all duplicate email addresses in the Person table. An email is a duplicate if it appears more than once.

Schema

Person

id	email
1	a@b.com
2	c@d.com
3	a@b.com

Solution

SELECT email
FROM Person
GROUP BY email
HAVING COUNT(email) > 1;

BeginnerJOINs

Customers Who Never Ordered

Find customers with no matching order record

Find all customers who have never placed an order. A classic LEFT JOIN + NULL check pattern asked in almost every SQL interview.

Schema

Customers

id	name
1	Alice
2	Bob
3	Carol

Orders

id	customerId
1	1
2	1

Solution

SELECT c.name AS Customers
FROM Customers c
LEFT JOIN Orders o ON c.id = o.customerId
WHERE o.id IS NULL;

MicrosoftBeginner

Employees Earning More Than Their Manager

Self-join to compare employee salary with their manager's

Write a query to find employees whose salary is higher than their direct manager's salary. Uses a self-join on the Employee table using managerId.

Schema

Employee

id	name	salary	managerId
1	Joe	70000	3
2	Henry	80000	4
3	Sam	60000	NULL
4	Max	90000	NULL

Solution

SELECT e.name AS Employee
FROM Employee e
JOIN Employee m ON e.managerId = m.id
WHERE e.salary > m.salary;

AmazonBeginnerAggregates

Average Product Rating Per Category

GROUP BY + AVG with ROUND — DataLemur classic

Calculate the average star rating per product category, rounded to 2 decimal places, ordered by category name. A common warm-up question in Amazon data analyst rounds.

Schema

reviews

review_id	product_id	stars
1	101	5
2	102	3
3	101	4

products

product_id	category
101	Electronics
102	Books

Solution

SELECT p.category,
  ROUND(AVG(r.stars), 2) AS avg_rating
FROM reviews r
JOIN products p ON r.product_id = p.product_id
GROUP BY p.category
ORDER BY p.category;

GoogleBeginner

Rising Temperature

Self-join Weather to itself — compare each day's temp to the day before using julianday()

Find all dates where the temperature was higher than the previous day. Use a self-join on the Weather table and julianday() arithmetic to link each row to the row exactly 1 day earlier.

Schema

Weather

id	recordDate	temperature
1	2023-01-01	10
2	2023-01-02	25
3	2023-01-03	20
4	2023-01-04	30

Solution

SELECT w1.id
FROM Weather w1
JOIN Weather w2
  ON julianday(w1.recordDate) - julianday(w2.recordDate) = 1
WHERE w1.temperature > w2.temperature;

ClassicBeginnerSELECT

Second Highest Salary

Asked in almost every SQL interview — multiple approaches

Write a query to find the second highest distinct salary. If there is no second highest salary, return NULL. Most asked SQL question globally — know 3 approaches.

Schema

Employee

id	salary
1	100
2	200
3	300
4	300

Solution (3 approaches)

-- Approach 1: Subquery (most readable)
SELECT MAX(salary) AS SecondHighestSalary
FROM Employee
WHERE salary < (SELECT MAX(salary) FROM Employee);

-- Approach 2: LIMIT + OFFSET
SELECT DISTINCT salary
FROM Employee
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Approach 3: Window function (preferred in interviews)
SELECT salary AS SecondHighestSalary
FROM (
  SELECT salary,
         DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM Employee
) t
WHERE rnk = 2;

FlipkartBeginnerAggregates

Top 5 Customers by Order Count

COUNT + GROUP BY + ORDER BY + LIMIT

Find the top 5 customers by total number of orders placed. Show their name and order count, highest first. Frequently asked in Flipkart/e-commerce data rounds.

Schema

customers

id	name
1	Alice
2	Bob
3	Carol
…	…

orders

id	customer_id	amount
1	1	500
2	1	300
3	2	700
…	…	…

Solution

SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
ORDER BY order_count DESC
LIMIT 5;

AmazonIntermediate

Department Top 3 Salaries

DENSE_RANK() OVER (PARTITION BY department)

Find the top 3 unique salaries in each department. This is the most commonly asked window function question. Uses DENSE_RANK so ties are handled correctly.

Schema

Employee

id	name	salary	deptId
1	Alice	90000	1
2	Bob	80000	1
3	Carol	70000	2

Department

id	name
1	Engineering
2	Marketing

Solution

WITH ranked AS (
  SELECT d.name AS Department,
         e.name AS Employee,
         e.salary,
         DENSE_RANK() OVER (
           PARTITION BY e.deptId
           ORDER BY e.salary DESC
         ) AS rnk
  FROM Employee e
  JOIN Department d ON e.deptId = d.id
)
SELECT Department, Employee, salary
FROM ranked
WHERE rnk <= 3;

MicrosoftIntermediateSubqueries

Employees with Salary Above Department Average

Correlated subquery or CTE approach

Find employees whose salary is above the average salary of their own department. Classic correlated subquery or CTE — commonly asked at Microsoft, LinkedIn, and Oracle.

Schema

employees

id	name	dept	salary
1	Alice	Eng	90000
2	Bob	Eng	70000
3	Carol	HR	60000
4	Dave	HR	55000

Solution

WITH dept_avg AS (
  SELECT dept, AVG(salary) AS avg_sal
  FROM employees
  GROUP BY dept
)
SELECT e.name, e.dept, e.salary
FROM employees e
JOIN dept_avg d ON e.dept = d.dept
WHERE e.salary > d.avg_sal
ORDER BY e.dept, e.salary DESC;

MetaIntermediateDate

Month-over-Month Revenue Growth

LAG() window function for period comparison

Calculate the month-over-month revenue growth percentage. Use LAG() to access the previous month's revenue and compute the percentage change.

Schema

monthly_orders

order_id	amount	order_date
1	1000	2024-01-15
2	800	2024-01-28
3	1500	2024-02-10
4	900	2024-02-22
5	1200	2024-03-05
6	1100	2024-03-18

Solution

WITH monthly AS (
  SELECT
    strftime('%Y-%m', order_date) AS month,
    SUM(amount) AS revenue
  FROM monthly_orders
  GROUP BY 1
)
SELECT
  month,
  revenue,
  LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
  ROUND(
    (revenue - LAG(revenue) OVER (ORDER BY month))
    * 100.0
    / LAG(revenue) OVER (ORDER BY month), 2
  ) AS growth_pct
FROM monthly;

GoogleIntermediateWindow

7-Day Rolling Average of Daily Active Users

AVG() OVER (ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

Calculate a 7-day rolling average of daily active users. Asked extensively at Google, Twitter, and analytics-heavy companies. Tests window frame knowledge.

Schema

user_activity

activity_date	active_users
2024-01-01	1200
2024-01-02	1350
2024-01-03	980

Solution

SELECT
  activity_date,
  active_users,
  ROUND(
    AVG(active_users) OVER (
      ORDER BY activity_date
      ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ), 2
  ) AS rolling_7d_avg
FROM user_activity
ORDER BY activity_date;

SwiggyIntermediateCTEs

Restaurants with Declining Orders Month-over-Month

LAG + filter to find consecutive decline — Swiggy SQL round

Find all restaurants where the order count this month is lower than the previous month. Directly from Swiggy data analyst interview rounds.

Schema

restaurant_orders

restaurant_id	month	order_count
R1	2024-01	500
R1	2024-02	420
R2	2024-01	300
R2	2024-02	350

Solution

WITH monthly AS (
  SELECT
    restaurant_id, month, order_count,
    LAG(order_count) OVER (
      PARTITION BY restaurant_id
      ORDER BY month
    ) AS prev_count
  FROM restaurant_orders
)
SELECT restaurant_id, month, order_count, prev_count
FROM monthly
WHERE order_count < prev_count;

FlipkartIntermediateSubqueries

Customers Who Ordered in Both 2023 and 2024

INTERSECT or double EXISTS/IN — Flipkart retention analysis

Find customers who placed at least one order in both 2023 and 2024 — a classic cohort retention question asked in Flipkart and e-commerce data interviews.

Schema

orders

order_id	customer_id	order_date
1	C1	2023-06-10
2	C1	2024-02-15
3	C2	2024-03-01

Solution

SELECT customer_id
FROM orders
WHERE YEAR(order_date) = 2023

INTERSECT

SELECT customer_id
FROM orders
WHERE YEAR(order_date) = 2024;

ClassicIntermediate

Rank Scores Without Gaps (DENSE_RANK)

RANK vs DENSE_RANK — a must-know difference

Rank all scores from highest to lowest. If two scores are tied, they should have the same rank. The next rank after a tie should NOT skip numbers — this is what DENSE_RANK does.

Schema

Scores

id	score
1	3.50
2	3.65
3	4.00
4	3.65

Solution

SELECT
  score,
  DENSE_RANK() OVER (ORDER BY score DESC) AS rank
FROM Scores
ORDER BY score DESC;

MicrosoftIntermediateWindow

Running Total of Sales by Date

SUM() OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)

Calculate a cumulative/running total of daily sales ordered by date. The most commonly tested window frame concept at Microsoft and Oracle interviews.

Schema

sales

sale_date	amount
2024-01-01	1000
2024-01-02	1500
2024-01-03	800

Solution

SELECT
  sale_date,
  amount,
  SUM(amount) OVER (
    ORDER BY sale_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM sales;

AmazonExpert

Median Employee Salary per Department

ROW_NUMBER + COUNT — no PERCENTILE_CONT in MySQL

Find the median salary of employees in each department. There's no built-in MEDIAN in MySQL — requires row numbering + even/odd count logic. Frequently asked at Amazon and top analytics roles.

Schema

Employee

id	company	salary
1	A	2341
2	A	341
3	A	15000
4	B	15000

Solution

WITH ranked AS (
  SELECT id, company, salary,
    ROW_NUMBER() OVER (PARTITION BY company ORDER BY salary) AS rn,
    COUNT(*) OVER (PARTITION BY company) AS cnt
  FROM Employee
)
SELECT id, company, salary
FROM ranked
WHERE rn IN (
  FLOOR((cnt + 1) / 2),
  CEIL((cnt + 1) / 2)
);

GoogleExpert

Human Traffic of Stadium — 3+ Consecutive Days

Row numbering gap trick to find consecutive sequences

Find all rows where 3 or more consecutive days had stadium traffic ≥ 100 people. The key insight: id - ROW_NUMBER() is constant for consecutive rows — the famous "gap-and-island" trick.

Schema

Stadium

id	visit_date	people
1	2024-01-01	10
2	2024-01-02	109
3	2024-01-03	150
4	2024-01-04	99
5	2024-01-05	145
6	2024-01-06	200
7	2024-01-07	120

Solution (Gap-and-Island)

WITH high_traffic AS (
  SELECT id, visit_date, people,
    id - ROW_NUMBER() OVER (ORDER BY id) AS grp
  FROM Stadium
  WHERE people >= 100
),
groups AS (
  SELECT grp
  FROM high_traffic
  GROUP BY grp
  HAVING COUNT(*) >= 3
)
SELECT h.id, h.visit_date, h.people
FROM high_traffic h
JOIN groups g ON h.grp = g.grp
ORDER BY h.visit_date;

ClassicExpert

Trips and Users — Cancellation Rate

Filter unbanned users + conditional aggregation

Calculate the cancellation rate of requests made by unbanned users between two dates. This multi-join + conditional aggregation problem appears in Uber, Lyft, and analytics interviews.

Schema

Trips

id	client_id	status	request_at
1	1	completed	2024-10-01
2	2	cancelled_by_driver	2024-10-01

Users

users_id	banned	role
1	No	client
2	Yes	client

Solution

SELECT
  t.request_at AS Day,
  ROUND(
    SUM(CASE WHEN t.status != 'completed' THEN 1 ELSE 0 END)
    * 1.0 / COUNT(*), 2
  ) AS 'Cancellation Rate'
FROM Trips t
JOIN Users u ON t.client_id = u.users_id
  AND u.banned = 'No'
WHERE t.request_at BETWEEN '2024-10-01' AND '2024-10-03'
GROUP BY t.request_at;

AmazonExpertWindow

Managers with 5+ Direct Reports

Self-join + HAVING COUNT

Find the names of managers who have at least 5 direct reports. A self-join aggregation problem asked heavily at Amazon, Google, and LinkedIn.

Schema

Employee

id	name	managerId
101	Alice	NULL
102	Bob	101
103	Carol	101
104	Dave	101
105	Eve	101
106	Frank	101

Solution

SELECT m.name
FROM Employee e
JOIN Employee m ON e.managerId = m.id
GROUP BY m.id, m.name
HAVING COUNT(e.id) >= 5;

MetaExpertCTEs

Consecutive Login Streak per User

DATE - ROW_NUMBER() gap trick for consecutive dates

Find each user's longest consecutive daily login streak. This uses the classic gap-and-island technique: subtracting ROW_NUMBER from the date to group consecutive days. Asked at Meta and Google.

Schema

logins

user_id	login_date
1	2024-01-01
1	2024-01-02
1	2024-01-03
1	2024-01-05
2	2024-01-01
2	2024-01-02

Solution

WITH gaps AS (
  SELECT user_id, login_date,
    date(login_date, '-' || ROW_NUMBER() OVER (
      PARTITION BY user_id ORDER BY login_date
    ) || ' days') AS grp
  FROM logins
),
streaks AS (
  SELECT user_id, grp, COUNT(*) AS streak_len
  FROM gaps
  GROUP BY user_id, grp
)
SELECT user_id, MAX(streak_len) AS longest_streak
FROM streaks
GROUP BY user_id;

⚡ Interview BoostIntermediateAggregates

Zomato Sales Rep Contest — Build the Leaderboard

Each deal row records two sales reps who competed and who closed it — derive the full standings table

Zomato's sales team runs a head-to-head deal contest each quarter. Two reps compete per restaurant deal, and only one wins it. Given the raw contest results, build the full leaderboard showing each rep's deals played, won, lost, and points (2 per win). The catch: a rep can appear as either rep_1 or rep_2 — both columns must be counted.

Schema

deal_contest

rep_1	rep_2	winner
Rahul	Priya	Rahul
Priya	Ankit	Ankit
Sneha	Vikram	Sneha
Vikram	Rahul	Rahul
Ankit	Sneha	Ankit

Expected Output

rep	played	won	lost	points
Rahul	2	2	0	4
Ankit	2	2	0	4
Sneha	2	1	1	2
Priya	2	0	2	0
Vikram	2	0	2	0

Solution

-- UNION ALL gives each rep both sides of every deal
WITH all_deals AS (
  SELECT rep_1 AS rep,
         CASE WHEN winner = rep_1 THEN 1 ELSE 0 END AS win
  FROM deal_contest
  UNION ALL
  SELECT rep_2 AS rep,
         CASE WHEN winner = rep_2 THEN 1 ELSE 0 END AS win
  FROM deal_contest
)
SELECT
  rep,
  COUNT(*)              AS played,
  SUM(win)              AS won,
  COUNT(*) - SUM(win)  AS lost,
  SUM(win) * 2          AS points
FROM all_deals
GROUP BY rep
ORDER BY points DESC, won DESC;

AmazonBeginnerAggregates

Count Orders by Status

Return the number of orders for each status, sorted from most to least frequent

Group the orders table by status and count how many orders fall into each group. Sort the results so the most common status appears first.

Schema

orders

order_id	customer_id	status	order_date
1	101	delivered	2026-01-05
2	102	shipped	2026-01-06
3	103	cancelled	2026-01-07
4	104	delivered	2026-01-08

Solution

SELECT status, COUNT(*) AS order_count
FROM orders
GROUP BY status
ORDER BY order_count DESC;

MicrosoftBeginnerSELECT

Top 5 Highest-Paid Employees

Retrieve the top 5 employees by salary; use name alphabetically as a tiebreaker

Return the 5 employees with the highest salaries. If two employees share the same salary, order them alphabetically by name.

Schema

employees

emp_id	name	department	salary
1	Alice	Engineering	120000
2	Bob	Marketing	85000
3	Carol	Engineering	150000
4	Eve	Engineering	150000

Solution

SELECT emp_id, name, department, salary
FROM employees
ORDER BY salary DESC, name ASC
LIMIT 5;

ClassicBeginnerAggregates

Departments With Average Salary Above 90K

Find departments where average salary exceeds $90,000 — a classic HAVING trap

Find all departments where the average salary exceeds $90,000. Return department name and average salary rounded to 2 decimal places. Many candidates fail this by using WHERE instead of HAVING.

Schema

employees

emp_id	name	department	salary
1	Alice	Engineering	120000
2	Bob	Marketing	85000
3	Carol	Engineering	150000

Solution

SELECT department,
       ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 90000
ORDER BY avg_salary DESC;

MetaBeginnerJOINs

Products With No Sales in 2025

Identify products that had zero sales — classic LEFT JOIN anti-join pattern

Find products that had zero sales transactions in 2025. The trick: put the year filter in the ON clause, not WHERE — otherwise you accidentally convert the LEFT JOIN to an INNER JOIN.

Schema

products

product_id	product_name	category
1	Laptop	Electronics
2	Headphones	Electronics
3	Desk Chair	Furniture

sales

sale_id	product_id	sale_date	revenue
1	1	2025-03-10	2400
2	2	2025-06-15	750
3	1	2024-11-20	1200

Solution

SELECT p.product_id, p.product_name
FROM products p
LEFT JOIN sales s
       ON p.product_id = s.product_id
      AND strftime('%Y', s.sale_date) = '2025'
WHERE s.sale_id IS NULL;

ClassicBeginnerAggregates

Monthly Revenue Summary

Calculate total revenue per month for 2025 using date functions

Calculate the total revenue per month for 2025. Use strftime to extract the month number and sort results chronologically.

Schema

sales

sale_id	product_id	sale_date	revenue
1	1	2025-01-05	900
2	2	2025-01-18	1200
3	3	2025-02-20	600
4	4	2025-04-10	48

Solution

SELECT strftime('%m', sale_date) AS month_num,
       ROUND(SUM(revenue), 2)     AS total_revenue
FROM sales
WHERE strftime('%Y', sale_date) = '2025'
GROUP BY month_num
ORDER BY month_num;

ClassicIntermediate

Employees Earning More Than Their Manager

Self-join on the same table to compare employee salary vs manager salary

Find all employees who earn a higher salary than their direct manager. The key insight: join the employees table to itself using the manager_id foreign key.

Schema

employees

emp_id	name	manager_id	salary
1	CEO	NULL	200000
2	Alice	1	120000
3	Carol	2	130000
4	Dave	2	80000

Solution

SELECT e.name AS employee_name,
       e.salary AS employee_salary,
       m.name   AS manager_name,
       m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.emp_id
WHERE e.salary > m.salary;

AmazonIntermediateSubqueries

Second Highest Salary Per Department

Find the second-highest salary in each department using a self-join (no window functions)

Find the second-highest salary within each department without using window functions. Use a self-join: pair each employee with every higher-paid colleague in the same dept, then HAVING COUNT = 1 means exactly one salary is higher.

Schema

employees

emp_id	name	department	salary
1	Alice	Engineering	120000
2	Carol	Engineering	150000
3	Grace	Marketing	95000
4	Bob	Marketing	85000

Solution

SELECT e1.department, e1.name, e1.salary AS second_highest
FROM employees e1
JOIN employees e2
     ON  e1.department = e2.department
    AND e1.salary < e2.salary
GROUP BY e1.department, e1.name, e1.salary
HAVING COUNT(DISTINCT e2.salary) = 1;

GoogleIntermediateAggregates

Pivot Monthly Sales by Category

Rotate rows into columns using CASE WHEN inside SUM — no PIVOT keyword needed

Pivot monthly revenue so each row is a month and columns show revenue for Electronics, Furniture, and Stationery. Use SUM(CASE WHEN) — the portable approach that works in all SQL dialects.

Schema

products

product_id	product_name	category
1	Laptop	Electronics
3	Chair	Furniture

sales

sale_id	product_id	sale_date	revenue
1	1	2025-01-05	2400
2	3	2025-01-18	900

Solution

SELECT strftime('%Y-%m', s.sale_date) AS month,
       SUM(CASE WHEN p.category = 'Electronics' THEN s.revenue ELSE 0 END) AS electronics,
       SUM(CASE WHEN p.category = 'Furniture'   THEN s.revenue ELSE 0 END) AS furniture,
       SUM(CASE WHEN p.category = 'Stationery'  THEN s.revenue ELSE 0 END) AS stationery
FROM sales s
JOIN products p ON s.product_id = p.product_id
GROUP BY month
ORDER BY month;

MetaIntermediateAggregates

Sending vs Opening Snaps Ratio

Calculate send-to-open time ratio per age group using CASE WHEN inside SUM

For each age group, calculate the ratio of time spent sending snaps vs. opening snaps. Use NULLIF in the denominator to prevent division-by-zero errors. Real Snapchat/Meta question from DataLemur.

Schema

activities

user_id	activity_type	time_spent	age_bucket
1	send	3.5	21-25
1	open	1.5	21-25
2	send	2.0	26-30
2	open	5.0	26-30

Solution

SELECT age_bucket,
       ROUND(
         SUM(CASE WHEN activity_type = 'send' THEN time_spent ELSE 0 END) /
         NULLIF(SUM(CASE WHEN activity_type = 'open' THEN time_spent ELSE 0 END), 0),
       2) AS send_to_open_ratio
FROM activities
GROUP BY age_bucket
ORDER BY age_bucket;

MicrosoftIntermediateWindow

3-Day Rolling Average Revenue

Use ROWS BETWEEN 2 PRECEDING AND CURRENT ROW to build a 3-day rolling window

Calculate a 3-day rolling average of daily revenue. The frame clause ROWS BETWEEN 2 PRECEDING AND CURRENT ROW restricts the window to exactly 3 rows. Using RANGE instead of ROWS would include tied dates — a subtle but important difference.

Schema

daily_revenue

rev_date	revenue
2025-01-01	1000
2025-01-02	1200
2025-01-03	900
2025-01-04	1500
2025-01-05	1100

Solution

SELECT rev_date, revenue,
       ROUND(AVG(revenue) OVER (
         ORDER BY rev_date
         ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
       ), 2) AS rolling_3day_avg
FROM daily_revenue
ORDER BY rev_date;

MetaIntermediateAggregates

Active User Retention Month-Over-Month

Self-join with a month offset to count users active in both current and previous month

Find how many users active in a given month were also active the month before. A self-join links each month to the previous one via a date offset. Real Facebook/Meta DataLemur question.

Schema

user_activity

user_id	activity_month
1	2025-01-01
1	2025-02-01
2	2025-01-01
2	2025-02-01
3	2025-02-01

Solution

SELECT curr.activity_month,
       COUNT(DISTINCT curr.user_id) AS retained_users
FROM user_activity curr
JOIN user_activity prev
     ON  curr.user_id = prev.user_id
    AND  curr.activity_month = date(prev.activity_month, '+1 month')
GROUP BY curr.activity_month
ORDER BY curr.activity_month;

AmazonIntermediateWindow

Users' Third Transaction

ROW_NUMBER OVER PARTITION BY user_id — filter where rn = 3 for the third event

For each user, retrieve their third transaction chronologically. Use ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY txn_date) then filter where rn = 3. Classic DataLemur medium question.

Schema

transactions

txn_id	user_id	amount	txn_date
1	101	50.00	2025-01-01
2	101	75.00	2025-01-15
3	101	120.00	2025-02-01
4	102	90.00	2025-03-01

Solution

WITH ranked AS (
  SELECT *, ROW_NUMBER() OVER (
    PARTITION BY user_id ORDER BY txn_date
  ) AS rn
  FROM transactions
)
SELECT txn_id, user_id, amount, txn_date
FROM ranked
WHERE rn = 3;

AmazonExpertWindow

Nth Highest Salary — Generalized

DENSE_RANK handles ties correctly; LIMIT/OFFSET silently skips tied values

Find the Nth highest distinct salary. DENSE_RANK() is the correct approach — if two people share the top salary, the next distinct value ranks 2nd. Using LIMIT/OFFSET would silently skip ties and return a wrong answer.

Schema

employees

emp_id	name	salary
1	Alice	150000
2	Bob	150000
3	Carol	120000
4	Dave	95000

Solution

WITH ranked_salaries AS (
  SELECT DISTINCT salary,
         DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
  FROM employees
)
SELECT salary AS nth_highest_salary
FROM ranked_salaries
WHERE rnk = 2;  -- change 2 to any N

AmazonExpertWindow

Year-Over-Year Revenue Growth Rate

LAG() shifts prior year's value into the current row — divide for growth %

Calculate year-over-year revenue growth per category. LAG(revenue) OVER (PARTITION BY category ORDER BY year) fetches last year's value. Use NULLIF to guard against division by zero.

Schema

yearly_sales

sale_year	category	revenue
2023	Electronics	500000
2024	Electronics	650000
2025	Electronics	780000

Solution

WITH yoy AS (
  SELECT sale_year, category, revenue,
         LAG(revenue) OVER (
           PARTITION BY category ORDER BY sale_year
         ) AS prev_revenue
  FROM yearly_sales
)
SELECT sale_year, category, revenue,
       ROUND(
         100.0 * (revenue - prev_revenue)
               / NULLIF(prev_revenue, 0), 1
       ) AS yoy_growth_pct
FROM yoy
WHERE prev_revenue IS NOT NULL
ORDER BY category, sale_year;

ClassicExpertWindow

Gaps and Islands: Active Subscription Periods

Subtract row_number as a day interval — consecutive dates produce the same group key

Find each user's continuous active subscription periods (start date, end date, duration). The gaps-and-islands trick: subtracting the row number (as days) from each consecutive active date produces the same constant — revealing each contiguous island.

Schema

subscription_status

user_id	status_date	is_active
1	2025-01-01	1
1	2025-01-02	1
1	2025-01-03	1
1	2025-01-04	0
1	2025-01-05	1

Solution

WITH active_days AS (
  SELECT user_id, status_date,
         ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY status_date) AS rn
  FROM subscription_status
  WHERE is_active = 1
),
islands AS (
  SELECT user_id, status_date,
         date(status_date, '-' || rn || ' days') AS grp
  FROM active_days
)
SELECT user_id,
       MIN(status_date) AS period_start,
       MAX(status_date) AS period_end,
       COUNT(*) AS duration_days
FROM islands
GROUP BY user_id, grp
ORDER BY user_id, period_start;

MicrosoftExpertSubqueries

Recursive CTE: Full Management Hierarchy

Anchor on root nodes (no manager), recursively join children, build path string

Traverse the full org chart from CEO to every employee using a recursive CTE. The anchor selects root nodes (manager_id IS NULL). The recursive part joins each employee to their parent's CTE row, building the hierarchy level and path.

Schema

employees

emp_id	name	manager_id
1	CEO	NULL
2	Alice	1
3	Bob	1
4	Carol	2

Solution

WITH RECURSIVE org AS (
  SELECT emp_id, name, manager_id,
         1 AS lvl, name AS path
  FROM employees
  WHERE manager_id IS NULL
  UNION ALL
  SELECT e.emp_id, e.name, e.manager_id,
         o.lvl + 1,
         o.path || ' > ' || e.name
  FROM employees e
  JOIN org o ON e.manager_id = o.emp_id
)
SELECT emp_id, name, lvl, path
FROM org
ORDER BY lvl, name;

GoogleExpertWindow

Sessionization: Group Events by 30-Min Inactivity

LAG detects gaps > 30 min; cumulative SUM of flags assigns incrementing session IDs

Group user clickstream events into sessions where any gap exceeding 30 minutes starts a new session. Two-CTE pattern: LAG() detects gaps, then SUM(new_session_flag) as a running count assigns the session ID. Real Google/Meta data engineering interview question.

Schema

clickstream

event_id	user_id	event_time
1	1	2025-01-01 09:00
2	1	2025-01-01 09:15
3	1	2025-01-01 10:25
4	1	2025-01-01 11:10

Solution

WITH gaps AS (
  SELECT event_id, user_id, event_time,
    CASE
      WHEN (julianday(event_time) -
            julianday(LAG(event_time) OVER (
              PARTITION BY user_id ORDER BY event_time
            ))) * 1440 > 30
      OR  LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) IS NULL
      THEN 1 ELSE 0
    END AS new_session
  FROM clickstream
),
sessions AS (
  SELECT *,
    SUM(new_session) OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
  FROM gaps
)
SELECT user_id, session_id,
       MIN(event_time) AS session_start,
       MAX(event_time) AS session_end
FROM sessions
GROUP BY user_id, session_id
ORDER BY user_id, session_id;

ClassicExpertWindow

Deduplication: Keep Latest Record Per Customer

ROW_NUMBER ORDER BY updated_at DESC — rn=1 is always the freshest row

A customer table has duplicates from system retries. Keep only the most recent record per customer. ROW_NUMBER() PARTITION BY customer_id ORDER BY updated_at DESC assigns rank 1 to the freshest row. Note: DENSE_RANK would fail here — tied timestamps would both get rank 1.

Schema

customer_records

record_id	customer_id	name	updated_at
1	1001	Alice	2025-01-01 10:00
2	1001	Alice	2025-03-15 14:30
3	1002	Bob	2025-02-01 09:00
4	1002	Bob	2025-02-01 09:00

Solution

WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY customer_id
      ORDER BY updated_at DESC
    ) AS rn
  FROM customer_records
)
SELECT record_id, customer_id, name, updated_at
FROM ranked
WHERE rn = 1;

AmazonExpertWindow

Running Balance: Before and After Each Transaction

SUM UNBOUNDED PRECEDING for balance after; LAG of that for balance before

Show each transaction with the account balance both before and after it. Three-step pattern: (1) convert debits to negative amounts; (2) running SUM ... ROWS UNBOUNDED PRECEDING for balance-after; (3) LAG() of that value for balance-before.

Schema

account_txns

txn_id	account_id	txn_date	amount	txn_type
1	1001	2025-01-05	500	credit
2	1001	2025-01-12	200	debit
3	1001	2025-01-20	1000	credit

Solution

WITH signed AS (
  SELECT *,
    CASE WHEN txn_type = 'credit' THEN  amount
         WHEN txn_type = 'debit'  THEN -amount
    END AS signed_amt
  FROM account_txns
),
running AS (
  SELECT *,
    SUM(signed_amt) OVER (
      PARTITION BY account_id
      ORDER BY txn_date, txn_id
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS balance_after
  FROM signed
)
SELECT txn_id, txn_date, amount, txn_type,
  COALESCE(LAG(balance_after) OVER (
    PARTITION BY account_id ORDER BY txn_date, txn_id
  ), 0) AS balance_before,
  balance_after
FROM running
ORDER BY account_id, txn_date, txn_id;

ClassicExpertWindow

Follow-Up Purchase Detection With LEAD

LEAD() looks at the NEXT row per user — check if the next product is the target item

Find what % of users who bought Product A immediately bought Product B next. LEAD(product) retrieves the very next purchase per user. Filter where current = A and check next = B. Real Apple/DataLemur hard question pattern.

Schema

purchase_history

txn_id	customer_id	product	purchased_at
1	101	iPhone	2025-01-05
2	101	AirPods	2025-01-10
3	102	iPhone	2025-02-01
4	102	Charger	2025-02-05

Solution

WITH next_product AS (
  SELECT customer_id, product AS current_product,
    LEAD(product) OVER (
      PARTITION BY customer_id ORDER BY purchased_at
    ) AS next_product
  FROM purchase_history
)
SELECT ROUND(
  100.0 * COUNT(CASE WHEN next_product = 'AirPods' THEN 1 END) /
  NULLIF(COUNT(*), 0), 1
) AS airpods_followup_pct
FROM next_product
WHERE current_product = 'iPhone';

🌍 NewbieSELECT

Your First SQL Query

SELECT * fetches every row and every column — the starting point of all SQL

Write a query to show all records from the students table. The asterisk * is a wildcard meaning "all columns". This is the very first query every SQL learner writes.

Schema

students

id	name	grade	city
1	Arjun	A	Delhi
2	Priya	B	Mumbai
3	Raj	A	Pune

Solution

SELECT *
FROM students;

🌍 NewbieSELECT

Choose Specific Columns

Only fetch the columns you actually need — avoid pulling unnecessary data

Retrieve only the name and price columns from the products table. In real systems, selecting specific columns is faster than SELECT * because less data travels over the network.

Schema

products

id	name	category	price	stock
1	Mouse	Electronics	799	150
2	Monitor	Electronics	4999	25
3	Notebook	Stationery	299	500

Solution

SELECT name, price
FROM products;

🌍 NewbieSELECT

Filter Rows with WHERE

WHERE lets you pick only the rows that match a condition

Show only employees who work in the Engineering department. The WHERE clause filters rows before they are returned — think of it as the "row filter" in SQL.

Schema

employees

id	name	department	salary
1	Alice	Engineering	95000
2	Bob	Sales	65000
3	Carol	Engineering	110000
4	Dave	HR	70000

Solution

SELECT name, salary
FROM employees
WHERE department = 'Engineering';

🌍 NewbieSELECT

Sort Results with ORDER BY

ASC = lowest to highest (default), DESC = highest to lowest

Retrieve all products and sort them by price from cheapest to most expensive. ORDER BY col ASC is ascending (smallest first); DESC flips it. Without ORDER BY, SQL gives back rows in no guaranteed order.

Schema

products

id	name	price
1	Mouse	799
2	Monitor	4999
3	Notebook	299

Solution

SELECT name, price
FROM products
ORDER BY price ASC;

🌍 NewbieSELECT

Get the Top 3 Students by Score

LIMIT caps the number of rows returned — combine with ORDER BY to get a true "top N"

Return the 3 students with the highest exam scores. Always pair LIMIT with ORDER BY — otherwise you get a random 3 rows, not the top 3.

Schema

students

id	name	score
1	Arjun	88
2	Priya	95
3	Raj	76
4	Sneha	91
5	Vikram	83

Solution

SELECT name, score
FROM students
ORDER BY score DESC
LIMIT 3;

🌍 NewbieAggregates

Count Total Rows with COUNT

COUNT(*) counts every row including NULLs; COUNT(col) skips NULL values in that column

Find the total number of orders in the orders table. COUNT(*) is the most common aggregate function — it counts every row regardless of NULLs. Adding AS total_orders gives the column a readable name.

Schema

orders

id	customer	amount	status
1	Alice	799	delivered
2	Bob	1299	shipped
3	Carol	4999	delivered

Solution

SELECT COUNT(*) AS total_orders
FROM orders;

🌍 NewbieAggregates

Calculate Average Salary with AVG

AVG sums all values and divides by count — it automatically ignores NULL values

Find the average salary of all employees. AVG(salary) adds up every salary and divides by the number of rows. Use ROUND(AVG(salary), 0) to remove decimal places when you want a clean number.

Schema

employees

id	name	salary
1	Alice	95000
2	Bob	72000
3	Carol	110000
4	Dave	83000

Solution

SELECT ROUND(AVG(salary), 0) AS avg_salary
FROM employees;

🌍 NewbieAggregates

Count Employees per Department

GROUP BY collapses rows with the same value into one group for aggregation

Show how many employees are in each department. GROUP BY department creates one row per unique department value, then COUNT(*) counts the employees inside each group. This is the most asked beginner aggregate question.

Schema

employees

id	name	department
1	Alice	Engineering
2	Bob	Sales
3	Carol	Engineering
4	Dave	HR
5	Eve	Sales

Solution

SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department
ORDER BY headcount DESC;

⚡ Interview BoostIntermediateWindow Functions

Pareto Principle — Identify the 20% of Products Driving 80% of Revenue

Use a running cumulative SUM window to rank products and find which ones together cross the 80% revenue threshold

Your analytics team at Flipkart notices that a small group of products drives the majority of revenue. Following the Pareto (80/20) rule, write a SQL query to find all products whose combined revenue — when ranked from highest to lowest — accounts for the first 80% of total revenue. Include the cumulative revenue percentage in the result. This pattern is one of the most frequently asked business analytics questions in SQL interviews.

Schema

products

product_id	product_name	revenue
1	Premium Laptop	50000
2	4K Monitor	30000
3	Keyboard	8000
4	USB-C Hub	6000
5	Mouse Pad	3000
6	Webcam	2000
7	Desk Organiser	800
8	Cable Clips	200

Solution 1 — Running Sum (Best for Interviews)

-- Running cumulative SUM to pinpoint the 80% revenue threshold
WITH ranked AS (
  SELECT
    product_id, product_name, revenue,
    SUM(revenue) OVER (
      ORDER BY revenue DESC
      ROWS UNBOUNDED PRECEDING
    ) AS running_total,
    SUM(revenue) OVER () AS total_revenue
  FROM products
)
SELECT
  product_id, product_name, revenue,
  ROUND(running_total * 100.0 / total_revenue, 1) AS cumulative_pct
FROM ranked
WHERE running_total - revenue < total_revenue * 0.8;

Solution 2 — NTILE (Top 20% by Row Count)

-- NTILE(5) splits into 5 equal buckets; bucket 1 = top 20% of rows
SELECT product_id, product_name, revenue
FROM (
  SELECT product_id, product_name, revenue,
    NTILE(5) OVER (ORDER BY revenue DESC) AS quintile
  FROM products
) t
WHERE quintile = 1;

Solution 3 — PERCENT_RANK (Top 20% by Relative Rank)

-- PERCENT_RANK = 0 for the top row; ≤ 0.2 means top 20% of products
SELECT product_id, product_name, revenue
FROM (
  SELECT product_id, product_name, revenue,
    PERCENT_RANK() OVER (ORDER BY revenue DESC) AS pct_rank
  FROM products
) t
WHERE pct_rank <= 0.2;

⚡ Interview BoostIntermediateCTEs

New vs Repeat Customer Report — Daily Acquisition & Retention Breakdown

Classify each order as new or repeat by comparing its date to the customer's first-ever order date, then count both types per day

Swiggy's growth team needs a daily customer classification report. For each order date, determine whether each order belongs to a new customer (placing their very first order on that day) or a repeat customer (who ordered at least once before). Output each date with the count of new and repeat customers side by side. This is one of the most common business analytics questions in data analyst interviews — it tests your ability to break a problem into steps using CTEs and conditional aggregation.

Schema

orders

customer_id	order_date	amount
C1	2024-01-10	250
C2	2024-01-10	180
C1	2024-01-11	320
C3	2024-01-11	150
C2	2024-01-12	200
C4	2024-01-12	90
C1	2024-01-12	410
C3	2024-01-13	270
C5	2024-01-13	130

Solution 1 — CTE + JOIN + CASE WHEN (Best for Interviews)

-- Step 1: find each customer's first ever order date
WITH first_order AS (
  SELECT customer_id, MIN(order_date) AS first_date
  FROM orders
  GROUP BY customer_id
)
-- Step 2: join back, classify, then aggregate per day
SELECT
  o.order_date,
  SUM(CASE WHEN o.order_date = f.first_date THEN 1 ELSE 0 END) AS new_customers,
  SUM(CASE WHEN o.order_date > f.first_date THEN 1 ELSE 0 END) AS repeat_customers
FROM orders o
JOIN first_order f ON o.customer_id = f.customer_id
GROUP BY o.order_date
ORDER BY o.order_date;

Solution 2 — Window MIN() in CTE (No JOIN needed)

-- MIN() OVER (PARTITION BY customer_id) gives first date without a separate GROUP BY
WITH classified AS (
  SELECT
    customer_id, order_date,
    MIN(order_date) OVER (PARTITION BY customer_id) AS first_date
  FROM orders
)
SELECT
  order_date,
  SUM(CASE WHEN order_date = first_date THEN 1 ELSE 0 END) AS new_customers,
  SUM(CASE WHEN order_date > first_date THEN 1 ELSE 0 END) AS repeat_customers
FROM classified
GROUP BY order_date
ORDER BY order_date;

Solution 3 — Two-CTE Step-by-Step (Most Readable)

-- CTE 1: tag every order as new or repeat
WITH first_order AS (
  SELECT customer_id, MIN(order_date) AS first_date
  FROM orders GROUP BY customer_id
),
tagged AS (
  SELECT o.order_date,
    CASE WHEN o.order_date = f.first_date THEN 'new' ELSE 'repeat' END AS cust_type
  FROM orders o
  JOIN first_order f ON o.customer_id = f.customer_id
)
-- CTE 2: count each type per day
SELECT order_date,
  SUM(CASE WHEN cust_type = 'new'    THEN 1 ELSE 0 END) AS new_customers,
  SUM(CASE WHEN cust_type = 'repeat' THEN 1 ELSE 0 END) AS repeat_customers
FROM tagged
GROUP BY order_date ORDER BY order_date;

⚡ Interview BoostExpertCTEsWindow

Co-Working Space Activity Report — Total Visits, Top Floor & Resources Per Member

Combine COUNT, ROW_NUMBER mode-finding, and STRING_AGG into one member-level summary — the classic three-technique product interview question

Smartworks, a co-working space provider, logs every member check-in: which floor they visit and which resource (laptop station, desktop, monitor, etc.) they use. The analytics team wants a member-level summary report — total check-ins, the floor each member visits most often (the statistical mode, not the maximum), and every distinct resource they have ever used, as a comma-separated list. This question is a staple in product-company interviews because it requires three distinct SQL techniques working together: plain aggregation, window-based mode calculation (SQL has no MODE() function), and string aggregation with deduplication.

Schema

cowork_visits

member	floor	resource
Priya	1	CPU
Priya	1	LAPTOP
Priya	2	MONITOR
Rahul	2	DESKTOP
Rahul	2	DESKTOP
Rahul	1	PRINTER

Solution 1 — Three-CTE Pattern: visit_count + floor_mode + resource_list (SQLite / MySQL)

WITH visit_count AS (
  SELECT member, COUNT(*) AS total_visits
  FROM cowork_visits
  GROUP BY member
),
floor_mode AS (
  -- rank floors by visit frequency; rn=1 is the most visited
  SELECT member, floor,
    ROW_NUMBER() OVER (
      PARTITION BY member
      ORDER BY COUNT(*) DESC
    ) AS rn
  FROM cowork_visits
  GROUP BY member, floor
),
resource_list AS (
  SELECT member,
    GROUP_CONCAT(DISTINCT resource) AS resources
  FROM cowork_visits
  GROUP BY member
)
SELECT
  v.member, v.total_visits,
  f.floor   AS top_floor,
  r.resources
FROM visit_count v
JOIN floor_mode f
  ON v.member = f.member AND f.rn = 1
JOIN resource_list r
  ON v.member = r.member
ORDER BY v.member;

Solution 2 — Two CTEs: floor_mode + summary (PostgreSQL / SQL Server)

WITH floor_mode AS (
  SELECT member, floor,
    ROW_NUMBER() OVER (
      PARTITION BY member
      ORDER BY COUNT(*) DESC
    ) AS rn
  FROM cowork_visits
  GROUP BY member, floor
),
summary AS (
  SELECT member,
    COUNT(*) AS total_visits,
    STRING_AGG(DISTINCT resource, ','
      ORDER BY resource) AS resources
  FROM cowork_visits
  GROUP BY member
)
SELECT
  s.member, s.total_visits,
  f.floor AS top_floor,
  s.resources
FROM summary s
JOIN floor_mode f
  ON s.member = f.member AND f.rn = 1
ORDER BY s.member;

Solution 3 — MySQL / SQLite: GROUP_CONCAT instead of STRING_AGG

-- GROUP_CONCAT for MySQL and SQLite (no STRING_AGG support)
WITH floor_mode AS (
  SELECT member, floor,
    ROW_NUMBER() OVER (
      PARTITION BY member
      ORDER BY COUNT(*) DESC
    ) AS rn
  FROM cowork_visits
  GROUP BY member, floor
),
summary AS (
  SELECT member,
    COUNT(*) AS total_visits,
    GROUP_CONCAT(DISTINCT resource) AS resources
  FROM cowork_visits
  GROUP BY member
)
SELECT
  s.member, s.total_visits,
  f.floor AS top_floor,
  s.resources
FROM summary s
JOIN floor_mode f
  ON s.member = f.member AND f.rn = 1
ORDER BY s.member;

⚡ Interview BoostExpertGaps & IslandsWindow

Server Health Window Report — Group Consecutive Uptime & Downtime Periods

Find contiguous same-status date ranges using the ROW_NUMBER subtraction island trick — SQL's canonical answer to "find consecutive groups"

Razorpay's SRE team runs a daily health check on their payment gateway. Each day is logged as up (all systems operational) or down (incident detected). For SLA reporting and incident post-mortems, the team needs a condensed view: each contiguous window of the same status collapsed into a single row with its start date, end date, and status. This is the classic Gaps and Islands problem — SQL has no built-in way to detect "consecutive same-value runs", so you need a window-function trick to assign each island a unique group key.

Schema

gateway_log

check_date	status
2024-01-01	up
2024-01-02	up
2024-01-03	up
2024-01-04	down
2024-01-05	down
2024-01-06	up

Solution 1 — ROW_NUMBER Subtraction (Most Portable — No Date Math)

WITH grp AS (
  SELECT check_date, status,
    ROW_NUMBER() OVER (ORDER BY check_date) -
    ROW_NUMBER() OVER (
      PARTITION BY status
      ORDER BY check_date
    ) AS grp_id
  FROM gateway_log
)
SELECT
  MIN(check_date) AS start_date,
  MAX(check_date) AS end_date,
  status
FROM grp
GROUP BY grp_id, status
ORDER BY start_date;

Solution 2 — LAG Change Detection + Cumulative SUM

WITH changes AS (
  SELECT check_date, status,
    CASE WHEN LAG(status)
      OVER (ORDER BY check_date) = status
      THEN 0 ELSE 1 END AS is_start
  FROM gateway_log
),
grps AS (
  SELECT check_date, status,
    SUM(is_start) OVER
      (ORDER BY check_date) AS grp_id
  FROM changes
)
SELECT
  MIN(check_date) AS start_date,
  MAX(check_date) AS end_date,
  status
FROM grps
GROUP BY grp_id, status
ORDER BY start_date;

Solution 3 — Date − ROW_NUMBER (SQLite-adapted version of the classic SQL Server approach)

-- SQL Server uses DATEADD(day, -rn, date); SQLite uses DATE(date, '-N days')
WITH grp AS (
  SELECT check_date, status,
    DATE(check_date,
      '-' || ROW_NUMBER() OVER (
        PARTITION BY status
        ORDER BY check_date
      ) || ' days') AS grp_key
  FROM gateway_log
)
SELECT
  MIN(check_date) AS start_date,
  MAX(check_date) AS end_date,
  status
FROM grp
GROUP BY grp_key, status
ORDER BY start_date;

⚡ Interview BoostIntermediateJOIN Row Count

INNER JOIN Row Count — Predict Output Rows with a One-sided Duplicate Key

Formula: for each matching key K, output rows = count(K in t1) × count(K in t2). Key=1 appears twice in t1 and once in t2 — that's 2×1=2 rows, not 1.

A Flipkart data team is auditing a JOIN before writing a pipeline. Table t1 has 3 rows and t2 has 3 rows — yet id=1 is duplicated in t1. A junior engineer assumes INNER JOIN returns 3 rows since both tables have 3 rows. How many rows does t1 INNER JOIN t2 ON t1.id = t2.id actually return — and why does the duplicate key matter?

Schema

id
1
1
2

id
1
2
3

Solution — INNER JOIN: 3 rows (key-by-key multiplication)

-- Key 1: 2 rows in t1 × 1 row in t2 = 2 rows
-- Key 2: 1 row in t1 × 1 row in t2 = 1 row
-- Key 3: no match in t1 → 0 rows
-- Total: 3 rows
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
INNER JOIN t2 ON t1.id = t2.id;

⚡ Interview BoostIntermediateJOIN Explosion

INNER JOIN Explosion — Predict Row Count When Both Tables Have Duplicate Keys

When both sides have duplicates for key=1 (t1 has 2, t2 has 3), INNER JOIN creates a 2×3=6 Cartesian product for that key group — larger than both input tables combined.

Amazon's data team joins two tables before a nightly aggregation. Both tables have duplicate entries for the same key value. A classic interview trap: before running the query, predict the exact row count. Most candidates guess 3 or 4 (min or max of the input table sizes) — both are wrong. The actual result shocks them.

Schema

id
1
1
2

id
1
1
1
3

Solution — INNER JOIN Explosion: 6 rows (Cartesian product per key group)

-- Key 1: 2 rows in t1 × 3 rows in t2 = 6 rows  ← Cartesian explosion!
-- Key 2: no match in t2 → 0 rows
-- Key 3: no match in t1 → 0 rows
-- Total: 6 rows (larger than both input tables!)
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
INNER JOIN t2 ON t1.id = t2.id;

⚡ Interview BoostIntermediateLEFT JOINNULL Handling

LEFT JOIN Row Count — Unmatched Left Rows Survive as NULL

LEFT JOIN = INNER JOIN rows + one NULL-padded row per unmatched t1 key. Key=1 explodes to 6 rows; key=2 (only in t1) adds 1 NULL row. Total: 7.

Same tables as Q55, but switching to LEFT JOIN. LEFT JOIN guarantees every row in t1 appears in the output — matched or not. t1's key=2 has no match in t2. How many rows does the LEFT JOIN return? Compare to Q55's INNER JOIN answer of 6.

Schema

id
1
1
2

id
1
1
1
3

Solution — LEFT JOIN: 7 rows (6 matched + 1 unmatched from t1)

-- INNER part — key=1: 2 × 3 = 6 rows
-- LEFT-only — key=2 in t1, no match in t2: 1 row (t2_id = NULL)
-- key=3 is only in t2 → LEFT JOIN does not preserve it
-- Total: 6 + 1 = 7 rows
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
LEFT JOIN t2 ON t1.id = t2.id;

⚡ Interview BoostIntermediateRIGHT JOINNULL Handling

RIGHT JOIN Row Count — Unmatched Right Rows Survive as NULL

RIGHT JOIN = INNER JOIN rows + one NULL-padded row per unmatched t2 key. Key=3 (only in t2) adds 1 NULL row — same count as LEFT JOIN, but a different unmatched key.

Same tables, now RIGHT JOIN. Every t2 row must appear — matched or not. t2's key=3 has no match in t1 and gets NULL for t1.id. But t1's key=2 (unmatched) is now dropped. Notice: LEFT JOIN and RIGHT JOIN both return 7 rows here, but they preserve different unmatched keys.

Schema

id
1
1
2

id
1
1
1
3

Solution — RIGHT JOIN: 7 rows (6 matched + 1 unmatched from t2)

-- INNER part — key=1: 2 × 3 = 6 rows
-- RIGHT-only — key=3 in t2, no match in t1: 1 row (t1_id = NULL)
-- key=2 is only in t1 → RIGHT JOIN drops it (LEFT JOIN would keep it)
-- Total: 6 + 1 = 7 rows
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
RIGHT JOIN t2 ON t1.id = t2.id;
-- SQLite 3.39+ supports RIGHT JOIN natively
-- Equivalent: FROM t2 LEFT JOIN t1 ON t2.id = t1.id

⚡ Interview BoostExpertAll 4 JOINsFULL OUTER

All 4 JOINs Compared — Predict INNER / LEFT / RIGHT / FULL OUTER Row Counts

INNER first: (2×2)+(1×1)=5. LEFT adds t1-only key=3 → 6. RIGHT adds t2-only key=4 → 6. FULL adds both: 5+1+1=7. Rule: INNER ≤ LEFT, INNER ≤ RIGHT, max(LEFT,RIGHT) ≤ FULL.

The definitive JOIN interview question. Given t1 = [1,1,2,3] and t2 = [1,1,2,4], predict the row count for all four JOIN types. Keys 1 and 2 are shared; key 3 is exclusive to t1; key 4 is exclusive to t2. Master the formula INNER ≤ LEFT, INNER ≤ RIGHT, max(LEFT, RIGHT) ≤ FULL — this appears in senior data engineering interviews at every major tech company.

Schema

id
1
1
2
3

id
1
1
2
4

Solution 1 — INNER JOIN: 5 rows

-- Key 1: 2 (t1) × 2 (t2) = 4 rows
-- Key 2: 1 (t1) × 1 (t2) = 1 row
-- Key 3: only in t1 → 0 rows  |  Key 4: only in t2 → 0 rows
-- INNER JOIN total: 5 rows
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
INNER JOIN t2 ON t1.id = t2.id;

Solution 2 — All 4 JOINs compared (SQLite-compatible FULL OUTER emulation)

-- Results: INNER=5, LEFT=6, RIGHT=6, FULL OUTER=7
SELECT 'INNER JOIN' AS join_type,
       COUNT(*) AS row_count
FROM t1 INNER JOIN t2 ON t1.id = t2.id
UNION ALL
SELECT 'LEFT JOIN', COUNT(*)
FROM t1 LEFT JOIN t2 ON t1.id = t2.id
UNION ALL
SELECT 'RIGHT JOIN', COUNT(*)
FROM t2 LEFT JOIN t1 ON t2.id = t1.id
UNION ALL
SELECT 'FULL OUTER', COUNT(*) FROM (
  SELECT t1.id, t2.id
  FROM t1 LEFT JOIN t2 ON t1.id = t2.id
  UNION ALL
  SELECT t1.id, t2.id
  FROM t2 LEFT JOIN t1 ON t2.id = t1.id
  WHERE t1.id IS NULL
);

⚡ Interview BoostIntermediateNULL TrapJOIN

NULL in JOIN Key — NULLs Never Match in Any JOIN Type

NULL = NULL evaluates to UNKNOWN, not TRUE. Both the t1 NULL and the t2 NULL appear as unmatched rows in LEFT/RIGHT JOIN — they never pair with each other or with anything else.

Zomato's data team joins an orders table (t1) with a customers table (t2). Both tables have a NULL key row — unassigned records. A junior engineer expects the two NULLs to pair up in the INNER JOIN. They won't. SQL uses three-valued logic: the ON clause keeps only rows where the condition is TRUE. NULL = NULL is UNKNOWN — not TRUE — so NULL keys are invisible to INNER JOIN and appear only as unmatched rows in LEFT/RIGHT JOIN.

Schema

id
1
2
NULL

id
1
NULL
3

Solution 1 — INNER JOIN: 1 row (NULLs never match)

-- NULL = NULL → UNKNOWN → treated as no match
-- Key 1: 1 × 1 = 1 row
-- id=2 (t1): no t2 match → 0 rows
-- id=NULL (t1): NULL = NULL is UNKNOWN → 0 rows
-- id=NULL (t2): same — never matches any t1 row
-- Total INNER: 1 row
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
INNER JOIN t2 ON t1.id = t2.id;

Solution 2 — LEFT JOIN: 3 rows (NULL-keyed t1 row survives as unmatched)

-- LEFT JOIN output:
-- (1, 1)     ← matched
-- (2, NULL)  ← t1.id=2 unmatched, t2 side is NULL
-- (NULL, NULL) ← t1.id=NULL unmatched (NULL key never matches)
-- Total LEFT: 3 rows
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
LEFT JOIN t2 ON t1.id = t2.id;
-- To force NULL = NULL to match, use IS (SQLite NULL-safe equals):
-- INNER JOIN t2 ON t1.id IS t2.id  → 2 rows: (1,1) and (NULL,NULL)

⚡ Interview BoostExpertNULL vs EmptyJOIN

Empty String vs NULL in JOIN — '' Matches, NULL Never Does

'' (empty string) is a value of length zero — it equals another '' and DOES match in JOIN. NULL is the absence of any value — NULL = '' is UNKNOWN and never matches anything.

Swiggy's data team finds product IDs contain both NULL entries (missing data) and empty string entries (data entry errors). t1 has key '1', key '', and key NULL. t2 has key '1', key '', and key '2'. In most UIs both NULL and '' display as blank — but SQL treats them completely differently. Predict the INNER JOIN row count and which rows appear.

Schema

id (TEXT)
1
'' (empty)
NULL

id (TEXT)
1
'' (empty)
2

Solution 1 — INNER JOIN: 2 rows ('' matches, NULL does not)

-- '' = '' is TRUE  → empty strings match each other!
-- NULL = '' is UNKNOWN → NULL key never matches
-- Key '1': 1 × 1 = 1 row
-- Key '' : 1 × 1 = 1 row  ← empty string IS a matchable value
-- Key NULL (t1): NULL = anything → UNKNOWN → 0 rows
-- Total INNER: 2 rows
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
INNER JOIN t2 ON t1.id = t2.id;

Solution 2 — Force NULL to match using IS (SQLite NULL-safe equals)

-- Standard ON t1.id = t2.id → 2 rows (NULL never matches)
-- NULL-safe ON t1.id IS t2.id → 2 rows (no NULL in t2 to pair with)
-- To see NULL-safe behavior, add NULL to t2 and compare:
-- INSERT INTO t2 VALUES(NULL);
-- ON t1.id = t2.id   → still 2 rows (NULL = NULL = UNKNOWN)
-- ON t1.id IS t2.id  → 3 rows (NULL IS NULL = TRUE in SQLite)
SELECT t1.id AS t1_id, t2.id AS t2_id
FROM t1
INNER JOIN t2 ON t1.id IS t2.id;
-- PostgreSQL equivalent: ON t1.id IS NOT DISTINCT FROM t2.id

⚡ Interview BoostIntermediateSCD Type 1Data Warehouse

SCD Type 1 — Overwrite: Apply a Price Change with No History Kept

SCD Type 1 simply overwrites the existing row. The old value is gone forever. Use UPDATE, not INSERT — INSERT creates a duplicate row for the same entity.

Flipkart's data warehouse team manages a product dimension table. iPhone 11 was listed at ₹10,000 on 2022-01-01. On 2022-03-01 the price drops to ₹8,000. Apply Slowly Changing Dimension Type 1: overwrite the record. Historical price is not needed — only the current state matters. Which SQL operation correctly implements SCD Type 1?

Schema (current state — before price change)

product

product_id	product_name	price
1	iphone11	10000
2	iphone12	15000

Solution — SCD Type 1: UPDATE the existing row (history discarded)

-- SCD Type 1: overwrite — old price 10000 is lost forever
UPDATE product
SET   price       = 8000,
      update_date = '2022-03-01'
WHERE product_id = 1;

-- After update: only the new price remains
SELECT * FROM product WHERE product_id = 1;

⚡ Interview BoostExpertSCD Type 2Data Warehouse

SCD Type 2 — Full History: Expire the Old Row, Insert the New Row

SCD Type 2 keeps every version. Two-step: (1) set end_date + is_current=0 on the old row, (2) INSERT a new row with start_date + is_current=1. Never UPDATE the price column.

The finance team needs full price history for audits — every price change must be preserved. Apply SCD Type 2: keep the old record with an expiry date and a is_current=0 flag, then insert a new current record. This table can be queried for both current state and any historical point in time.

Schema (before price change — only current row exists)

product_history

product_id	price	is_current
1	10000	1

Solution — SCD Type 2: Expire old row → Insert new row

-- Step 1: expire the current row (close its time window)
UPDATE product_history
SET   end_date   = '2022-02-28',
      is_current = 0
WHERE product_id = 1
  AND is_current = 1;

-- Step 2: insert the new current row (open new time window)
INSERT INTO product_history
  (product_id, product_name, price, start_date, end_date, is_current)
VALUES
  (1, 'iphone11', 8000, '2022-03-01', NULL, 1);

-- Verify: two rows now exist for product_id=1
SELECT product_id, price, start_date, end_date, is_current
FROM product_history
WHERE product_id = 1;

⚡ Interview BoostIntermediateSCD Type 3Data Warehouse

SCD Type 3 — Previous Value Column: Keep Exactly One Version Back

SCD Type 3 adds a prev_price column to the row. On change: shift current_price → prev_price, set new current_price. Only the immediately previous value is retained — older history is gone.

Marketing needs to show "was ₹X, now ₹Y" on the product page — only the previous price matters, not the full history. Apply SCD Type 3: add a prev_price column to the product table. On each price change, shift the current value to prev_price and write the new price to current_price. Simple, compact — but limited to one version back.

Schema (before price change — prev_price is NULL)

product

product_id	current_price	prev_price
1	10000	NULL

Solution — SCD Type 3: Shift current → prev, write new current

-- SCD Type 3: shift column, no new row created
UPDATE product
SET   prev_price    = current_price,   -- 10000 moves to prev
      current_price = 8000,             -- new price
      update_date   = '2022-03-01'
WHERE product_id = 1;

-- After: 1 row, prev_price=10000, current_price=8000
SELECT product_id, current_price, prev_price
FROM product;

-- ⚠ Limitation: if price changes again (8000→7000),
-- prev_price becomes 8000 and 10000 is lost forever.

⚡ Interview BoostIntermediateSCD Type 2Query Pattern

SCD Type 2 — Query Current State: Get Only the Active Row per Product

Filter WHERE is_current = 1 (or equivalently WHERE end_date IS NULL) to get the single active row per entity. Without this filter, all historical rows are returned, mixing stale prices into reports.

After applying SCD Type 2, the product_history table contains both historical and current rows for iPhone 11. An analyst needs only the current (active) price for each product. Without a filter, the query returns every version — mixing old and new prices into the same result set. Which filter isolates only the live record?

Schema (SCD Type 2 table with history)

product_history

product_id	price	is_current
1	10000	0
1	8000	1
2	15000	1

Solution — Filter by is_current = 1 (or end_date IS NULL)

-- Get only the active (current) row per product
SELECT product_id, product_name, price
FROM product_history
WHERE is_current = 1;

-- Equivalent alternative: open-ended rows have no end_date
-- WHERE end_date IS NULL

-- Returns: product 1 → 8000, product 2 → 15000
-- Historical rows (is_current=0) are stored but hidden from day-to-day queries

⚡ Interview BoostExpertSCD Type 2Point-in-Time

SCD Type 2 — Point-in-Time Query: What Was the Price on a Given Date?

Use BETWEEN start_date AND COALESCE(end_date, '9999-12-31') — the COALESCE converts open-ended (NULL) rows into a far-future date so active rows are included in range checks.

SCD Type 2's killer feature: time travel queries. The finance team needs to know what iPhone 11's price was on 2022-02-15 — before the March price drop. The old row (price=10000, start=2022-01-01, end=2022-02-28) covers Feb 15. The new row (price=8000, start=2022-03-01, end=NULL) does not. Querying by exact start_date returns nothing — you need a range check.

Schema (SCD Type 2 table — 2 versions for product 1)

product_history

product_id	price	start_date
1	10000	2022-01-01
1	8000	2022-03-01

Solution — Point-in-time: BETWEEN start_date AND COALESCE(end_date, '9999-12-31')

-- What was the price on 2022-02-15?
SELECT product_id, product_name, price, start_date, end_date
FROM product_history
WHERE product_id = 1
  AND '2022-02-15' BETWEEN start_date
      AND COALESCE(end_date, '9999-12-31');
-- Returns: price=10000 (the Jan 1 – Feb 28 version)
-- COALESCE handles active rows: end_date=NULL → '9999-12-31'
-- so the open-ended row always includes future dates

Solution 2 — Compare any two dates side-by-side

-- Which version was active on Feb 15 vs Apr 01?
SELECT
  product_id,
  price,
  start_date,
  COALESCE(end_date, 'active') AS end_date
FROM product_history
WHERE product_id = 1
  AND (
    '2022-02-15' BETWEEN start_date AND COALESCE(end_date, '9999-12-31')
    OR
    '2022-04-01' BETWEEN start_date AND COALESCE(end_date, '9999-12-31')
  );

⚡ Interview BoostIntermediateAggregates

COUNT(*) vs COUNT(0) vs COUNT(-1) vs COUNT(col) vs COUNT(DISTINCT col)

COUNT(*), COUNT(1), COUNT(0), COUNT(-1) all return the same value — only COUNT(col) and COUNT(DISTINCT col) behave differently

This is one of the most-asked SQL interview tricks. The emp table has 12 rows — the last row is all NULLs. What does each variant of COUNT return, and why? Understanding this proves you know how the SQL engine evaluates aggregate functions under the hood.

Schema

emp

emp_id	emp_name	salary	manager_id	dep_id
1	Ankit	14300	4	100
2	Mohit	15600	5	200
3	Vikas	12100	4	100
4	Rohit	7260	2	100
5	Mudit	15600	6	200
6	Agam	15600	2	200
7	Sanjay	12000	2	200
8	Ashish	7200	2	200
9	Mukesh	7000	6	300
10	Rakesh	8000	6	300
11	Akhil	4000	1	500
NULL	NULL	NULL	NULL	NULL

Expected Output

count_star	count_1	count_0	count_manager	distinct_deps
12	12	12	11	4

Key Rule COUNT(anything non-NULL) = same as COUNT(*). The expression COUNT(0) evaluates 0 for every row — 0 is not NULL — so it counts all 12 rows. COUNT(col) evaluates the column value per row and skips NULLs. COUNT(DISTINCT col) additionally de-duplicates.

Solution

SELECT
  COUNT(*)               AS count_star,    -- 12: every row, NULLs included
  COUNT(1)               AS count_1,       -- 12: literal 1 is never NULL
  COUNT(0)               AS count_0,       -- 12: literal 0 is never NULL
  COUNT(manager_id)      AS count_manager, -- 11: skips the NULL row
  COUNT(DISTINCT dep_id) AS distinct_deps  -- 4:  unique non-NULL dep_ids (100,200,300,500)
FROM emp;

-- Mental model:
-- COUNT(*) / COUNT(literal)  → count rows
-- COUNT(col)                 → count non-NULL values in that column
-- COUNT(DISTINCT col)        → count unique non-NULL values

Why do COUNT(0) and COUNT(-1) equal COUNT(*)?
SQL evaluates the expression inside COUNT for every row. COUNT(0) resolves to the integer 0 for row 1, 0 for row 2 … 0 for row 12. None of those are NULL, so all 12 are counted. The actual value (0, 1, 42, 'hello') doesn't matter — what matters is whether the expression is NULL or not.

⚡ Interview BoostIntermediateAggregatesLEFT JOIN

COUNT(*) vs COUNT(col) in a LEFT JOIN — The Classic NULL Trap

LEFT JOIN produces NULL columns for unmatched rows — COUNT(*) counts them but COUNT(col) doesn't

A very common interview follow-up after COUNT(*) vs COUNT(col): what happens inside a LEFT JOIN? The query below finds all customers and how many orders each has placed. Bob has never ordered — his order_id will be NULL after the LEFT JOIN. Which COUNT catches that?

Schema

customers

id	name
1	Alice
2	Bob
3	Carol

orders

order_id	customer_id	amount
101	1	500
102	1	300
103	3	750

Expected Output

name	total_rows	order_count
Alice	2	2
Bob	1	0
Carol	1	1

Trap Bob's LEFT JOIN row has order_id = NULL. COUNT(*) sees a row and counts it (→ 1). COUNT(o.order_id) sees NULL and skips it (→ 0). In most reporting queries you want COUNT(o.order_id) because 0 orders is more useful than 1.

Solution

SELECT
  c.name,
  COUNT(*)          AS total_rows,   -- counts the NULL row for Bob → 1
  COUNT(o.order_id) AS order_count   -- skips NULL → 0 for Bob ✓
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.name
ORDER BY c.name;

⚡ Interview BoostIntermediateAggregatesConditional Count

Conditional COUNT with CASE — Pivot Row Counts in One Query

COUNT(CASE WHEN ... THEN 1 END) counts only matching rows because unmatched branches return NULL, which COUNT ignores

A natural extension of COUNT(col): combine COUNT with CASE to pivot status counts into columns without multiple subqueries. This pattern appears in nearly every Flipkart, Swiggy, and Zomato interview when asked to "show delivered, shipped, and cancelled orders in one row." The trick works because CASE WHEN false THEN 1 END returns NULL — and COUNT silently skips NULLs.

Schema

orders

order_id	status
1	delivered
2	shipped
3	cancelled
4	delivered
5	delivered
6	shipped

Expected Output

total	delivered	shipped	cancelled
6	3	2	1

Why it works CASE WHEN status='delivered' THEN 1 END returns 1 for delivered rows and NULL for everything else. COUNT ignores NULLs, so only the matching rows are counted. This is a single table scan — far faster than three separate subqueries.

Solution

SELECT
  COUNT(*)                                           AS total,
  COUNT(CASE WHEN status = 'delivered' THEN 1 END) AS delivered,
  COUNT(CASE WHEN status = 'shipped'   THEN 1 END) AS shipped,
  COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled
FROM orders;

-- Equivalent using SUM (also common in interviews):
-- SUM(CASE WHEN status = 'delivered' THEN 1 ELSE 0 END)
-- Both give the same result — COUNT skips NULLs, SUM adds zeros

COUNT(CASE...) vs SUM(CASE...)
Both work but differ in style. COUNT(CASE WHEN x THEN 1 END) leans on NULL-skipping behaviour. SUM(CASE WHEN x THEN 1 ELSE 0 END) is explicit — never returns NULL even on an empty set (returns 0 vs NULL). Use SUM when a guaranteed 0 matters; use COUNT when brevity matters.

⚡ Interview Boost Expert Subqueries / CTEs Recursive CTE

Employee Org Chart — Find All Subordinates at Any Depth Using Recursive CTE

Recursive CTEs split into two parts: an anchor query (runs once) and a recursive member (runs until it returns 0 rows), joined by UNION ALL

The employees table stores a self-referencing hierarchy via manager_id. Write a query to find all direct and indirect reports under Priya (emp_id = 2, VP Engineering) along with their reporting depth. A simple JOIN only finds direct reports — you need a Recursive CTE to walk the tree to any depth.

Schema

employees

emp_id	emp_name	manager_id
1	Rahul (CEO)	NULL
2	Priya (VP Eng)	1
3	Amit (VP Sales)	1
4	Neha (BE Lead)	2
5	Karan (FE Lead)	2
6	Ravi (Sales)	3
7	Sana (Engineer)	4
8	Vijay (Engineer)	5

Expected Output (under Priya)

emp_id	emp_name	reporting_level
4	Neha (BE Lead)	1
5	Karan (FE Lead)	1
7	Sana (Engineer)	2
8	Vijay (Engineer)	2

Recursive Pattern Anchor: seed with direct reports of emp_id=2 (depth=1). Recursive member: join employees back to the CTE on e.manager_id = cte.emp_id, incrementing depth each time. Stops automatically when no new employees match. Add WHERE depth < 10 as a safety guard against circular data.

Solution

WITH RECURSIVE org_tree AS (

  -- Anchor: seed with direct reports of Priya (emp_id = 2)
  SELECT emp_id, emp_name, manager_id, 1 AS depth
  FROM employees
  WHERE manager_id = 2

  UNION ALL

  -- Recursive member: walk one level deeper each iteration
  SELECT e.emp_id, e.emp_name, e.manager_id, ot.depth + 1
  FROM employees e
  INNER JOIN org_tree ot ON e.manager_id = ot.emp_id
  WHERE ot.depth < 10     -- safety guard: prevents infinite loop on bad data

)
SELECT
  emp_id,
  emp_name,
  depth AS reporting_level
FROM org_tree
ORDER BY depth, emp_name;

-- Iteration 1 (anchor):  Neha(4,depth=1), Karan(5,depth=1)
-- Iteration 2 (recursive): Sana(7,depth=2), Vijay(8,depth=2)
-- Iteration 3: no employees report to Sana or Vijay → stops

How the SQL engine executes a Recursive CTE
The engine maintains a working table and a result table. Step 1: run the anchor, put rows into both tables. Step 2: run the recursive member using the working table as input, producing new rows. Step 3: move new rows to the result table, replace the working table. Repeat until the recursive member returns 0 new rows. This is why WHERE depth < 10 is important — circular data (A → B → A) would produce an infinite loop without it.

Syntax note: SQLite and PostgreSQL require WITH RECURSIVE. SQL Server uses just WITH — the optimizer detects the recursion automatically.

⚡ Interview Boost Expert Subqueries / CTEs LEFT JOIN

Daily Channel Breakdown — Classify Users as App-Only, Web-Only, or Both

Two CTEs: one to label each user-date pair, one to generate all date × channel skeletons — then LEFT JOIN so zero rows always appear

An e-commerce platform lets users order via app or web. The purchases table records one row per user per channel per day. Write a query that returns, for each date, the total revenue and distinct user count split into three groups: app-only, web-only, and both. Dates where no one used both channels must still appear with 0 / 0 — they must not be silently dropped.

Schema

purchases

user_id	order_date	channel	amount
1	2024-01-01	app	250
1	2024-01-01	web	180
2	2024-01-01	app	320
3	2024-01-01	web	150
2	2024-01-02	app	400
4	2024-01-02	web	200

Expected Output

order_date	channel	total_revenue	total_users
2024-01-01	app	320	1
2024-01-01	web	150	1
2024-01-01	both	430	1
2024-01-02	app	400	1
2024-01-02	web	200	1
2024-01-02	both	0	0

Two-Step Pattern CTE 1 classifies each (user, date) as 'app', 'web', or 'both' using COUNT(DISTINCT channel) = 2. CTE 2 uses UNION ALL to generate every date × channel skeleton — this is what guarantees both = 0, 0 on 2024-01-02 instead of a missing row.

Solution

WITH user_channel AS (
  SELECT
    user_id,
    order_date,
    SUM(amount)                           AS total_revenue,
    CASE
      WHEN COUNT(DISTINCT channel) = 2 THEN 'both'
      ELSE MAX(channel)                  -- 'app' or 'web'
    END                                   AS channel
  FROM purchases
  GROUP BY user_id, order_date
),
all_combos AS (
  -- generate skeleton: every date paired with all 3 channel labels
  SELECT DISTINCT order_date, 'app'  AS channel FROM purchases
  UNION ALL
  SELECT DISTINCT order_date, 'web'  AS channel FROM purchases
  UNION ALL
  SELECT DISTINCT order_date, 'both' AS channel FROM purchases
)
SELECT
  ac.order_date,
  ac.channel,
  COALESCE(SUM(uc.total_revenue), 0) AS total_revenue,
  COUNT(uc.user_id)                   AS total_users
FROM all_combos ac
LEFT JOIN user_channel uc
  ON  ac.order_date = uc.order_date
  AND ac.channel    = uc.channel
GROUP BY ac.order_date, ac.channel
ORDER BY ac.order_date,
         CASE ac.channel
           WHEN 'app'  THEN 1
           WHEN 'web'  THEN 2
           ELSE 3
         END;

-- user_channel: labels each (user, date) — 'app', 'web', or 'both'
-- all_combos:   skeleton rows for every date × channel (guarantees zeros)
-- LEFT JOIN:    fills actual revenue/users; unmatched → NULL → 0

Why UNION ALL and not just GROUP BY?
After the user_channel CTE, only dates that actually had 'both' users produce a 'both' row. On 2024-01-02, nobody used both channels — so a plain GROUP BY returns only 'app' and 'web' rows. The all_combos UNION ALL pre-generates the 'both' skeleton for every date regardless. The LEFT JOIN then matches real data where it exists and leaves NULLs elsewhere. COALESCE(..., 0) and COUNT(uc.user_id) convert those NULLs to 0.

⚡ Interview Boost Expert Recursive CTE String Functions

Most In-Demand Job Skills — Split Comma-Separated Column into Rows

When a column stores multiple values as a comma-separated string, use a Recursive CTE with INSTR + SUBSTR to peel one value per iteration until the string is empty

The job_listings table stores required skills as a single comma-separated string per row (e.g., 'Python,SQL,Excel'). Write a query to find the most in-demand skills across all listings — each skill in a multi-skill listing must be counted as a separate row. There is no STRING_SPLIT() in SQLite; solve it with a Recursive CTE.

Schema

job_listings

job_id	company	required_skills
1	Flipkart	Python,SQL,Excel
2	Swiggy	SQL,Tableau
3	Zomato	Python,Power BI,SQL
4	Amazon	Excel,SQL,Python
5	Myntra	Tableau,Power BI

Expected Output

skill	cnt
SQL	4
Python	3
Excel	2
Power BI	2
Tableau	2

Split Logic INSTR(s, ',') returns position of the first comma (0 if none).
Anchor: extract everything before the first comma as skill; everything after as remaining. If no comma, skill = whole string, remaining = ''.
Recursive: repeat on remaining until remaining = ''.

Solution

WITH RECURSIVE split_skills AS (

  -- Anchor: extract the first skill from each job listing
  SELECT
    job_id,
    TRIM(SUBSTR(required_skills, 1,
      CASE WHEN INSTR(required_skills, ',') > 0
           THEN INSTR(required_skills, ',') - 1   -- up to first comma
           ELSE LENGTH(required_skills) END))   -- or whole string if no comma
      AS skill,
    CASE WHEN INSTR(required_skills, ',') > 0
         THEN SUBSTR(required_skills, INSTR(required_skills, ',') + 1)
         ELSE '' END                        -- everything after the comma
      AS remaining
  FROM job_listings

  UNION ALL

  -- Recursive: peel the next skill from remaining
  SELECT
    job_id,
    TRIM(SUBSTR(remaining, 1,
      CASE WHEN INSTR(remaining, ',') > 0
           THEN INSTR(remaining, ',') - 1
           ELSE LENGTH(remaining) END))
      AS skill,
    CASE WHEN INSTR(remaining, ',') > 0
         THEN SUBSTR(remaining, INSTR(remaining, ',') + 1)
         ELSE '' END
      AS remaining
  FROM split_skills
  WHERE remaining != ''        -- stop when nothing is left

)
SELECT skill, COUNT(*) AS cnt
FROM split_skills
GROUP BY skill
ORDER BY cnt DESC, skill ASC;

-- SQL Server equivalent (much simpler):
-- SELECT value AS skill, COUNT(*) AS cnt
-- FROM job_listings
-- CROSS APPLY STRING_SPLIT(required_skills, ',')
-- GROUP BY value ORDER BY cnt DESC;

How the recursion peels 'Python,SQL,Excel':
Iteration 1 (anchor): INSTR('Python,SQL,Excel', ',') = 7 → skill = 'Python', remaining = 'SQL,Excel'
Iteration 2: INSTR('SQL,Excel', ',') = 4 → skill = 'SQL', remaining = 'Excel'
Iteration 3: INSTR('Excel', ',') = 0 → skill = 'Excel', remaining = ''
Iteration 4: WHERE remaining != '' → stops. Three rows produced for job_id=1.

SQL Server shortcut: STRING_SPLIT(required_skills, ',') with CROSS APPLY does all of the above in one line. The recursive CTE is the portable cross-database equivalent.

⚡ Interview Boost Expert Aggregates Window Functions

Aggregation Mastery — 7 Techniques Every SQL Developer Must Know

Simple aggregate → GROUP BY → HAVING → ROLLUP → GROUPING SETS → Window OVER() → CTE multi-step: each solves a different aggregation shape

The sales table records regional sales across two years. The same dataset is used to demonstrate all 7 aggregation techniques — from a single aggregate function on the whole table all the way to window functions and multi-step CTEs. Knowing which technique to reach for in an interview is what separates intermediate from expert SQL.

Schema

sales

sale_id	region	year	amount
1	North	2023	1000
2	South	2023	1500
3	North	2023	800
4	South	2024	2000
5	North	2024	1200
6	South	2024	900

Method 6 Output — Window Aggregate (active in playground)

sale_id	region	amount	region_total	grand_total
1	North	1000	3000	7400
3	North	800	3000	7400
5	North	1200	3000	7400
2	South	1500	4400	7400
4	South	2000	4400	7400
6	South	900	4400	7400

When to use which 1. No grouping needed? Simple aggregate function. 2. Group-level totals? GROUP BY. 3. Filter after grouping? + HAVING. 4. Subtotals + grand total? ROLLUP. 5. All grouping combos? CUBE / GROUPING SETS. 6. Keep every row + add totals? Window OVER(). 7. Multi-step or % share? CTE.

Solution — All 7 Methods

-- ── METHOD 1: Simple Aggregate — whole-table summary ────────────────
-- SELECT COUNT(*) AS rows, SUM(amount) AS total,
--        AVG(amount) AS avg_amt, MIN(amount) AS min_amt, MAX(amount) AS max_amt
-- FROM sales;
-- ↳ 1 row result: rows=6, total=7400, avg=1233, min=800, max=2000

-- ── METHOD 2: GROUP BY — collapse rows into groups ───────────────────
-- SELECT region, SUM(amount) AS total FROM sales GROUP BY region;
-- ↳ North=3000, South=4400

-- ── METHOD 3: GROUP BY + HAVING — filter groups after aggregating ────
-- SELECT region, SUM(amount) AS total
-- FROM sales GROUP BY region HAVING SUM(amount) > 3000;
-- ↳ Only South (4400) survives the HAVING filter

-- ── METHOD 4: GROUP BY multiple columns ──────────────────────────────
-- SELECT region, year, SUM(amount) AS total
-- FROM sales GROUP BY region, year ORDER BY region, year;
-- ↳ 4 rows: North/2023=1800, North/2024=1200, South/2023=1500, South/2024=2900

-- ── METHOD 5: ROLLUP — subtotals + grand total ───────────────────────
-- Supported: SQL Server / PostgreSQL / MySQL 8+  (NOT SQLite)
-- SELECT region, year, SUM(amount) FROM sales GROUP BY ROLLUP(region, year);
-- ↳ Adds subtotal rows per region + a grand total NULL/NULL row
-- CUBE gives all dimension combinations; GROUPING SETS lets you pick custom ones

-- ── METHOD 6: Window Aggregate — per-row totals, no row collapse ─────
SELECT
  sale_id, region, amount,
  SUM(amount) OVER(PARTITION BY region) AS region_total,
  SUM(amount) OVER()                    AS grand_total
FROM sales
ORDER BY region, sale_id;
-- ↳ Every row kept, region_total repeated per partition, grand_total=7400 on all rows

-- ── METHOD 7: CTE — multi-step aggregation with % share ──────────────
-- WITH rt AS (SELECT region, SUM(amount) AS total FROM sales GROUP BY region)
-- SELECT region, total,
--   ROUND(total * 100.0 / SUM(total) OVER(), 1) AS pct_of_total
-- FROM rt;
-- ↳ North=3000 (40.5%), South=4400 (59.5%)

The key distinction: GROUP BY vs Window OVER()
GROUP BY collapses rows — 6 rows become 2 (one per region). You lose the individual row data.
Window OVER() keeps all rows and adds aggregate columns alongside them — 6 rows stay 6 rows, each gaining region_total and grand_total.

ROLLUP / CUBE / GROUPING SETS (Method 5) are SQL Server / PostgreSQL / MySQL 8+ only — not available in SQLite. SQLite workaround: UNION ALL the GROUP BY queries.

Rule of thumb: if you need a row-level percentage (amount / region_total), you need a Window function — GROUP BY destroys the row before you can divide it.

⚡ Interview Boost Intermediate String Functions

Count Occurrences of a Character or Word in a String — LENGTH minus REPLACE

SQL has no built-in count-occurrences function — subtract the string length after REPLACE from the original length; each removed character reduces length by exactly 1

The strings table holds employee full names. Write a query that returns, for each name: (1) how many times the letter 'k' appears (case-insensitive), and (2) how many words the name contains. Neither COUNT nor any native function solves this directly — the trick uses LENGTH and REPLACE together.

Schema

strings

id	full_name
1	Priya Sharma
2	Ram Kumar Verma
3	Akshay Kumar Ak k
4	Rahul

Expected Output

full_name	k_count	word_count
Akshay Kumar Ak k	4	4
Ram Kumar Verma	1	3
Priya Sharma	0	2
Rahul	0	1

Core Formula Single char: LENGTH(s) - LENGTH(REPLACE(LOWER(s), 'k', '')) — removing N occurrences of 'k' shrinks the string by exactly N.
Multi-char word: (LENGTH(s) - LENGTH(REPLACE(LOWER(s), 'kumar', ''))) / LENGTH('kumar') — divide by word length since each removal shrinks by 5 chars.
Word count: LENGTH(TRIM(s)) - LENGTH(REPLACE(TRIM(s), ' ', '')) + 1 — spaces = words - 1.

Solution

SELECT
  full_name,

  -- Count occurrences of 'k' (case-insensitive)
  LENGTH(full_name) - LENGTH(REPLACE(LOWER(full_name), 'k', ''))
    AS k_count,

  -- Count words: spaces between words = words - 1
  LENGTH(TRIM(full_name))
    - LENGTH(REPLACE(TRIM(full_name), ' ', '')) + 1
    AS word_count

FROM strings
ORDER BY k_count DESC, full_name;

-- Extension: count a multi-char word (divide by pattern length)
-- (LENGTH(full_name) - LENGTH(REPLACE(LOWER(full_name),'kumar','')))
--   / LENGTH('kumar')  AS kumar_count

Why divide by LENGTH(pattern) for multi-char words?
Removing one 'k' (length 1) shrinks the string by 1 → difference = count × 1 → no division needed.
Removing one 'kumar' (length 5) shrinks the string by 5 → difference = count × 5 → divide by 5 to get count.
General rule: occurrences = (LENGTH(s) - LENGTH(REPLACE(LOWER(s), pattern, ''))) / LENGTH(pattern).

Word count edge cases: TRIM strips leading/trailing spaces before counting so " Rahul " still returns 1, not 3.

126

🌍 NewbieCOALESCE

Replace Missing Contact Info

COALESCE returns the first non-NULL value in its argument list — perfect for supplying a default when data is absent

A customer support team has a contacts table. Some customers haven't provided a phone number (stored as NULL). Write a query that shows every customer's name and their phone number — but displays 'N/A' for anyone with a missing number.

Schema

contacts

id	name	phone
1	Riya	9876543210
2	Raj	NULL
3	Priya	9123456789
4	Ankit	NULL

Expected Output

name	phone
Riya	9876543210
Raj	N/A
Priya	9123456789
Ankit	N/A

Solution

SELECT name, COALESCE(phone, 'N/A') AS phone
FROM contacts
ORDER BY id;

127

🌍 NewbieCASE WHEN

Classify Products into Price Tiers

CASE WHEN is SQL's if-else — evaluate conditions top to bottom and return the first match

The marketing team wants to label every product with a pricing tier: 'Budget' (price < 500), 'Mid-Range' (500–2000), or 'Premium' (above 2000). Write a query that returns each product's name, price, and its computed tier, ordered by price.

Schema

products

id	name	price
1	Pen	15
2	Bag	850
3	Laptop	55000
4	Mouse	450
5	Keyboard	1800

Expected Output

name	price	tier
Pen	15	Budget
Mouse	450	Budget
Bag	850	Mid-Range
Keyboard	1800	Mid-Range
Laptop	55000	Premium

Solution

SELECT name, price,
  CASE
    WHEN price < 500   THEN 'Budget'
    WHEN price <= 2000 THEN 'Mid-Range'
    ELSE 'Premium'
  END AS tier
FROM products
ORDER BY price;

128

🌍 NewbieCOUNT DISTINCT

Count Unique Delivery Cities

COUNT(DISTINCT col) counts only unique values — duplicates are collapsed before counting

The logistics team wants to know how many unique cities the company currently ships to. Multiple orders to the same city should count as one. Write a query that returns a single number.

Schema

orders

id	customer	city	amount
1	Rahul	Delhi	500
2	Priya	Mumbai	300
3	Amit	Delhi	700
4	Sneha	Pune	200
5	Rohit	Mumbai	400
6	Kavya	Hyderabad	600

Expected Output

unique_cities
4

Solution

SELECT COUNT(DISTINCT city) AS unique_cities
FROM orders;

129

🌍 NewbieIN

Filter Sales by Selected Regions

IN is shorthand for multiple OR conditions — cleaner and easier to extend

The regional sales manager oversees only three territories: North, South, and West. Write a query to show all sales records from those regions only, sorted by region and then by amount (highest first).

Schema

sales

id	region	amount
1	North	5000
2	East	3000
3	South	7000
4	West	4500
5	North	2000
6	East	1500

Expected Output

id	region	amount
1	North	5000
5	North	2000
3	South	7000
4	West	4500

Solution

SELECT id, region, amount
FROM sales
WHERE region IN ('North', 'South', 'West')
ORDER BY region, amount DESC;

130

BeginnerMIN + GROUP BY

First Login Date per Player

MIN() on a date column gives the earliest — pair with GROUP BY to get one earliest date per group

A gaming company tracks every session in a game_activity table. The growth team needs each player's first login date to measure early engagement and D1 retention. Return one row per player.

Schema

game_activity

player_id	login_date	score
1	2024-01-15	120
1	2024-02-03	85
2	2024-01-10	50
2	2024-03-15	180
3	2024-02-28	95

Expected Output

player_id	first_login
1	2024-01-15
2	2024-01-10
3	2024-02-28

Solution

SELECT player_id, MIN(login_date) AS first_login
FROM game_activity
GROUP BY player_id
ORDER BY player_id;

131

BeginnerLEFT JOIN

Employees Who Did Not Receive a Bonus

LEFT JOIN keeps all rows from the left table — pair with WHERE right side IS NULL to find non-matches

HR wants to identify employees who did not receive a performance bonus this year so they can follow up. Return the names of all employees absent from the bonus table, sorted alphabetically.

Schema

employees

id	name
1	Alice
2	Bob
3	Carol
4	Dave

bonus

employee_id	amount
1	10000
3	15000

Expected Output

name
Bob
Dave

Solution

SELECT e.name
FROM employees e
LEFT JOIN bonus b ON e.id = b.employee_id
WHERE b.employee_id IS NULL
ORDER BY e.name;

132

BeginnerHAVING

Classes with Minimum Enrolment

HAVING filters groups after GROUP BY — use it when your condition involves an aggregate like COUNT

A school only assigns a dedicated teacher to classes that have at least 5 enrolled students. Write a query to return the names of those classes, sorted alphabetically.

Schema

enrollments

student_id	class
1	Math
2	Math
3	Math
4	Math
5	Math
6	Math
7	Physics
8	Physics
9	Physics
10	Physics
11	Chemistry
12	Chemistry
13	Chemistry
14	Chemistry
15	Chemistry

Expected Output

class
Chemistry
Math

Solution

SELECT class
FROM enrollments
GROUP BY class
HAVING COUNT(student_id) >= 5
ORDER BY class;

133

BeginnerWHERE + AND

Active Products with High Ratings

Combine multiple WHERE conditions with AND — all conditions must be true for the row to pass

The product catalogue team wants to feature items that are not discontinued AND have a customer rating above 3.5. Return each qualifying product's id, name, and rating — highest rated first.

Schema

products

id	name	status	rating
1	Wireless Earbuds	active	4.2
2	Old Router	discontinued	4.5
3	Smart Watch	active	4.8
4	USB Hub	active	3.2
5	Webcam	active	3.9

Expected Output

id	name	rating
3	Smart Watch	4.8
1	Wireless Earbuds	4.2
5	Webcam	3.9

Solution

SELECT id, name, rating
FROM products
WHERE status != 'discontinued'
  AND rating > 3.5
ORDER BY rating DESC;

134

IntermediateSelf JOIN

Adjacent Available Seats for Cinema Booking

Self-join on ABS(a.seat_id - b.seat_id) = 1 finds physically adjacent rows — filter both sides for availability

A cinema booking system marks each seat as free (1) or taken (0). Couples need two consecutive free seats. Write a query that returns all seat IDs where the seat and its immediate neighbour are both free, ordered by seat ID.

Schema

cinema

seat_id	free
1	1
2	0
3	1
4	1
5	1
6	0
7	1
8	1

Expected Output

seat_id
3
4
5
7
8

Solution

SELECT DISTINCT a.seat_id
FROM cinema a
JOIN cinema b
  ON ABS(a.seat_id - b.seat_id) = 1
WHERE a.free = 1 AND b.free = 1
ORDER BY a.seat_id;

135

IntermediateCASE + AVG

Same-Day Delivery Rate

Convert boolean conditions to 1/0 with CASE WHEN, then AVG or SUM to calculate percentages

A food delivery platform records the order_date and the customer's preferred_date. The ops team needs to know what percentage of orders were delivered on the same day they were placed. Return one number rounded to 2 decimal places.

Schema

deliveries

id	order_date	preferred_date
1	2024-01-15	2024-01-15
2	2024-01-16	2024-01-18
3	2024-01-17	2024-01-17
4	2024-01-18	2024-01-20
5	2024-01-19	2024-01-19

Expected Output

immediate_pct
60.00

Solution

SELECT ROUND(
  SUM(CASE WHEN order_date = preferred_date THEN 1 ELSE 0 END)
    * 100.0 / COUNT(*),
  2) AS immediate_pct
FROM deliveries;

136

IntermediateGROUP BY + CASE

Monthly Transaction Approval Summary

Pivot row-level status into columns using CASE WHEN inside SUM — one pass through the table, multiple aggregates

The finance team needs a monthly report showing total transaction count, how many were approved, and the total approved amount — all in a single row per month. Transactions have a state column: either 'approved' or 'declined'.

Schema

transactions

id	trans_date	state	amount
1	2024-01-10	approved	1000
2	2024-01-15	declined	500
3	2024-01-20	approved	800
4	2024-02-05	approved	1200
5	2024-02-10	declined	600

Expected Output

month	trans_count	approved_count	approved_total
2024-01	3	2	1800
2024-02	2	1	1200

Solution

SELECT
  strftime('%Y-%m', trans_date) AS month,
  COUNT(*)                    AS trans_count,
  SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count,
  SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total
FROM transactions
GROUP BY month
ORDER BY month;

137

IntermediateSubquery

Customers Who Bought Every Product

Relational division: group purchases by customer, count distinct products, compare to total product count

Identify power buyers — customers who have purchased every single product in the catalogue. Use HAVING to compare each customer's distinct product count against the total number of products.

Schema

products

id	name
1	Widget
2	Gadget
3	Gizmo

purchases

customer_id	product_id
101	1
101	2
101	3
102	1
102	2
103	1
103	2
103	3

Expected Output

customer_id
101
103

Solution

SELECT customer_id
FROM purchases
GROUP BY customer_id
HAVING COUNT(DISTINCT product_id) = (
  SELECT COUNT(*) FROM products
)
ORDER BY customer_id;

138

ExpertLAG + CTE

Find Values That Appear 3+ Times Consecutively

Use LAG() twice to look back 2 rows — if all three match you have a run of 3

A factory logs machine output readings sequentially by ID. Quality control wants to flag any reading value that appears in at least 3 consecutive log entries — this signals a sustained operational state that may need investigation. Return each such value once.

Schema

logs

id	num
1	1
2	1
3	1
4	2
5	1
6	2
7	2
8	2

Expected Output

ConsecutiveNums
1
2

Solution

WITH cte AS (
  SELECT num,
    LAG(num, 1) OVER (ORDER BY id) AS prev1,
    LAG(num, 2) OVER (ORDER BY id) AS prev2
  FROM logs
)
SELECT DISTINCT num AS ConsecutiveNums
FROM cte
WHERE num = prev1 AND num = prev2
ORDER BY num;

139

ExpertCASE WHEN

Classify Org Chart Nodes: Root, Leaf, Inner

Root has no manager (NULL parent), Inner appears as someone else's manager, Leaf has a manager but no direct reports

An organisational chart is stored as a self-referencing table. Classify each person: Root (no manager — the CEO), Inner (has a manager AND manages others), or Leaf (has a manager but no direct reports). Order by id.

Schema

org

id	name	manager_id
1	CEO	NULL
2	Alice	1
3	Bob	1
4	Carol	2
5	Dave	3
6	Eve	3

Expected Output

id	name	node_type
1	CEO	Root
2	Alice	Inner
3	Bob	Inner
4	Carol	Leaf
5	Dave	Leaf
6	Eve	Leaf

Solution

SELECT id, name,
  CASE
    WHEN manager_id IS NULL THEN 'Root'
    WHEN id IN (
      SELECT DISTINCT manager_id FROM org
      WHERE manager_id IS NOT NULL
    ) THEN 'Inner'
    ELSE 'Leaf'
  END AS node_type
FROM org
ORDER BY id;

140

⚡ Interview Boost Intermediate CASE + GROUP BY

Customer Activity Classification

Compare each customer's most recent order date to a reference date — CASE WHEN on a date difference classifies status

The retention team classifies customers as Active (ordered within the last 90 days from 2024-12-31) or Churned (no recent order). Write a query showing each customer's last order date and their status, newest first.

Schema

orders

id	customer_id	order_date
1	101	2024-12-20
2	101	2024-03-10
3	102	2024-05-10
4	103	2024-11-30
5	104	2024-06-15

Expected Output

customer_id	last_order	status
101	2024-12-20	Active
103	2024-11-30	Active
104	2024-06-15	Churned
102	2024-05-10	Churned

Solution

WITH last_order AS (
  SELECT customer_id, MAX(order_date) AS last_order
  FROM orders
  GROUP BY customer_id
)
SELECT customer_id, last_order,
  CASE
    WHEN julianday('2024-12-31') - julianday(last_order) <= 90
      THEN 'Active'
    ELSE 'Churned'
  END AS status
FROM last_order
ORDER BY last_order DESC;

141

⚡ Interview Boost Intermediate Window Functions

Product Revenue Rank with Running Total

Combine RANK() and SUM() OVER() in one query — window functions compose naturally in SELECT

The analytics team wants to understand revenue concentration. For each product, show its revenue, rank among all products, cumulative revenue up to that rank, and what percentage of total revenue that cumulative figure represents. Sort highest revenue first.

Schema

product_sales

product_name	revenue
Analytics Pro	120000
Data Studio	85000
Query Builder	95000
Report Gen	60000
Dashboard	78000

Expected Output

product_name	revenue	rev_rank	cumulative_pct
Analytics Pro	120000	1	27.4
Query Builder	95000	2	49.1
Data Studio	85000	3	68.5
Dashboard	78000	4	86.3
Report Gen	60000	5	100.0

Solution

SELECT
  product_name,
  revenue,
  RANK() OVER (ORDER BY revenue DESC) AS rev_rank,
  ROUND(
    SUM(revenue) OVER (
      ORDER BY revenue DESC
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) * 100.0 / SUM(revenue) OVER (),
  1) AS cumulative_pct
FROM product_sales
ORDER BY revenue DESC;

142

⚡ Interview Boost Expert Pivot / CASE

Quarterly Revenue Pivot per Product

Pivot rows to columns using SUM(CASE WHEN quarter = 'Q1' THEN amount END) — one column per bucket

The CFO wants a single-row-per-product view showing Q1, Q2, Q3, Q4 revenue side by side — a classic pivot. Transform the long-format sales table into wide format using conditional aggregation.

Schema

sales

product_id	sale_date	revenue
1	2024-02-15	3000
1	2024-05-20	4500
1	2024-08-10	2000
1	2024-11-25	5500
2	2024-01-30	1500
2	2024-07-15	2800
2	2024-10-05	3200

Expected Output

product_id	Q1	Q2	Q3	Q4
1	3000	4500	2000	5500
2	1500	0	2800	3200

Solution

SELECT
  product_id,
  SUM(CASE WHEN strftime('%m',sale_date) IN ('01','02','03') THEN revenue ELSE 0 END) AS Q1,
  SUM(CASE WHEN strftime('%m',sale_date) IN ('04','05','06') THEN revenue ELSE 0 END) AS Q2,
  SUM(CASE WHEN strftime('%m',sale_date) IN ('07','08','09') THEN revenue ELSE 0 END) AS Q3,
  SUM(CASE WHEN strftime('%m',sale_date) IN ('10','11','12') THEN revenue ELSE 0 END) AS Q4
FROM sales
GROUP BY product_id
ORDER BY product_id;

143

⚡ Interview Boost Expert Window + CTE

Median Salary Without MEDIAN()

Most databases lack MEDIAN() — use ROW_NUMBER to find the middle row(s) then AVG them

SQL has no built-in MEDIAN() function. Calculate the median salary across all employees using ROW_NUMBER — pick the middle row for odd counts, average the two middle rows for even counts.

Schema

employees

id	name	salary
1	A	45000
2	B	60000
3	C	75000
4	D	85000
5	E	95000
6	F	110000

Expected Output

median_salary
80000.0

Solution

WITH ranked AS (
  SELECT salary,
    ROW_NUMBER() OVER (ORDER BY salary) AS rn,
    COUNT(*) OVER () AS total
  FROM employees
)
SELECT AVG(CAST(salary AS REAL)) AS median_salary
FROM ranked
WHERE rn IN ((total + 1) / 2, (total + 2) / 2);

144

⚡ Interview Boost Expert Funnel Analysis

Purchase Funnel Drop-off Analysis

Count each stage with CASE WHEN inside SUM — compute conversion rates between stages using NULLIF to avoid divide-by-zero

The product team tracks user events through a purchase funnel: view → cart → purchase. Write a single query that returns how many users reached each stage, and the conversion rate between consecutive stages. Rates rounded to 1 decimal.

Schema

funnel_events

user_id	stage
1	view
2	view
3	view
4	view
5	view
6	view
7	view
8	view
1	cart
2	cart
3	cart
4	cart
5	cart
1	purchase
2	purchase
3	purchase

Expected Output

viewed	added_to_cart	purchased	view_to_cart_pct	cart_to_purchase_pct
8	5	3	62.5	60.0

Solution

SELECT
  SUM(CASE WHEN stage='view'     THEN 1 ELSE 0 END) AS viewed,
  SUM(CASE WHEN stage='cart'     THEN 1 ELSE 0 END) AS added_to_cart,
  SUM(CASE WHEN stage='purchase' THEN 1 ELSE 0 END) AS purchased,
  ROUND(
    SUM(CASE WHEN stage='cart'     THEN 1.0 ELSE 0 END) * 100
    / NULLIF(SUM(CASE WHEN stage='view' THEN 1 ELSE 0 END), 0),
  1) AS view_to_cart_pct,
  ROUND(
    SUM(CASE WHEN stage='purchase' THEN 1.0 ELSE 0 END) * 100
    / NULLIF(SUM(CASE WHEN stage='cart' THEN 1 ELSE 0 END), 0),
  1) AS cart_to_purchase_pct
FROM funnel_events;

145

⚡ Interview Boost Expert YoY / Pivot

Year-over-Year Revenue Growth by Category

Self-pivot 2023 and 2024 revenue using CASE WHEN inside SUM, then compute growth % using NULLIF to guard against zero denominators

The business review team needs a year-over-year comparison of revenue by product category. Return each category's 2023 revenue, 2024 revenue, and percentage growth — sorted by growth rate descending so the fastest-growing categories appear first.

Schema

category_sales

category	sale_year	revenue
Electronics	2023	450000
Electronics	2024	520000
Clothing	2023	280000
Clothing	2024	310000
Food	2023	190000
Food	2024	175000

Expected Output

category	rev_2023	rev_2024	yoy_growth_pct
Electronics	450000	520000	15.56
Clothing	280000	310000	10.71
Food	190000	175000	-7.89

Solution

SELECT
  category,
  SUM(CASE WHEN sale_year=2023 THEN revenue ELSE 0 END) AS rev_2023,
  SUM(CASE WHEN sale_year=2024 THEN revenue ELSE 0 END) AS rev_2024,
  ROUND(
    (SUM(CASE WHEN sale_year=2024 THEN revenue ELSE 0 END)
     - SUM(CASE WHEN sale_year=2023 THEN revenue ELSE 0 END))
    * 100.0
    / NULLIF(SUM(CASE WHEN sale_year=2023 THEN revenue ELSE 0 END), 0),
  2) AS yoy_growth_pct
FROM category_sales
GROUP BY category
ORDER BY yoy_growth_pct DESC;

/ Loading…

0/–done

Use Desktop to SolveTo write and run SQL queries for this question, open MarutiNovaTech on a laptop or PC.

SQL

Output Ready

Run your query to see results

Test Cases

No questions match your filters Try removing a filter or selecting "All"

Reference

SQL Cheat Sheet

Every essential clause, function, and pattern — click any card to copy the snippet.

CREATE TABLE

CREATE TABLE employees (
  id        INT PRIMARY KEY,
  name      VARCHAR(100) NOT NULL,
  dept      VARCHAR(50),
  salary    DECIMAL(10,2),
  hire_date DATE
);

INSERT / UPDATE / DELETE

INSERT INTO users (name, email)
VALUES ('Alice', 'a@x.com');

UPDATE users SET status = 'active'
WHERE id = 5;

DELETE FROM logs
WHERE created_at < '2023-01-01';

ORDER BY & LIMIT

SELECT * FROM products
ORDER BY price DESC
LIMIT 10;

-- Skip first 20 rows (pagination)
ORDER BY created_at DESC
LIMIT 10 OFFSET 20;

DISTINCT & NULL handling

SELECT DISTINCT department
FROM employees;

-- Replace NULL with default
SELECT COALESCE(phone, 'N/A')
FROM contacts;

-- NULL check (never use = NULL)
WHERE manager_id IS NULL;

SELECT & WHERE — filtering rows

SELECT name, salary
FROM employees
WHERE dept = 'Engineering'
  AND salary > 80000
ORDER BY salary DESC;

Table: employees

INPUT — all rows

name	dept	salary
Alice	Engineering	90,000
Bob	Marketing	60,000
Carol	Engineering	85,000
Dave	HR	55,000

↓ after WHERE + ORDER BY

OUTPUT — 2 rows returned

name	salary
Alice	90,000
Carol	85,000

TIPWHERE runs before SELECT. It filters rows first, then SELECT picks which columns to show. You can't use a column alias from SELECT inside WHERE.

GROUP BY & HAVING — aggregating groups

SELECT dept,
       COUNT(*) AS headcount,
       AVG(salary) AS avg_sal
FROM employees
GROUP BY dept
HAVING COUNT(*) > 1
ORDER BY avg_sal DESC;

Table: employees

INPUT — 5 rows, 3 departments

name	dept	salary
Alice	Eng	90,000
Bob	Eng	80,000
Carol	HR	55,000
Dave	Mktg	70,000
Eve	HR	60,000

↓ GROUP BY dept + HAVING count > 1

OUTPUT — Mktg removed (only 1 row)

dept	headcount	avg_sal
Eng	2	85,000
HR	2	57,500

NOTEHAVING ≠ WHERE. Use WHERE to filter individual rows before grouping. Use HAVING to filter the grouped results. HAVING COUNT(*) > 1 is valid; WHERE COUNT(*) > 1 is an error.

INNER JOIN

Matching rows only

LEFT JOIN

All left + matching right

RIGHT JOIN

All right + matching left

FULL OUTER

All rows from both tables

INNER JOIN — rows matched in both tables

SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d
  ON e.dept_id = d.id;

-- Only rows where dept_id exists
-- in BOTH tables are returned.
-- Unmatched rows are dropped.

Two tables being joined

employees

name	dept_id
Alice	1
Bob	2
Carol	99

departments

id	dept_name
1	Engineering
2	Marketing

↓ INNER JOIN on dept_id = id

OUTPUT — Carol excluded (dept_id 99 has no match)

name	dept_name
Alice	Engineering
Bob	Marketing

TIPCarol is silently dropped. INNER JOIN only returns rows where the join condition matches in both tables. If an employee has a dept_id that doesn't exist in departments, that employee disappears from results.

LEFT JOIN — keep all left rows, NULL where no match

SELECT e.name, o.order_id
FROM employees e
LEFT JOIN orders o
  ON e.id = o.emp_id;

-- Tip: find employees with NO orders:
SELECT e.name
FROM employees e
LEFT JOIN orders o
  ON e.id = o.emp_id
WHERE o.order_id IS NULL;

employees LEFT JOIN orders

employees

id	name
1	Alice
2	Bob
3	Carol

orders

order_id	emp_id
101	1
102	1

↓ LEFT JOIN (all employees kept)

OUTPUT — Bob & Carol kept with NULL

name	order_id
Alice	101
Alice	102
Bob	NULL
Carol	NULL

NOTEClassic interview pattern: Find records with no match by doing a LEFT JOIN then WHERE right_table.id IS NULL. This is faster than a NOT IN subquery on large tables.

Multi-table JOIN

SELECT e.name, d.name, l.city
FROM employees e
JOIN departments d ON e.dept_id = d.id
JOIN locations l  ON d.loc_id  = l.id
WHERE l.city = 'Mumbai';

SELF JOIN

-- Find each employee & their manager
SELECT e.name  AS employee,
       m.name  AS manager
FROM employees e
LEFT JOIN employees m
  ON e.manager_id = m.id;

Core Aggregate Functions — what they return

SELECT
  COUNT(*),        -- every row
  COUNT(salary),  -- non-NULL only
  SUM(salary),
  AVG(salary),
  MIN(salary),
  MAX(salary)
FROM employees;

Table: employees (4 rows)

INPUT

name	salary
Alice	90,000
Bob	70,000
Carol	80,000
Dave	NULL

↓ aggregate results

OUTPUT — single row of totals

function	result
COUNT(*)	4
COUNT(salary)	3 (NULL skipped)
SUM(salary)	240,000
AVG(salary)	80,000
MIN / MAX	70,000 / 90,000

NOTECOUNT(*) vs COUNT(col). COUNT(*) counts all rows including NULLs. COUNT(salary) skips rows where salary is NULL. A very common interview trick question.

GROUP BY

SELECT department,
       COUNT(*) AS headcount,
       AVG(salary) AS avg_sal
FROM employees
GROUP BY department
ORDER BY avg_sal DESC;

HAVING (filter groups)

SELECT dept, COUNT(*) AS cnt
FROM employees
GROUP BY dept
HAVING COUNT(*) > 5;

-- HAVING filters after grouping
-- WHERE filters before grouping

ROLLUP & CUBE

-- Subtotals + grand total
SELECT dept, job, SUM(salary)
FROM employees
GROUP BY ROLLUP(dept, job);

-- All combinations
GROUP BY CUBE(dept, job);

String Functions

UPPER(name)           -- ALICE
LOWER(name)           -- alice
LENGTH(name)          -- 5
TRIM(name)            -- strip spaces
SUBSTRING(name,1,3)  -- Ali
CONCAT(first,' ',last)
REPLACE(str,'a','@')

Date Functions

NOW()               -- current datetime
CURDATE()           -- current date
YEAR(date_col)
MONTH(date_col)
DAY(date_col)
DATEDIFF(d1, d2)    -- days between
DATE_ADD(d, INTERVAL 7 DAY)

CASE WHEN

SELECT name,
  CASE
    WHEN salary >= 100000 THEN 'Senior'
    WHEN salary >= 60000  THEN 'Mid'
    ELSE 'Junior'
  END AS level
FROM employees;

CAST & CONVERT

SELECT CAST('2024-01-15' AS DATE);
SELECT CAST(42.7 AS INT);       -- 42
SELECT CAST(price AS CHAR);

-- MySQL: CONVERT(val, type)
SELECT CONVERT(salary, CHAR);

Subquery in WHERE

-- Employees earning above avg
SELECT name, salary
FROM employees
WHERE salary > (
  SELECT AVG(salary)
  FROM employees
);

IN / EXISTS

SELECT * FROM customers
WHERE id IN (
  SELECT customer_id FROM orders
);

WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.cust_id = c.id
);

CTE (WITH clause)

WITH dept_avg AS (
  SELECT dept,
         AVG(salary) AS avg_sal
  FROM employees
  GROUP BY dept
)
SELECT e.name, d.avg_sal
FROM employees e
JOIN dept_avg d ON e.dept = d.dept;

Correlated Subquery

-- Top earner per dept
SELECT name, dept, salary
FROM employees e1
WHERE salary = (
  SELECT MAX(salary)
  FROM employees e2
  WHERE e2.dept = e1.dept
);

Window Functions — add a column, keep all rows

SELECT name, dept, salary,
  ROW_NUMBER() OVER (
    PARTITION BY dept
    ORDER BY salary DESC
  ) AS rn,
  RANK() OVER (
    ORDER BY salary DESC
  ) AS rnk
FROM employees;

-- OVER() = window definition
-- PARTITION BY = reset per group
-- ORDER BY = ranking direction

Window functions don't collapse rows

INPUT — 4 rows, 2 departments

name	dept	salary
Alice	Eng	90k
Bob	Eng	80k
Carol	HR	70k
Dave	HR	70k

↓ window adds rn + rnk columns

OUTPUT — same 4 rows + 2 new columns

name	dept	rn	rnk
Alice	Eng	1	1
Bob	Eng	2	2
Carol	HR	1	3
Dave	HR	2	3

TIPrn resets to 1 per department (PARTITION BY). Carol and Dave both get RANK 3 because their salaries are tied — that's RANK vs DENSE_RANK in action. Use this pattern inside a CTE + WHERE rn = 1 to get the top earner per department.

LAG & LEAD

-- Compare row with prev/next
SELECT month, revenue,
  LAG(revenue) OVER (
    ORDER BY month
  ) AS prev_month,
  revenue - LAG(revenue) OVER (
    ORDER BY month
  ) AS growth
FROM sales;

Running Total

SELECT order_date, amount,
  SUM(amount) OVER (
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING
      AND CURRENT ROW
  ) AS running_total
FROM orders;

DENSE_RANK Top-N

-- Top 3 per department
WITH ranked AS (
  SELECT *, DENSE_RANK() OVER (
    PARTITION BY dept
    ORDER BY salary DESC
  ) AS dr
  FROM employees
)
SELECT * FROM ranked
WHERE dr <= 3;

Top questions asked in Data Analyst, Data Engineer, and BI Developer interviews. Click any question to reveal the answer.

Very HotWhat is the difference between WHERE and HAVING?▼

WHERE filters individual rows before grouping (runs at step ③). HAVING filters groups after GROUP BY (runs at step ⑤).

Rule: If you need COUNT(), SUM(), or any aggregate in your filter — use HAVING. Otherwise use WHERE.

Example: WHERE salary > 50000 ✅ | HAVING COUNT(*) > 3 ✅ | WHERE COUNT(*) > 3 ❌ (error)

Very HotDELETE vs TRUNCATE vs DROP — what's the difference?▼

DELETE — removes specific rows (with WHERE), logs each deletion, can be rolled back. Triggers fire.
TRUNCATE — removes ALL rows instantly, minimal logging, cannot use WHERE, resets auto-increment. Much faster.
DROP — removes the entire table structure + data. Permanent. Can't rollback in most databases.

Memory trick: DELETE a person, TRUNCATE a notebook, DROP a bomb.

Very HotWhat is a NULL? Is NULL = NULL true?▼

NULL means "unknown" or "missing" — not zero, not empty string. It's the absence of a value.

NULL = NULL → returns NULL (not TRUE). This is a very common interview trap.
To check for NULL you must use: IS NULL or IS NOT NULL.

Also: NULL + 5 = NULL. Any arithmetic with NULL gives NULL. Use COALESCE(col, 0) to substitute a default.

Very HotROW_NUMBER vs RANK vs DENSE_RANK?▼

All three number rows, but differ when there are ties:

Imagine scores: 90, 90, 85, 80:
ROW_NUMBER: 1, 2, 3, 4 (always unique, no gaps)
RANK: 1, 1, 3, 4 (ties get same rank, but skips next number)
DENSE_RANK: 1, 1, 2, 3 (ties get same rank, no gaps)

Interview tip: Use DENSE_RANK for "top N" problems to avoid missing ranks.

CommonWhat is a Primary Key vs Foreign Key?▼

Primary Key (PK) — uniquely identifies each row in a table. Must be NOT NULL + UNIQUE. One per table.

Foreign Key (FK) — a column that references the Primary Key of another table. Enforces referential integrity (you can't add an order for a customer that doesn't exist).

Example: orders.customer_id is a Foreign Key referencing customers.id (Primary Key).

CommonWhat is a subquery vs a JOIN? When to use which?▼

JOIN combines tables horizontally (adds columns). Best when you need columns from multiple tables.
Subquery is a query inside a query. Best for single-value lookups or filtering based on aggregated data.

Use JOIN when: you want multiple columns from related tables.
Use Subquery when: you need "employees earning above the company average" (can't JOIN on a calculated value easily).

CTEs (WITH clause) are often cleaner than nested subqueries for readability.

CommonUNION vs UNION ALL?▼

UNION — combines results of two queries and removes duplicate rows. Slower (needs to sort/compare).
UNION ALL — combines results and keeps ALL rows including duplicates. Faster.

Rule: Use UNION ALL by default (faster). Use UNION only when you specifically need to remove duplicates.

Both require: same number of columns and compatible data types in the same order.

CommonWhat is an Index? When should you create one?▼

An index is a data structure that speeds up SELECT queries — like the index at the back of a textbook.

Create an index when: the column appears frequently in WHERE, JOIN ON, or ORDER BY.
Avoid over-indexing: every index slows down INSERT/UPDATE/DELETE because the index also needs updating.

Clustered index — determines physical order of rows on disk. One per table (usually PK).
Non-clustered index — separate lookup structure. Can have many per table.

Good to KnowWhat is Normalization? Explain 1NF, 2NF, 3NF.▼

Normalization organizes a database to reduce redundancy and improve data integrity.

1NF (First Normal Form) — No repeating groups; every cell holds one value; each row is unique.
2NF (Second Normal Form) — 1NF + every non-key column fully depends on the entire primary key (no partial dependency).
3NF (Third Normal Form) — 2NF + no transitive dependencies (non-key column depends only on the PK, not on another non-key column).

Tip: "3NF = no column should depend on a non-key column."

Good to KnowWhat are Window Functions and when would you use them?▼

Window functions perform calculations across a set of related rows without collapsing them into one row (unlike GROUP BY).

Syntax: function() OVER (PARTITION BY col ORDER BY col)

Use cases:
• Ranking employees by salary within each department (RANK())
• Running totals, moving averages (SUM() OVER)
• Comparing each row to the previous row (LAG())
• Finding top-N per group (DENSE_RANK + WHERE dr <= N)

WHERE vs HAVING — most confused pair in SQL

Feature	WHERE	HAVING
Filters	Individual rows	Groups (after GROUP BY)
When it runs	Before GROUP BY (step ③)	After GROUP BY (step ⑤)
Can use aggregate functions?	❌ No	✅ Yes
Example	`WHERE salary > 50000`	`HAVING COUNT(*) > 5`
Works without GROUP BY?	✅ Yes	Rarely useful without it

DELETE vs TRUNCATE vs DROP

Feature	DELETE	TRUNCATE	DROP
Removes	Specific rows	All rows	Entire table
WHERE clause	✅ Yes	❌ No	❌ No
Can rollback?	✅ Yes	Depends on DB	❌ No
Resets auto-increment?	❌ No	✅ Yes	Table is gone
Speed	Slower	Fast	Fast
Triggers fire?	✅ Yes	❌ No	❌ No

ROW_NUMBER vs RANK vs DENSE_RANK — with example (scores: 90, 90, 85)

Score	ROW_NUMBER()	RANK()	DENSE_RANK()
90	1	1	1
90	2	1	1
85	3	3 (gap!)	2 (no gap)
Ties get same rank?	❌ Always unique	✅ Yes	✅ Yes
Skips numbers after tie?	—	✅ Yes	❌ Never

INNER JOIN vs LEFT JOIN vs FULL OUTER JOIN

Type	Returns	NULLs in result?	Use when...
INNER JOIN	Matching rows only	❌ No	You only want records that exist in both tables
LEFT JOIN	All left + matched right	Right side may be NULL	You want all customers even if they have no orders
RIGHT JOIN	Matched left + all right	Left side may be NULL	Rarely used (just swap tables and use LEFT JOIN)
FULL OUTER	All rows from both	Both sides may be NULL	Find mismatches — rows missing in either table

UNION vs UNION ALL

Feature	UNION	UNION ALL
Removes duplicates?	✅ Yes	❌ No, keeps all rows
Performance	Slower (sorts to find dupes)	Faster
Use when	You need unique combined results	You know there are no dupes OR you want all rows
Default	This is what "UNION" means alone	Must specify ALL explicitly

SQL Interview Prep

Top 20 CTE Patterns

Common Table Expressions · Modern SQL · From Basics to Advanced

01 Basic Readable CTE

11 YoY Growth (Lag)

02 Multiple CTEs

12 Monthly Active Users

03 Recursion (1 to 10)

13 Find Missing IDs

04 Manager Hierarchy

14 String Splitting

05 Nth Highest Salary

15 Basket Analysis

06 Delete Duplicates

16 Pivot Prep

07 Running Total

17 Consecutive Wins

08 Moving Average

18 Update via CTE

09 Gaps & Islands

19 Tree Path Generation

10 Date Generation

20 Insert from CTE

01 Basic Readable CTE Readability

Don't nest. Define first — Replacing Nested Subqueries

-- Don't nest. Define first.
WITH HighSales AS (
  SELECT * FROM orders
  WHERE amount > 1000
)
SELECT * FROM HighSales;

TIPName your logic before you use it. A CTE makes complex queries read top-to-bottom — like prose, not inside-out.

02 Multiple CTEs Readability

Chain named subqueries with commas

WITH dept_avg AS (
  SELECT dept, AVG(salary) AS avg_sal
  FROM employees GROUP BY dept
),
above_avg AS (
  SELECT e.name, e.dept, e.salary
  FROM employees e
  JOIN dept_avg d ON e.dept = d.dept
  WHERE e.salary > d.avg_sal
)
SELECT * FROM above_avg;

TIPSeparate CTEs with a comma, not WITH. Each CTE can reference all CTEs defined before it.

03 Recursion (1 to 10) Recursion

Generate a number series — Anchor + Recursive step

WITH RECURSIVE nums AS (
  SELECT 1 AS n          -- anchor: start
  UNION ALL
  SELECT n + 1           -- recursive: add 1
  FROM nums WHERE n < 10 -- stop condition
)
SELECT n FROM nums;

TIPAnchor + UNION ALL + recursive member. The WHERE clause is your stop condition — always include one or you get an infinite loop.

04 Manager Hierarchy Recursion

Full org chart traversal from CEO to every employee

WITH RECURSIVE org AS (
  SELECT id, name, manager_id, 1 AS lvl
  FROM employees
  WHERE manager_id IS NULL   -- anchor: CEO
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.lvl + 1
  FROM employees e
  JOIN org o ON e.manager_id = o.id
)
SELECT lvl, name
FROM org ORDER BY lvl, name;

TIPEach pass fetches direct children of the previous output. Stops when no new child rows are found. Add a path column to build breadcrumb strings.

05 Nth Highest Salary Analytics

DENSE_RANK inside a CTE — clean and reusable

WITH ranked AS (
  SELECT salary,
    DENSE_RANK() OVER (
      ORDER BY salary DESC
    ) AS rnk
  FROM employees
)
SELECT salary
FROM ranked
WHERE rnk = 2; -- change N here

TIPUse DENSE_RANK, not RANK — RANK skips numbers after ties so "2nd highest" could disappear. Change rnk = N for any Nth value.

06 Delete Duplicates DML

Keep only the latest row per group — ROW_NUMBER in DML

WITH dupes AS (
  SELECT id,
    ROW_NUMBER() OVER (
      PARTITION BY email
      ORDER BY created_at DESC
    ) AS rn
  FROM users
)
DELETE FROM users
WHERE id IN (
  SELECT id FROM dupes WHERE rn > 1
);

TIPrn = 1 is the newest per email. Rows with rn > 1 are older duplicates — safe to delete. Swap ORDER BY to ASC to keep the oldest instead.

07 Running Total Analytics

Cumulative SUM over an ordered date sequence

WITH daily AS (
  SELECT order_date, SUM(amount) AS rev
  FROM orders GROUP BY order_date
)
SELECT order_date, rev,
  SUM(rev) OVER (
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING
      AND CURRENT ROW
  ) AS running_total
FROM daily;

TIPROWS BETWEEN is explicit and predictable. Without it, RANGE is the default — which can behave unexpectedly on tied dates.

08 Moving Average Analytics

7-day rolling window — smooth out daily noise

WITH daily AS (
  SELECT sale_date, SUM(revenue) AS rev
  FROM sales GROUP BY sale_date
)
SELECT sale_date, rev,
  ROUND(AVG(rev) OVER (
    ORDER BY sale_date
    ROWS BETWEEN 6 PRECEDING
      AND CURRENT ROW
  ), 2) AS avg_7d
FROM daily;

TIP6 PRECEDING + CURRENT ROW = 7-day window. Adjust the number to change window size. The first 6 rows will average fewer days — that's expected.

09 Gaps & Islands Advanced

Group consecutive dates into contiguous periods

WITH numbered AS (
  SELECT user_id, login_date,
    ROW_NUMBER() OVER (
      PARTITION BY user_id
      ORDER BY login_date
    ) AS rn
  FROM logins
),
islands AS (
  SELECT user_id,
    DATE_SUB(login_date, INTERVAL rn DAY) AS grp
  FROM numbered
)
SELECT user_id,
  MIN(login_date) AS start,
  MAX(login_date) AS end,
  COUNT(*) AS days
FROM islands GROUP BY user_id, grp;

TIPConsecutive dates − their row number = the same constant. That constant is the island key — rows sharing it belong to the same streak.

10 Date Generation Recursion

Generate a full date series to fill data gaps

WITH RECURSIVE dates AS (
  SELECT '2024-01-01' AS d
  UNION ALL
  SELECT DATE_ADD(d, INTERVAL 1 DAY)
  FROM dates
  WHERE d < '2024-01-31'
)
SELECT d FROM dates;

TIPLEFT JOIN this series to your data to surface days with zero sales — they won't appear otherwise, which breaks dashboards and moving averages.

11 YoY Growth (Lag) Analytics

Year-over-Year percentage growth via LAG()

WITH yearly AS (
  SELECT YEAR(sale_date) AS yr,
    SUM(revenue) AS rev
  FROM sales GROUP BY yr
)
SELECT yr, rev,
  LAG(rev) OVER (ORDER BY yr) AS prev_rev,
  ROUND(
    100.0 * (rev - LAG(rev) OVER (ORDER BY yr))
    / NULLIF(LAG(rev) OVER (ORDER BY yr), 0),
  2) AS yoy_pct
FROM yearly;

TIPNULLIF(prev_rev, 0) prevents division-by-zero. The first year always returns NULL for yoy_pct — correct, there's no prior year to compare.

12 Monthly Active Users Analytics

COUNT DISTINCT active users per calendar month

WITH monthly AS (
  SELECT
    DATE_FORMAT(event_date, '%Y-%m') AS month,
    user_id
  FROM events
)
SELECT month,
  COUNT(DISTINCT user_id) AS mau
FROM monthly
GROUP BY month
ORDER BY month;

TIPDATE_FORMAT normalizes all dates to YYYY-MM. COUNT(DISTINCT user_id) counts each user only once per month regardless of how many events they fired.

13 Find Missing IDs Advanced

Detect gaps in a numeric sequence

WITH RECURSIVE seq AS (
  SELECT MIN(id) AS n, MAX(id) AS mx
  FROM orders
  UNION ALL
  SELECT n + 1, mx
  FROM seq WHERE n < mx
)
SELECT n AS missing_id
FROM seq
WHERE n NOT IN (
  SELECT id FROM orders
);

TIPGenerate every ID from MIN to MAX, then subtract what exists. What remains = the gaps. On large tables, prefer a LEFT JOIN approach for better performance.

14 String Splitting Advanced

Flatten comma-separated tag strings into rows

WITH RECURSIVE split AS (
  SELECT id,
    TRIM(SUBSTRING_INDEX(tags, ',', 1)) AS tag,
    IF(LOCATE(',', tags) > 0,
      SUBSTRING(tags, LOCATE(',', tags) + 1),
      NULL) AS rest
  FROM products
  UNION ALL
  SELECT id,
    TRIM(SUBSTRING_INDEX(rest, ',', 1)),
    IF(LOCATE(',', rest) > 0,
      SUBSTRING(rest, LOCATE(',', rest) + 1), NULL)
  FROM split WHERE rest IS NOT NULL
)
SELECT id, tag FROM split ORDER BY id;

TIPEach recursive step peels off the leftmost item. Stops when rest IS NULL (no more commas). PostgreSQL users: use STRING_TO_TABLE() instead.

15 Basket Analysis Advanced

Find products frequently bought together

WITH baskets AS (
  SELECT
    a.product_id AS p1,
    b.product_id AS p2,
    COUNT(*) AS freq
  FROM order_items a
  JOIN order_items b
    ON  a.order_id = b.order_id
    AND a.product_id < b.product_id
  GROUP BY a.product_id, b.product_id
)
SELECT p1, p2, freq
FROM baskets
ORDER BY freq DESC LIMIT 10;

TIPSelf-join on the same order pairs every product with every other. p1 < p2 prevents counting (A,B) and (B,A) separately.

16 Pivot Prep Analytics

Conditional aggregation — SQL's manual pivot

WITH raw AS (
  SELECT product, region, revenue FROM sales
)
SELECT product,
  SUM(CASE WHEN region='North' THEN revenue END) AS north,
  SUM(CASE WHEN region='South' THEN revenue END) AS south,
  SUM(CASE WHEN region='East'  THEN revenue END) AS east,
  SUM(CASE WHEN region='West'  THEN revenue END) AS west
FROM raw GROUP BY product;

TIPWrap each SUM in COALESCE(…, 0) to show zeros instead of NULL for missing combinations. PostgreSQL also has a native CROSSTAB() function.

17 Consecutive Wins Advanced

Detect streaks using the double ROW_NUMBER trick

WITH base AS (
  SELECT match_date, result,
    ROW_NUMBER() OVER (ORDER BY match_date) AS rn
  FROM matches
),
streaks AS (
  SELECT match_date, result,
    rn - ROW_NUMBER() OVER (
      PARTITION BY result ORDER BY match_date
    ) AS grp
  FROM base
)
SELECT MIN(match_date) AS start,
  MAX(match_date) AS end,
  COUNT(*) AS streak_len
FROM streaks WHERE result='Win'
GROUP BY grp HAVING COUNT(*) >= 3;

TIPGlobal rn minus per-result rn is constant within a streak. Rows sharing that constant belong to the same winning run. Change HAVING for minimum streak length.

18 Update via CTE DML

CTEs work inside UPDATE, DELETE & INSERT

WITH top_performers AS (
  SELECT id FROM employees
  WHERE dept = 'Engineering'
    AND performance >= 90
)
UPDATE employees
SET salary = salary * 1.10
WHERE id IN (
  SELECT id FROM top_performers
);

TIPCTEs work inside UPDATE, DELETE, and INSERT…SELECT. Use them to keep complex filter logic readable rather than burying it in a subquery inside WHERE.

19 Tree Path Generation Recursion

Build full breadcrumb path from root to each node

WITH RECURSIVE paths AS (
  SELECT id, name,
    CAST(name AS CHAR(1000)) AS path
  FROM categories
  WHERE parent_id IS NULL
  UNION ALL
  SELECT c.id, c.name,
    CONCAT(p.path, ' > ', c.name)
  FROM categories c
  JOIN paths p ON c.parent_id = p.id
)
SELECT id, name, path
FROM paths ORDER BY path;

TIPCAST to CHAR(1000) gives enough room for deep paths. CONCAT builds Electronics > Phones > Smartphones automatically as the tree grows.

20 Insert from CTE DML

Populate a table from a computed CTE result set

WITH vip_candidates AS (
  SELECT customer_id,
    SUM(amount) AS total_spent
  FROM orders
  WHERE order_date >= '2024-01-01'
  GROUP BY customer_id
  HAVING total_spent > 5000
)
INSERT INTO vip_customers
  (customer_id, total_spent, tagged_at)
SELECT customer_id, total_spent, NOW()
FROM vip_candidates;

TIPINSERT … SELECT is the most common CTE + DML pattern. The CTE computes which rows to insert; INSERT writes them. Clean, auditable, and readable.

In Depth

SQL Deep Dives

Visual comparisons · Annotated patterns · Production-ready snippets

Comparison

SUM()vsSUM OVER()

Aggregate collapses rows — window keeps them all. This is the most important window function concept in analytics SQL, and the most commonly misunderstood.

SUM()

SELECT job, SUM(sales)
FROM sales
GROUP BY job;

✓Calculates total for each group
✓Requires GROUP BY
✓Collapses rows — one row per group

VS.

SUM OVER()

SELECT job, sales,
  SUM(sales) OVER (
    PARTITION BY job
  ) AS total_per_job
FROM sales;

✓Calculates total within each partition
✓No GROUP BY needed
✓Keeps ALL rows — repeats total per row

Grouped SUMs — 3 rows returned

job	SUM(sales)
Engineer	50
Sales	70
Manager	90

Window SUMs — all 5 rows kept

job	sales	total_per_job
Engineer	20	50
Engineer	30	50
Sales	30	70
Sales	40	70
Manager	90	90

TIP Rule: Need individual rows AND a group total in the same result? Use SUM() OVER(). Only need one row per group? Use SUM() + GROUP BY. You cannot mix window functions with GROUP BY in the same SELECT level.

Ranking

ROW_NUMBER()vsRANK()vsDENSE_RANK()

All three rank rows — but they differ dramatically when ties exist. Getting this wrong produces silent, hard-to-debug query bugs in production.

ROW_NUMBER() Always Unique

Always assigns a unique sequential number, even to identical values. Never repeats, never skips.

No ties — always 1, 2, 3, 4…

ROW_NUMBER() OVER (
  ORDER BY score DESC
)

✅ Use for: deduplication, pagination, picking exactly one row per partition

RANK() Skips on Ties

Tied rows share the same rank, but the next rank jumps forward by the number of tied rows.

Ties share rank, next rank jumps

RANK() OVER (
  ORDER BY score DESC
)

⚠️ Avoid for "top N" — rank 2 can disappear if two rows tie at rank 1

DENSE_RANK() No Gaps

Tied rows share the same rank and the next rank is always the very next integer — no gaps ever.

Ties share rank, no skipping

DENSE_RANK() OVER (
  ORDER BY score DESC
)

✅ Use for: "Nth highest value" — guaranteed to return a result even with ties

Applied to scores: 90, 90, 85, 80

score	ROW_NUMBER()	RANK()	DENSE_RANK()
90	1	1	1
90	2	1	1
85	3	3 ← gap!	2
80	4	4	3

TIP Interview answer: For "find the Nth highest salary" always use DENSE_RANK. With RANK, if two people tie at rank 1, there is no rank 2 — your WHERE rnk = 2 returns zero rows and you get a silent wrong answer.

Snippet Library

Production-ready SQL patterns — copy, adapt, and ship

01 Top N per Group Analytics

WITH ranked AS (
  SELECT *,
    DENSE_RANK() OVER (
      PARTITION BY dept
      ORDER BY salary DESC
    ) AS rnk
  FROM employees
)
SELECT * FROM ranked
WHERE rnk <= 3; -- top 3 per dept

02 Month-over-Month Growth Analytics

WITH monthly AS (
  SELECT
    DATE_FORMAT(order_date, '%Y-%m') AS month,
    SUM(revenue) AS rev
  FROM orders GROUP BY month
)
SELECT month, rev,
  LAG(rev) OVER (ORDER BY month) AS prev_rev,
  ROUND(100.0 *
    (rev - LAG(rev) OVER (ORDER BY month))
    / NULLIF(LAG(rev) OVER (ORDER BY month),0),
  1) AS mom_pct
FROM monthly;

03 Latest Record per Group Practical

-- Most recent order per customer
WITH latest AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY customer_id
      ORDER BY order_date DESC
    ) AS rn
  FROM orders
)
SELECT * FROM latest WHERE rn = 1;

04 Running Balance Analytics

SELECT txn_date, description, amount,
  SUM(
    CASE WHEN type = 'credit' THEN  amount
         WHEN type = 'debit'  THEN -amount
    END
  ) OVER (
    ORDER BY txn_date
    ROWS BETWEEN UNBOUNDED PRECEDING
      AND CURRENT ROW
  ) AS balance
FROM transactions;

05 Percentage of Total Analytics

SELECT dept, name, salary,
  ROUND(
    100.0 * salary /
    SUM(salary) OVER (PARTITION BY dept),
  1) AS pct_of_dept,
  ROUND(
    100.0 * salary /
    SUM(salary) OVER (),
  1) AS pct_of_total
FROM employees;

06 Session Detection Advanced

-- Gap > 30 min = new session
WITH gaps AS (
  SELECT user_id, event_time,
    CASE WHEN TIMESTAMPDIFF(MINUTE,
      LAG(event_time) OVER (
        PARTITION BY user_id
        ORDER BY event_time),
      event_time) > 30
    THEN 1 ELSE 0 END AS new_sess
  FROM events
)
SELECT user_id, event_time,
  SUM(new_sess) OVER (
    PARTITION BY user_id
    ORDER BY event_time
  ) + 1 AS session_id
FROM gaps;

Learn SQL by Playing

The language behindevery data-driven decision.

SQL Unlocked

6 topics.150+ questions.Zero confusion.

Real SQL Interview Questions

SQL Cheat Sheet

CREATE TABLE

INSERT / UPDATE / DELETE

ORDER BY & LIMIT

DISTINCT & NULL handling

SELECT & WHERE — filtering rows

GROUP BY & HAVING — aggregating groups

INNER JOIN — rows matched in both tables

LEFT JOIN — keep all left rows, NULL where no match

Multi-table JOIN

SELF JOIN

Core Aggregate Functions — what they return

GROUP BY

HAVING (filter groups)

ROLLUP & CUBE

String Functions

Date Functions

CASE WHEN

CAST & CONVERT

Subquery in WHERE

IN / EXISTS

CTE (WITH clause)

Correlated Subquery

Window Functions — add a column, keep all rows

LAG & LEAD

Running Total

DENSE_RANK Top-N

Top 20 CTE Patterns

01 Basic Readable CTE Readability

02 Multiple CTEs Readability

03 Recursion (1 to 10) Recursion

04 Manager Hierarchy Recursion

05 Nth Highest Salary Analytics

06 Delete Duplicates DML

07 Running Total Analytics

08 Moving Average Analytics

09 Gaps & Islands Advanced

10 Date Generation Recursion

11 YoY Growth (Lag) Analytics

12 Monthly Active Users Analytics

13 Find Missing IDs Advanced

14 String Splitting Advanced

15 Basket Analysis Advanced

16 Pivot Prep Analytics

17 Consecutive Wins Advanced

18 Update via CTE DML

19 Tree Path Generation Recursion

20 Insert from CTE DML

SQL Deep Dives

Snippet Library

Guess the Output

SQL Speed Round

Query to Mastery

Crack SQL Interviews.Land the Job.

The language behind
every data-driven decision.

6 topics.
150+ questions.
Zero confusion.

Crack SQL Interviews.
Land the Job.