SELECT name, salary FROM employees WHERE dept = 'Engineering' AND salary > 80000 ORDER BY salary DESC LIMIT 5;
Master SQL with an interactive cheat sheet and two skill-building games — designed to make database concepts stick.
Swiggy tracks your order. Flipkart predicts your next buy. Google ranks every search result.
All of it — runs on SQL.
SELECT city, COUNT(*) AS total_orders, ROUND(AVG(delivery_mins), 1) AS avg_delivery FROM deliveries WHERE status = 'delivered' GROUP BY city ORDER BY total_orders DESC LIMIT 5;
Your structured path from zero to confident SQL — topic by topic, story by story.
SELECT → WHERE → ORDER BY → Aggregates → GROUP BY → NULLs. Each topic has theory, diagrams, MCQs, Fix-It drills, and write-from-scratch challenges. One structured staircase — all the way up.
Curated from Amazon, Google, Microsoft, Meta, Flipkart, Swiggy and LeetCode — organized by company, level, and topic.
Show all students who scored grade 'A'. Use the WHERE clause to filter rows — only rows where grade = 'A' should appear in the result.
| id | name | grade | city |
|---|---|---|---|
| 1 | Arjun | A | Delhi |
| 2 | Priya | B | Mumbai |
| 3 | Raj | A | Pune |
| 4 | Simran | C | Delhi |
| 5 | Karan | A | Chennai |
| id | name | grade | city |
|---|---|---|---|
| 1 | Arjun | A | Delhi |
| 3 | Raj | A | Pune |
| 5 | Karan | A | Chennai |
SELECT * FROM students WHERE grade = 'A';
Show all products sorted by price from lowest to highest. Then try changing it to highest-to-lowest by adding DESC. ORDER BY always comes at the end of a query.
| id | name | category | price |
|---|---|---|---|
| 1 | Pen | Stationery | 10 |
| 2 | Notebook | Stationery | 45 |
| 3 | Bag | Accessories | 350 |
| 4 | Eraser | Stationery | 5 |
| 5 | Water Bottle | Accessories | 120 |
| id | name | price |
|---|---|---|
| 4 | Eraser | 5 |
| 1 | Pen | 10 |
| 2 | Notebook | 45 |
| 5 | Water Bottle | 120 |
| 3 | Bag | 350 |
SELECT id, name, price FROM products ORDER BY price ASC; -- ASC is default; swap DESC for highest first
Find all customers whose name starts with the letter 'A'. Use LIKE 'A%' — the % wildcard matches any characters after 'A'. Try LIKE '%a' in the playground to find names ending with 'a'.
| id | name | city |
|---|---|---|
| 1 | Ankit | Delhi |
| 2 | Priya | Mumbai |
| 3 | Aisha | Hyderabad |
| 4 | Rahul | Pune |
| 5 | Amita | Bengaluru |
| id | name | city |
|---|---|---|
| 1 | Ankit | Delhi |
| 3 | Aisha | Hyderabad |
| 5 | Amita | Bengaluru |
SELECT * FROM customers WHERE name LIKE 'A%'; -- Try also: -- WHERE name LIKE '%a' -- ends with 'a' -- WHERE name LIKE '_i%' -- second letter is 'i'
Count how many students are in each city. Use GROUP BY city to group rows by city, then COUNT(*) to count how many students fall in each group.
| id | name | grade | city |
|---|---|---|---|
| 1 | Arjun | A | Delhi |
| 2 | Priya | B | Mumbai |
| 3 | Raj | A | Delhi |
| 4 | Simran | C | Mumbai |
| 5 | Karan | A | Chennai |
| city | student_count |
|---|---|
| Chennai | 1 |
| Delhi | 2 |
| Mumbai | 2 |
SELECT city, COUNT(*) AS student_count FROM students GROUP BY city ORDER BY city;
Show each order with the customer's name and amount. The customers table has names; orders has amounts. Use INNER JOIN to match them on customer_id.
| customers.id | customers.name | orders.order_id | orders.customer_id | orders.amount |
|---|---|---|---|---|
| 1 | Arjun | 101 | 1 | 500 |
| 2 | Priya | 102 | 2 | 300 |
| 1 | Arjun | 103 | 1 | 750 |
| 3 | Karan | 104 | 3 | 200 |
| order_id | name | amount |
|---|---|---|
| 101 | Arjun | 500 |
| 102 | Priya | 300 |
| 103 | Arjun | 750 |
| 104 | Karan | 200 |
SELECT o.order_id, c.name, o.amount FROM orders o INNER JOIN customers c ON o.customer_id = c.id ORDER BY o.order_id;
The HR team needs a report with only employee name and department — they don't want salary or phone visible. Select just those two columns from the employees table.
| id | name | department | salary | phone |
|---|---|---|---|---|
| 1 | Arjun | Engineering | 75000 | 9812345678 |
| 2 | Priya | Marketing | 62000 | 9987654321 |
| 3 | Raj | Engineering | 81000 | 9776543210 |
| name | department |
|---|---|
| Arjun | Engineering |
| Priya | Marketing |
| Raj | Engineering |
SELECT name, department FROM employees;
Show all products where price is greater than ₹100. Then try modifying the query to show products priced less than ₹50 and products where category is NOT 'Food'.
| id | name | category | price |
|---|---|---|---|
| 1 | Rice | Food | 45 |
| 2 | Headphones | Electronics | 799 |
| 3 | Notebook | Stationery | 60 |
| 4 | Pen | Stationery | 10 |
| 5 | Mixer | Appliances | 1200 |
| id | name | price |
|---|---|---|
| 2 | Headphones | 799 |
| 5 | Mixer | 1200 |
SELECT * FROM products WHERE price > 100; -- Also try: -- WHERE price < 50 -- cheaper than ₹50 -- WHERE category != 'Food' -- everything except Food
Part A — Show employees in the 'Sales' department AND with salary above ₹40,000.
Part B — Show customers from 'Delhi' OR 'Pune'.
| id | name | department | salary |
|---|---|---|---|
| 1 | Arjun | Sales | 38000 |
| 2 | Priya | Sales | 52000 |
| 3 | Raj | Engineering | 70000 |
| 4 | Simran | Sales | 47000 |
| name | department | salary |
|---|---|---|
| Priya | Sales | 52000 |
| Simran | Sales | 47000 |
-- Part A: AND — both conditions must be true SELECT * FROM employees WHERE department = 'Sales' AND salary > 40000; -- Part B: OR — either condition is enough -- SELECT * FROM customers WHERE city = 'Delhi' OR city = 'Pune';
The orders table has hundreds of rows but you only need to know which unique cities orders have come from. Use SELECT DISTINCT to get each city name exactly once.
| id | customer | city | amount |
|---|---|---|---|
| 1 | Arjun | Delhi | 500 |
| 2 | Priya | Mumbai | 300 |
| 3 | Raj | Delhi | 750 |
| 4 | Simran | Pune | 200 |
| 5 | Karan | Mumbai | 900 |
| city |
|---|
| Delhi |
| Mumbai |
| Pune |
SELECT DISTINCT city FROM orders ORDER BY city;
Find all employees who have not provided a phone number (phone is NULL). Then find those who have provided one (IS NOT NULL).
| id | name | phone | |
|---|---|---|---|
| 1 | Arjun | 9812345678 | a@co.in |
| 2 | Priya | NULL | p@co.in |
| 3 | Raj | NULL | r@co.in |
| 4 | Simran | 9776543210 | s@co.in |
| id | name | phone |
|---|---|---|
| 2 | Priya | NULL |
| 3 | Raj | NULL |
SELECT * FROM employees WHERE phone IS NULL; -- Find employees who DO have a phone: -- WHERE phone IS NOT NULL
Show all students sorted by score descending (highest first). When two students have the same score, sort their names alphabetically (A–Z).
| id | name | grade | score |
|---|---|---|---|
| 1 | Raj | 10 | 88 |
| 2 | Aisha | 10 | 95 |
| 3 | Priya | 10 | 88 |
| 4 | Karan | 10 | 72 |
| name | score |
|---|---|
| Aisha | 95 |
| Priya | 88 |
| Raj | 88 |
| Karan | 72 |
SELECT name, score FROM students ORDER BY score DESC, name ASC;
Show all products that belong to the categories 'Electronics', 'Books', or 'Clothing'. Use IN instead of three separate OR conditions.
| id | name | category | price |
|---|---|---|---|
| 1 | Headphones | Electronics | 799 |
| 2 | Rice | Food | 45 |
| 3 | Novel | Books | 299 |
| 4 | T-Shirt | Clothing | 399 |
| 5 | Mixer | Appliances | 1200 |
| name | category | price |
|---|---|---|
| Headphones | Electronics | 799 |
| Novel | Books | 299 |
| T-Shirt | Clothing | 399 |
SELECT name, category, price FROM products WHERE category IN ('Electronics', 'Books', 'Clothing') ORDER BY category;
Show all orders where the amount is between ₹500 and ₹1000 (both ends included). Works for numbers, dates, and text ranges.
| id | customer | amount | order_date |
|---|---|---|---|
| 1 | Arjun | 450 | 2024-01-05 |
| 2 | Priya | 750 | 2024-01-08 |
| 3 | Raj | 1200 | 2024-01-12 |
| 4 | Simran | 500 | 2024-01-15 |
| 5 | Karan | 999 | 2024-01-20 |
| customer | amount |
|---|---|
| Priya | 750 |
| Simran | 500 |
| Karan | 999 |
SELECT customer, amount FROM orders WHERE amount BETWEEN 500 AND 1000 ORDER BY amount;
Find the highest and lowest salary in the company with a single query. Also find the most recent joining date using MAX on a date column.
| id | name | salary | join_date |
|---|---|---|---|
| 1 | Arjun | 75000 | 2021-03-15 |
| 2 | Priya | 62000 | 2022-07-01 |
| 3 | Raj | 90000 | 2020-11-20 |
| 4 | Simran | 48000 | 2023-01-10 |
| highest_salary | lowest_salary | latest_join |
|---|---|---|
| 90000 | 48000 | 2023-01-10 |
SELECT MAX(salary) AS highest_salary, MIN(salary) AS lowest_salary, MAX(join_date) AS latest_join FROM employees;
Calculate the total revenue and average order value from the sales table in a single query. Round the average to 2 decimal places.
| id | product | amount | region |
|---|---|---|---|
| 1 | Laptop | 45000 | North |
| 2 | Phone | 18000 | South |
| 3 | Tablet | 22000 | North |
| 4 | Earbuds | 3500 | West |
| total_revenue | avg_order |
|---|---|
| 88500 | 22125.00 |
SELECT SUM(amount) AS total_revenue, ROUND(AVG(amount), 2) AS avg_order FROM sales;
Show only customers whose total order value exceeds ₹1,000. You need GROUP BY to group per customer, then HAVING to filter on the total.
| id | customer | amount |
|---|---|---|
| 1 | Arjun | 400 |
| 2 | Priya | 750 |
| 3 | Arjun | 800 |
| 4 | Karan | 200 |
| 5 | Priya | 500 |
| customer | total |
|---|---|
| Arjun | 1200 |
| Priya | 1250 |
SELECT customer, SUM(amount) AS total FROM orders GROUP BY customer HAVING SUM(amount) > 1000 ORDER BY total DESC;
Display each employee's full name (first + last concatenated) as full_name, and rename salary to monthly_pay. Use AS to create column aliases.
| id | first_name | last_name | salary |
|---|---|---|---|
| 1 | Arjun | Sharma | 75000 |
| 2 | Priya | Mehta | 62000 |
| 3 | Raj | Patel | 90000 |
| full_name | monthly_pay |
|---|---|
| Arjun Sharma | 75000 |
| Priya Mehta | 62000 |
| Raj Patel | 90000 |
SELECT first_name || ' ' || last_name AS full_name, salary AS monthly_pay FROM employees;
Find all customers who have never placed an order. INNER JOIN would drop them — LEFT JOIN keeps them with NULLs, then filter for those NULLs.
| id | name |
|---|---|
| 1 | Arjun |
| 2 | Priya |
| 3 | Raj |
| 4 | Simran |
| order_id | customer_id |
|---|---|
| 101 | 1 |
| 102 | 3 |
| name |
|---|
| Priya |
| Simran |
SELECT c.name FROM customers c LEFT JOIN orders o ON c.id = o.customer_id WHERE o.order_id IS NULL ORDER BY c.name;
Build a combined list of cities from both the customers and suppliers tables — no duplicates. Then try UNION ALL to see duplicates included.
| id | name | city |
|---|---|---|
| 1 | Arjun | Delhi |
| 2 | Priya | Mumbai |
| id | name | city |
|---|---|---|
| 1 | Tata Co. | Mumbai |
| 2 | Reliance | Pune |
| city |
|---|
| Delhi |
| Mumbai |
| Pune |
SELECT city FROM customers UNION SELECT city FROM suppliers ORDER BY city; -- Try UNION ALL to include the duplicate 'Mumbai': -- SELECT city FROM customers UNION ALL SELECT city FROM suppliers
Query all columns for every city in Maharashtra (state_code = 'MH') with a population greater than 500,000.
| id | name | state_code | district | population |
|---|---|---|---|---|
| 1 | Mumbai | MH | Mumbai City | 12478447 |
| 2 | Pune | MH | Pune | 3124458 |
| 3 | Jaipur | RJ | Jaipur | 3046163 |
| 4 | Nashik | MH | Nashik | 1486053 |
| 5 | Aurangabad | MH | Aurangabad | 373311 |
| id | name | state_code | district | population |
|---|---|---|---|---|
| 1 | Mumbai | MH | Mumbai City | 12478447 |
| 2 | Pune | MH | Pune | 3124458 |
| 4 | Nashik | MH | Nashik | 1486053 |
SELECT * FROM city WHERE state_code = 'MH' AND population > 500000;
List the names of all cities in Rajasthan (state_code = 'RJ'), sorted alphabetically.
| id | name | state_code | population |
|---|---|---|---|
| 3 | Jaipur | RJ | 3046163 |
| 6 | Udaipur | RJ | 451100 |
| 7 | Ajmer | RJ | 542321 |
| name |
|---|
| Ajmer |
| Jaipur |
| Udaipur |
SELECT name FROM city WHERE state_code = 'RJ' ORDER BY name ASC;
List distinct city names where the city's ID is an even number. Use the modulo operator % (or MOD(id, 2) = 0) to check for even numbers.
| id | name | state_code |
|---|---|---|
| 1 | Mumbai | MH |
| 2 | Pune | MH |
| 2 | Pune | MH |
| 4 | Nashik | MH |
| 5 | Jaipur | RJ |
| name |
|---|
| Nashik |
| Pune |
SELECT DISTINCT name FROM city WHERE id % 2 = 0 ORDER BY name;
Find how many duplicate city names exist in the table. Subtract the number of distinct names from the total count — the difference equals the duplicate count.
| id | name |
|---|---|
| 1 | Delhi |
| 2 | Mumbai |
| 3 | Delhi |
| 4 | Pune |
| 5 | Mumbai |
| duplicate_count |
|---|
| 2 |
SELECT COUNT(name) - COUNT(DISTINCT name) AS duplicate_count FROM city;
A country is classified as "big" if it has an area of at least 3,000,000 km² OR a population of at least 25,000,000. Report the name, population, and area for all big countries.
| name | continent | area | population |
|---|---|---|---|
| India | Asia | 3287263 | 1380000000 |
| Maldives | Asia | 298 | 540000 |
| Russia | Europe | 17098242 | 144000000 |
| Nepal | Asia | 147181 | 29136808 |
| name | population | area |
|---|---|---|
| India | 1380000000 | 3287263 |
| Russia | 144000000 | 17098242 |
| Nepal | 29136808 | 147181 |
SELECT name, population, area FROM world WHERE area >= 3000000 OR population >= 25000000;
Find the names of customers who were not referred by customer with id = 2. Include customers whose referee_id is NULL (they were referred by no one).
| id | name | referee_id |
|---|---|---|
| 1 | Arjun | NULL |
| 2 | Priya | NULL |
| 3 | Raj | 1 |
| 4 | Simran | 2 |
| 5 | Karan | 2 |
| 6 | Neha | 3 |
| name |
|---|
| Arjun |
| Priya |
| Raj |
| Neha |
SELECT name FROM customer WHERE referee_id != 2 OR referee_id IS NULL ORDER BY name;
Find all customers who have never placed an order. Use LEFT JOIN to keep all customers, then filter for rows where the order side is NULL.
| id | name |
|---|---|
| 1 | Arjun |
| 2 | Priya |
| 3 | Raj |
| 4 | Simran |
| id | customerId |
|---|---|
| 1 | 1 |
| 2 | 3 |
| Customers |
|---|
| Priya |
| Simran |
SELECT c.name AS Customers FROM Customers c LEFT JOIN Orders o ON c.id = o.customerId WHERE o.id IS NULL ORDER BY c.name;
Find all email addresses that appear more than once in the Person table. Use GROUP BY to group by email, then HAVING to filter groups with more than 1 row.
| id | |
|---|---|
| 1 | a@example.com |
| 2 | b@example.com |
| 3 | a@example.com |
| a@example.com |
SELECT email AS Email FROM Person GROUP BY email HAVING COUNT(*) > 1;
Find all classes that have at least 5 students enrolled. Each row in the table represents one student enrolled in one class.
| student | class |
|---|---|
| Arjun | Maths |
| Priya | Maths |
| Raj | Biology |
| Simran | Maths |
| Karan | Maths |
| Neha | Biology |
| Aisha | Maths |
| class |
|---|
| Maths |
SELECT class FROM Courses GROUP BY class HAVING COUNT(student) >= 5;
Show all movies with an odd ID and a description that is not 'boring', ordered by rating descending.
| id | movie | description | rating |
|---|---|---|---|
| 1 | War | great 3D | 8.9 |
| 2 | Science | fiction | 8.5 |
| 3 | irish | boring | 6.2 |
| 4 | Ice song | Fantacy | 8.6 |
| 5 | House card | Interesting | 9.1 |
| id | movie | description | rating |
|---|---|---|---|
| 5 | House card | Interesting | 9.1 |
| 1 | War | great 3D | 8.9 |
SELECT * FROM Cinema WHERE id % 2 = 1 AND description != 'boring' ORDER BY rating DESC;
Find all authors who viewed at least one of their own articles (i.e. author_id = viewer_id). Return distinct author IDs sorted ascending.
| article_id | author_id | viewer_id | view_date |
|---|---|---|---|
| 1 | 3 | 5 | 2019-08-01 |
| 1 | 3 | 6 | 2019-08-02 |
| 2 | 7 | 7 | 2019-08-01 |
| 2 | 7 | 6 | 2019-08-02 |
| 4 | 7 | 1 | 2019-07-22 |
| 3 | 4 | 4 | 2019-07-21 |
| id |
|---|
| 4 |
| 7 |
SELECT DISTINCT author_id AS id FROM Views WHERE author_id = viewer_id ORDER BY id;
You want to show only departments where the total salary is above ₹50,000. Your query already has GROUP BY department. Which clause do you add?
A phone column has some rows with no value stored. Which query correctly finds those rows?
The orders table has 1,000 rows but only 40 unique cities. What does SELECT DISTINCT city FROM orders return?
You write a query with SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY. In which order does SQL actually execute these clauses?
Hint: this is why you can't use a SELECT alias inside WHERE.
You LEFT JOIN customers (5 rows) with orders (3 rows). Two customers have no orders at all. How many rows appear in the result?
A status column has 10 rows, but 3 of them are NULL. What does COUNT(status) return?
Which pattern matches any name that starts with 'A' and ends with 'n', with any number of characters in between (like 'Arjun', 'Adrian', or even just 'An')?
You write SELECT name FROM employees ORDER BY salary without specifying ASC or DESC. In what order are results returned?
Table A has 3 rows, Table B has 3 rows. One row exists in both tables. UNION gives ___ rows; UNION ALL gives ___ rows.
Table students has id as PRIMARY KEY. Row with id=1 already exists. You try:INSERT INTO students VALUES(1, 'Riya') — what happens? And which general rule does that reveal about PRIMARY KEY?
You have 5 customers and 3 orders. Two customers have never ordered. You INNER JOIN customers to orders. How many rows appear in the result?
A student table has ages 17, 18, 20, 25, 26. Query: WHERE age BETWEEN 18 AND 25. Which ages appear in the result?
Table employees before grouping:
| id | name | department | salary |
|---|---|---|---|
| 1 | Arjun | HR | 40000 |
| 2 | Priya | HR | 55000 |
| 3 | Ravi | IT | 70000 |
After GROUP BY department, HR collapses to 1 row. SQL can't decide which name (Arjun or Priya?) to show. Which SELECT line causes an error?
After running DELETE FROM employees, what is left in the database?
A row has phone = NULL and email = 'p@co.in'. What does COALESCE(phone, email, 'No contact') return?
A student scores 85. What does this return?CASE WHEN score >= 90 THEN 'A' WHEN score >= 80 THEN 'B' ELSE 'C' END
orders.customer_id has a FOREIGN KEY referencing customers.id. What does this prevent?
You add CREATE INDEX idx_email ON users(email). What is the primary benefit?
Table employees — notice manager_id points to another row in the same table:
| id | name | salary | manager_id |
|---|---|---|---|
| 1 | Sunita | 90000 | NULL |
| 2 | Arjun | 60000 | 1 |
| 3 | Priya | 95000 | 1 |
Priya (₹95k) earns more than her manager Sunita (₹90k). To find such employees, you use:
You want the 3 most expensive products from the products table. Which query is correct in SQLite?
Query: SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees). What does the inner SELECT AVG(salary) return?
WHERE EXISTS (SELECT 1 FROM orders WHERE orders.customer_id = customers.id) — what does this check for each customer row?
Write a SQL query to find all duplicate email addresses in the Person table. An email is a duplicate if it appears more than once.
| id | |
|---|---|
| 1 | a@b.com |
| 2 | c@d.com |
| 3 | a@b.com |
SELECT email FROM Person GROUP BY email HAVING COUNT(email) > 1;
Find all customers who have never placed an order. A classic LEFT JOIN + NULL check pattern asked in almost every SQL interview.
| id | name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| 3 | Carol |
| id | customerId |
|---|---|
| 1 | 1 |
| 2 | 1 |
SELECT c.name AS Customers FROM Customers c LEFT JOIN Orders o ON c.id = o.customerId WHERE o.id IS NULL;
Write a query to find employees whose salary is higher than their direct manager's salary. Uses a self-join on the Employee table using managerId.
| id | name | salary | managerId |
|---|---|---|---|
| 1 | Joe | 70000 | 3 |
| 2 | Henry | 80000 | 4 |
| 3 | Sam | 60000 | NULL |
| 4 | Max | 90000 | NULL |
SELECT e.name AS Employee FROM Employee e JOIN Employee m ON e.managerId = m.id WHERE e.salary > m.salary;
Calculate the average star rating per product category, rounded to 2 decimal places, ordered by category name. A common warm-up question in Amazon data analyst rounds.
| review_id | product_id | stars |
|---|---|---|
| 1 | 101 | 5 |
| 2 | 102 | 3 |
| 3 | 101 | 4 |
| product_id | category |
|---|---|
| 101 | Electronics |
| 102 | Books |
SELECT p.category, ROUND(AVG(r.stars), 2) AS avg_rating FROM reviews r JOIN products p ON r.product_id = p.product_id GROUP BY p.category ORDER BY p.category;
Find all dates where the temperature was higher than the previous day. Use a self-join on the Weather table and julianday() arithmetic to link each row to the row exactly 1 day earlier.
| id | recordDate | temperature |
|---|---|---|
| 1 | 2023-01-01 | 10 |
| 2 | 2023-01-02 | 25 |
| 3 | 2023-01-03 | 20 |
| 4 | 2023-01-04 | 30 |
SELECT w1.id FROM Weather w1 JOIN Weather w2 ON julianday(w1.recordDate) - julianday(w2.recordDate) = 1 WHERE w1.temperature > w2.temperature;
Write a query to find the second highest distinct salary. If there is no second highest salary, return NULL. Most asked SQL question globally — know 3 approaches.
| id | salary |
|---|---|
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
| 4 | 300 |
-- Approach 1: Subquery (most readable) SELECT MAX(salary) AS SecondHighestSalary FROM Employee WHERE salary < (SELECT MAX(salary) FROM Employee); -- Approach 2: LIMIT + OFFSET SELECT DISTINCT salary FROM Employee ORDER BY salary DESC LIMIT 1 OFFSET 1; -- Approach 3: Window function (preferred in interviews) SELECT salary AS SecondHighestSalary FROM ( SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM Employee ) t WHERE rnk = 2;
Find the top 5 customers by total number of orders placed. Show their name and order count, highest first. Frequently asked in Flipkart/e-commerce data rounds.
| id | name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| 3 | Carol |
| … | … |
| id | customer_id | amount |
|---|---|---|
| 1 | 1 | 500 |
| 2 | 1 | 300 |
| 3 | 2 | 700 |
| … | … | … |
SELECT c.name, COUNT(o.id) AS order_count FROM customers c JOIN orders o ON c.id = o.customer_id GROUP BY c.id, c.name ORDER BY order_count DESC LIMIT 5;
Find the top 3 unique salaries in each department. This is the most commonly asked window function question. Uses DENSE_RANK so ties are handled correctly.
| id | name | salary | deptId |
|---|---|---|---|
| 1 | Alice | 90000 | 1 |
| 2 | Bob | 80000 | 1 |
| 3 | Carol | 70000 | 2 |
| id | name |
|---|---|
| 1 | Engineering |
| 2 | Marketing |
WITH ranked AS ( SELECT d.name AS Department, e.name AS Employee, e.salary, DENSE_RANK() OVER ( PARTITION BY e.deptId ORDER BY e.salary DESC ) AS rnk FROM Employee e JOIN Department d ON e.deptId = d.id ) SELECT Department, Employee, salary FROM ranked WHERE rnk <= 3;
Find employees whose salary is above the average salary of their own department. Classic correlated subquery or CTE — commonly asked at Microsoft, LinkedIn, and Oracle.
| id | name | dept | salary |
|---|---|---|---|
| 1 | Alice | Eng | 90000 |
| 2 | Bob | Eng | 70000 |
| 3 | Carol | HR | 60000 |
| 4 | Dave | HR | 55000 |
WITH dept_avg AS ( SELECT dept, AVG(salary) AS avg_sal FROM employees GROUP BY dept ) SELECT e.name, e.dept, e.salary FROM employees e JOIN dept_avg d ON e.dept = d.dept WHERE e.salary > d.avg_sal ORDER BY e.dept, e.salary DESC;
Calculate the month-over-month revenue growth percentage. Use LAG() to access the previous month's revenue and compute the percentage change.
| order_id | amount | order_date |
|---|---|---|
| 1 | 1000 | 2024-01-15 |
| 2 | 800 | 2024-01-28 |
| 3 | 1500 | 2024-02-10 |
| 4 | 900 | 2024-02-22 |
| 5 | 1200 | 2024-03-05 |
| 6 | 1100 | 2024-03-18 |
WITH monthly AS ( SELECT strftime('%Y-%m', order_date) AS month, SUM(amount) AS revenue FROM monthly_orders GROUP BY 1 ) SELECT month, revenue, LAG(revenue) OVER (ORDER BY month) AS prev_revenue, ROUND( (revenue - LAG(revenue) OVER (ORDER BY month)) * 100.0 / LAG(revenue) OVER (ORDER BY month), 2 ) AS growth_pct FROM monthly;
Calculate a 7-day rolling average of daily active users. Asked extensively at Google, Twitter, and analytics-heavy companies. Tests window frame knowledge.
| activity_date | active_users |
|---|---|
| 2024-01-01 | 1200 |
| 2024-01-02 | 1350 |
| 2024-01-03 | 980 |
SELECT activity_date, active_users, ROUND( AVG(active_users) OVER ( ORDER BY activity_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ), 2 ) AS rolling_7d_avg FROM user_activity ORDER BY activity_date;
Find all restaurants where the order count this month is lower than the previous month. Directly from Swiggy data analyst interview rounds.
| restaurant_id | month | order_count |
|---|---|---|
| R1 | 2024-01 | 500 |
| R1 | 2024-02 | 420 |
| R2 | 2024-01 | 300 |
| R2 | 2024-02 | 350 |
WITH monthly AS ( SELECT restaurant_id, month, order_count, LAG(order_count) OVER ( PARTITION BY restaurant_id ORDER BY month ) AS prev_count FROM restaurant_orders ) SELECT restaurant_id, month, order_count, prev_count FROM monthly WHERE order_count < prev_count;
Find customers who placed at least one order in both 2023 and 2024 — a classic cohort retention question asked in Flipkart and e-commerce data interviews.
| order_id | customer_id | order_date |
|---|---|---|
| 1 | C1 | 2023-06-10 |
| 2 | C1 | 2024-02-15 |
| 3 | C2 | 2024-03-01 |
SELECT customer_id FROM orders WHERE YEAR(order_date) = 2023 INTERSECT SELECT customer_id FROM orders WHERE YEAR(order_date) = 2024;
Rank all scores from highest to lowest. If two scores are tied, they should have the same rank. The next rank after a tie should NOT skip numbers — this is what DENSE_RANK does.
| id | score |
|---|---|
| 1 | 3.50 |
| 2 | 3.65 |
| 3 | 4.00 |
| 4 | 3.65 |
SELECT score, DENSE_RANK() OVER (ORDER BY score DESC) AS rank FROM Scores ORDER BY score DESC;
Calculate a cumulative/running total of daily sales ordered by date. The most commonly tested window frame concept at Microsoft and Oracle interviews.
| sale_date | amount |
|---|---|
| 2024-01-01 | 1000 |
| 2024-01-02 | 1500 |
| 2024-01-03 | 800 |
SELECT sale_date, amount, SUM(amount) OVER ( ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_total FROM sales;
Find the median salary of employees in each department. There's no built-in MEDIAN in MySQL — requires row numbering + even/odd count logic. Frequently asked at Amazon and top analytics roles.
| id | company | salary |
|---|---|---|
| 1 | A | 2341 |
| 2 | A | 341 |
| 3 | A | 15000 |
| 4 | B | 15000 |
WITH ranked AS ( SELECT id, company, salary, ROW_NUMBER() OVER (PARTITION BY company ORDER BY salary) AS rn, COUNT(*) OVER (PARTITION BY company) AS cnt FROM Employee ) SELECT id, company, salary FROM ranked WHERE rn IN ( FLOOR((cnt + 1) / 2), CEIL((cnt + 1) / 2) );
Find all rows where 3 or more consecutive days had stadium traffic ≥ 100 people. The key insight: id - ROW_NUMBER() is constant for consecutive rows — the famous "gap-and-island" trick.
| id | visit_date | people |
|---|---|---|
| 1 | 2024-01-01 | 10 |
| 2 | 2024-01-02 | 109 |
| 3 | 2024-01-03 | 150 |
| 4 | 2024-01-04 | 99 |
| 5 | 2024-01-05 | 145 |
| 6 | 2024-01-06 | 200 |
| 7 | 2024-01-07 | 120 |
WITH high_traffic AS ( SELECT id, visit_date, people, id - ROW_NUMBER() OVER (ORDER BY id) AS grp FROM Stadium WHERE people >= 100 ), groups AS ( SELECT grp FROM high_traffic GROUP BY grp HAVING COUNT(*) >= 3 ) SELECT h.id, h.visit_date, h.people FROM high_traffic h JOIN groups g ON h.grp = g.grp ORDER BY h.visit_date;
Calculate the cancellation rate of requests made by unbanned users between two dates. This multi-join + conditional aggregation problem appears in Uber, Lyft, and analytics interviews.
| id | client_id | status | request_at |
|---|---|---|---|
| 1 | 1 | completed | 2024-10-01 |
| 2 | 2 | cancelled_by_driver | 2024-10-01 |
| users_id | banned | role |
|---|---|---|
| 1 | No | client |
| 2 | Yes | client |
SELECT t.request_at AS Day, ROUND( SUM(CASE WHEN t.status != 'completed' THEN 1 ELSE 0 END) * 1.0 / COUNT(*), 2 ) AS 'Cancellation Rate' FROM Trips t JOIN Users u ON t.client_id = u.users_id AND u.banned = 'No' WHERE t.request_at BETWEEN '2024-10-01' AND '2024-10-03' GROUP BY t.request_at;
Find the names of managers who have at least 5 direct reports. A self-join aggregation problem asked heavily at Amazon, Google, and LinkedIn.
| id | name | managerId |
|---|---|---|
| 101 | Alice | NULL |
| 102 | Bob | 101 |
| 103 | Carol | 101 |
| 104 | Dave | 101 |
| 105 | Eve | 101 |
| 106 | Frank | 101 |
SELECT m.name FROM Employee e JOIN Employee m ON e.managerId = m.id GROUP BY m.id, m.name HAVING COUNT(e.id) >= 5;
Find each user's longest consecutive daily login streak. This uses the classic gap-and-island technique: subtracting ROW_NUMBER from the date to group consecutive days. Asked at Meta and Google.
| user_id | login_date |
|---|---|
| 1 | 2024-01-01 |
| 1 | 2024-01-02 |
| 1 | 2024-01-03 |
| 1 | 2024-01-05 |
| 2 | 2024-01-01 |
| 2 | 2024-01-02 |
WITH gaps AS ( SELECT user_id, login_date, date(login_date, '-' || ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY login_date ) || ' days') AS grp FROM logins ), streaks AS ( SELECT user_id, grp, COUNT(*) AS streak_len FROM gaps GROUP BY user_id, grp ) SELECT user_id, MAX(streak_len) AS longest_streak FROM streaks GROUP BY user_id;
Zomato's sales team runs a head-to-head deal contest each quarter. Two reps compete per restaurant deal, and only one wins it. Given the raw contest results, build the full leaderboard showing each rep's deals played, won, lost, and points (2 per win). The catch: a rep can appear as either rep_1 or rep_2 — both columns must be counted.
| rep_1 | rep_2 | winner |
|---|---|---|
| Rahul | Priya | Rahul |
| Priya | Ankit | Ankit |
| Sneha | Vikram | Sneha |
| Vikram | Rahul | Rahul |
| Ankit | Sneha | Ankit |
| rep | played | won | lost | points |
|---|---|---|---|---|
| Rahul | 2 | 2 | 0 | 4 |
| Ankit | 2 | 2 | 0 | 4 |
| Sneha | 2 | 1 | 1 | 2 |
| Priya | 2 | 0 | 2 | 0 |
| Vikram | 2 | 0 | 2 | 0 |
-- UNION ALL gives each rep both sides of every deal WITH all_deals AS ( SELECT rep_1 AS rep, CASE WHEN winner = rep_1 THEN 1 ELSE 0 END AS win FROM deal_contest UNION ALL SELECT rep_2 AS rep, CASE WHEN winner = rep_2 THEN 1 ELSE 0 END AS win FROM deal_contest ) SELECT rep, COUNT(*) AS played, SUM(win) AS won, COUNT(*) - SUM(win) AS lost, SUM(win) * 2 AS points FROM all_deals GROUP BY rep ORDER BY points DESC, won DESC;
Group the orders table by status and count how many orders fall into each group. Sort the results so the most common status appears first.
| order_id | customer_id | status | order_date |
|---|---|---|---|
| 1 | 101 | delivered | 2026-01-05 |
| 2 | 102 | shipped | 2026-01-06 |
| 3 | 103 | cancelled | 2026-01-07 |
| 4 | 104 | delivered | 2026-01-08 |
SELECT status, COUNT(*) AS order_count FROM orders GROUP BY status ORDER BY order_count DESC;
Return the 5 employees with the highest salaries. If two employees share the same salary, order them alphabetically by name.
| emp_id | name | department | salary |
|---|---|---|---|
| 1 | Alice | Engineering | 120000 |
| 2 | Bob | Marketing | 85000 |
| 3 | Carol | Engineering | 150000 |
| 4 | Eve | Engineering | 150000 |
SELECT emp_id, name, department, salary FROM employees ORDER BY salary DESC, name ASC LIMIT 5;
Find all departments where the average salary exceeds $90,000. Return department name and average salary rounded to 2 decimal places. Many candidates fail this by using WHERE instead of HAVING.
| emp_id | name | department | salary |
|---|---|---|---|
| 1 | Alice | Engineering | 120000 |
| 2 | Bob | Marketing | 85000 |
| 3 | Carol | Engineering | 150000 |
SELECT department, ROUND(AVG(salary), 2) AS avg_salary FROM employees GROUP BY department HAVING AVG(salary) > 90000 ORDER BY avg_salary DESC;
Find products that had zero sales transactions in 2025. The trick: put the year filter in the ON clause, not WHERE — otherwise you accidentally convert the LEFT JOIN to an INNER JOIN.
| product_id | product_name | category |
|---|---|---|
| 1 | Laptop | Electronics |
| 2 | Headphones | Electronics |
| 3 | Desk Chair | Furniture |
| sale_id | product_id | sale_date | revenue |
|---|---|---|---|
| 1 | 1 | 2025-03-10 | 2400 |
| 2 | 2 | 2025-06-15 | 750 |
| 3 | 1 | 2024-11-20 | 1200 |
SELECT p.product_id, p.product_name FROM products p LEFT JOIN sales s ON p.product_id = s.product_id AND strftime('%Y', s.sale_date) = '2025' WHERE s.sale_id IS NULL;
Calculate the total revenue per month for 2025. Use strftime to extract the month number and sort results chronologically.
| sale_id | product_id | sale_date | revenue |
|---|---|---|---|
| 1 | 1 | 2025-01-05 | 900 |
| 2 | 2 | 2025-01-18 | 1200 |
| 3 | 3 | 2025-02-20 | 600 |
| 4 | 4 | 2025-04-10 | 48 |
SELECT strftime('%m', sale_date) AS month_num, ROUND(SUM(revenue), 2) AS total_revenue FROM sales WHERE strftime('%Y', sale_date) = '2025' GROUP BY month_num ORDER BY month_num;
Find all employees who earn a higher salary than their direct manager. The key insight: join the employees table to itself using the manager_id foreign key.
| emp_id | name | manager_id | salary |
|---|---|---|---|
| 1 | CEO | NULL | 200000 |
| 2 | Alice | 1 | 120000 |
| 3 | Carol | 2 | 130000 |
| 4 | Dave | 2 | 80000 |
SELECT e.name AS employee_name, e.salary AS employee_salary, m.name AS manager_name, m.salary AS manager_salary FROM employees e JOIN employees m ON e.manager_id = m.emp_id WHERE e.salary > m.salary;
Find the second-highest salary within each department without using window functions. Use a self-join: pair each employee with every higher-paid colleague in the same dept, then HAVING COUNT = 1 means exactly one salary is higher.
| emp_id | name | department | salary |
|---|---|---|---|
| 1 | Alice | Engineering | 120000 |
| 2 | Carol | Engineering | 150000 |
| 3 | Grace | Marketing | 95000 |
| 4 | Bob | Marketing | 85000 |
SELECT e1.department, e1.name, e1.salary AS second_highest FROM employees e1 JOIN employees e2 ON e1.department = e2.department AND e1.salary < e2.salary GROUP BY e1.department, e1.name, e1.salary HAVING COUNT(DISTINCT e2.salary) = 1;
Pivot monthly revenue so each row is a month and columns show revenue for Electronics, Furniture, and Stationery. Use SUM(CASE WHEN) — the portable approach that works in all SQL dialects.
| product_id | product_name | category |
|---|---|---|
| 1 | Laptop | Electronics |
| 3 | Chair | Furniture |
| sale_id | product_id | sale_date | revenue |
|---|---|---|---|
| 1 | 1 | 2025-01-05 | 2400 |
| 2 | 3 | 2025-01-18 | 900 |
SELECT strftime('%Y-%m', s.sale_date) AS month, SUM(CASE WHEN p.category = 'Electronics' THEN s.revenue ELSE 0 END) AS electronics, SUM(CASE WHEN p.category = 'Furniture' THEN s.revenue ELSE 0 END) AS furniture, SUM(CASE WHEN p.category = 'Stationery' THEN s.revenue ELSE 0 END) AS stationery FROM sales s JOIN products p ON s.product_id = p.product_id GROUP BY month ORDER BY month;
For each age group, calculate the ratio of time spent sending snaps vs. opening snaps. Use NULLIF in the denominator to prevent division-by-zero errors. Real Snapchat/Meta question from DataLemur.
| user_id | activity_type | time_spent | age_bucket |
|---|---|---|---|
| 1 | send | 3.5 | 21-25 |
| 1 | open | 1.5 | 21-25 |
| 2 | send | 2.0 | 26-30 |
| 2 | open | 5.0 | 26-30 |
SELECT age_bucket, ROUND( SUM(CASE WHEN activity_type = 'send' THEN time_spent ELSE 0 END) / NULLIF(SUM(CASE WHEN activity_type = 'open' THEN time_spent ELSE 0 END), 0), 2) AS send_to_open_ratio FROM activities GROUP BY age_bucket ORDER BY age_bucket;
Calculate a 3-day rolling average of daily revenue. The frame clause ROWS BETWEEN 2 PRECEDING AND CURRENT ROW restricts the window to exactly 3 rows. Using RANGE instead of ROWS would include tied dates — a subtle but important difference.
| rev_date | revenue |
|---|---|
| 2025-01-01 | 1000 |
| 2025-01-02 | 1200 |
| 2025-01-03 | 900 |
| 2025-01-04 | 1500 |
| 2025-01-05 | 1100 |
SELECT rev_date, revenue, ROUND(AVG(revenue) OVER ( ORDER BY rev_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ), 2) AS rolling_3day_avg FROM daily_revenue ORDER BY rev_date;
Find how many users active in a given month were also active the month before. A self-join links each month to the previous one via a date offset. Real Facebook/Meta DataLemur question.
| user_id | activity_month |
|---|---|
| 1 | 2025-01-01 |
| 1 | 2025-02-01 |
| 2 | 2025-01-01 |
| 2 | 2025-02-01 |
| 3 | 2025-02-01 |
SELECT curr.activity_month, COUNT(DISTINCT curr.user_id) AS retained_users FROM user_activity curr JOIN user_activity prev ON curr.user_id = prev.user_id AND curr.activity_month = date(prev.activity_month, '+1 month') GROUP BY curr.activity_month ORDER BY curr.activity_month;
For each user, retrieve their third transaction chronologically. Use ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY txn_date) then filter where rn = 3. Classic DataLemur medium question.
| txn_id | user_id | amount | txn_date |
|---|---|---|---|
| 1 | 101 | 50.00 | 2025-01-01 |
| 2 | 101 | 75.00 | 2025-01-15 |
| 3 | 101 | 120.00 | 2025-02-01 |
| 4 | 102 | 90.00 | 2025-03-01 |
WITH ranked AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY txn_date ) AS rn FROM transactions ) SELECT txn_id, user_id, amount, txn_date FROM ranked WHERE rn = 3;
Find the Nth highest distinct salary. DENSE_RANK() is the correct approach — if two people share the top salary, the next distinct value ranks 2nd. Using LIMIT/OFFSET would silently skip ties and return a wrong answer.
| emp_id | name | salary |
|---|---|---|
| 1 | Alice | 150000 |
| 2 | Bob | 150000 |
| 3 | Carol | 120000 |
| 4 | Dave | 95000 |
WITH ranked_salaries AS ( SELECT DISTINCT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM employees ) SELECT salary AS nth_highest_salary FROM ranked_salaries WHERE rnk = 2; -- change 2 to any N
Calculate year-over-year revenue growth per category. LAG(revenue) OVER (PARTITION BY category ORDER BY year) fetches last year's value. Use NULLIF to guard against division by zero.
| sale_year | category | revenue |
|---|---|---|
| 2023 | Electronics | 500000 |
| 2024 | Electronics | 650000 |
| 2025 | Electronics | 780000 |
WITH yoy AS ( SELECT sale_year, category, revenue, LAG(revenue) OVER ( PARTITION BY category ORDER BY sale_year ) AS prev_revenue FROM yearly_sales ) SELECT sale_year, category, revenue, ROUND( 100.0 * (revenue - prev_revenue) / NULLIF(prev_revenue, 0), 1 ) AS yoy_growth_pct FROM yoy WHERE prev_revenue IS NOT NULL ORDER BY category, sale_year;
Find each user's continuous active subscription periods (start date, end date, duration). The gaps-and-islands trick: subtracting the row number (as days) from each consecutive active date produces the same constant — revealing each contiguous island.
| user_id | status_date | is_active |
|---|---|---|
| 1 | 2025-01-01 | 1 |
| 1 | 2025-01-02 | 1 |
| 1 | 2025-01-03 | 1 |
| 1 | 2025-01-04 | 0 |
| 1 | 2025-01-05 | 1 |
WITH active_days AS ( SELECT user_id, status_date, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY status_date) AS rn FROM subscription_status WHERE is_active = 1 ), islands AS ( SELECT user_id, status_date, date(status_date, '-' || rn || ' days') AS grp FROM active_days ) SELECT user_id, MIN(status_date) AS period_start, MAX(status_date) AS period_end, COUNT(*) AS duration_days FROM islands GROUP BY user_id, grp ORDER BY user_id, period_start;
Traverse the full org chart from CEO to every employee using a recursive CTE. The anchor selects root nodes (manager_id IS NULL). The recursive part joins each employee to their parent's CTE row, building the hierarchy level and path.
| emp_id | name | manager_id |
|---|---|---|
| 1 | CEO | NULL |
| 2 | Alice | 1 |
| 3 | Bob | 1 |
| 4 | Carol | 2 |
WITH RECURSIVE org AS ( SELECT emp_id, name, manager_id, 1 AS lvl, name AS path FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.emp_id, e.name, e.manager_id, o.lvl + 1, o.path || ' > ' || e.name FROM employees e JOIN org o ON e.manager_id = o.emp_id ) SELECT emp_id, name, lvl, path FROM org ORDER BY lvl, name;
Group user clickstream events into sessions where any gap exceeding 30 minutes starts a new session. Two-CTE pattern: LAG() detects gaps, then SUM(new_session_flag) as a running count assigns the session ID. Real Google/Meta data engineering interview question.
| event_id | user_id | event_time |
|---|---|---|
| 1 | 1 | 2025-01-01 09:00 |
| 2 | 1 | 2025-01-01 09:15 |
| 3 | 1 | 2025-01-01 10:25 |
| 4 | 1 | 2025-01-01 11:10 |
WITH gaps AS ( SELECT event_id, user_id, event_time, CASE WHEN (julianday(event_time) - julianday(LAG(event_time) OVER ( PARTITION BY user_id ORDER BY event_time ))) * 1440 > 30 OR LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) IS NULL THEN 1 ELSE 0 END AS new_session FROM clickstream ), sessions AS ( SELECT *, SUM(new_session) OVER (PARTITION BY user_id ORDER BY event_time) AS session_id FROM gaps ) SELECT user_id, session_id, MIN(event_time) AS session_start, MAX(event_time) AS session_end FROM sessions GROUP BY user_id, session_id ORDER BY user_id, session_id;
A customer table has duplicates from system retries. Keep only the most recent record per customer. ROW_NUMBER() PARTITION BY customer_id ORDER BY updated_at DESC assigns rank 1 to the freshest row. Note: DENSE_RANK would fail here — tied timestamps would both get rank 1.
| record_id | customer_id | name | updated_at |
|---|---|---|---|
| 1 | 1001 | Alice | 2025-01-01 10:00 |
| 2 | 1001 | Alice | 2025-03-15 14:30 |
| 3 | 1002 | Bob | 2025-02-01 09:00 |
| 4 | 1002 | Bob | 2025-02-01 09:00 |
WITH ranked AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY updated_at DESC ) AS rn FROM customer_records ) SELECT record_id, customer_id, name, updated_at FROM ranked WHERE rn = 1;
Show each transaction with the account balance both before and after it. Three-step pattern: (1) convert debits to negative amounts; (2) running SUM ... ROWS UNBOUNDED PRECEDING for balance-after; (3) LAG() of that value for balance-before.
| txn_id | account_id | txn_date | amount | txn_type |
|---|---|---|---|---|
| 1 | 1001 | 2025-01-05 | 500 | credit |
| 2 | 1001 | 2025-01-12 | 200 | debit |
| 3 | 1001 | 2025-01-20 | 1000 | credit |
WITH signed AS ( SELECT *, CASE WHEN txn_type = 'credit' THEN amount WHEN txn_type = 'debit' THEN -amount END AS signed_amt FROM account_txns ), running AS ( SELECT *, SUM(signed_amt) OVER ( PARTITION BY account_id ORDER BY txn_date, txn_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS balance_after FROM signed ) SELECT txn_id, txn_date, amount, txn_type, COALESCE(LAG(balance_after) OVER ( PARTITION BY account_id ORDER BY txn_date, txn_id ), 0) AS balance_before, balance_after FROM running ORDER BY account_id, txn_date, txn_id;
Find what % of users who bought Product A immediately bought Product B next. LEAD(product) retrieves the very next purchase per user. Filter where current = A and check next = B. Real Apple/DataLemur hard question pattern.
| txn_id | customer_id | product | purchased_at |
|---|---|---|---|
| 1 | 101 | iPhone | 2025-01-05 |
| 2 | 101 | AirPods | 2025-01-10 |
| 3 | 102 | iPhone | 2025-02-01 |
| 4 | 102 | Charger | 2025-02-05 |
WITH next_product AS ( SELECT customer_id, product AS current_product, LEAD(product) OVER ( PARTITION BY customer_id ORDER BY purchased_at ) AS next_product FROM purchase_history ) SELECT ROUND( 100.0 * COUNT(CASE WHEN next_product = 'AirPods' THEN 1 END) / NULLIF(COUNT(*), 0), 1 ) AS airpods_followup_pct FROM next_product WHERE current_product = 'iPhone';
Write a query to show all records from the students table. The asterisk * is a wildcard meaning "all columns". This is the very first query every SQL learner writes.
| id | name | grade | city |
|---|---|---|---|
| 1 | Arjun | A | Delhi |
| 2 | Priya | B | Mumbai |
| 3 | Raj | A | Pune |
SELECT * FROM students;
Retrieve only the name and price columns from the products table. In real systems, selecting specific columns is faster than SELECT * because less data travels over the network.
| id | name | category | price | stock |
|---|---|---|---|---|
| 1 | Mouse | Electronics | 799 | 150 |
| 2 | Monitor | Electronics | 4999 | 25 |
| 3 | Notebook | Stationery | 299 | 500 |
SELECT name, price FROM products;
Show only employees who work in the Engineering department. The WHERE clause filters rows before they are returned — think of it as the "row filter" in SQL.
| id | name | department | salary |
|---|---|---|---|
| 1 | Alice | Engineering | 95000 |
| 2 | Bob | Sales | 65000 |
| 3 | Carol | Engineering | 110000 |
| 4 | Dave | HR | 70000 |
SELECT name, salary FROM employees WHERE department = 'Engineering';
Retrieve all products and sort them by price from cheapest to most expensive. ORDER BY col ASC is ascending (smallest first); DESC flips it. Without ORDER BY, SQL gives back rows in no guaranteed order.
| id | name | price |
|---|---|---|
| 1 | Mouse | 799 |
| 2 | Monitor | 4999 |
| 3 | Notebook | 299 |
SELECT name, price FROM products ORDER BY price ASC;
Return the 3 students with the highest exam scores. Always pair LIMIT with ORDER BY — otherwise you get a random 3 rows, not the top 3.
| id | name | score |
|---|---|---|
| 1 | Arjun | 88 |
| 2 | Priya | 95 |
| 3 | Raj | 76 |
| 4 | Sneha | 91 |
| 5 | Vikram | 83 |
SELECT name, score FROM students ORDER BY score DESC LIMIT 3;
Find the total number of orders in the orders table. COUNT(*) is the most common aggregate function — it counts every row regardless of NULLs. Adding AS total_orders gives the column a readable name.
| id | customer | amount | status |
|---|---|---|---|
| 1 | Alice | 799 | delivered |
| 2 | Bob | 1299 | shipped |
| 3 | Carol | 4999 | delivered |
SELECT COUNT(*) AS total_orders FROM orders;
Find the average salary of all employees. AVG(salary) adds up every salary and divides by the number of rows. Use ROUND(AVG(salary), 0) to remove decimal places when you want a clean number.
| id | name | salary |
|---|---|---|
| 1 | Alice | 95000 |
| 2 | Bob | 72000 |
| 3 | Carol | 110000 |
| 4 | Dave | 83000 |
SELECT ROUND(AVG(salary), 0) AS avg_salary FROM employees;
Show how many employees are in each department. GROUP BY department creates one row per unique department value, then COUNT(*) counts the employees inside each group. This is the most asked beginner aggregate question.
| id | name | department |
|---|---|---|
| 1 | Alice | Engineering |
| 2 | Bob | Sales |
| 3 | Carol | Engineering |
| 4 | Dave | HR |
| 5 | Eve | Sales |
SELECT department, COUNT(*) AS headcount FROM employees GROUP BY department ORDER BY headcount DESC;
Your analytics team at Flipkart notices that a small group of products drives the majority of revenue. Following the Pareto (80/20) rule, write a SQL query to find all products whose combined revenue — when ranked from highest to lowest — accounts for the first 80% of total revenue. Include the cumulative revenue percentage in the result. This pattern is one of the most frequently asked business analytics questions in SQL interviews.
| product_id | product_name | revenue |
|---|---|---|
| 1 | Premium Laptop | 50000 |
| 2 | 4K Monitor | 30000 |
| 3 | Keyboard | 8000 |
| 4 | USB-C Hub | 6000 |
| 5 | Mouse Pad | 3000 |
| 6 | Webcam | 2000 |
| 7 | Desk Organiser | 800 |
| 8 | Cable Clips | 200 |
-- Running cumulative SUM to pinpoint the 80% revenue threshold WITH ranked AS ( SELECT product_id, product_name, revenue, SUM(revenue) OVER ( ORDER BY revenue DESC ROWS UNBOUNDED PRECEDING ) AS running_total, SUM(revenue) OVER () AS total_revenue FROM products ) SELECT product_id, product_name, revenue, ROUND(running_total * 100.0 / total_revenue, 1) AS cumulative_pct FROM ranked WHERE running_total - revenue < total_revenue * 0.8;
-- NTILE(5) splits into 5 equal buckets; bucket 1 = top 20% of rows SELECT product_id, product_name, revenue FROM ( SELECT product_id, product_name, revenue, NTILE(5) OVER (ORDER BY revenue DESC) AS quintile FROM products ) t WHERE quintile = 1;
-- PERCENT_RANK = 0 for the top row; ≤ 0.2 means top 20% of products SELECT product_id, product_name, revenue FROM ( SELECT product_id, product_name, revenue, PERCENT_RANK() OVER (ORDER BY revenue DESC) AS pct_rank FROM products ) t WHERE pct_rank <= 0.2;
Swiggy's growth team needs a daily customer classification report. For each order date, determine whether each order belongs to a new customer (placing their very first order on that day) or a repeat customer (who ordered at least once before). Output each date with the count of new and repeat customers side by side. This is one of the most common business analytics questions in data analyst interviews — it tests your ability to break a problem into steps using CTEs and conditional aggregation.
| customer_id | order_date | amount |
|---|---|---|
| C1 | 2024-01-10 | 250 |
| C2 | 2024-01-10 | 180 |
| C1 | 2024-01-11 | 320 |
| C3 | 2024-01-11 | 150 |
| C2 | 2024-01-12 | 200 |
| C4 | 2024-01-12 | 90 |
| C1 | 2024-01-12 | 410 |
| C3 | 2024-01-13 | 270 |
| C5 | 2024-01-13 | 130 |
-- Step 1: find each customer's first ever order date WITH first_order AS ( SELECT customer_id, MIN(order_date) AS first_date FROM orders GROUP BY customer_id ) -- Step 2: join back, classify, then aggregate per day SELECT o.order_date, SUM(CASE WHEN o.order_date = f.first_date THEN 1 ELSE 0 END) AS new_customers, SUM(CASE WHEN o.order_date > f.first_date THEN 1 ELSE 0 END) AS repeat_customers FROM orders o JOIN first_order f ON o.customer_id = f.customer_id GROUP BY o.order_date ORDER BY o.order_date;
-- MIN() OVER (PARTITION BY customer_id) gives first date without a separate GROUP BY WITH classified AS ( SELECT customer_id, order_date, MIN(order_date) OVER (PARTITION BY customer_id) AS first_date FROM orders ) SELECT order_date, SUM(CASE WHEN order_date = first_date THEN 1 ELSE 0 END) AS new_customers, SUM(CASE WHEN order_date > first_date THEN 1 ELSE 0 END) AS repeat_customers FROM classified GROUP BY order_date ORDER BY order_date;
-- CTE 1: tag every order as new or repeat WITH first_order AS ( SELECT customer_id, MIN(order_date) AS first_date FROM orders GROUP BY customer_id ), tagged AS ( SELECT o.order_date, CASE WHEN o.order_date = f.first_date THEN 'new' ELSE 'repeat' END AS cust_type FROM orders o JOIN first_order f ON o.customer_id = f.customer_id ) -- CTE 2: count each type per day SELECT order_date, SUM(CASE WHEN cust_type = 'new' THEN 1 ELSE 0 END) AS new_customers, SUM(CASE WHEN cust_type = 'repeat' THEN 1 ELSE 0 END) AS repeat_customers FROM tagged GROUP BY order_date ORDER BY order_date;
Smartworks, a co-working space provider, logs every member check-in: which floor they visit and which resource (laptop station, desktop, monitor, etc.) they use. The analytics team wants a member-level summary report — total check-ins, the floor each member visits most often (the statistical mode, not the maximum), and every distinct resource they have ever used, as a comma-separated list. This question is a staple in product-company interviews because it requires three distinct SQL techniques working together: plain aggregation, window-based mode calculation (SQL has no MODE() function), and string aggregation with deduplication.
| member | floor | resource |
|---|---|---|
| Priya | 1 | CPU |
| Priya | 1 | LAPTOP |
| Priya | 2 | MONITOR |
| Rahul | 2 | DESKTOP |
| Rahul | 2 | DESKTOP |
| Rahul | 1 | PRINTER |
WITH visit_count AS ( SELECT member, COUNT(*) AS total_visits FROM cowork_visits GROUP BY member ), floor_mode AS ( -- rank floors by visit frequency; rn=1 is the most visited SELECT member, floor, ROW_NUMBER() OVER ( PARTITION BY member ORDER BY COUNT(*) DESC ) AS rn FROM cowork_visits GROUP BY member, floor ), resource_list AS ( SELECT member, GROUP_CONCAT(DISTINCT resource) AS resources FROM cowork_visits GROUP BY member ) SELECT v.member, v.total_visits, f.floor AS top_floor, r.resources FROM visit_count v JOIN floor_mode f ON v.member = f.member AND f.rn = 1 JOIN resource_list r ON v.member = r.member ORDER BY v.member;
WITH floor_mode AS ( SELECT member, floor, ROW_NUMBER() OVER ( PARTITION BY member ORDER BY COUNT(*) DESC ) AS rn FROM cowork_visits GROUP BY member, floor ), summary AS ( SELECT member, COUNT(*) AS total_visits, STRING_AGG(DISTINCT resource, ',' ORDER BY resource) AS resources FROM cowork_visits GROUP BY member ) SELECT s.member, s.total_visits, f.floor AS top_floor, s.resources FROM summary s JOIN floor_mode f ON s.member = f.member AND f.rn = 1 ORDER BY s.member;
-- GROUP_CONCAT for MySQL and SQLite (no STRING_AGG support) WITH floor_mode AS ( SELECT member, floor, ROW_NUMBER() OVER ( PARTITION BY member ORDER BY COUNT(*) DESC ) AS rn FROM cowork_visits GROUP BY member, floor ), summary AS ( SELECT member, COUNT(*) AS total_visits, GROUP_CONCAT(DISTINCT resource) AS resources FROM cowork_visits GROUP BY member ) SELECT s.member, s.total_visits, f.floor AS top_floor, s.resources FROM summary s JOIN floor_mode f ON s.member = f.member AND f.rn = 1 ORDER BY s.member;
Razorpay's SRE team runs a daily health check on their payment gateway. Each day is logged as up (all systems operational) or down (incident detected). For SLA reporting and incident post-mortems, the team needs a condensed view: each contiguous window of the same status collapsed into a single row with its start date, end date, and status. This is the classic Gaps and Islands problem — SQL has no built-in way to detect "consecutive same-value runs", so you need a window-function trick to assign each island a unique group key.
| check_date | status |
|---|---|
| 2024-01-01 | up |
| 2024-01-02 | up |
| 2024-01-03 | up |
| 2024-01-04 | down |
| 2024-01-05 | down |
| 2024-01-06 | up |
WITH grp AS ( SELECT check_date, status, ROW_NUMBER() OVER (ORDER BY check_date) - ROW_NUMBER() OVER ( PARTITION BY status ORDER BY check_date ) AS grp_id FROM gateway_log ) SELECT MIN(check_date) AS start_date, MAX(check_date) AS end_date, status FROM grp GROUP BY grp_id, status ORDER BY start_date;
WITH changes AS ( SELECT check_date, status, CASE WHEN LAG(status) OVER (ORDER BY check_date) = status THEN 0 ELSE 1 END AS is_start FROM gateway_log ), grps AS ( SELECT check_date, status, SUM(is_start) OVER (ORDER BY check_date) AS grp_id FROM changes ) SELECT MIN(check_date) AS start_date, MAX(check_date) AS end_date, status FROM grps GROUP BY grp_id, status ORDER BY start_date;
-- SQL Server uses DATEADD(day, -rn, date); SQLite uses DATE(date, '-N days') WITH grp AS ( SELECT check_date, status, DATE(check_date, '-' || ROW_NUMBER() OVER ( PARTITION BY status ORDER BY check_date ) || ' days') AS grp_key FROM gateway_log ) SELECT MIN(check_date) AS start_date, MAX(check_date) AS end_date, status FROM grp GROUP BY grp_key, status ORDER BY start_date;
A Flipkart data team is auditing a JOIN before writing a pipeline. Table t1 has 3 rows and t2 has 3 rows — yet id=1 is duplicated in t1. A junior engineer assumes INNER JOIN returns 3 rows since both tables have 3 rows. How many rows does t1 INNER JOIN t2 ON t1.id = t2.id actually return — and why does the duplicate key matter?
| id |
|---|
| 1 |
| 1 |
| 2 |
| id |
|---|
| 1 |
| 2 |
| 3 |
-- Key 1: 2 rows in t1 × 1 row in t2 = 2 rows -- Key 2: 1 row in t1 × 1 row in t2 = 1 row -- Key 3: no match in t1 → 0 rows -- Total: 3 rows SELECT t1.id AS t1_id, t2.id AS t2_id FROM t1 INNER JOIN t2 ON t1.id = t2.id;
Amazon's data team joins two tables before a nightly aggregation. Both tables have duplicate entries for the same key value. A classic interview trap: before running the query, predict the exact row count. Most candidates guess 3 or 4 (min or max of the input table sizes) — both are wrong. The actual result shocks them.
| id |
|---|
| 1 |
| 1 |
| 2 |
| id |
|---|
| 1 |
| 1 |
| 1 |
| 3 |
-- Key 1: 2 rows in t1 × 3 rows in t2 = 6 rows ← Cartesian explosion! -- Key 2: no match in t2 → 0 rows -- Key 3: no match in t1 → 0 rows -- Total: 6 rows (larger than both input tables!) SELECT t1.id AS t1_id, t2.id AS t2_id FROM t1 INNER JOIN t2 ON t1.id = t2.id;
Same tables as Q55, but switching to LEFT JOIN. LEFT JOIN guarantees every row in t1 appears in the output — matched or not. t1's key=2 has no match in t2. How many rows does the LEFT JOIN return? Compare to Q55's INNER JOIN answer of 6.
| id |
|---|
| 1 |
| 1 |
| 2 |
| id |
|---|
| 1 |
| 1 |
| 1 |
| 3 |
-- INNER part — key=1: 2 × 3 = 6 rows -- LEFT-only — key=2 in t1, no match in t2: 1 row (t2_id = NULL) -- key=3 is only in t2 → LEFT JOIN does not preserve it -- Total: 6 + 1 = 7 rows SELECT t1.id AS t1_id, t2.id AS t2_id FROM t1 LEFT JOIN t2 ON t1.id = t2.id;
Same tables, now RIGHT JOIN. Every t2 row must appear — matched or not. t2's key=3 has no match in t1 and gets NULL for t1.id. But t1's key=2 (unmatched) is now dropped. Notice: LEFT JOIN and RIGHT JOIN both return 7 rows here, but they preserve different unmatched keys.
| id |
|---|
| 1 |
| 1 |
| 2 |
| id |
|---|
| 1 |
| 1 |
| 1 |
| 3 |
-- INNER part — key=1: 2 × 3 = 6 rows -- RIGHT-only — key=3 in t2, no match in t1: 1 row (t1_id = NULL) -- key=2 is only in t1 → RIGHT JOIN drops it (LEFT JOIN would keep it) -- Total: 6 + 1 = 7 rows SELECT t1.id AS t1_id, t2.id AS t2_id FROM t1 RIGHT JOIN t2 ON t1.id = t2.id; -- SQLite 3.39+ supports RIGHT JOIN natively -- Equivalent: FROM t2 LEFT JOIN t1 ON t2.id = t1.id
The definitive JOIN interview question. Given t1 = [1,1,2,3] and t2 = [1,1,2,4], predict the row count for all four JOIN types. Keys 1 and 2 are shared; key 3 is exclusive to t1; key 4 is exclusive to t2. Master the formula INNER ≤ LEFT, INNER ≤ RIGHT, max(LEFT, RIGHT) ≤ FULL — this appears in senior data engineering interviews at every major tech company.
| id |
|---|
| 1 |
| 1 |
| 2 |
| 3 |
| id |
|---|
| 1 |
| 1 |
| 2 |
| 4 |
-- Key 1: 2 (t1) × 2 (t2) = 4 rows -- Key 2: 1 (t1) × 1 (t2) = 1 row -- Key 3: only in t1 → 0 rows | Key 4: only in t2 → 0 rows -- INNER JOIN total: 5 rows SELECT t1.id AS t1_id, t2.id AS t2_id FROM t1 INNER JOIN t2 ON t1.id = t2.id;
-- Results: INNER=5, LEFT=6, RIGHT=6, FULL OUTER=7 SELECT 'INNER JOIN' AS join_type, COUNT(*) AS row_count FROM t1 INNER JOIN t2 ON t1.id = t2.id UNION ALL SELECT 'LEFT JOIN', COUNT(*) FROM t1 LEFT JOIN t2 ON t1.id = t2.id UNION ALL SELECT 'RIGHT JOIN', COUNT(*) FROM t2 LEFT JOIN t1 ON t2.id = t1.id UNION ALL SELECT 'FULL OUTER', COUNT(*) FROM ( SELECT t1.id, t2.id FROM t1 LEFT JOIN t2 ON t1.id = t2.id UNION ALL SELECT t1.id, t2.id FROM t2 LEFT JOIN t1 ON t2.id = t1.id WHERE t1.id IS NULL );
Zomato's data team joins an orders table (t1) with a customers table (t2). Both tables have a NULL key row — unassigned records. A junior engineer expects the two NULLs to pair up in the INNER JOIN. They won't. SQL uses three-valued logic: the ON clause keeps only rows where the condition is TRUE. NULL = NULL is UNKNOWN — not TRUE — so NULL keys are invisible to INNER JOIN and appear only as unmatched rows in LEFT/RIGHT JOIN.
| id |
|---|
| 1 |
| 2 |
| NULL |
| id |
|---|
| 1 |
| NULL |
| 3 |
-- NULL = NULL → UNKNOWN → treated as no match -- Key 1: 1 × 1 = 1 row -- id=2 (t1): no t2 match → 0 rows -- id=NULL (t1): NULL = NULL is UNKNOWN → 0 rows -- id=NULL (t2): same — never matches any t1 row -- Total INNER: 1 row SELECT t1.id AS t1_id, t2.id AS t2_id FROM t1 INNER JOIN t2 ON t1.id = t2.id;
-- LEFT JOIN output: -- (1, 1) ← matched -- (2, NULL) ← t1.id=2 unmatched, t2 side is NULL -- (NULL, NULL) ← t1.id=NULL unmatched (NULL key never matches) -- Total LEFT: 3 rows SELECT t1.id AS t1_id, t2.id AS t2_id FROM t1 LEFT JOIN t2 ON t1.id = t2.id; -- To force NULL = NULL to match, use IS (SQLite NULL-safe equals): -- INNER JOIN t2 ON t1.id IS t2.id → 2 rows: (1,1) and (NULL,NULL)
Swiggy's data team finds product IDs contain both NULL entries (missing data) and empty string entries (data entry errors). t1 has key '1', key '', and key NULL. t2 has key '1', key '', and key '2'. In most UIs both NULL and '' display as blank — but SQL treats them completely differently. Predict the INNER JOIN row count and which rows appear.
| id (TEXT) |
|---|
| 1 |
| '' (empty) |
| NULL |
| id (TEXT) |
|---|
| 1 |
| '' (empty) |
| 2 |
-- '' = '' is TRUE → empty strings match each other! -- NULL = '' is UNKNOWN → NULL key never matches -- Key '1': 1 × 1 = 1 row -- Key '' : 1 × 1 = 1 row ← empty string IS a matchable value -- Key NULL (t1): NULL = anything → UNKNOWN → 0 rows -- Total INNER: 2 rows SELECT t1.id AS t1_id, t2.id AS t2_id FROM t1 INNER JOIN t2 ON t1.id = t2.id;
-- Standard ON t1.id = t2.id → 2 rows (NULL never matches) -- NULL-safe ON t1.id IS t2.id → 2 rows (no NULL in t2 to pair with) -- To see NULL-safe behavior, add NULL to t2 and compare: -- INSERT INTO t2 VALUES(NULL); -- ON t1.id = t2.id → still 2 rows (NULL = NULL = UNKNOWN) -- ON t1.id IS t2.id → 3 rows (NULL IS NULL = TRUE in SQLite) SELECT t1.id AS t1_id, t2.id AS t2_id FROM t1 INNER JOIN t2 ON t1.id IS t2.id; -- PostgreSQL equivalent: ON t1.id IS NOT DISTINCT FROM t2.id
Flipkart's data warehouse team manages a product dimension table. iPhone 11 was listed at ₹10,000 on 2022-01-01. On 2022-03-01 the price drops to ₹8,000. Apply Slowly Changing Dimension Type 1: overwrite the record. Historical price is not needed — only the current state matters. Which SQL operation correctly implements SCD Type 1?
| product_id | product_name | price |
|---|---|---|
| 1 | iphone11 | 10000 |
| 2 | iphone12 | 15000 |
-- SCD Type 1: overwrite — old price 10000 is lost forever UPDATE product SET price = 8000, update_date = '2022-03-01' WHERE product_id = 1; -- After update: only the new price remains SELECT * FROM product WHERE product_id = 1;
The finance team needs full price history for audits — every price change must be preserved. Apply SCD Type 2: keep the old record with an expiry date and a is_current=0 flag, then insert a new current record. This table can be queried for both current state and any historical point in time.
| product_id | price | is_current |
|---|---|---|
| 1 | 10000 | 1 |
-- Step 1: expire the current row (close its time window) UPDATE product_history SET end_date = '2022-02-28', is_current = 0 WHERE product_id = 1 AND is_current = 1; -- Step 2: insert the new current row (open new time window) INSERT INTO product_history (product_id, product_name, price, start_date, end_date, is_current) VALUES (1, 'iphone11', 8000, '2022-03-01', NULL, 1); -- Verify: two rows now exist for product_id=1 SELECT product_id, price, start_date, end_date, is_current FROM product_history WHERE product_id = 1;
Marketing needs to show "was ₹X, now ₹Y" on the product page — only the previous price matters, not the full history. Apply SCD Type 3: add a prev_price column to the product table. On each price change, shift the current value to prev_price and write the new price to current_price. Simple, compact — but limited to one version back.
| product_id | current_price | prev_price |
|---|---|---|
| 1 | 10000 | NULL |
-- SCD Type 3: shift column, no new row created UPDATE product SET prev_price = current_price, -- 10000 moves to prev current_price = 8000, -- new price update_date = '2022-03-01' WHERE product_id = 1; -- After: 1 row, prev_price=10000, current_price=8000 SELECT product_id, current_price, prev_price FROM product; -- ⚠ Limitation: if price changes again (8000→7000), -- prev_price becomes 8000 and 10000 is lost forever.
After applying SCD Type 2, the product_history table contains both historical and current rows for iPhone 11. An analyst needs only the current (active) price for each product. Without a filter, the query returns every version — mixing old and new prices into the same result set. Which filter isolates only the live record?
| product_id | price | is_current |
|---|---|---|
| 1 | 10000 | 0 |
| 1 | 8000 | 1 |
| 2 | 15000 | 1 |
-- Get only the active (current) row per product SELECT product_id, product_name, price FROM product_history WHERE is_current = 1; -- Equivalent alternative: open-ended rows have no end_date -- WHERE end_date IS NULL -- Returns: product 1 → 8000, product 2 → 15000 -- Historical rows (is_current=0) are stored but hidden from day-to-day queries
SCD Type 2's killer feature: time travel queries. The finance team needs to know what iPhone 11's price was on 2022-02-15 — before the March price drop. The old row (price=10000, start=2022-01-01, end=2022-02-28) covers Feb 15. The new row (price=8000, start=2022-03-01, end=NULL) does not. Querying by exact start_date returns nothing — you need a range check.
| product_id | price | start_date |
|---|---|---|
| 1 | 10000 | 2022-01-01 |
| 1 | 8000 | 2022-03-01 |
-- What was the price on 2022-02-15? SELECT product_id, product_name, price, start_date, end_date FROM product_history WHERE product_id = 1 AND '2022-02-15' BETWEEN start_date AND COALESCE(end_date, '9999-12-31'); -- Returns: price=10000 (the Jan 1 – Feb 28 version) -- COALESCE handles active rows: end_date=NULL → '9999-12-31' -- so the open-ended row always includes future dates
-- Which version was active on Feb 15 vs Apr 01? SELECT product_id, price, start_date, COALESCE(end_date, 'active') AS end_date FROM product_history WHERE product_id = 1 AND ( '2022-02-15' BETWEEN start_date AND COALESCE(end_date, '9999-12-31') OR '2022-04-01' BETWEEN start_date AND COALESCE(end_date, '9999-12-31') );
This is one of the most-asked SQL interview tricks. The emp table has 12 rows — the last row is all NULLs. What does each variant of COUNT return, and why? Understanding this proves you know how the SQL engine evaluates aggregate functions under the hood.
| emp_id | emp_name | salary | manager_id | dep_id |
|---|---|---|---|---|
| 1 | Ankit | 14300 | 4 | 100 |
| 2 | Mohit | 15600 | 5 | 200 |
| 3 | Vikas | 12100 | 4 | 100 |
| 4 | Rohit | 7260 | 2 | 100 |
| 5 | Mudit | 15600 | 6 | 200 |
| 6 | Agam | 15600 | 2 | 200 |
| 7 | Sanjay | 12000 | 2 | 200 |
| 8 | Ashish | 7200 | 2 | 200 |
| 9 | Mukesh | 7000 | 6 | 300 |
| 10 | Rakesh | 8000 | 6 | 300 |
| 11 | Akhil | 4000 | 1 | 500 |
| NULL | NULL | NULL | NULL | NULL |
| count_star | count_1 | count_0 | count_manager | distinct_deps |
|---|---|---|---|---|
| 12 | 12 | 12 | 11 | 4 |
COUNT(0) evaluates 0 for every row — 0 is not NULL — so it counts all 12 rows. COUNT(col) evaluates the column value per row and skips NULLs. COUNT(DISTINCT col) additionally de-duplicates.
SELECT COUNT(*) AS count_star, -- 12: every row, NULLs included COUNT(1) AS count_1, -- 12: literal 1 is never NULL COUNT(0) AS count_0, -- 12: literal 0 is never NULL COUNT(manager_id) AS count_manager, -- 11: skips the NULL row COUNT(DISTINCT dep_id) AS distinct_deps -- 4: unique non-NULL dep_ids (100,200,300,500) FROM emp; -- Mental model: -- COUNT(*) / COUNT(literal) → count rows -- COUNT(col) → count non-NULL values in that column -- COUNT(DISTINCT col) → count unique non-NULL values
COUNT(0) resolves to the integer 0 for row 1, 0 for row 2 … 0 for row 12. None of those are NULL, so all 12 are counted. The actual value (0, 1, 42, 'hello') doesn't matter — what matters is whether the expression is NULL or not.
A very common interview follow-up after COUNT(*) vs COUNT(col): what happens inside a LEFT JOIN? The query below finds all customers and how many orders each has placed. Bob has never ordered — his order_id will be NULL after the LEFT JOIN. Which COUNT catches that?
| id | name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| 3 | Carol |
| order_id | customer_id | amount |
|---|---|---|
| 101 | 1 | 500 |
| 102 | 1 | 300 |
| 103 | 3 | 750 |
| name | total_rows | order_count |
|---|---|---|
| Alice | 2 | 2 |
| Bob | 1 | 0 |
| Carol | 1 | 1 |
order_id = NULL. COUNT(*) sees a row and counts it (→ 1). COUNT(o.order_id) sees NULL and skips it (→ 0). In most reporting queries you want COUNT(o.order_id) because 0 orders is more useful than 1.
SELECT c.name, COUNT(*) AS total_rows, -- counts the NULL row for Bob → 1 COUNT(o.order_id) AS order_count -- skips NULL → 0 for Bob ✓ FROM customers c LEFT JOIN orders o ON c.id = o.customer_id GROUP BY c.name ORDER BY c.name;
A natural extension of COUNT(col): combine COUNT with CASE to pivot status counts into columns without multiple subqueries. This pattern appears in nearly every Flipkart, Swiggy, and Zomato interview when asked to "show delivered, shipped, and cancelled orders in one row." The trick works because CASE WHEN false THEN 1 END returns NULL — and COUNT silently skips NULLs.
| order_id | status |
|---|---|
| 1 | delivered |
| 2 | shipped |
| 3 | cancelled |
| 4 | delivered |
| 5 | delivered |
| 6 | shipped |
| total | delivered | shipped | cancelled |
|---|---|---|---|
| 6 | 3 | 2 | 1 |
CASE WHEN status='delivered' THEN 1 END returns 1 for delivered rows and NULL for everything else. COUNT ignores NULLs, so only the matching rows are counted. This is a single table scan — far faster than three separate subqueries.
SELECT COUNT(*) AS total, COUNT(CASE WHEN status = 'delivered' THEN 1 END) AS delivered, COUNT(CASE WHEN status = 'shipped' THEN 1 END) AS shipped, COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancelled FROM orders; -- Equivalent using SUM (also common in interviews): -- SUM(CASE WHEN status = 'delivered' THEN 1 ELSE 0 END) -- Both give the same result — COUNT skips NULLs, SUM adds zeros
COUNT(CASE WHEN x THEN 1 END) leans on NULL-skipping behaviour. SUM(CASE WHEN x THEN 1 ELSE 0 END) is explicit — never returns NULL even on an empty set (returns 0 vs NULL). Use SUM when a guaranteed 0 matters; use COUNT when brevity matters.
The employees table stores a self-referencing hierarchy via manager_id. Write a query to find all direct and indirect reports under Priya (emp_id = 2, VP Engineering) along with their reporting depth. A simple JOIN only finds direct reports — you need a Recursive CTE to walk the tree to any depth.
| emp_id | emp_name | manager_id |
|---|---|---|
| 1 | Rahul (CEO) | NULL |
| 2 | Priya (VP Eng) | 1 |
| 3 | Amit (VP Sales) | 1 |
| 4 | Neha (BE Lead) | 2 |
| 5 | Karan (FE Lead) | 2 |
| 6 | Ravi (Sales) | 3 |
| 7 | Sana (Engineer) | 4 |
| 8 | Vijay (Engineer) | 5 |
| emp_id | emp_name | reporting_level |
|---|---|---|
| 4 | Neha (BE Lead) | 1 |
| 5 | Karan (FE Lead) | 1 |
| 7 | Sana (Engineer) | 2 |
| 8 | Vijay (Engineer) | 2 |
employees back to the CTE on e.manager_id = cte.emp_id, incrementing depth each time. Stops automatically when no new employees match. Add WHERE depth < 10 as a safety guard against circular data.
WITH RECURSIVE org_tree AS ( -- Anchor: seed with direct reports of Priya (emp_id = 2) SELECT emp_id, emp_name, manager_id, 1 AS depth FROM employees WHERE manager_id = 2 UNION ALL -- Recursive member: walk one level deeper each iteration SELECT e.emp_id, e.emp_name, e.manager_id, ot.depth + 1 FROM employees e INNER JOIN org_tree ot ON e.manager_id = ot.emp_id WHERE ot.depth < 10 -- safety guard: prevents infinite loop on bad data ) SELECT emp_id, emp_name, depth AS reporting_level FROM org_tree ORDER BY depth, emp_name; -- Iteration 1 (anchor): Neha(4,depth=1), Karan(5,depth=1) -- Iteration 2 (recursive): Sana(7,depth=2), Vijay(8,depth=2) -- Iteration 3: no employees report to Sana or Vijay → stops
WHERE depth < 10 is important — circular data (A → B → A) would produce an infinite loop without it.WITH RECURSIVE. SQL Server uses just WITH — the optimizer detects the recursion automatically.
An e-commerce platform lets users order via app or web. The purchases table records one row per user per channel per day. Write a query that returns, for each date, the total revenue and distinct user count split into three groups: app-only, web-only, and both. Dates where no one used both channels must still appear with 0 / 0 — they must not be silently dropped.
| user_id | order_date | channel | amount |
|---|---|---|---|
| 1 | 2024-01-01 | app | 250 |
| 1 | 2024-01-01 | web | 180 |
| 2 | 2024-01-01 | app | 320 |
| 3 | 2024-01-01 | web | 150 |
| 2 | 2024-01-02 | app | 400 |
| 4 | 2024-01-02 | web | 200 |
| order_date | channel | total_revenue | total_users |
|---|---|---|---|
| 2024-01-01 | app | 320 | 1 |
| 2024-01-01 | web | 150 | 1 |
| 2024-01-01 | both | 430 | 1 |
| 2024-01-02 | app | 400 | 1 |
| 2024-01-02 | web | 200 | 1 |
| 2024-01-02 | both | 0 | 0 |
COUNT(DISTINCT channel) = 2. CTE 2 uses UNION ALL to generate every date × channel skeleton — this is what guarantees both = 0, 0 on 2024-01-02 instead of a missing row.
WITH user_channel AS ( SELECT user_id, order_date, SUM(amount) AS total_revenue, CASE WHEN COUNT(DISTINCT channel) = 2 THEN 'both' ELSE MAX(channel) -- 'app' or 'web' END AS channel FROM purchases GROUP BY user_id, order_date ), all_combos AS ( -- generate skeleton: every date paired with all 3 channel labels SELECT DISTINCT order_date, 'app' AS channel FROM purchases UNION ALL SELECT DISTINCT order_date, 'web' AS channel FROM purchases UNION ALL SELECT DISTINCT order_date, 'both' AS channel FROM purchases ) SELECT ac.order_date, ac.channel, COALESCE(SUM(uc.total_revenue), 0) AS total_revenue, COUNT(uc.user_id) AS total_users FROM all_combos ac LEFT JOIN user_channel uc ON ac.order_date = uc.order_date AND ac.channel = uc.channel GROUP BY ac.order_date, ac.channel ORDER BY ac.order_date, CASE ac.channel WHEN 'app' THEN 1 WHEN 'web' THEN 2 ELSE 3 END; -- user_channel: labels each (user, date) — 'app', 'web', or 'both' -- all_combos: skeleton rows for every date × channel (guarantees zeros) -- LEFT JOIN: fills actual revenue/users; unmatched → NULL → 0
user_channel CTE, only dates that actually had 'both' users produce a 'both' row. On 2024-01-02, nobody used both channels — so a plain GROUP BY returns only 'app' and 'web' rows. The all_combos UNION ALL pre-generates the 'both' skeleton for every date regardless. The LEFT JOIN then matches real data where it exists and leaves NULLs elsewhere. COALESCE(..., 0) and COUNT(uc.user_id) convert those NULLs to 0.
The job_listings table stores required skills as a single comma-separated string per row (e.g., 'Python,SQL,Excel'). Write a query to find the most in-demand skills across all listings — each skill in a multi-skill listing must be counted as a separate row. There is no STRING_SPLIT() in SQLite; solve it with a Recursive CTE.
| job_id | company | required_skills |
|---|---|---|
| 1 | Flipkart | Python,SQL,Excel |
| 2 | Swiggy | SQL,Tableau |
| 3 | Zomato | Python,Power BI,SQL |
| 4 | Amazon | Excel,SQL,Python |
| 5 | Myntra | Tableau,Power BI |
| skill | cnt |
|---|---|
| SQL | 4 |
| Python | 3 |
| Excel | 2 |
| Power BI | 2 |
| Tableau | 2 |
skill; everything after as remaining. If no comma, skill = whole string, remaining = ''.remaining until remaining = ''.
WITH RECURSIVE split_skills AS ( -- Anchor: extract the first skill from each job listing SELECT job_id, TRIM(SUBSTR(required_skills, 1, CASE WHEN INSTR(required_skills, ',') > 0 THEN INSTR(required_skills, ',') - 1 -- up to first comma ELSE LENGTH(required_skills) END)) -- or whole string if no comma AS skill, CASE WHEN INSTR(required_skills, ',') > 0 THEN SUBSTR(required_skills, INSTR(required_skills, ',') + 1) ELSE '' END -- everything after the comma AS remaining FROM job_listings UNION ALL -- Recursive: peel the next skill from remaining SELECT job_id, TRIM(SUBSTR(remaining, 1, CASE WHEN INSTR(remaining, ',') > 0 THEN INSTR(remaining, ',') - 1 ELSE LENGTH(remaining) END)) AS skill, CASE WHEN INSTR(remaining, ',') > 0 THEN SUBSTR(remaining, INSTR(remaining, ',') + 1) ELSE '' END AS remaining FROM split_skills WHERE remaining != '' -- stop when nothing is left ) SELECT skill, COUNT(*) AS cnt FROM split_skills GROUP BY skill ORDER BY cnt DESC, skill ASC; -- SQL Server equivalent (much simpler): -- SELECT value AS skill, COUNT(*) AS cnt -- FROM job_listings -- CROSS APPLY STRING_SPLIT(required_skills, ',') -- GROUP BY value ORDER BY cnt DESC;
INSTR('Python,SQL,Excel', ',') = 7 → skill = 'Python', remaining = 'SQL,Excel'INSTR('SQL,Excel', ',') = 4 → skill = 'SQL', remaining = 'Excel'INSTR('Excel', ',') = 0 → skill = 'Excel', remaining = ''WHERE remaining != '' → stops. Three rows produced for job_id=1.STRING_SPLIT(required_skills, ',') with CROSS APPLY does all of the above in one line. The recursive CTE is the portable cross-database equivalent.
The sales table records regional sales across two years. The same dataset is used to demonstrate all 7 aggregation techniques — from a single aggregate function on the whole table all the way to window functions and multi-step CTEs. Knowing which technique to reach for in an interview is what separates intermediate from expert SQL.
| sale_id | region | year | amount |
|---|---|---|---|
| 1 | North | 2023 | 1000 |
| 2 | South | 2023 | 1500 |
| 3 | North | 2023 | 800 |
| 4 | South | 2024 | 2000 |
| 5 | North | 2024 | 1200 |
| 6 | South | 2024 | 900 |
| sale_id | region | amount | region_total | grand_total |
|---|---|---|---|---|
| 1 | North | 1000 | 3000 | 7400 |
| 3 | North | 800 | 3000 | 7400 |
| 5 | North | 1200 | 3000 | 7400 |
| 2 | South | 1500 | 4400 | 7400 |
| 4 | South | 2000 | 4400 | 7400 |
| 6 | South | 900 | 4400 | 7400 |
-- ── METHOD 1: Simple Aggregate — whole-table summary ──────────────── -- SELECT COUNT(*) AS rows, SUM(amount) AS total, -- AVG(amount) AS avg_amt, MIN(amount) AS min_amt, MAX(amount) AS max_amt -- FROM sales; -- ↳ 1 row result: rows=6, total=7400, avg=1233, min=800, max=2000 -- ── METHOD 2: GROUP BY — collapse rows into groups ─────────────────── -- SELECT region, SUM(amount) AS total FROM sales GROUP BY region; -- ↳ North=3000, South=4400 -- ── METHOD 3: GROUP BY + HAVING — filter groups after aggregating ──── -- SELECT region, SUM(amount) AS total -- FROM sales GROUP BY region HAVING SUM(amount) > 3000; -- ↳ Only South (4400) survives the HAVING filter -- ── METHOD 4: GROUP BY multiple columns ────────────────────────────── -- SELECT region, year, SUM(amount) AS total -- FROM sales GROUP BY region, year ORDER BY region, year; -- ↳ 4 rows: North/2023=1800, North/2024=1200, South/2023=1500, South/2024=2900 -- ── METHOD 5: ROLLUP — subtotals + grand total ─────────────────────── -- Supported: SQL Server / PostgreSQL / MySQL 8+ (NOT SQLite) -- SELECT region, year, SUM(amount) FROM sales GROUP BY ROLLUP(region, year); -- ↳ Adds subtotal rows per region + a grand total NULL/NULL row -- CUBE gives all dimension combinations; GROUPING SETS lets you pick custom ones -- ── METHOD 6: Window Aggregate — per-row totals, no row collapse ───── SELECT sale_id, region, amount, SUM(amount) OVER(PARTITION BY region) AS region_total, SUM(amount) OVER() AS grand_total FROM sales ORDER BY region, sale_id; -- ↳ Every row kept, region_total repeated per partition, grand_total=7400 on all rows -- ── METHOD 7: CTE — multi-step aggregation with % share ────────────── -- WITH rt AS (SELECT region, SUM(amount) AS total FROM sales GROUP BY region) -- SELECT region, total, -- ROUND(total * 100.0 / SUM(total) OVER(), 1) AS pct_of_total -- FROM rt; -- ↳ North=3000 (40.5%), South=4400 (59.5%)
amount / region_total), you need a Window function — GROUP BY destroys the row before you can divide it.
The strings table holds employee full names. Write a query that returns, for each name: (1) how many times the letter 'k' appears (case-insensitive), and (2) how many words the name contains. Neither COUNT nor any native function solves this directly — the trick uses LENGTH and REPLACE together.
| id | full_name |
|---|---|
| 1 | Priya Sharma |
| 2 | Ram Kumar Verma |
| 3 | Akshay Kumar Ak k |
| 4 | Rahul |
| full_name | k_count | word_count |
|---|---|---|
| Akshay Kumar Ak k | 4 | 4 |
| Ram Kumar Verma | 1 | 3 |
| Priya Sharma | 0 | 2 |
| Rahul | 0 | 1 |
LENGTH(s) - LENGTH(REPLACE(LOWER(s), 'k', '')) — removing N occurrences of 'k' shrinks the string by exactly N.(LENGTH(s) - LENGTH(REPLACE(LOWER(s), 'kumar', ''))) / LENGTH('kumar') — divide by word length since each removal shrinks by 5 chars.LENGTH(TRIM(s)) - LENGTH(REPLACE(TRIM(s), ' ', '')) + 1 — spaces = words - 1.
SELECT full_name, -- Count occurrences of 'k' (case-insensitive) LENGTH(full_name) - LENGTH(REPLACE(LOWER(full_name), 'k', '')) AS k_count, -- Count words: spaces between words = words - 1 LENGTH(TRIM(full_name)) - LENGTH(REPLACE(TRIM(full_name), ' ', '')) + 1 AS word_count FROM strings ORDER BY k_count DESC, full_name; -- Extension: count a multi-char word (divide by pattern length) -- (LENGTH(full_name) - LENGTH(REPLACE(LOWER(full_name),'kumar',''))) -- / LENGTH('kumar') AS kumar_count
occurrences = (LENGTH(s) - LENGTH(REPLACE(LOWER(s), pattern, ''))) / LENGTH(pattern).TRIM strips leading/trailing spaces before counting so " Rahul " still returns 1, not 3.
A customer support team has a contacts table. Some customers haven't provided a phone number (stored as NULL). Write a query that shows every customer's name and their phone number — but displays 'N/A' for anyone with a missing number.
| id | name | phone |
|---|---|---|
| 1 | Riya | 9876543210 |
| 2 | Raj | NULL |
| 3 | Priya | 9123456789 |
| 4 | Ankit | NULL |
| name | phone |
|---|---|
| Riya | 9876543210 |
| Raj | N/A |
| Priya | 9123456789 |
| Ankit | N/A |
SELECT name, COALESCE(phone, 'N/A') AS phone FROM contacts ORDER BY id;
The marketing team wants to label every product with a pricing tier: 'Budget' (price < 500), 'Mid-Range' (500–2000), or 'Premium' (above 2000). Write a query that returns each product's name, price, and its computed tier, ordered by price.
| id | name | price |
|---|---|---|
| 1 | Pen | 15 |
| 2 | Bag | 850 |
| 3 | Laptop | 55000 |
| 4 | Mouse | 450 |
| 5 | Keyboard | 1800 |
| name | price | tier |
|---|---|---|
| Pen | 15 | Budget |
| Mouse | 450 | Budget |
| Bag | 850 | Mid-Range |
| Keyboard | 1800 | Mid-Range |
| Laptop | 55000 | Premium |
SELECT name, price, CASE WHEN price < 500 THEN 'Budget' WHEN price <= 2000 THEN 'Mid-Range' ELSE 'Premium' END AS tier FROM products ORDER BY price;
The logistics team wants to know how many unique cities the company currently ships to. Multiple orders to the same city should count as one. Write a query that returns a single number.
| id | customer | city | amount |
|---|---|---|---|
| 1 | Rahul | Delhi | 500 |
| 2 | Priya | Mumbai | 300 |
| 3 | Amit | Delhi | 700 |
| 4 | Sneha | Pune | 200 |
| 5 | Rohit | Mumbai | 400 |
| 6 | Kavya | Hyderabad | 600 |
| unique_cities |
|---|
| 4 |
SELECT COUNT(DISTINCT city) AS unique_cities FROM orders;
The regional sales manager oversees only three territories: North, South, and West. Write a query to show all sales records from those regions only, sorted by region and then by amount (highest first).
| id | region | amount |
|---|---|---|
| 1 | North | 5000 |
| 2 | East | 3000 |
| 3 | South | 7000 |
| 4 | West | 4500 |
| 5 | North | 2000 |
| 6 | East | 1500 |
| id | region | amount |
|---|---|---|
| 1 | North | 5000 |
| 5 | North | 2000 |
| 3 | South | 7000 |
| 4 | West | 4500 |
SELECT id, region, amount FROM sales WHERE region IN ('North', 'South', 'West') ORDER BY region, amount DESC;
A gaming company tracks every session in a game_activity table. The growth team needs each player's first login date to measure early engagement and D1 retention. Return one row per player.
| player_id | login_date | score |
|---|---|---|
| 1 | 2024-01-15 | 120 |
| 1 | 2024-02-03 | 85 |
| 2 | 2024-01-10 | 50 |
| 2 | 2024-03-15 | 180 |
| 3 | 2024-02-28 | 95 |
| player_id | first_login |
|---|---|
| 1 | 2024-01-15 |
| 2 | 2024-01-10 |
| 3 | 2024-02-28 |
SELECT player_id, MIN(login_date) AS first_login FROM game_activity GROUP BY player_id ORDER BY player_id;
HR wants to identify employees who did not receive a performance bonus this year so they can follow up. Return the names of all employees absent from the bonus table, sorted alphabetically.
| id | name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| 3 | Carol |
| 4 | Dave |
| employee_id | amount |
|---|---|
| 1 | 10000 |
| 3 | 15000 |
| name |
|---|
| Bob |
| Dave |
SELECT e.name FROM employees e LEFT JOIN bonus b ON e.id = b.employee_id WHERE b.employee_id IS NULL ORDER BY e.name;
A school only assigns a dedicated teacher to classes that have at least 5 enrolled students. Write a query to return the names of those classes, sorted alphabetically.
| student_id | class |
|---|---|
| 1 | Math |
| 2 | Math |
| 3 | Math |
| 4 | Math |
| 5 | Math |
| 6 | Math |
| 7 | Physics |
| 8 | Physics |
| 9 | Physics |
| 10 | Physics |
| 11 | Chemistry |
| 12 | Chemistry |
| 13 | Chemistry |
| 14 | Chemistry |
| 15 | Chemistry |
| class |
|---|
| Chemistry |
| Math |
SELECT class FROM enrollments GROUP BY class HAVING COUNT(student_id) >= 5 ORDER BY class;
The product catalogue team wants to feature items that are not discontinued AND have a customer rating above 3.5. Return each qualifying product's id, name, and rating — highest rated first.
| id | name | status | rating |
|---|---|---|---|
| 1 | Wireless Earbuds | active | 4.2 |
| 2 | Old Router | discontinued | 4.5 |
| 3 | Smart Watch | active | 4.8 |
| 4 | USB Hub | active | 3.2 |
| 5 | Webcam | active | 3.9 |
| id | name | rating |
|---|---|---|
| 3 | Smart Watch | 4.8 |
| 1 | Wireless Earbuds | 4.2 |
| 5 | Webcam | 3.9 |
SELECT id, name, rating FROM products WHERE status != 'discontinued' AND rating > 3.5 ORDER BY rating DESC;
A cinema booking system marks each seat as free (1) or taken (0). Couples need two consecutive free seats. Write a query that returns all seat IDs where the seat and its immediate neighbour are both free, ordered by seat ID.
| seat_id | free |
|---|---|
| 1 | 1 |
| 2 | 0 |
| 3 | 1 |
| 4 | 1 |
| 5 | 1 |
| 6 | 0 |
| 7 | 1 |
| 8 | 1 |
| seat_id |
|---|
| 3 |
| 4 |
| 5 |
| 7 |
| 8 |
SELECT DISTINCT a.seat_id FROM cinema a JOIN cinema b ON ABS(a.seat_id - b.seat_id) = 1 WHERE a.free = 1 AND b.free = 1 ORDER BY a.seat_id;
A food delivery platform records the order_date and the customer's preferred_date. The ops team needs to know what percentage of orders were delivered on the same day they were placed. Return one number rounded to 2 decimal places.
| id | order_date | preferred_date |
|---|---|---|
| 1 | 2024-01-15 | 2024-01-15 |
| 2 | 2024-01-16 | 2024-01-18 |
| 3 | 2024-01-17 | 2024-01-17 |
| 4 | 2024-01-18 | 2024-01-20 |
| 5 | 2024-01-19 | 2024-01-19 |
| immediate_pct |
|---|
| 60.00 |
SELECT ROUND( SUM(CASE WHEN order_date = preferred_date THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) AS immediate_pct FROM deliveries;
The finance team needs a monthly report showing total transaction count, how many were approved, and the total approved amount — all in a single row per month. Transactions have a state column: either 'approved' or 'declined'.
| id | trans_date | state | amount |
|---|---|---|---|
| 1 | 2024-01-10 | approved | 1000 |
| 2 | 2024-01-15 | declined | 500 |
| 3 | 2024-01-20 | approved | 800 |
| 4 | 2024-02-05 | approved | 1200 |
| 5 | 2024-02-10 | declined | 600 |
| month | trans_count | approved_count | approved_total |
|---|---|---|---|
| 2024-01 | 3 | 2 | 1800 |
| 2024-02 | 2 | 1 | 1200 |
SELECT strftime('%Y-%m', trans_date) AS month, COUNT(*) AS trans_count, SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count, SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total FROM transactions GROUP BY month ORDER BY month;
Identify power buyers — customers who have purchased every single product in the catalogue. Use HAVING to compare each customer's distinct product count against the total number of products.
| id | name |
|---|---|
| 1 | Widget |
| 2 | Gadget |
| 3 | Gizmo |
| customer_id | product_id |
|---|---|
| 101 | 1 |
| 101 | 2 |
| 101 | 3 |
| 102 | 1 |
| 102 | 2 |
| 103 | 1 |
| 103 | 2 |
| 103 | 3 |
| customer_id |
|---|
| 101 |
| 103 |
SELECT customer_id FROM purchases GROUP BY customer_id HAVING COUNT(DISTINCT product_id) = ( SELECT COUNT(*) FROM products ) ORDER BY customer_id;
A factory logs machine output readings sequentially by ID. Quality control wants to flag any reading value that appears in at least 3 consecutive log entries — this signals a sustained operational state that may need investigation. Return each such value once.
| id | num |
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 1 |
| 6 | 2 |
| 7 | 2 |
| 8 | 2 |
| ConsecutiveNums |
|---|
| 1 |
| 2 |
WITH cte AS ( SELECT num, LAG(num, 1) OVER (ORDER BY id) AS prev1, LAG(num, 2) OVER (ORDER BY id) AS prev2 FROM logs ) SELECT DISTINCT num AS ConsecutiveNums FROM cte WHERE num = prev1 AND num = prev2 ORDER BY num;
An organisational chart is stored as a self-referencing table. Classify each person: Root (no manager — the CEO), Inner (has a manager AND manages others), or Leaf (has a manager but no direct reports). Order by id.
| id | name | manager_id |
|---|---|---|
| 1 | CEO | NULL |
| 2 | Alice | 1 |
| 3 | Bob | 1 |
| 4 | Carol | 2 |
| 5 | Dave | 3 |
| 6 | Eve | 3 |
| id | name | node_type |
|---|---|---|
| 1 | CEO | Root |
| 2 | Alice | Inner |
| 3 | Bob | Inner |
| 4 | Carol | Leaf |
| 5 | Dave | Leaf |
| 6 | Eve | Leaf |
SELECT id, name, CASE WHEN manager_id IS NULL THEN 'Root' WHEN id IN ( SELECT DISTINCT manager_id FROM org WHERE manager_id IS NOT NULL ) THEN 'Inner' ELSE 'Leaf' END AS node_type FROM org ORDER BY id;
The retention team classifies customers as Active (ordered within the last 90 days from 2024-12-31) or Churned (no recent order). Write a query showing each customer's last order date and their status, newest first.
| id | customer_id | order_date |
|---|---|---|
| 1 | 101 | 2024-12-20 |
| 2 | 101 | 2024-03-10 |
| 3 | 102 | 2024-05-10 |
| 4 | 103 | 2024-11-30 |
| 5 | 104 | 2024-06-15 |
| customer_id | last_order | status |
|---|---|---|
| 101 | 2024-12-20 | Active |
| 103 | 2024-11-30 | Active |
| 104 | 2024-06-15 | Churned |
| 102 | 2024-05-10 | Churned |
WITH last_order AS ( SELECT customer_id, MAX(order_date) AS last_order FROM orders GROUP BY customer_id ) SELECT customer_id, last_order, CASE WHEN julianday('2024-12-31') - julianday(last_order) <= 90 THEN 'Active' ELSE 'Churned' END AS status FROM last_order ORDER BY last_order DESC;
The analytics team wants to understand revenue concentration. For each product, show its revenue, rank among all products, cumulative revenue up to that rank, and what percentage of total revenue that cumulative figure represents. Sort highest revenue first.
| product_name | revenue |
|---|---|
| Analytics Pro | 120000 |
| Data Studio | 85000 |
| Query Builder | 95000 |
| Report Gen | 60000 |
| Dashboard | 78000 |
| product_name | revenue | rev_rank | cumulative_pct |
|---|---|---|---|
| Analytics Pro | 120000 | 1 | 27.4 |
| Query Builder | 95000 | 2 | 49.1 |
| Data Studio | 85000 | 3 | 68.5 |
| Dashboard | 78000 | 4 | 86.3 |
| Report Gen | 60000 | 5 | 100.0 |
SELECT product_name, revenue, RANK() OVER (ORDER BY revenue DESC) AS rev_rank, ROUND( SUM(revenue) OVER ( ORDER BY revenue DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) * 100.0 / SUM(revenue) OVER (), 1) AS cumulative_pct FROM product_sales ORDER BY revenue DESC;
The CFO wants a single-row-per-product view showing Q1, Q2, Q3, Q4 revenue side by side — a classic pivot. Transform the long-format sales table into wide format using conditional aggregation.
| product_id | sale_date | revenue |
|---|---|---|
| 1 | 2024-02-15 | 3000 |
| 1 | 2024-05-20 | 4500 |
| 1 | 2024-08-10 | 2000 |
| 1 | 2024-11-25 | 5500 |
| 2 | 2024-01-30 | 1500 |
| 2 | 2024-07-15 | 2800 |
| 2 | 2024-10-05 | 3200 |
| product_id | Q1 | Q2 | Q3 | Q4 |
|---|---|---|---|---|
| 1 | 3000 | 4500 | 2000 | 5500 |
| 2 | 1500 | 0 | 2800 | 3200 |
SELECT product_id, SUM(CASE WHEN strftime('%m',sale_date) IN ('01','02','03') THEN revenue ELSE 0 END) AS Q1, SUM(CASE WHEN strftime('%m',sale_date) IN ('04','05','06') THEN revenue ELSE 0 END) AS Q2, SUM(CASE WHEN strftime('%m',sale_date) IN ('07','08','09') THEN revenue ELSE 0 END) AS Q3, SUM(CASE WHEN strftime('%m',sale_date) IN ('10','11','12') THEN revenue ELSE 0 END) AS Q4 FROM sales GROUP BY product_id ORDER BY product_id;
SQL has no built-in MEDIAN() function. Calculate the median salary across all employees using ROW_NUMBER — pick the middle row for odd counts, average the two middle rows for even counts.
| id | name | salary |
|---|---|---|
| 1 | A | 45000 |
| 2 | B | 60000 |
| 3 | C | 75000 |
| 4 | D | 85000 |
| 5 | E | 95000 |
| 6 | F | 110000 |
| median_salary |
|---|
| 80000.0 |
WITH ranked AS ( SELECT salary, ROW_NUMBER() OVER (ORDER BY salary) AS rn, COUNT(*) OVER () AS total FROM employees ) SELECT AVG(CAST(salary AS REAL)) AS median_salary FROM ranked WHERE rn IN ((total + 1) / 2, (total + 2) / 2);
The product team tracks user events through a purchase funnel: view → cart → purchase. Write a single query that returns how many users reached each stage, and the conversion rate between consecutive stages. Rates rounded to 1 decimal.
| user_id | stage |
|---|---|
| 1 | view |
| 2 | view |
| 3 | view |
| 4 | view |
| 5 | view |
| 6 | view |
| 7 | view |
| 8 | view |
| 1 | cart |
| 2 | cart |
| 3 | cart |
| 4 | cart |
| 5 | cart |
| 1 | purchase |
| 2 | purchase |
| 3 | purchase |
| viewed | added_to_cart | purchased | view_to_cart_pct | cart_to_purchase_pct |
|---|---|---|---|---|
| 8 | 5 | 3 | 62.5 | 60.0 |
SELECT SUM(CASE WHEN stage='view' THEN 1 ELSE 0 END) AS viewed, SUM(CASE WHEN stage='cart' THEN 1 ELSE 0 END) AS added_to_cart, SUM(CASE WHEN stage='purchase' THEN 1 ELSE 0 END) AS purchased, ROUND( SUM(CASE WHEN stage='cart' THEN 1.0 ELSE 0 END) * 100 / NULLIF(SUM(CASE WHEN stage='view' THEN 1 ELSE 0 END), 0), 1) AS view_to_cart_pct, ROUND( SUM(CASE WHEN stage='purchase' THEN 1.0 ELSE 0 END) * 100 / NULLIF(SUM(CASE WHEN stage='cart' THEN 1 ELSE 0 END), 0), 1) AS cart_to_purchase_pct FROM funnel_events;
The business review team needs a year-over-year comparison of revenue by product category. Return each category's 2023 revenue, 2024 revenue, and percentage growth — sorted by growth rate descending so the fastest-growing categories appear first.
| category | sale_year | revenue |
|---|---|---|
| Electronics | 2023 | 450000 |
| Electronics | 2024 | 520000 |
| Clothing | 2023 | 280000 |
| Clothing | 2024 | 310000 |
| Food | 2023 | 190000 |
| Food | 2024 | 175000 |
| category | rev_2023 | rev_2024 | yoy_growth_pct |
|---|---|---|---|
| Electronics | 450000 | 520000 | 15.56 |
| Clothing | 280000 | 310000 | 10.71 |
| Food | 190000 | 175000 | -7.89 |
SELECT category, SUM(CASE WHEN sale_year=2023 THEN revenue ELSE 0 END) AS rev_2023, SUM(CASE WHEN sale_year=2024 THEN revenue ELSE 0 END) AS rev_2024, ROUND( (SUM(CASE WHEN sale_year=2024 THEN revenue ELSE 0 END) - SUM(CASE WHEN sale_year=2023 THEN revenue ELSE 0 END)) * 100.0 / NULLIF(SUM(CASE WHEN sale_year=2023 THEN revenue ELSE 0 END), 0), 2) AS yoy_growth_pct FROM category_sales GROUP BY category ORDER BY yoy_growth_pct DESC;
Every essential clause, function, and pattern — click any card to copy the snippet.
HAVING COUNT(*) > 1 instead.CREATE TABLE employees ( id INT PRIMARY KEY, name VARCHAR(100) NOT NULL, dept VARCHAR(50), salary DECIMAL(10,2), hire_date DATE );
INSERT INTO users (name, email) VALUES ('Alice', 'a@x.com'); UPDATE users SET status = 'active' WHERE id = 5; DELETE FROM logs WHERE created_at < '2023-01-01';
SELECT * FROM products ORDER BY price DESC LIMIT 10; -- Skip first 20 rows (pagination) ORDER BY created_at DESC LIMIT 10 OFFSET 20;
SELECT DISTINCT department FROM employees; -- Replace NULL with default SELECT COALESCE(phone, 'N/A') FROM contacts; -- NULL check (never use = NULL) WHERE manager_id IS NULL;
SELECT name, salary FROM employees WHERE dept = 'Engineering' AND salary > 80000 ORDER BY salary DESC;
| name | dept | salary |
|---|---|---|
| Alice | Engineering | 90,000 |
| Bob | Marketing | 60,000 |
| Carol | Engineering | 85,000 |
| Dave | HR | 55,000 |
| name | salary |
|---|---|
| Alice | 90,000 |
| Carol | 85,000 |
SELECT dept, COUNT(*) AS headcount, AVG(salary) AS avg_sal FROM employees GROUP BY dept HAVING COUNT(*) > 1 ORDER BY avg_sal DESC;
| name | dept | salary |
|---|---|---|
| Alice | Eng | 90,000 |
| Bob | Eng | 80,000 |
| Carol | HR | 55,000 |
| Dave | Mktg | 70,000 |
| Eve | HR | 60,000 |
| dept | headcount | avg_sal |
|---|---|---|
| Eng | 2 | 85,000 |
| HR | 2 | 57,500 |
HAVING COUNT(*) > 1 is valid; WHERE COUNT(*) > 1 is an error.SELECT e.name, d.dept_name FROM employees e INNER JOIN departments d ON e.dept_id = d.id; -- Only rows where dept_id exists -- in BOTH tables are returned. -- Unmatched rows are dropped.
| name | dept_id |
|---|---|
| Alice | 1 |
| Bob | 2 |
| Carol | 99 |
| id | dept_name |
|---|---|
| 1 | Engineering |
| 2 | Marketing |
| name | dept_name |
|---|---|
| Alice | Engineering |
| Bob | Marketing |
SELECT e.name, o.order_id FROM employees e LEFT JOIN orders o ON e.id = o.emp_id; -- Tip: find employees with NO orders: SELECT e.name FROM employees e LEFT JOIN orders o ON e.id = o.emp_id WHERE o.order_id IS NULL;
| id | name |
|---|---|
| 1 | Alice |
| 2 | Bob |
| 3 | Carol |
| order_id | emp_id |
|---|---|
| 101 | 1 |
| 102 | 1 |
| name | order_id |
|---|---|
| Alice | 101 |
| Alice | 102 |
| Bob | NULL |
| Carol | NULL |
WHERE right_table.id IS NULL. This is faster than a NOT IN subquery on large tables.SELECT e.name, d.name, l.city FROM employees e JOIN departments d ON e.dept_id = d.id JOIN locations l ON d.loc_id = l.id WHERE l.city = 'Mumbai';
-- Find each employee & their manager SELECT e.name AS employee, m.name AS manager FROM employees e LEFT JOIN employees m ON e.manager_id = m.id;
SELECT COUNT(*), -- every row COUNT(salary), -- non-NULL only SUM(salary), AVG(salary), MIN(salary), MAX(salary) FROM employees;
| name | salary |
|---|---|
| Alice | 90,000 |
| Bob | 70,000 |
| Carol | 80,000 |
| Dave | NULL |
| function | result |
|---|---|
| COUNT(*) | 4 |
| COUNT(salary) | 3 (NULL skipped) |
| SUM(salary) | 240,000 |
| AVG(salary) | 80,000 |
| MIN / MAX | 70,000 / 90,000 |
COUNT(*) counts all rows including NULLs. COUNT(salary) skips rows where salary is NULL. A very common interview trick question.SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_sal FROM employees GROUP BY department ORDER BY avg_sal DESC;
SELECT dept, COUNT(*) AS cnt FROM employees GROUP BY dept HAVING COUNT(*) > 5; -- HAVING filters after grouping -- WHERE filters before grouping
-- Subtotals + grand total SELECT dept, job, SUM(salary) FROM employees GROUP BY ROLLUP(dept, job); -- All combinations GROUP BY CUBE(dept, job);
UPPER(name) -- ALICE LOWER(name) -- alice LENGTH(name) -- 5 TRIM(name) -- strip spaces SUBSTRING(name,1,3) -- Ali CONCAT(first,' ',last) REPLACE(str,'a','@')
NOW() -- current datetime CURDATE() -- current date YEAR(date_col) MONTH(date_col) DAY(date_col) DATEDIFF(d1, d2) -- days between DATE_ADD(d, INTERVAL 7 DAY)
SELECT name, CASE WHEN salary >= 100000 THEN 'Senior' WHEN salary >= 60000 THEN 'Mid' ELSE 'Junior' END AS level FROM employees;
SELECT CAST('2024-01-15' AS DATE); SELECT CAST(42.7 AS INT); -- 42 SELECT CAST(price AS CHAR); -- MySQL: CONVERT(val, type) SELECT CONVERT(salary, CHAR);
-- Employees earning above avg SELECT name, salary FROM employees WHERE salary > ( SELECT AVG(salary) FROM employees );
SELECT * FROM customers WHERE id IN ( SELECT customer_id FROM orders ); WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.cust_id = c.id );
WITH dept_avg AS ( SELECT dept, AVG(salary) AS avg_sal FROM employees GROUP BY dept ) SELECT e.name, d.avg_sal FROM employees e JOIN dept_avg d ON e.dept = d.dept;
-- Top earner per dept SELECT name, dept, salary FROM employees e1 WHERE salary = ( SELECT MAX(salary) FROM employees e2 WHERE e2.dept = e1.dept );
SELECT name, dept, salary, ROW_NUMBER() OVER ( PARTITION BY dept ORDER BY salary DESC ) AS rn, RANK() OVER ( ORDER BY salary DESC ) AS rnk FROM employees; -- OVER() = window definition -- PARTITION BY = reset per group -- ORDER BY = ranking direction
| name | dept | salary |
|---|---|---|
| Alice | Eng | 90k |
| Bob | Eng | 80k |
| Carol | HR | 70k |
| Dave | HR | 70k |
| name | dept | rn | rnk |
|---|---|---|---|
| Alice | Eng | 1 | 1 |
| Bob | Eng | 2 | 2 |
| Carol | HR | 1 | 3 |
| Dave | HR | 2 | 3 |
WHERE rn = 1 to get the top earner per department.-- Compare row with prev/next SELECT month, revenue, LAG(revenue) OVER ( ORDER BY month ) AS prev_month, revenue - LAG(revenue) OVER ( ORDER BY month ) AS growth FROM sales;
SELECT order_date, amount, SUM(amount) OVER ( ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_total FROM orders;
-- Top 3 per department WITH ranked AS ( SELECT *, DENSE_RANK() OVER ( PARTITION BY dept ORDER BY salary DESC ) AS dr FROM employees ) SELECT * FROM ranked WHERE dr <= 3;
Top questions asked in Data Analyst, Data Engineer, and BI Developer interviews. Click any question to reveal the answer.
COUNT(), SUM(), or any aggregate in your filter — use HAVING. Otherwise use WHERE.WHERE salary > 50000 ✅ | HAVING COUNT(*) > 3 ✅ | WHERE COUNT(*) > 3 ❌ (error)NULL = NULL → returns NULL (not TRUE). This is a very common interview trap.IS NULL or IS NOT NULL.NULL + 5 = NULL. Any arithmetic with NULL gives NULL. Use COALESCE(col, 0) to substitute a default.
ROW_NUMBER: 1, 2, 3, 4 (always unique, no gaps)RANK: 1, 1, 3, 4 (ties get same rank, but skips next number)DENSE_RANK: 1, 1, 2, 3 (ties get same rank, no gaps)DENSE_RANK for "top N" problems to avoid missing ranks.
orders.customer_id is a Foreign Key referencing customers.id (Primary Key).
WITH clause) are often cleaner than nested subqueries for readability.
UNION ALL by default (faster). Use UNION only when you specifically need to remove duplicates.function() OVER (PARTITION BY col ORDER BY col)RANK())SUM() OVER)LAG())DENSE_RANK + WHERE dr <= N)
| Feature | WHERE | HAVING |
|---|---|---|
| Filters | Individual rows | Groups (after GROUP BY) |
| When it runs | Before GROUP BY (step ③) | After GROUP BY (step ⑤) |
| Can use aggregate functions? | ❌ No | ✅ Yes |
| Example | WHERE salary > 50000 | HAVING COUNT(*) > 5 |
| Works without GROUP BY? | ✅ Yes | Rarely useful without it |
| Feature | DELETE | TRUNCATE | DROP |
|---|---|---|---|
| Removes | Specific rows | All rows | Entire table |
| WHERE clause | ✅ Yes | ❌ No | ❌ No |
| Can rollback? | ✅ Yes | Depends on DB | ❌ No |
| Resets auto-increment? | ❌ No | ✅ Yes | Table is gone |
| Speed | Slower | Fast | Fast |
| Triggers fire? | ✅ Yes | ❌ No | ❌ No |
| Score | ROW_NUMBER() | RANK() | DENSE_RANK() |
|---|---|---|---|
| 90 | 1 | 1 | 1 |
| 90 | 2 | 1 | 1 |
| 85 | 3 | 3 (gap!) | 2 (no gap) |
| Ties get same rank? | ❌ Always unique | ✅ Yes | ✅ Yes |
| Skips numbers after tie? | — | ✅ Yes | ❌ Never |
| Type | Returns | NULLs in result? | Use when... |
|---|---|---|---|
| INNER JOIN | Matching rows only | ❌ No | You only want records that exist in both tables |
| LEFT JOIN | All left + matched right | Right side may be NULL | You want all customers even if they have no orders |
| RIGHT JOIN | Matched left + all right | Left side may be NULL | Rarely used (just swap tables and use LEFT JOIN) |
| FULL OUTER | All rows from both | Both sides may be NULL | Find mismatches — rows missing in either table |
| Feature | UNION | UNION ALL |
|---|---|---|
| Removes duplicates? | ✅ Yes | ❌ No, keeps all rows |
| Performance | Slower (sorts to find dupes) | Faster |
| Use when | You need unique combined results | You know there are no dupes OR you want all rows |
| Default | This is what "UNION" means alone | Must specify ALL explicitly |
Common Table Expressions · Modern SQL · From Basics to Advanced
-- Don't nest. Define first. WITH HighSales AS ( SELECT * FROM orders WHERE amount > 1000 ) SELECT * FROM HighSales;
WITH dept_avg AS ( SELECT dept, AVG(salary) AS avg_sal FROM employees GROUP BY dept ), above_avg AS ( SELECT e.name, e.dept, e.salary FROM employees e JOIN dept_avg d ON e.dept = d.dept WHERE e.salary > d.avg_sal ) SELECT * FROM above_avg;
WITH RECURSIVE nums AS ( SELECT 1 AS n -- anchor: start UNION ALL SELECT n + 1 -- recursive: add 1 FROM nums WHERE n < 10 -- stop condition ) SELECT n FROM nums;
WITH RECURSIVE org AS ( SELECT id, name, manager_id, 1 AS lvl FROM employees WHERE manager_id IS NULL -- anchor: CEO UNION ALL SELECT e.id, e.name, e.manager_id, o.lvl + 1 FROM employees e JOIN org o ON e.manager_id = o.id ) SELECT lvl, name FROM org ORDER BY lvl, name;
path column to build breadcrumb strings.WITH ranked AS ( SELECT salary, DENSE_RANK() OVER ( ORDER BY salary DESC ) AS rnk FROM employees ) SELECT salary FROM ranked WHERE rnk = 2; -- change N here
rnk = N for any Nth value.WITH dupes AS ( SELECT id, ROW_NUMBER() OVER ( PARTITION BY email ORDER BY created_at DESC ) AS rn FROM users ) DELETE FROM users WHERE id IN ( SELECT id FROM dupes WHERE rn > 1 );
WITH daily AS ( SELECT order_date, SUM(amount) AS rev FROM orders GROUP BY order_date ) SELECT order_date, rev, SUM(rev) OVER ( ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_total FROM daily;
WITH daily AS ( SELECT sale_date, SUM(revenue) AS rev FROM sales GROUP BY sale_date ) SELECT sale_date, rev, ROUND(AVG(rev) OVER ( ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ), 2) AS avg_7d FROM daily;
WITH numbered AS ( SELECT user_id, login_date, ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY login_date ) AS rn FROM logins ), islands AS ( SELECT user_id, DATE_SUB(login_date, INTERVAL rn DAY) AS grp FROM numbered ) SELECT user_id, MIN(login_date) AS start, MAX(login_date) AS end, COUNT(*) AS days FROM islands GROUP BY user_id, grp;
WITH RECURSIVE dates AS ( SELECT '2024-01-01' AS d UNION ALL SELECT DATE_ADD(d, INTERVAL 1 DAY) FROM dates WHERE d < '2024-01-31' ) SELECT d FROM dates;
WITH yearly AS ( SELECT YEAR(sale_date) AS yr, SUM(revenue) AS rev FROM sales GROUP BY yr ) SELECT yr, rev, LAG(rev) OVER (ORDER BY yr) AS prev_rev, ROUND( 100.0 * (rev - LAG(rev) OVER (ORDER BY yr)) / NULLIF(LAG(rev) OVER (ORDER BY yr), 0), 2) AS yoy_pct FROM yearly;
WITH monthly AS ( SELECT DATE_FORMAT(event_date, '%Y-%m') AS month, user_id FROM events ) SELECT month, COUNT(DISTINCT user_id) AS mau FROM monthly GROUP BY month ORDER BY month;
WITH RECURSIVE seq AS ( SELECT MIN(id) AS n, MAX(id) AS mx FROM orders UNION ALL SELECT n + 1, mx FROM seq WHERE n < mx ) SELECT n AS missing_id FROM seq WHERE n NOT IN ( SELECT id FROM orders );
WITH RECURSIVE split AS ( SELECT id, TRIM(SUBSTRING_INDEX(tags, ',', 1)) AS tag, IF(LOCATE(',', tags) > 0, SUBSTRING(tags, LOCATE(',', tags) + 1), NULL) AS rest FROM products UNION ALL SELECT id, TRIM(SUBSTRING_INDEX(rest, ',', 1)), IF(LOCATE(',', rest) > 0, SUBSTRING(rest, LOCATE(',', rest) + 1), NULL) FROM split WHERE rest IS NOT NULL ) SELECT id, tag FROM split ORDER BY id;
STRING_TO_TABLE() instead.WITH baskets AS ( SELECT a.product_id AS p1, b.product_id AS p2, COUNT(*) AS freq FROM order_items a JOIN order_items b ON a.order_id = b.order_id AND a.product_id < b.product_id GROUP BY a.product_id, b.product_id ) SELECT p1, p2, freq FROM baskets ORDER BY freq DESC LIMIT 10;
p1 < p2 prevents counting (A,B) and (B,A) separately.WITH raw AS ( SELECT product, region, revenue FROM sales ) SELECT product, SUM(CASE WHEN region='North' THEN revenue END) AS north, SUM(CASE WHEN region='South' THEN revenue END) AS south, SUM(CASE WHEN region='East' THEN revenue END) AS east, SUM(CASE WHEN region='West' THEN revenue END) AS west FROM raw GROUP BY product;
CROSSTAB() function.WITH base AS ( SELECT match_date, result, ROW_NUMBER() OVER (ORDER BY match_date) AS rn FROM matches ), streaks AS ( SELECT match_date, result, rn - ROW_NUMBER() OVER ( PARTITION BY result ORDER BY match_date ) AS grp FROM base ) SELECT MIN(match_date) AS start, MAX(match_date) AS end, COUNT(*) AS streak_len FROM streaks WHERE result='Win' GROUP BY grp HAVING COUNT(*) >= 3;
WITH top_performers AS ( SELECT id FROM employees WHERE dept = 'Engineering' AND performance >= 90 ) UPDATE employees SET salary = salary * 1.10 WHERE id IN ( SELECT id FROM top_performers );
WITH RECURSIVE paths AS ( SELECT id, name, CAST(name AS CHAR(1000)) AS path FROM categories WHERE parent_id IS NULL UNION ALL SELECT c.id, c.name, CONCAT(p.path, ' > ', c.name) FROM categories c JOIN paths p ON c.parent_id = p.id ) SELECT id, name, path FROM paths ORDER BY path;
WITH vip_candidates AS ( SELECT customer_id, SUM(amount) AS total_spent FROM orders WHERE order_date >= '2024-01-01' GROUP BY customer_id HAVING total_spent > 5000 ) INSERT INTO vip_customers (customer_id, total_spent, tagged_at) SELECT customer_id, total_spent, NOW() FROM vip_candidates;
Visual comparisons · Annotated patterns · Production-ready snippets
SUM()vsSUM OVER()
SELECT job, SUM(sales) FROM sales GROUP BY job;
SELECT job, sales, SUM(sales) OVER ( PARTITION BY job ) AS total_per_job FROM sales;
| job | SUM(sales) |
|---|---|
| Engineer | 50 |
| Sales | 70 |
| Manager | 90 |
| job | sales | total_per_job |
|---|---|---|
| Engineer | 20 | 50 |
| Engineer | 30 | 50 |
| Sales | 30 | 70 |
| Sales | 40 | 70 |
| Manager | 90 | 90 |
ROW_NUMBER()vsRANK()vsDENSE_RANK()
ROW_NUMBER() OVER ( ORDER BY score DESC )
RANK() OVER ( ORDER BY score DESC )
DENSE_RANK() OVER ( ORDER BY score DESC )
| score | ROW_NUMBER() | RANK() | DENSE_RANK() |
|---|---|---|---|
| 90 | 1 | 1 | 1 |
| 90 | 2 | 1 | 1 |
| 85 | 3 | 3 ← gap! | 2 |
| 80 | 4 | 4 | 3 |
Production-ready SQL patterns — copy, adapt, and ship
WITH ranked AS ( SELECT *, DENSE_RANK() OVER ( PARTITION BY dept ORDER BY salary DESC ) AS rnk FROM employees ) SELECT * FROM ranked WHERE rnk <= 3; -- top 3 per dept
WITH monthly AS ( SELECT DATE_FORMAT(order_date, '%Y-%m') AS month, SUM(revenue) AS rev FROM orders GROUP BY month ) SELECT month, rev, LAG(rev) OVER (ORDER BY month) AS prev_rev, ROUND(100.0 * (rev - LAG(rev) OVER (ORDER BY month)) / NULLIF(LAG(rev) OVER (ORDER BY month),0), 1) AS mom_pct FROM monthly;
-- Most recent order per customer WITH latest AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY order_date DESC ) AS rn FROM orders ) SELECT * FROM latest WHERE rn = 1;
SELECT txn_date, description, amount, SUM( CASE WHEN type = 'credit' THEN amount WHEN type = 'debit' THEN -amount END ) OVER ( ORDER BY txn_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS balance FROM transactions;
SELECT dept, name, salary, ROUND( 100.0 * salary / SUM(salary) OVER (PARTITION BY dept), 1) AS pct_of_dept, ROUND( 100.0 * salary / SUM(salary) OVER (), 1) AS pct_of_total FROM employees;
-- Gap > 30 min = new session WITH gaps AS ( SELECT user_id, event_time, CASE WHEN TIMESTAMPDIFF(MINUTE, LAG(event_time) OVER ( PARTITION BY user_id ORDER BY event_time), event_time) > 30 THEN 1 ELSE 0 END AS new_sess FROM events ) SELECT user_id, event_time, SUM(new_sess) OVER ( PARTITION BY user_id ORDER BY event_time ) + 1 AS session_id FROM gaps;
Read the table + SQL query, then pick the correct result. Score points and earn XP for each correct answer.
Choose a difficulty, then pick the correct SQL output before the timer runs out. Score points and earn XP for each correct answer.
60 seconds. As many correct answers as possible. Build streaks for bonus points.
Answer as many SQL questions as you can in 60 seconds. Build streaks for combo multipliers.
From SQL basics to advanced analytics — live online sessions every Sat & Sun, real-world capstone project, and interview preparation built in.
Four resources, one payment — delivered on WhatsApp in minutes. No app, no login, no subscription ever.
One-time payment · No subscription · Delivered on WhatsApp in minutes
Get the SQL Pack on WhatsApp