SQL for Data Analysis
📌 Introduction
SQL (Structured Query Language) is one of the most powerful tools for Data Analysis. It enables Business Analysts (BA) to query large datasets directly from relational databases to uncover insights, trends, and support decision-making.
🚀 Key SQL Concepts for Data Analysis
SQL Concept
Purpose in Data Analysis
SELECT & Filtering (WHERE)
Fetch specific data and apply filters.
Aggregate Functions (SUM, AVG, COUNT)
Summarize data (totals, averages, counts).
GROUP BY & HAVING
Group and filter data at an aggregate level.
JOINs (INNER, LEFT, RIGHT, FULL)
Combine data from multiple tables.
Subqueries & CTEs
Break down complex queries for better understanding.
Window Functions
Perform row-level analytics like ranking, running totals.
CASE Statements
Apply conditional logic to data.
ORDER BY & LIMIT
Sort and limit data for better visualization.
🔑 1. Selecting and Filtering Data
Basic Query:
sqlCopyEditSELECT name, salary, department_id
FROM employees
WHERE salary > 50000;
Applying Multiple Conditions:
sqlCopyEditSELECT name, salary
FROM employees
WHERE salary > 50000 AND department_id = 2;
🔑 2. Aggregating Data (SUM, AVG, COUNT, MIN, MAX)
sqlCopyEditSELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;
Total salary per department:
sqlCopyEditSELECT department_id, SUM(salary) AS total_salary
FROM employees
GROUP BY department_id;
🔑 3. Filtering Aggregated Data (HAVING)
sqlCopyEditSELECT department_id, COUNT(*) AS employee_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 5;
🔑 4. Joining Tables
sqlCopyEditSELECT e.name, e.salary, d.department_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.id;
LEFT JOIN Example:
sqlCopyEditSELECT e.name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department_id = d.id;
🔑 5. Subqueries for Dynamic Filtering
sqlCopyEditSELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
🔑 6. Common Table Expressions (CTEs) for Better Query Structure
sqlCopyEditWITH HighEarners AS (
SELECT name, salary FROM employees WHERE salary > 60000
)
SELECT * FROM HighEarners;
🔑 7. Window Functions for Advanced Analysis
Ranking Employees within Departments:
sqlCopyEditSELECT name, department_id,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS SalaryRank
FROM employees;
Running Total of Salaries:
sqlCopyEditSELECT name, salary,
SUM(salary) OVER (ORDER BY salary) AS RunningTotal
FROM employees;
🔑 8. Conditional Logic (CASE Statements)
sqlCopyEditSELECT name, salary,
CASE
WHEN salary > 70000 THEN 'High'
WHEN salary BETWEEN 50000 AND 70000 THEN 'Medium'
ELSE 'Low'
END AS SalaryCategory
FROM employees;
🔑 9. Sorting and Limiting Results
Sorting:
sqlCopyEditSELECT name, salary
FROM employees
ORDER BY salary DESC;
Limiting Results (Top 5 Earners):
sqlCopyEditSELECT name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;
✅ Sample Data Analysis Scenarios using SQL
Business Question
SQL Feature Used
What is the average salary in each department?
GROUP BY
+ AVG()
Who are the top 5 highest-paid employees?
ORDER BY DESC
+ LIMIT
How many employees are in each department?
GROUP BY
+ COUNT()
What departments do not have any employees?
LEFT JOIN
+ NULL
check
Which employees earn above the company average salary?
Subquery
with AVG()
Rank employees by salary within each department
RANK()
with OVER(PARTITION BY)
Find total salary cost and breakdown by departments
SUM()
+ GROUP BY
Segment employees into salary brackets
CASE
statements
🎯 Why SQL for Data Analysis is Crucial for Business Analysts
Benefit
Explanation
Direct Access to Data
Quickly pull raw data without relying on developers.
Data-Driven Decision Making
Support business decisions with accurate and timely data.
Identify Trends and Patterns
Analyze historical and current data to find business insights.
Build Reports and Dashboards
Prepare datasets for tools like Power BI, Tableau, Excel.
Validate Requirements and Results
Cross-check data during system analysis and UAT testing.
📊 Example: Analyzing Employee Dataset
sqlCopyEdit-- Find top 3 departments by total salary cost
SELECT d.department_name, SUM(e.salary) AS total_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.department_name
ORDER BY total_salary DESC
LIMIT 3;
💡 Pro Tips for Business Analysts
Start with Clear Questions: Know what you need to analyze before writing SQL.
Break Down Complex Queries: Use CTEs or subqueries for readability.
Document Queries: Comment and explain your queries for future reference.
Optimize for Performance: Avoid unnecessary columns, use indexes if needed.
Collaborate with DBAs and Developers: For complex joins or optimizations.
✅ Conclusion
SQL for Data Analysis is a must-have skill for any Business Analyst working with data-centric environments. By leveraging advanced SQL techniques, you can transform raw data into actionable insights, directly supporting business goals.
Last updated