Query Any CSV File with SQL in Your Browser
Turn spreadsheets into queryable databases instantly with browser-based SQLite, covering joins, aggregations, window functions, and real-world data analysis workflows
There is a version of data analysis that most people know: open a spreadsheet, use VLOOKUP, build a pivot table, write increasingly complex nested IF formulas, scroll horizontally through columns that should probably be rows, and eventually produce something that answers the question. It works. It also scales poorly, breaks when the data changes shape, and requires understanding a different set of syntax rules for every operation.
There is a better version. SQL is the language that underlies every relational database in the world. It is how analysts at every scale, from a solo freelancer to an enterprise data team, write queries that filter, join, aggregate, rank, and transform data. The syntax is declarative: you describe what you want, and the database figures out how to produce it. One language handles filtering, grouping, joining, sorting, ranking, and computing running totals. The same query that worked on a hundred rows works on a hundred million.
The barrier to SQL has traditionally been: you need a database. Setting up a PostgreSQL or MySQL database requires administration, server configuration, and data loading steps that are real overhead for anyone just trying to answer a question about a CSV file.
ReportMedic’s Query CSV with SQL tool removes that barrier entirely. Load a CSV file. Write SQL. See results. The tool runs SQLite in the browser, treating your CSV as a database table. Your data never leaves your device, no server processes your queries, and the full power of SQL including joins, window functions, common table expressions, and aggregations is available immediately.
This guide covers everything: the SQL fundamentals you need to query CSV files effectively, the specific syntax and features of SQLite, detailed walkthroughs of the ReportMedic tool, and real-world query recipes across sales, HR, marketing, finance, e-commerce, and education data. Whether you are completely new to SQL or experienced with it and looking for a fast browser-based environment, this guide provides the material to use SQL confidently on any CSV file.
Why SQL Beats Spreadsheet Formulas for Data Analysis
Before diving into SQL syntax, understanding the specific problems it solves helps calibrate when to reach for SQL versus a spreadsheet.
The VLOOKUP Problem
VLOOKUP and its successors (INDEX/MATCH, XLOOKUP) are lookup functions: given a value, find the corresponding row in another table and return a value from a specified column. They work for simple one-to-many lookups and fail or become extremely complex for:
Multiple lookup conditions (match on two or more columns simultaneously)
Many-to-many relationships (every row in table A joined to every matching row in table B)
Aggregations after joining (sum of all matching values, count of matches)
Chains of lookups (join A to B, then join that result to C)
SQL handles all of these with the JOIN syntax and WHERE conditions, expressed clearly and concisely.
The Pivot Table Scaling Problem
Pivot tables are powerful interactive tools for exploring aggregated data. They also have limitations:
They are bound to the spreadsheet file, not portable as reusable logic
The underlying SQL equivalent is explicit, repeatable, and can be applied to different datasets
Complex pivot configurations become confusing to maintain
Adding a new dimension requires rebuilding the pivot setup
A SQL GROUP BY query that produces the same aggregation is a text string that can be saved, shared, versioned, and reused. It runs against any dataset with matching columns.
The Formula Complexity Ceiling
As analytical questions become more complex, spreadsheet formulas become harder to write, harder to read, harder to debug, and harder to maintain. A moderately complex SQL query is often more readable than an equivalent spreadsheet formula because SQL is designed for expressing data transformations in declarative prose.
Consider this question: “For each sales region, what is the month-over-month percentage change in revenue, and which months had a higher revenue than the region’s six-month average?”
In SQL, this is straightforward with a window function and a subquery. In a spreadsheet, this requires careful column management, OFFSET formulas, and several auxiliary columns.
What SQL Enables That Spreadsheets Cannot
Multi-table joins: SQL joins multiple tables together based on matching key columns. Spreadsheets require intermediate steps with VLOOKUP chains to approximate this.
Window functions: Window functions compute values across a set of related rows (running totals, rankings, lag/lead comparisons) without collapsing the rows the way GROUP BY does. There is no spreadsheet equivalent that is as clean.
Common Table Expressions (CTEs): CTEs let you name intermediate query results and use them in subsequent steps, making complex multi-step analyses readable and maintainable.
Set operations: UNION, INTERSECT, and EXCEPT combine query results in ways that spreadsheet formulas cannot express directly.
Self-joins: Joining a table to itself to find related rows (employees and their managers in the same table, transactions within a time window of each other) is clean in SQL and painful in spreadsheets.
SQL Fundamentals for CSV Querying
The following covers the SQL constructs you need for effective CSV data analysis. Examples are written for SQLite syntax, which is what the browser-based tool uses.
SELECT and FROM: The Foundation
Every SQL query starts with SELECT and FROM:
SELECT column1, column2, column3
FROM table_name;
To select all columns:
SELECT *
FROM sales_data;
When you load a CSV file in the ReportMedic tool, the filename (without extension) becomes the table name. A file named sales_data.csv becomes the sales_data table. Column names come from the CSV header row.
WHERE: Filtering Rows
WHERE filters rows based on conditions:
SELECT customer_name, amount, region
FROM sales
WHERE region = 'North'
AND amount > 1000;
Comparison operators: =, <> (not equal), <, >, <=, >=
Pattern matching with LIKE:
SELECT * FROM customers WHERE email LIKE '%@gmail.com';
SELECT * FROM products WHERE name LIKE 'Widget%';
Membership with IN:
SELECT * FROM orders WHERE status IN ('pending', 'processing');
SELECT * FROM employees WHERE department IN ('Engineering', 'Product', 'Design');
Range with BETWEEN:
SELECT * FROM transactions WHERE amount BETWEEN 100 AND 500;
SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';
Null handling:
SELECT * FROM customers WHERE phone IS NULL;
SELECT * FROM records WHERE notes IS NOT NULL;
ORDER BY: Sorting Results
SELECT name, salary
FROM employees
ORDER BY salary DESC; -- highest first
SELECT last_name, first_name, hire_date
FROM employees
ORDER BY last_name ASC, first_name ASC; -- alphabetical by name
-- Order by column position (2 = second column)
SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department
ORDER BY 2 DESC;
LIMIT and OFFSET: Pagination
-- First 10 rows
SELECT * FROM large_table LIMIT 10;
-- Rows 11-20 (page 2)
SELECT * FROM large_table LIMIT 10 OFFSET 10;
-- Top 5 highest-value customers
SELECT customer_name, SUM(amount) as total
FROM orders
GROUP BY customer_name
ORDER BY total DESC
LIMIT 5;
GROUP BY and Aggregate Functions
GROUP BY collapses rows into groups and applies aggregate functions to each group:
SELECT
department,
COUNT(*) as headcount,
AVG(salary) as avg_salary,
MIN(salary) as min_salary,
MAX(salary) as max_salary,
SUM(salary) as total_payroll
FROM employees
GROUP BY department;
Common aggregate functions:
COUNT(*)- count all rows in groupCOUNT(column)- count non-null values in columnCOUNT(DISTINCT column)- count unique non-null valuesSUM(column)- sum of numeric columnAVG(column)- average of numeric columnMIN(column)/MAX(column)- minimum / maximum valueGROUP_CONCAT(column)- concatenate values into a string (SQLite-specific)
HAVING: Filtering Groups
HAVING filters groups after aggregation (WHERE filters rows before aggregation):
-- Departments with more than 5 employees
SELECT department, COUNT(*) as headcount
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
-- Customers who have placed more than 3 orders and spent over $500 total
SELECT customer_id, COUNT(*) as order_count, SUM(amount) as total_spent
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 3 AND SUM(amount) > 500;
JOIN: Combining Multiple Tables
When you load two CSV files into the SQL tool, you can join them together using shared key columns.
INNER JOIN - returns rows that match in both tables:
SELECT
orders.order_id,
customers.customer_name,
customers.email,
orders.amount,
orders.status
FROM orders
INNER JOIN customers ON orders.customer_id = customers.id;
LEFT JOIN - returns all rows from the left table, with matched rows from the right (NULL where no match):
-- All customers, even those with no orders
SELECT
customers.customer_name,
COUNT(orders.order_id) as order_count,
COALESCE(SUM(orders.amount), 0) as total_spent
FROM customers
LEFT JOIN orders ON customers.id = orders.customer_id
GROUP BY customers.id, customers.customer_name;
Self-join - joining a table to itself:
-- Find employees and their managers (both in the same employees table)
SELECT
e.name as employee_name,
m.name as manager_name
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id
ORDER BY m.name, e.name;
Subqueries
Subqueries nest one query inside another:
-- Customers who have placed above-average orders
SELECT customer_name, amount
FROM orders
WHERE amount > (SELECT AVG(amount) FROM orders);
-- Products that have never been ordered
SELECT product_name
FROM products
WHERE id NOT IN (SELECT DISTINCT product_id FROM order_items);
-- Department where salary is highest
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC
LIMIT 1;
CASE WHEN: Conditional Logic
CASE WHEN applies conditional logic within a query:
SELECT
employee_name,
salary,
CASE
WHEN salary >= 100000 THEN 'Senior'
WHEN salary >= 70000 THEN 'Mid-level'
WHEN salary >= 50000 THEN 'Junior'
ELSE 'Entry level'
END as salary_band
FROM employees;
-- Categorize orders by size
SELECT
order_id,
amount,
CASE
WHEN amount >= 1000 THEN 'Large'
WHEN amount >= 500 THEN 'Medium'
WHEN amount >= 100 THEN 'Small'
ELSE 'Micro'
END as order_size
FROM orders;
Common Table Expressions (CTEs)
CTEs give names to intermediate query results, making complex queries readable:
-- Without CTE (harder to read)
SELECT department, avg_salary
FROM (
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
) dept_averages
WHERE avg_salary > 80000;
-- With CTE (much cleaner)
WITH dept_averages AS (
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
)
SELECT department, avg_salary
FROM dept_averages
WHERE avg_salary > 80000;
Multiple CTEs can chain together:
WITH
monthly_revenue AS (
SELECT
strftime('%Y-%m', order_date) as month,
SUM(amount) as revenue
FROM orders
GROUP BY strftime('%Y-%m', order_date)
),
ranked_months AS (
SELECT
month,
revenue,
RANK() OVER (ORDER BY revenue DESC) as revenue_rank
FROM monthly_revenue
)
SELECT month, revenue, revenue_rank
FROM ranked_months
WHERE revenue_rank <= 3;
Window Functions: SQL’s Most Powerful Feature
Window functions compute values across a set of rows related to the current row, without collapsing those rows. They are the most analytically powerful feature in SQL and have no clean spreadsheet equivalent.
The Syntax Structure
function_name(column) OVER (
[PARTITION BY partition_columns]
[ORDER BY sort_columns]
[ROWS/RANGE frame_specification]
)
ROW_NUMBER, RANK, and DENSE_RANK
-- Rank employees by salary within each department
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as row_num,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dense_rank
FROM employees;
The difference between RANK and DENSE_RANK: if two rows tie for rank 2, RANK gives the next row rank 4, while DENSE_RANK gives it rank 3.
Practical application: Finding the top N in each category:
-- Top 3 products by revenue in each category
WITH ranked_products AS (
SELECT
category,
product_name,
SUM(revenue) as total_revenue,
RANK() OVER (PARTITION BY category ORDER BY SUM(revenue) DESC) as revenue_rank
FROM product_sales
GROUP BY category, product_name
)
SELECT category, product_name, total_revenue
FROM ranked_products
WHERE revenue_rank <= 3;
LAG and LEAD: Comparing Adjacent Rows
-- Month-over-month revenue change
WITH monthly AS (
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) as prev_month_revenue
FROM monthly_revenue
)
SELECT
month,
revenue,
prev_month_revenue,
revenue - prev_month_revenue as change,
ROUND((revenue - prev_month_revenue) * 100.0 / prev_month_revenue, 1) as pct_change
FROM monthly
WHERE prev_month_revenue IS NOT NULL;
Running Totals and Moving Averages
-- Running total of revenue over time
SELECT
date,
daily_revenue,
SUM(daily_revenue) OVER (ORDER BY date) as running_total,
AVG(daily_revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as seven_day_avg
FROM daily_sales;
NTILE: Percentile Groupings
-- Divide customers into quartiles by total spending
SELECT
customer_name,
total_spent,
NTILE(4) OVER (ORDER BY total_spent) as spending_quartile
FROM customer_totals;
SQLite-Specific Features
SQLite is the database engine running in the browser. Understanding SQLite-specific syntax helps you write effective queries.
String Functions
-- Case conversion
SELECT UPPER(name), LOWER(email) FROM contacts;
-- Trimming whitespace
SELECT TRIM(name), LTRIM(text), RTRIM(text) FROM data;
-- Substring
SELECT SUBSTR(product_code, 1, 3) as category_code FROM products;
-- Length
SELECT name, LENGTH(name) as name_length FROM products;
-- Replacing text
SELECT REPLACE(phone, '-', '') as cleaned_phone FROM contacts;
-- Finding position
SELECT INSTR(email, '@') as at_position FROM contacts;
-- Concatenation
SELECT first_name || ' ' || last_name as full_name FROM employees;
Date and Time Functions
SQLite stores dates as text (ISO format: ‘YYYY-MM-DD’) or as Unix timestamps. Date functions work on ISO format strings:
-- Current date
SELECT date('now');
-- Extract year, month, day
SELECT
strftime('%Y', order_date) as year,
strftime('%m', order_date) as month,
strftime('%d', order_date) as day
FROM orders;
-- Date arithmetic: orders in the last 30 days
SELECT * FROM orders
WHERE order_date >= date('now', '-30 days');
-- Days between two dates
SELECT
order_id,
order_date,
ship_date,
julianday(ship_date) - julianday(order_date) as days_to_ship
FROM orders
WHERE ship_date IS NOT NULL;
-- Group by month
SELECT
strftime('%Y-%m', order_date) as month,
COUNT(*) as order_count,
SUM(amount) as monthly_revenue
FROM orders
GROUP BY strftime('%Y-%m', order_date)
ORDER BY month;
Type Conversion
SQLite is dynamically typed. When CSV data is loaded, all values start as text. Explicit conversion is sometimes needed:
-- Cast text to numeric for arithmetic
SELECT
product_name,
CAST(price as REAL) * CAST(quantity as INTEGER) as line_total
FROM order_items;
-- ROUND for decimal precision
SELECT ROUND(AVG(CAST(salary as REAL)), 2) as avg_salary FROM employees;
-- Numeric comparison requires CAST for text-stored numbers
SELECT * FROM products
WHERE CAST(price as REAL) > 50.00;
NULL Handling
-- COALESCE: return first non-null value
SELECT COALESCE(phone, mobile, 'No contact') as contact FROM customers;
-- IFNULL: return second value if first is null
SELECT IFNULL(middle_name, '') as middle_name FROM employees;
-- NULLIF: return null if two values are equal
SELECT NULLIF(status, 'unknown') as clean_status FROM orders;
-- Returns NULL for 'unknown', the original value otherwise
-- Count non-null values
SELECT
COUNT(*) as total_rows,
COUNT(phone) as rows_with_phone,
COUNT(*) - COUNT(phone) as rows_without_phone
FROM customers;
Using the ReportMedic SQL Query Tool
Navigate to reportmedic.org/tools/query-csv-with-sql-online.html. The tool loads a SQLite database environment in the browser.
Loading CSV Files
Click to upload or drag and drop a CSV file. The tool reads the file, creates a SQLite table from the data, and makes it available for querying. The table name is derived from the CSV filename.
After loading, the tool shows the table’s columns and their detected data types. CSV data is text-based, so all columns initially have text affinity in SQLite. The tool attempts to detect numeric columns and displays suggested column types.
You can load multiple CSV files simultaneously. Each file becomes a separate table, enabling JOIN queries across multiple CSV sources.
The Query Editor
The query editor provides a text area for writing SQL. Features typically include:
SQL keyword suggestions
Previous query history
Error highlighting when syntax problems are detected
Write your query, click run, and results appear in the results pane below.
Reading the Results
Query results display as a table with sortable columns, row counts, and pagination for large result sets. The column headers match the output column names from your SELECT clause (or AS aliases where specified).
For queries that return many rows, scrolling through the results table or using LIMIT in your query to see a subset is practical.
Exporting Results
Query results can be exported as CSV for use in spreadsheet applications, further analysis, or loading into other tools. The exported CSV contains exactly the rows and columns in your query result.
Real-World Query Recipes
Sales Data Analysis
-- Sales CSV with columns: order_id, customer_id, customer_name, product,
-- amount, region, order_date, status
-- 1. Revenue by region
SELECT
region,
COUNT(*) as order_count,
SUM(CAST(amount as REAL)) as total_revenue,
ROUND(AVG(CAST(amount as REAL)), 2) as avg_order_value
FROM sales
GROUP BY region
ORDER BY total_revenue DESC;
-- 2. Top 10 customers by revenue
SELECT
customer_name,
COUNT(*) as orders,
SUM(CAST(amount as REAL)) as total_revenue,
MAX(CAST(amount as REAL)) as largest_order
FROM sales
WHERE status = 'completed'
GROUP BY customer_id, customer_name
ORDER BY total_revenue DESC
LIMIT 10;
-- 3. Monthly revenue trend
SELECT
strftime('%Y-%m', order_date) as month,
COUNT(*) as orders,
SUM(CAST(amount as REAL)) as revenue
FROM sales
GROUP BY strftime('%Y-%m', order_date)
ORDER BY month;
-- 4. Revenue by product with running total
WITH product_revenue AS (
SELECT
product,
SUM(CAST(amount as REAL)) as revenue
FROM sales
WHERE status = 'completed'
GROUP BY product
)
SELECT
product,
revenue,
ROUND(revenue * 100.0 / SUM(revenue) OVER(), 1) as pct_of_total,
SUM(revenue) OVER (ORDER BY revenue DESC) as running_total
FROM product_revenue
ORDER BY revenue DESC;
-- 5. Month-over-month growth
WITH monthly AS (
SELECT
strftime('%Y-%m', order_date) as month,
SUM(CAST(amount as REAL)) as revenue
FROM sales
GROUP BY strftime('%Y-%m', order_date)
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) as prev_month,
ROUND((revenue - LAG(revenue) OVER (ORDER BY month)) * 100.0 /
LAG(revenue) OVER (ORDER BY month), 1) as growth_pct
FROM monthly
ORDER BY month;
HR Data Analysis
-- HR CSV with columns: employee_id, name, department, title,
-- salary, hire_date, manager_id, status
-- 1. Headcount and payroll by department
SELECT
department,
COUNT(*) as headcount,
SUM(CAST(salary as REAL)) as total_payroll,
ROUND(AVG(CAST(salary as REAL)), 0) as avg_salary,
MIN(CAST(salary as REAL)) as min_salary,
MAX(CAST(salary as REAL)) as max_salary
FROM employees
WHERE status = 'active'
GROUP BY department
ORDER BY headcount DESC;
-- 2. Salary distribution by band
SELECT
CASE
WHEN CAST(salary as REAL) >= 120000 THEN 'Band 5: 120k+'
WHEN CAST(salary as REAL) >= 90000 THEN 'Band 4: 90k-120k'
WHEN CAST(salary as REAL) >= 70000 THEN 'Band 3: 70k-90k'
WHEN CAST(salary as REAL) >= 50000 THEN 'Band 2: 50k-70k'
ELSE 'Band 1: Under 50k'
END as salary_band,
COUNT(*) as count,
ROUND(AVG(CAST(salary as REAL)), 0) as band_average
FROM employees
WHERE status = 'active'
GROUP BY salary_band
ORDER BY salary_band;
-- 3. Tenure analysis
SELECT
name,
department,
hire_date,
ROUND(julianday('now') - julianday(hire_date)) as days_employed,
ROUND((julianday('now') - julianday(hire_date)) / 365.25, 1) as years_employed
FROM employees
WHERE status = 'active'
ORDER BY days_employed DESC;
-- 4. Salary percentile for each employee within their department
SELECT
name,
department,
CAST(salary as REAL) as salary,
ROUND(CAST(salary as REAL) * 100.0 /
SUM(CAST(salary as REAL)) OVER (PARTITION BY department), 1) as dept_salary_pct,
RANK() OVER (PARTITION BY department ORDER BY CAST(salary as REAL) DESC) as dept_salary_rank,
COUNT(*) OVER (PARTITION BY department) as dept_size
FROM employees
WHERE status = 'active'
ORDER BY department, dept_salary_rank;
-- 5. Attrition by department (requires status column with 'active'/'inactive')
SELECT
department,
COUNT(CASE WHEN status = 'active' THEN 1 END) as active,
COUNT(CASE WHEN status = 'inactive' THEN 1 END) as inactive,
COUNT(*) as total,
ROUND(COUNT(CASE WHEN status = 'inactive' THEN 1 END) * 100.0 / COUNT(*), 1) as attrition_rate
FROM employees
GROUP BY department
ORDER BY attrition_rate DESC;
Marketing Campaign Analysis
-- Marketing CSV with columns: campaign_id, campaign_name, channel,
-- impressions, clicks, conversions, cost, revenue
-- 1. Campaign performance overview
SELECT
campaign_name,
channel,
CAST(impressions as INTEGER) as impressions,
CAST(clicks as INTEGER) as clicks,
ROUND(CAST(clicks as REAL) * 100 / CAST(impressions as REAL), 2) as ctr,
CAST(conversions as INTEGER) as conversions,
ROUND(CAST(conversions as REAL) * 100 / CAST(clicks as REAL), 2) as cvr,
CAST(cost as REAL) as cost,
ROUND(CAST(cost as REAL) / CAST(conversions as REAL), 2) as cost_per_conversion
FROM campaigns
ORDER BY cost_per_conversion;
-- 2. ROAS (Return on Ad Spend) by channel
SELECT
channel,
SUM(CAST(cost as REAL)) as total_spend,
SUM(CAST(revenue as REAL)) as total_revenue,
ROUND(SUM(CAST(revenue as REAL)) / SUM(CAST(cost as REAL)), 2) as roas
FROM campaigns
GROUP BY channel
ORDER BY roas DESC;
-- 3. Identify campaigns below target ROAS
SELECT
campaign_name,
channel,
ROUND(CAST(revenue as REAL) / CAST(cost as REAL), 2) as roas,
CAST(cost as REAL) as spend,
CAST(conversions as INTEGER) as conversions
FROM campaigns
WHERE CAST(revenue as REAL) / CAST(cost as REAL) < 2.0 -- Below 2x ROAS target
ORDER BY roas;
-- 4. Channel efficiency comparison
SELECT
channel,
COUNT(*) as campaign_count,
SUM(CAST(impressions as INTEGER)) as total_impressions,
ROUND(SUM(CAST(clicks as REAL)) * 100 / SUM(CAST(impressions as REAL)), 2) as avg_ctr,
ROUND(SUM(CAST(conversions as REAL)) * 100 / SUM(CAST(clicks as REAL)), 2) as avg_cvr,
ROUND(SUM(CAST(cost as REAL)) / SUM(CAST(conversions as REAL)), 2) as avg_cpa
FROM campaigns
GROUP BY channel;
Finance and Expense Analysis
-- Finance CSV with columns: date, account, category, amount,
-- description, department, status
-- 1. Expense by category
SELECT
category,
COUNT(*) as transaction_count,
SUM(CAST(amount as REAL)) as total_amount,
ROUND(AVG(CAST(amount as REAL)), 2) as avg_transaction,
MAX(CAST(amount as REAL)) as largest_transaction
FROM expenses
WHERE status = 'approved'
GROUP BY category
ORDER BY total_amount DESC;
-- 2. Monthly spending trend by department
SELECT
strftime('%Y-%m', date) as month,
department,
SUM(CAST(amount as REAL)) as monthly_spend
FROM expenses
WHERE status = 'approved'
GROUP BY strftime('%Y-%m', date), department
ORDER BY month, department;
-- 3. Budget vs actual (requires budget table)
-- (If you load both expenses.csv and budget.csv)
SELECT
b.department,
b.category,
CAST(b.budget_amount as REAL) as budget,
COALESCE(SUM(CAST(e.amount as REAL)), 0) as actual_spend,
CAST(b.budget_amount as REAL) - COALESCE(SUM(CAST(e.amount as REAL)), 0) as remaining,
ROUND(COALESCE(SUM(CAST(e.amount as REAL)), 0) * 100.0 /
CAST(b.budget_amount as REAL), 1) as pct_used
FROM budget b
LEFT JOIN expenses e ON b.department = e.department
AND b.category = e.category
AND status = 'approved'
GROUP BY b.department, b.category
ORDER BY pct_used DESC;
-- 4. Variance analysis: categories spending above average
WITH category_stats AS (
SELECT
category,
SUM(CAST(amount as REAL)) as total,
AVG(SUM(CAST(amount as REAL))) OVER () as overall_avg
FROM expenses
WHERE status = 'approved'
GROUP BY category
)
SELECT
category,
total,
ROUND(overall_avg, 2) as category_avg,
ROUND(total - overall_avg, 2) as variance,
CASE WHEN total > overall_avg THEN 'Above average' ELSE 'Below average' END as status
FROM category_stats
ORDER BY total DESC;
E-commerce Customer Analysis
-- Orders CSV with columns: order_id, customer_id, customer_email,
-- product_id, product_name, category, quantity, unit_price, order_date
-- 1. Customer lifetime value (LTV)
SELECT
customer_id,
customer_email,
COUNT(DISTINCT order_id) as order_count,
SUM(CAST(quantity as INTEGER) * CAST(unit_price as REAL)) as ltv,
MIN(order_date) as first_order,
MAX(order_date) as last_order,
ROUND((julianday('now') - julianday(MIN(order_date))) / 365.25, 1) as customer_age_years
FROM orders
GROUP BY customer_id, customer_email
ORDER BY ltv DESC;
-- 2. Product performance
SELECT
product_name,
category,
COUNT(DISTINCT order_id) as orders_containing_product,
SUM(CAST(quantity as INTEGER)) as units_sold,
SUM(CAST(quantity as INTEGER) * CAST(unit_price as REAL)) as total_revenue,
ROUND(AVG(CAST(unit_price as REAL)), 2) as avg_selling_price
FROM orders
GROUP BY product_id, product_name, category
ORDER BY total_revenue DESC;
-- 3. Customer cohort analysis by first order month
WITH customer_cohorts AS (
SELECT
customer_id,
strftime('%Y-%m', MIN(order_date)) as cohort_month
FROM orders
GROUP BY customer_id
),
order_months AS (
SELECT
o.customer_id,
c.cohort_month,
strftime('%Y-%m', o.order_date) as order_month
FROM orders o
JOIN customer_cohorts c ON o.customer_id = c.customer_id
)
SELECT
cohort_month,
order_month,
COUNT(DISTINCT customer_id) as customers,
COUNT(DISTINCT customer_id) * 100.0 /
FIRST_VALUE(COUNT(DISTINCT customer_id)) OVER (
PARTITION BY cohort_month ORDER BY order_month
) as retention_rate
FROM order_months
GROUP BY cohort_month, order_month
ORDER BY cohort_month, order_month;
-- 4. Cross-sell analysis: which products are purchased together?
SELECT
a.product_name as product_1,
b.product_name as product_2,
COUNT(*) as times_purchased_together
FROM orders a
JOIN orders b ON a.order_id = b.order_id
AND a.product_id < b.product_id -- Avoid duplicates
GROUP BY a.product_name, b.product_name
HAVING COUNT(*) >= 3
ORDER BY times_purchased_together DESC;
-- 5. RFM segmentation (Recency, Frequency, Monetary)
WITH rfm_base AS (
SELECT
customer_id,
customer_email,
ROUND(julianday('now') - julianday(MAX(order_date))) as recency_days,
COUNT(DISTINCT order_id) as frequency,
SUM(CAST(quantity as INTEGER) * CAST(unit_price as REAL)) as monetary
FROM orders
GROUP BY customer_id, customer_email
),
rfm_scored AS (
SELECT
customer_id,
customer_email,
recency_days,
frequency,
monetary,
NTILE(5) OVER (ORDER BY recency_days ASC) as r_score,
NTILE(5) OVER (ORDER BY frequency DESC) as f_score,
NTILE(5) OVER (ORDER BY monetary DESC) as m_score
FROM rfm_base
)
SELECT
customer_email,
recency_days,
frequency,
ROUND(monetary, 2) as monetary,
r_score,
f_score,
m_score,
r_score + f_score + m_score as rfm_total,
CASE
WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champions'
WHEN r_score >= 3 AND f_score >= 3 THEN 'Loyal Customers'
WHEN r_score >= 4 THEN 'Recent Customers'
WHEN f_score >= 4 THEN 'Frequent Buyers'
WHEN r_score <= 2 AND f_score >= 3 THEN 'At Risk'
WHEN r_score <= 2 AND f_score <= 2 THEN 'Churned'
ELSE 'Others'
END as segment
FROM rfm_scored
ORDER BY rfm_total DESC;
Education and Grades Analysis
-- Grades CSV with columns: student_id, student_name, course_id,
-- course_name, subject, score, max_score, semester
-- 1. Student performance overview
SELECT
student_name,
COUNT(*) as courses_taken,
ROUND(AVG(CAST(score as REAL) * 100 / CAST(max_score as REAL)), 1) as avg_pct,
MIN(CAST(score as REAL) * 100 / CAST(max_score as REAL)) as lowest_pct,
MAX(CAST(score as REAL) * 100 / CAST(max_score as REAL)) as highest_pct
FROM grades
GROUP BY student_id, student_name
ORDER BY avg_pct DESC;
-- 2. Grade distribution per course
SELECT
course_name,
COUNT(*) as students,
COUNT(CASE WHEN CAST(score as REAL) / CAST(max_score as REAL) >= 0.9 THEN 1 END) as A_grades,
COUNT(CASE WHEN CAST(score as REAL) / CAST(max_score as REAL) >= 0.8
AND CAST(score as REAL) / CAST(max_score as REAL) < 0.9 THEN 1 END) as B_grades,
COUNT(CASE WHEN CAST(score as REAL) / CAST(max_score as REAL) >= 0.7
AND CAST(score as REAL) / CAST(max_score as REAL) < 0.8 THEN 1 END) as C_grades,
COUNT(CASE WHEN CAST(score as REAL) / CAST(max_score as REAL) < 0.7 THEN 1 END) as below_C
FROM grades
GROUP BY course_id, course_name;
-- 3. Identify students who improved significantly across semesters
WITH semester_avgs AS (
SELECT
student_id,
student_name,
semester,
ROUND(AVG(CAST(score as REAL) * 100 / CAST(max_score as REAL)), 1) as avg_score
FROM grades
GROUP BY student_id, student_name, semester
)
SELECT
student_name,
semester,
avg_score,
LAG(avg_score) OVER (PARTITION BY student_id ORDER BY semester) as prev_semester,
avg_score - LAG(avg_score) OVER (PARTITION BY student_id ORDER BY semester) as improvement
FROM semester_avgs
ORDER BY student_name, semester;
SQL vs Spreadsheet Formulas: When Each Is Faster
The decision between SQL and spreadsheet formulas is not always clear-cut. Understanding the specific decision factors helps you choose efficiently.
When SQL Is Clearly Faster
Multiple conditions: A SQL WHERE clause with multiple conditions is simpler and clearer than a nested IF or a complex FILTER formula in a spreadsheet.
Aggregations by multiple dimensions: GROUP BY with multiple columns is one query. In a spreadsheet, this requires a pivot table or SUMIFS with multiple criteria.
Joining two tables: A SQL JOIN is one clause. In a spreadsheet, this requires VLOOKUP or INDEX/MATCH applied to each column needed from the second table.
Ranking within groups: RANK() OVER (PARTITION BY...) is clean SQL. In a spreadsheet, this requires a COUNTIFS-based formula or multiple helper columns.
Running totals: SUM() OVER (ORDER BY...) is a single function call. In a spreadsheet, this requires a formula that locks the start of the range while moving the end, which is error-prone to set up correctly.
Top N within groups: With a CTE and RANK(), this is clean SQL. In a spreadsheet, this requires a complex formula or a manual filter-and-copy workflow.
When Spreadsheet Formulas Are Faster
One-off lookups in a file you already have open: If the data is already in Excel and you need one VLOOKUP result, writing the formula is faster than opening the SQL tool and loading the file.
Simple arithmetic on a few values: Quick calculations that do not require filtering, aggregating, or joining are often faster to do directly in a spreadsheet.
Visual layout matters: Pivot tables with drag-and-drop interactivity are faster to explore than rewriting SQL queries for each new slice.
The data recipient will use a spreadsheet: If the final output will be reviewed in Excel, building the analysis in Excel avoids the export-import step.
The Decision Framework
A practical rule of thumb: if the analysis involves any of the following, use SQL:
Joining two or more data sources
Aggregating with more than one grouping dimension
Any window function calculation (running totals, rankings, lag/lead comparisons)
Finding top N within groups
Multiple filter conditions combined with AND/OR logic
Working with data that has more than a few thousand rows where spreadsheet performance degrades
For everything simpler, the spreadsheet tool you already have open is fine.
Advanced SQL Techniques in Browser SQLite
Recursive CTEs
SQLite supports recursive CTEs, which enable traversing hierarchical data:
-- Traverse an org chart hierarchy
WITH RECURSIVE org_tree AS (
-- Base case: top-level employees (no manager)
SELECT id, name, manager_id, 0 as level, name as path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees who report to someone in the tree
SELECT e.id, e.name, e.manager_id, ot.level + 1, ot.path || ' > ' || e.name
FROM employees e
JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT
SUBSTR(' ', 1, level * 2) || name as hierarchy,
level,
path
FROM org_tree
ORDER BY path;
UNION and Set Operations
-- Combine results from multiple queries
SELECT 'Q1' as quarter, SUM(amount) as revenue FROM orders WHERE order_date LIKE '%-01%' OR order_date LIKE '%-02%' OR order_date LIKE '%-03%'
UNION ALL
SELECT 'Q2', SUM(amount) FROM orders WHERE strftime('%m', order_date) IN ('04', '05', '06')
UNION ALL
SELECT 'Q3', SUM(amount) FROM orders WHERE strftime('%m', order_date) IN ('07', '08', '09')
UNION ALL
SELECT 'Q4', SUM(amount) FROM orders WHERE strftime('%m', order_date) IN ('10', '11', '12');
-- INTERSECT: customers who appear in both tables
SELECT customer_email FROM active_customers
INTERSECT
SELECT customer_email FROM newsletter_subscribers;
-- EXCEPT: customers in list A but not list B
SELECT customer_email FROM all_customers
EXCEPT
SELECT customer_email FROM opted_out_customers;
String Aggregation for Multi-Value Summaries
-- List all products purchased by each customer as a comma-separated string
SELECT
customer_email,
GROUP_CONCAT(DISTINCT product_name) as products_purchased
FROM orders
GROUP BY customer_id, customer_email;
-- With ordering and separator
SELECT
department,
GROUP_CONCAT(name, ' | ') as team_members
FROM employees
WHERE status = 'active'
GROUP BY department
ORDER BY department;
Advanced Aggregation Patterns
Beyond basic GROUP BY, SQL offers several powerful aggregation patterns that handle complex analytical scenarios.
Conditional Aggregation with CASE WHEN Inside Aggregates
Conditional aggregation computes different aggregates for different subsets in a single query pass, producing pivot-style results:
-- Orders by status (pivot style)
SELECT
strftime('%Y-%m', order_date) as month,
COUNT(*) as total_orders,
COUNT(CASE WHEN status = 'completed' THEN 1 END) as completed,
COUNT(CASE WHEN status = 'pending' THEN 1 END) as pending,
COUNT(CASE WHEN status = 'cancelled' THEN 1 END) as cancelled,
SUM(CASE WHEN status = 'completed' THEN CAST(amount as REAL) ELSE 0 END) as completed_revenue,
ROUND(
COUNT(CASE WHEN status = 'completed' THEN 1 END) * 100.0 / COUNT(*),
1
) as completion_rate_pct
FROM orders
GROUP BY strftime('%Y-%m', order_date)
ORDER BY month;
This approach produces a single row per month with all statuses broken out as columns, without requiring a JOIN or multiple queries.
Percentage of Total
A common pattern is calculating each group’s percentage of the grand total:
-- Each product's share of total revenue
WITH product_rev AS (
SELECT
product_name,
SUM(CAST(amount as REAL)) as revenue
FROM sales
GROUP BY product_name
)
SELECT
product_name,
revenue,
ROUND(revenue * 100.0 / SUM(revenue) OVER (), 2) as pct_of_total,
ROUND(SUM(revenue) OVER (ORDER BY revenue DESC) * 100.0 /
SUM(revenue) OVER (), 2) as cumulative_pct
FROM product_rev
ORDER BY revenue DESC;
The window function SUM(revenue) OVER () without PARTITION BY computes the grand total, used for the percentage calculation.
Finding Gaps in Sequential Data
Identifying missing values in a sequence (missing order IDs, gaps in a date series):
-- Find gaps in order ID sequence
WITH consecutive AS (
SELECT
order_id,
LEAD(order_id) OVER (ORDER BY order_id) as next_id
FROM orders
)
SELECT
order_id as gap_after,
next_id as gap_before,
next_id - order_id - 1 as missing_count
FROM consecutive
WHERE next_id - order_id > 1
ORDER BY order_id;
Aggregating at Multiple Levels
SQL can produce aggregates at multiple grain levels simultaneously:
-- Revenue at product, category, and total levels in one query
WITH base AS (
SELECT product_name, category, CAST(amount as REAL) as amount
FROM sales
)
SELECT
product_name,
category,
SUM(amount) as product_revenue,
SUM(SUM(amount)) OVER (PARTITION BY category) as category_total,
SUM(SUM(amount)) OVER () as grand_total,
ROUND(SUM(amount) * 100.0 / SUM(SUM(amount)) OVER (PARTITION BY category), 1) as pct_of_category,
ROUND(SUM(amount) * 100.0 / SUM(SUM(amount)) OVER (), 1) as pct_of_total
FROM base
GROUP BY product_name, category
ORDER BY category, product_revenue DESC;
SQL for Specific Business Questions
These sections address common specific business questions with complete query solutions.
Customer Segmentation and Churn Analysis
-- Identify customers who have not ordered recently (potential churn)
SELECT
customer_id,
customer_email,
MAX(order_date) as last_order_date,
COUNT(DISTINCT order_id) as total_orders,
SUM(CAST(amount as REAL)) as lifetime_value,
ROUND(julianday('now') - julianday(MAX(order_date))) as days_since_last_order,
CASE
WHEN julianday('now') - julianday(MAX(order_date)) > 365 THEN 'Churned (1+ year)'
WHEN julianday('now') - julianday(MAX(order_date)) > 180 THEN 'At Risk (6-12 months)'
WHEN julianday('now') - julianday(MAX(order_date)) > 90 THEN 'Cooling (3-6 months)'
ELSE 'Active'
END as churn_status
FROM orders
GROUP BY customer_id, customer_email
ORDER BY days_since_last_order DESC;
Inventory and Stock Analysis
-- Find products where current stock is below reorder threshold
-- (requires inventory.csv with columns: product_id, product_name,
-- current_stock, reorder_point, unit_cost)
SELECT
product_name,
current_stock,
reorder_point,
reorder_point - CAST(current_stock as INTEGER) as units_short,
CAST(unit_cost as REAL) * (reorder_point - CAST(current_stock as INTEGER)) as reorder_cost
FROM inventory
WHERE CAST(current_stock as INTEGER) < CAST(reorder_point as INTEGER)
ORDER BY units_short DESC;
-- Products with no movement in a given period
-- (requires inventory.csv and orders.csv)
SELECT
i.product_name,
CAST(i.current_stock as INTEGER) as on_hand,
CAST(i.unit_cost as REAL) * CAST(i.current_stock as INTEGER) as inventory_value,
COALESCE(o.last_ordered, 'Never') as last_ordered
FROM inventory i
LEFT JOIN (
SELECT product_id, MAX(order_date) as last_ordered
FROM order_items
GROUP BY product_id
) o ON i.product_id = o.product_id
WHERE o.last_ordered IS NULL
OR julianday('now') - julianday(o.last_ordered) > 90
ORDER BY inventory_value DESC;
Performance Benchmarking
-- Compare salesperson performance against team average
SELECT
s.salesperson_name,
s.region,
COUNT(o.order_id) as orders_closed,
SUM(CAST(o.amount as REAL)) as total_revenue,
ROUND(AVG(CAST(o.amount as REAL)), 2) as avg_deal_size,
ROUND(AVG(AVG(CAST(o.amount as REAL))) OVER (PARTITION BY s.region), 2) as region_avg_deal,
ROUND(SUM(CAST(o.amount as REAL)) * 100.0 /
SUM(SUM(CAST(o.amount as REAL))) OVER (PARTITION BY s.region), 1) as region_revenue_share
FROM salespeople s
JOIN orders o ON s.id = o.salesperson_id
WHERE o.status = 'completed'
GROUP BY s.id, s.salesperson_name, s.region
ORDER BY s.region, total_revenue DESC;
Funnel Analysis
-- Marketing funnel conversion rates
-- (requires funnel.csv with columns: session_id, stage, stage_date)
WITH funnel_counts AS (
SELECT
stage,
COUNT(DISTINCT session_id) as sessions
FROM funnel_events
GROUP BY stage
),
stages_ordered AS (
SELECT stage, sessions,
CASE stage
WHEN 'visit' THEN 1
WHEN 'product_view' THEN 2
WHEN 'add_to_cart' THEN 3
WHEN 'checkout' THEN 4
WHEN 'purchase' THEN 5
ELSE 6
END as stage_order
FROM funnel_counts
)
SELECT
stage,
sessions,
ROUND(sessions * 100.0 /
FIRST_VALUE(sessions) OVER (ORDER BY stage_order), 1) as pct_of_top,
ROUND(sessions * 100.0 /
LAG(sessions) OVER (ORDER BY stage_order), 1) as step_conversion_rate
FROM stages_ordered
ORDER BY stage_order;
Understanding Query Execution and Performance
Understanding how SQL queries execute helps you write more efficient queries and debug unexpected results.
The Order of SQL Clause Execution
SQL clauses execute in a specific logical order that differs from the order they are written:
FROM and JOIN - determine which tables and rows are involved
WHERE - filter individual rows
GROUP BY - group filtered rows
HAVING - filter groups
SELECT - compute output expressions and apply aliases
DISTINCT - remove duplicate rows
ORDER BY - sort results
LIMIT/OFFSET - restrict result count
This execution order explains some common gotchas:
Column aliases defined in SELECT cannot be used in WHERE (WHERE executes before SELECT)
Column aliases can be used in ORDER BY (ORDER BY executes after SELECT)
Aggregate functions cannot be in WHERE but can be in HAVING
The same aggregate can appear in both HAVING and SELECT
-- This works:
SELECT department, COUNT(*) as emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 3 -- HAVING uses the aggregate, not the alias
ORDER BY emp_count DESC; -- ORDER BY can use the alias
-- This does NOT work (cannot use alias in HAVING):
SELECT department, COUNT(*) as emp_count
FROM employees
GROUP BY department
HAVING emp_count > 3; -- This fails in most SQL, though SQLite may allow it
Query Optimization Basics
For the CSV volumes typical in browser-based analysis, performance is rarely a concern. But understanding optimization helps when working with larger files.
Filter early: Apply the most restrictive WHERE conditions to reduce the number of rows processed by subsequent operations.
Avoid SELECT * in production queries: Selecting only the columns you need reduces data transfer and is clearer about the query’s intent.
Use LIMIT when exploring: When exploring data interactively, adding LIMIT 100 to queries prevents waiting for large result sets while you are still understanding the data.
Subquery vs CTE: In SQLite, CTEs and subqueries have equivalent performance. CTEs are generally preferred for readability.
SQL Patterns for Data Quality Checks
SQL is highly effective for data quality validation. These patterns identify common data quality issues.
Completeness Checks
-- Identify columns with high null rates
SELECT
'customer_id' as column_name,
COUNT(*) as total,
COUNT(customer_id) as non_null,
COUNT(*) - COUNT(customer_id) as null_count,
ROUND((COUNT(*) - COUNT(customer_id)) * 100.0 / COUNT(*), 1) as null_pct
FROM orders
UNION ALL
SELECT
'amount',
COUNT(*),
COUNT(amount),
COUNT(*) - COUNT(amount),
ROUND((COUNT(*) - COUNT(amount)) * 100.0 / COUNT(*), 1)
FROM orders
UNION ALL
SELECT
'order_date',
COUNT(*),
COUNT(order_date),
COUNT(*) - COUNT(order_date),
ROUND((COUNT(*) - COUNT(order_date)) * 100.0 / COUNT(*), 1)
FROM orders;
Uniqueness Checks
-- Find duplicate records across key columns
SELECT
customer_email,
order_date,
amount,
COUNT(*) as occurrences
FROM orders
GROUP BY customer_email, order_date, amount
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;
Referential Integrity Checks
-- Orphaned records: orders without matching customers
-- (when orders.csv and customers.csv are both loaded)
SELECT o.order_id, o.customer_id, o.amount
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id
WHERE c.id IS NULL;
-- Orders with invalid product IDs
SELECT o.order_id, o.product_id
FROM order_items o
LEFT JOIN products p ON o.product_id = p.id
WHERE p.id IS NULL;
Range and Format Checks
-- Values outside expected ranges
SELECT 'Negative amounts' as issue, COUNT(*) as count
FROM orders WHERE CAST(amount as REAL) < 0
UNION ALL
SELECT 'Zero amounts', COUNT(*)
FROM orders WHERE CAST(amount as REAL) = 0
UNION ALL
SELECT 'Invalid email format', COUNT(*)
FROM customers WHERE email NOT LIKE '%@%.%'
UNION ALL
SELECT 'Future dates', COUNT(*)
FROM orders WHERE order_date > date('now');
SQL for Financial Reporting
Financial reporting in SQL requires specific patterns for period-over-period analysis, budget comparisons, and regulatory calculations.
Period-over-Period Comparisons
-- Year-over-year revenue comparison
WITH yearly AS (
SELECT
strftime('%Y', order_date) as year,
SUM(CAST(amount as REAL)) as revenue
FROM orders
WHERE status = 'completed'
GROUP BY strftime('%Y', order_date)
)
SELECT
year,
revenue,
LAG(revenue) OVER (ORDER BY year) as prior_year,
revenue - LAG(revenue) OVER (ORDER BY year) as yoy_change,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY year)) * 100.0 /
LAG(revenue) OVER (ORDER BY year),
1
) as yoy_growth_pct
FROM yearly
ORDER BY year;
Rolling Calculations
-- 3-month rolling average revenue
WITH monthly AS (
SELECT
strftime('%Y-%m', order_date) as month,
SUM(CAST(amount as REAL)) as revenue
FROM orders
GROUP BY strftime('%Y-%m', order_date)
)
SELECT
month,
revenue,
ROUND(AVG(revenue) OVER (
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2) as rolling_3mo_avg
FROM monthly
ORDER BY month;
Waterfall Chart Data
-- Revenue waterfall: starting balance, additions, subtractions, ending balance
SELECT
'Opening Balance' as category,
1 as sort_order,
CAST(opening_balance as REAL) as amount
FROM financial_summary
UNION ALL
SELECT
category,
2 as sort_order,
SUM(CAST(amount as REAL)) as amount
FROM transactions
WHERE type = 'revenue'
GROUP BY category
UNION ALL
SELECT
category,
3 as sort_order,
-SUM(CAST(amount as REAL)) as amount
FROM transactions
WHERE type = 'expense'
GROUP BY category
ORDER BY sort_order, category;
Persona-Specific SQL Workflows
Business Analysts Replacing Excel Pivot Tables
A common transition for business analysts moving from Excel to SQL is replacing pivot tables. This pattern maps directly.
Excel pivot table: rows = Region, columns = Quarter, values = Sum of Revenue
SQL equivalent:
SELECT
region,
SUM(CASE WHEN strftime('%m', order_date) IN ('01','02','03')
THEN CAST(amount as REAL) ELSE 0 END) as Q1,
SUM(CASE WHEN strftime('%m', order_date) IN ('04','05','06')
THEN CAST(amount as REAL) ELSE 0 END) as Q2,
SUM(CASE WHEN strftime('%m', order_date) IN ('07','08','09')
THEN CAST(amount as REAL) ELSE 0 END) as Q3,
SUM(CASE WHEN strftime('%m', order_date) IN ('10','11','12')
THEN CAST(amount as REAL) ELSE 0 END) as Q4,
SUM(CAST(amount as REAL)) as annual_total
FROM orders
WHERE status = 'completed'
GROUP BY region
ORDER BY annual_total DESC;
The SQL version is more explicit about what each column represents and easier to modify (adding a new quarter or changing the time period is a one-line edit).
Data Engineers Validating ETL Outputs
When data engineers run ETL (Extract, Transform, Load) processes, SQL validation queries on the output CSV confirm the pipeline worked correctly:
-- Validate row counts match expected
SELECT COUNT(*) as row_count FROM pipeline_output;
-- Validate no nulls in required columns
SELECT
COUNT(*) as total,
COUNT(CASE WHEN primary_key IS NULL THEN 1 END) as null_keys,
COUNT(CASE WHEN required_field IS NULL THEN 1 END) as null_required
FROM pipeline_output;
-- Validate aggregates match source
SELECT
SUM(CAST(amount as REAL)) as total_amount,
COUNT(DISTINCT customer_id) as unique_customers,
MIN(event_date) as earliest_date,
MAX(event_date) as latest_date
FROM pipeline_output;
This validation pattern connects naturally to ReportMedic’s Data Profiler for automated completeness and distribution checks, and to ReportMedic’s Compare Two Spreadsheets tool for comparing expected vs actual values at the row level.
Students Learning SQL with Real Data
For SQL students who learn better with their own data rather than textbook examples, loading a real CSV they have (perhaps a spending tracker, a reading list, a class grade export) and writing queries against it accelerates learning significantly.
Starting questions a student can answer with basic SQL:
-- What is the average in each course?
SELECT course, ROUND(AVG(CAST(score as REAL)), 1) as avg_score
FROM grades GROUP BY course;
-- Which categories did I spend most on?
SELECT category, SUM(CAST(amount as REAL)) as total
FROM spending GROUP BY category ORDER BY total DESC;
-- How many books did I read each month?
SELECT strftime('%Y-%m', finish_date) as month, COUNT(*) as books_read
FROM reading_log GROUP BY month ORDER BY month;
Each of these queries introduces a core concept (GROUP BY, ORDER BY, date functions) in the context of data the student personally cares about.
Comparison with Alternatives
DBeaver and DataGrip: Professional Database Clients
DBeaver is a free, open-source database management tool. DataGrip is JetBrains’ commercial database IDE. Both provide rich SQL editing, connection to many database types, schema browsing, and powerful query tools.
These are the right tools when: you have a real database server to connect to, you need persistent connections, you work with complex multi-schema databases, or you need professional-grade database management features.
For ad-hoc CSV analysis, these tools require: loading the CSV into a database (an extra step), maintaining a database connection, and the overhead of a desktop application.
Mode Analytics and Redash: Business Intelligence Platforms
Mode and Redash are business intelligence platforms that allow teams to share SQL queries and visualizations. They connect to shared databases and support collaborative analysis.
These are the right tools when: you have a shared data warehouse, you need to share analysis with a team, or you build dashboards that update automatically.
For individual CSV analysis without a shared database, they require data loading infrastructure.
Google BigQuery Sandbox
Google provides a free sandbox tier for BigQuery that allows loading files and running SQL. This is a cloud SQL environment with BigQuery’s capabilities including excellent performance on large datasets and geospatial SQL.
BigQuery is the right tool when: datasets are very large (millions to billions of rows), you need BigQuery-specific features (geospatial, ML functions), or the data is already in Google Cloud.
For small to medium CSV files where data should stay on the device, browser-based SQLite is simpler and more private.
DB Fiddle and SQL Fiddle
DB Fiddle and SQL Fiddle are online tools for sharing SQL examples, testing queries, and learning SQL. They run SQL on a server and return results.
These tools send data to remote servers, which matters for sensitive data. They are excellent for learning SQL syntax and sharing examples, but not for analyzing confidential CSV files.
The Browser SQLite Advantage
ReportMedic’s SQL Query tool is the right choice when: CSV files contain data that should not leave the device, you want SQL without any setup or account, and you need standard SQL capabilities on files you already have locally.
Integrating SQL with Other ReportMedic Tools
SQL After Python Preprocessing
Use the Python Code Runner to transform and clean a CSV, then load the cleaned CSV into the SQL tool for analysis. Python handles complex text processing, regex-based cleaning, and programmatic transformations that SQL cannot express cleanly. SQL handles the analytical queries on the clean data.
Example workflow: a sales export with inconsistent date formats and mixed currency symbols. Python cleans the data (normalizes dates to ISO format, strips currency symbols from numeric fields). The cleaned CSV loads into the SQL tool for revenue analysis, growth calculations, and customer segmentation.
SQL After Data Cleaning
Use ReportMedic’s Clean Data tool to fix common data quality issues (trailing spaces, inconsistent casing, duplicate rows), then load the cleaned file into the SQL tool for analysis. The data quality step ensures that GROUP BY and JOIN operations work correctly without false duplicates or mismatched strings due to format inconsistencies.
SQL Before Data Profiling
After running SQL queries to understand the structure and content of a CSV, use ReportMedic’s Data Profiler to get statistical distribution information about individual columns. The profiler provides a quick overview of data quality, null rates, and value distributions that complement SQL analysis.
Building a Data Analysis Workflow with SQL
For analysts who regularly work with CSV data from various sources, a repeatable SQL-based workflow improves efficiency and consistency.
Step 1: Load and Explore
When you receive a new CSV file, start with exploratory queries:
-- How many rows?
SELECT COUNT(*) FROM your_table;
-- What does the data look like?
SELECT * FROM your_table LIMIT 10;
-- What are the unique values in key columns?
SELECT DISTINCT status FROM your_table ORDER BY status;
SELECT DISTINCT department FROM your_table ORDER BY department;
-- Are there nulls in important columns?
SELECT
COUNT(*) as total,
COUNT(customer_id) as has_customer_id,
COUNT(amount) as has_amount,
COUNT(order_date) as has_date
FROM your_table;
-- What's the date range?
SELECT MIN(order_date), MAX(order_date) FROM your_table;
Step 2: Validate
Before relying on analysis, validate data quality:
-- Check for duplicate primary keys
SELECT order_id, COUNT(*) as count
FROM orders
GROUP BY order_id
HAVING COUNT(*) > 1;
-- Check for negative amounts (if that should not occur)
SELECT * FROM orders WHERE CAST(amount as REAL) < 0;
-- Check for future dates (if that should not occur)
SELECT * FROM orders WHERE order_date > date('now');
-- Check for outliers in numeric columns
SELECT
AVG(CAST(amount as REAL)) as mean,
MIN(CAST(amount as REAL)) as min_val,
MAX(CAST(amount as REAL)) as max_val
FROM orders;
Step 3: Answer the Core Questions
With clean, validated data, write the queries that answer your specific analytical questions.
Step 4: Export and Share
Export query results as CSV for sharing with stakeholders, loading into visualization tools, or inclusion in reports.
Data Preparation Tips
Handling Headers
The SQL tool uses the first row of the CSV as column names. If a CSV does not have a header row, add one before loading. Column names with spaces or special characters may need to be quoted in SQL:
SELECT "order id", "customer name" FROM orders;
-- Or use brackets (SQLite also accepts)
SELECT [order id], [customer name] FROM orders;
Encoding Issues
CSVs with special characters (accented letters, non-ASCII text) may have encoding issues that affect query results. Files from Windows applications often use Windows-1252 encoding rather than UTF-8. If you see garbled characters in query results, converting the CSV to UTF-8 encoding before loading (using a text editor that supports encoding conversion) resolves the issue.
Delimiter Problems
Standard CSVs use comma delimiters. Some systems export semicolon-delimited or tab-delimited files. The SQL tool expects standard comma-delimited CSV. For semicolon or tab delimiters, use ReportMedic’s Clean Data tool or a text editor to convert the delimiter before loading.
Null Value Representation
Different systems represent null/missing values differently: empty string, “NULL”, “N/A”, “None”, “#N/A”. In SQL, these load as text values rather than SQL NULL. To treat them as NULL in queries:
SELECT
NULLIF(notes, '') as notes,
NULLIF(phone, 'N/A') as phone
FROM customers;
Frequently Asked Questions
Does SQL in the browser work the same as SQL in a real database?
Browser-based SQL uses SQLite, which implements standard SQL with some SQLite-specific extensions and a few differences from other database systems. The core SQL covered in this guide (SELECT, WHERE, GROUP BY, HAVING, JOIN, ORDER BY, subqueries, CTEs, window functions) is standard and works the same across SQLite, PostgreSQL, MySQL, and other databases with minor syntax variations. SQLite-specific functions like strftime(), SUBSTR(), INSTR(), and GROUP_CONCAT() exist in slightly different forms in other databases. Code written for the browser SQLite environment is directly transferable to most production database environments with minor adjustments for system-specific functions.
How many CSV files can I load and join at once?
You can load multiple CSV files simultaneously in the SQL tool. Each file becomes a separate table, and you can join any number of tables together in a single query. Practical limits depend on the total data volume and your device’s available memory. For typical business CSV files (thousands to tens of thousands of rows), loading three to five files simultaneously is comfortable on any modern device. For very large files or many simultaneous files, performance may slow as the data volume approaches available browser memory.
What is the difference between WHERE and HAVING?
WHERE filters individual rows before any grouping or aggregation happens. HAVING filters groups after aggregation. A rule of thumb: if the filter condition uses an aggregate function (COUNT, SUM, AVG), it must be in HAVING. If it filters on raw column values, it can be in WHERE. It is also possible to use WHERE and HAVING in the same query: WHERE narrows the rows that enter the grouping operation, and HAVING further filters the groups that appear in the result.
Why do I get unexpected results when filtering on numeric columns in CSV data?
CSV data is text. When a number is stored as text in a CSV, comparisons like WHERE salary > 50000 compare strings lexicographically rather than numerically. String comparison produces incorrect results for numeric comparisons: ‘9000’ > ‘50000’ is true in string comparison because ‘9’ > ‘5’. The solution is to cast the column to a numeric type: WHERE CAST(salary as REAL) > 50000. The SQL tool attempts to detect numeric columns automatically, but explicit CAST is more reliable for computed columns and complex conditions.
Can I create views or temporary tables in browser SQLite?
Yes, SQLite supports CREATE VIEW for defining named queries and CREATE TEMP TABLE or CREATE TEMPORARY TABLE for temporary tables that exist only for the session. Views are useful for defining complex subqueries once and reusing them across multiple queries without repeating the definition. Temporary tables are useful for staging intermediate results in complex multi-step analyses. Both are session-scoped: they exist only while the browser tab is open and reset when the page is reloaded.
How do I handle dates that are stored in different formats across my CSV files?
SQLite’s date functions work on ISO format dates (YYYY-MM-DD). For dates stored in other formats (MM/DD/YYYY, DD-MM-YYYY, etc.), you need to either convert them before loading (using ReportMedic’s Clean Data tool or the Python Code Runner) or convert them in SQL using SUBSTR and concatenation. For example, a date stored as ‘MM/DD/YYYY’:
-- Convert MM/DD/YYYY to YYYY-MM-DD in SQL
SELECT
SUBSTR(date_col, 7, 4) || '-' ||
SUBSTR(date_col, 1, 2) || '-' ||
SUBSTR(date_col, 4, 2) as iso_date
FROM your_table;
Pre-converting dates before loading is generally more reliable than in-query conversion.
What window functions are supported in SQLite?
SQLite supports a comprehensive set of window functions: ROW_NUMBER(), RANK(), DENSE_RANK(), PERCENT_RANK(), CUME_DIST(), NTILE(), LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), NTH_VALUE(), and all aggregate functions (SUM, AVG, COUNT, MIN, MAX) as window functions. Frame specifications (ROWS BETWEEN, RANGE BETWEEN) are supported. SQLite’s window function support is generally equivalent to the SQL standard for the functions listed.
How do I export query results for use in other tools?
The SQL Query tool provides a button to export the current query results as a CSV file. This exported CSV contains exactly the rows and columns from your query result, which can be opened in Excel, loaded into another tool, or used as input to another ReportMedic tool for further processing.
Can I use SQL to join a CSV file with data I paste directly?
SQLite supports VALUES clauses that create inline table data. You can create a temporary reference table by using VALUES in a CTE:
WITH region_map(region_code, region_name) AS (
VALUES
('N', 'North'),
('S', 'South'),
('E', 'East'),
('W', 'West')
)
SELECT
o.order_id,
rm.region_name,
o.amount
FROM orders o
JOIN region_map rm ON o.region_code = rm.region_code;
This is useful when you have a small reference table that does not warrant a separate CSV file.
Is SQL difficult to learn for someone who currently uses spreadsheets?
SQL is accessible for spreadsheet users because the conceptual operations are familiar: SELECT is like choosing which columns to display, WHERE is like using AutoFilter, GROUP BY with aggregate functions is like creating a pivot table, ORDER BY is like sorting, and JOIN is like VLOOKUP. The syntax is different from spreadsheet formulas, but the analytical concepts translate directly. Most spreadsheet users who work with data regularly can write useful SQL queries after a few hours of practice with the basic constructs covered in this guide. The SQL Query tool provides immediate feedback on every query, making it an excellent self-teaching environment.
Key Takeaways
SQL is the standard language for data analysis across every scale of operation, and browser-based SQLite makes it immediately accessible for CSV analysis without any database setup.
ReportMedic’s Query CSV with SQL tool loads your CSV files as SQLite tables and runs SQL queries locally in the browser. Your data never leaves your device, which matters for the financial reports, customer data, HR records, and business metrics that typically live in CSV files.
The full SQL feature set is available: filtering, aggregation, joins, subqueries, CTEs, window functions (ROW_NUMBER, RANK, LAG, LEAD, SUM OVER, AVG OVER), string functions, date arithmetic, and set operations.
The practical advantage of SQL over spreadsheet formulas grows with data complexity: multi-table analysis, ranking within groups, running totals, cohort analysis, and customer segmentation queries are all straightforward in SQL and painful in spreadsheets.
The tool integrates naturally with ReportMedic’s Data Profiler for understanding data before querying, ReportMedic’s Clean Data tool for preparing files before loading, and the Python Code Runner for programmatic transformations that SQL cannot express cleanly.
Write better queries. Get better answers. Keep your data on your device.
Explore all of ReportMedic’s browser-based tools at reportmedic.org.
Building a Personal SQL Query Library
As you write SQL queries for recurring analysis tasks, building a personal library of saved queries saves time and improves consistency.
Organizing Your Query Library
A practical approach to organizing saved SQL queries:
/sql-queries/
/sales/
monthly_revenue.sql
top_customers.sql
regional_breakdown.sql
mom_growth.sql
/hr/
headcount_by_dept.sql
salary_analysis.sql
tenure_report.sql
/finance/
expense_by_category.sql
period_comparison.sql
budget_vs_actual.sql
/templates/
data_quality_check.sql
column_profiling.sql
top_n_by_group.sql
Each query file is a plain text file containing the SQL with a brief comment at the top describing what it does and what columns the input CSV needs.
Template Queries Worth Saving
Top N per group (works for any table and dimension):
-- Top [N] [items] by [metric] within each [dimension]
-- Input columns: dimension_column, item_column, metric_column
WITH ranked AS (
SELECT
[dimension_column],
[item_column],
SUM(CAST([metric_column] as REAL)) as total_metric,
RANK() OVER (PARTITION BY [dimension_column]
ORDER BY SUM(CAST([metric_column] as REAL)) DESC) as rnk
FROM [your_table]
GROUP BY [dimension_column], [item_column]
)
SELECT [dimension_column], [item_column], total_metric
FROM ranked
WHERE rnk <= 5 -- Change N here
ORDER BY [dimension_column], rnk;
Period-over-period comparison (works for any time period):
-- Period-over-period [metric] change
-- Input columns: date_column, metric_column
WITH periods AS (
SELECT
strftime('[period_format]', [date_column]) as period,
SUM(CAST([metric_column] as REAL)) as metric_value
FROM [your_table]
GROUP BY period
)
SELECT
period,
metric_value,
LAG(metric_value) OVER (ORDER BY period) as prior_period,
ROUND(
(metric_value - LAG(metric_value) OVER (ORDER BY period)) * 100.0 /
LAG(metric_value) OVER (ORDER BY period), 1
) as change_pct
FROM periods
ORDER BY period;
Data quality check template:
-- Data quality summary for [table_name]
SELECT
'[column_name]' as column_name,
COUNT(*) as total_rows,
COUNT([column_name]) as non_null,
COUNT(*) - COUNT([column_name]) as null_count,
ROUND((COUNT(*) - COUNT([column_name])) * 100.0 / COUNT(*), 1) as null_pct,
COUNT(DISTINCT [column_name]) as distinct_values
FROM [table_name];
Common SQL Mistakes and How to Avoid Them
Forgetting CAST for Numeric Operations
The most common error when querying CSVs is treating numeric columns as text for comparisons and arithmetic. CSV data is always text initially.
The problem:
-- This produces string comparison, not numeric comparison
SELECT * FROM products WHERE price > 100;
-- '9.99' > '100' is TRUE in string comparison (9 > 1)
The fix:
SELECT * FROM products WHERE CAST(price as REAL) > 100;
For columns you know are always numeric, you can also use arithmetic to force numeric comparison:
SELECT * FROM products WHERE price + 0 > 100; -- Forces numeric context
GROUP BY Missing Columns
Every column in SELECT that is not inside an aggregate function must appear in GROUP BY:
-- This is incorrect (SQLite may accept it but produces undefined results)
SELECT department, name, AVG(salary)
FROM employees
GROUP BY department; -- 'name' is missing from GROUP BY
-- Correct: either add name to GROUP BY
SELECT department, name, AVG(salary)
FROM employees
GROUP BY department, name;
-- Or aggregate name if you want one row per department
SELECT department, GROUP_CONCAT(name) as employees, AVG(salary)
FROM employees
GROUP BY department;
Confusing WHERE and HAVING
-- WRONG: Using WHERE with an aggregate
SELECT department, COUNT(*) as count
FROM employees
WHERE COUNT(*) > 5 -- Error: cannot use aggregate in WHERE
GROUP BY department;
-- CORRECT: Use HAVING for aggregate conditions
SELECT department, COUNT(*) as count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
NULL Comparison Errors
In SQL, NULL is not equal to anything, including itself:
-- This finds no rows, even if status column has NULLs
SELECT * FROM orders WHERE status = NULL; -- Wrong
-- Correct null comparison
SELECT * FROM orders WHERE status IS NULL;
SELECT * FROM orders WHERE status IS NOT NULL;
JOIN Direction Errors
LEFT JOIN returns all rows from the left table plus matched rows from the right. Swapping the table order changes which table “keeps all rows”:
-- All customers, even those with no orders
SELECT c.name, o.amount
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id;
-- This is different: all orders, even those with no matching customer
SELECT c.name, o.amount
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id;
When you want to preserve all rows from a specific table and join optional data from another, put the “all rows” table on the left side of a LEFT JOIN.
SQL in the Context of the Full Data Stack
Understanding where SQL fits within a complete data analysis workflow helps you use each tool appropriately.
The Data Analysis Stack
A complete data analysis workflow typically moves through these stages:
Data collection: Data arrives in CSV from a system export, an API pull, a form submission, a spreadsheet export, or any other source.
Data inspection: A quick look at the data structure, identifying columns, checking for obvious issues, understanding scale. ReportMedic’s Data Profiler handles this efficiently.
Data cleaning: Fixing quality issues: trimming whitespace, normalizing formats, removing duplicates, filling gaps. ReportMedic’s Clean Data tool handles standard cleaning operations.
Data transformation: Complex programmatic transformations that require looping logic, regex processing, or joining logic beyond SQL’s convenience. ReportMedic’s Python Code Runner handles this.
Data analysis: Filtering, aggregation, ranking, joining, and computing derived metrics. The SQL Query tool is purpose-built for this stage.
Result sharing: Exporting query results as CSV for charts, reports, or distribution. The SQL tool’s CSV export feeds into this.
When to Use SQL vs. Python for Analysis
SQL and Python overlap significantly for data analysis tasks. Both can filter, aggregate, join, and transform data. The choice between them for a given task:
SQL is the natural choice when:
The data is already in tabular CSV form and needs filtering, grouping, and joining
The analysis question is expressible in set-based declarative terms
The result is a summary table or report
Multiple people need to read and understand the analysis logic
Python is the natural choice when:
The transformation requires iteration logic (loops over rows with conditional state)
The analysis uses mathematical operations from NumPy or SciPy
The data requires text processing with regex across many records
The workflow involves API calls or web scraping alongside data manipulation
The output is a visualization, a model, or a complex object rather than a table
In practice, many workflows use both: Python for preprocessing and transformation, SQL for analysis and aggregation.
Exporting for Visualization
SQL query results exported as CSV can feed directly into visualization tools. The workflow:
Write SQL to produce a clean, aggregated summary (monthly revenue by region, top 10 products by category, etc.)
Export the result as CSV
Load the CSV into a visualization tool (Excel, Google Sheets, Tableau, Flourish, Datawrapper)
The SQL step ensures the visualization tool receives clean, correctly aggregated data rather than raw individual records. This makes the visualization simpler to build and easier to maintain: when the underlying data changes, re-run the SQL and re-load the exported CSV.
Frequently Asked Questions (Continued)
Can I use SQL to create new tables from query results?
Yes, SQLite supports CREATE TABLE AS SELECT which creates a new table populated with the result of a query:
CREATE TABLE high_value_customers AS
SELECT customer_id, customer_email, SUM(CAST(amount as REAL)) as ltv
FROM orders
WHERE status = 'completed'
GROUP BY customer_id, customer_email
HAVING SUM(CAST(amount as REAL)) > 1000;
This creates a temporary table that can be queried in subsequent SQL. It exists for the session and resets when the page is reloaded.
How do I handle CSVs with inconsistent column names across files?
When joining CSVs from different sources, column names may not match exactly. Two approaches: use aliases in the query to normalize names, or use a CASE expression to map variant names to a standard. For systematic column name normalization across many files, ReportMedic’s Auto-Map Columns tool handles bulk renaming before loading into the SQL tool.
What is the maximum CSV size the SQL tool can handle?
The practical limit depends on your device’s available memory. The SQL tool loads the entire CSV into memory as a SQLite database. Modern devices with 8GB+ RAM can comfortably handle CSV files of several hundred megabytes (millions of rows for typical column counts). For very large files (gigabytes), desktop SQLite tools or cloud-based SQL environments handle the volume better. For most business CSV analysis (sales exports, HR data, marketing reports), the browser tool handles the file sizes comfortably.
How do I query dates stored as Unix timestamps?
Unix timestamps (integer seconds since January 1, 1970) are common in data exports from systems that use epoch time internally. SQLite handles Unix timestamps with datetime() and strftime():
-- Convert Unix timestamp to readable date
SELECT
order_id,
datetime(created_at, 'unixepoch') as created_date,
strftime('%Y-%m', datetime(created_at, 'unixepoch')) as year_month
FROM orders;
-- Filter by date range with Unix timestamps
SELECT * FROM events
WHERE created_at >= strftime('%s', '2024-01-01')
AND created_at < strftime('%s', '2024-04-01');
A Closing Note on SQL as a Career Skill
SQL is consistently listed among the most valuable technical skills in data-related job postings. Data analyst, business analyst, product analyst, data scientist, data engineer, and marketing analyst roles all list SQL as either required or preferred. This is not a trend but a reflection of the reality that SQL is how data professionals interrogate databases at every organization that has data (which is every organization).
Learning SQL with browser-based tools removes the infrastructure barrier that previously required either organizational access to a database or the overhead of setting up a local database server. The ReportMedic SQL Query tool makes SQL immediately accessible: bring any CSV, write any query, see results in seconds.
The SQL learned in this browser-based environment transfers directly to production databases. The queries that work against a CSV in SQLite are the same queries that work against PostgreSQL, MySQL, BigQuery, and Snowflake tables, with minor dialect variations for database-specific functions. Browser-based practice builds the skills that production environments use.
Start with the queries in this guide. Modify them for your own data. Build your personal library. The syntax that feels unfamiliar today becomes second nature after a few dozen queries against data you actually care about.
Quick SQL Reference: The Most Used Constructs
For quick reference while writing queries, here are the constructs used most frequently:
-- Basic SELECT with filter and sort
SELECT col1, col2, col3
FROM table_name
WHERE condition
ORDER BY col1 DESC
LIMIT 10;
-- Aggregation
SELECT group_col, COUNT(*), SUM(metric), AVG(metric)
FROM table_name
GROUP BY group_col
HAVING COUNT(*) > 1;
-- INNER JOIN
SELECT a.col, b.col
FROM table_a a
JOIN table_b b ON a.id = b.foreign_id;
-- LEFT JOIN (all from left, matched from right)
SELECT a.col, COALESCE(b.col, 'default') as b_col
FROM table_a a
LEFT JOIN table_b b ON a.id = b.foreign_id;
-- CTE
WITH cte_name AS (
SELECT ...
)
SELECT * FROM cte_name WHERE ...;
-- Window function
SELECT col,
ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY sort_col) as rn,
SUM(metric) OVER (PARTITION BY group_col) as group_total,
LAG(metric) OVER (ORDER BY date_col) as prev_value
FROM table_name;
-- CASE WHEN
SELECT
CASE WHEN value > 100 THEN 'High'
WHEN value > 50 THEN 'Mid'
ELSE 'Low'
END as category
FROM table_name;
-- Conditional aggregation
SELECT
SUM(CASE WHEN status = 'active' THEN 1 ELSE 0 END) as active_count,
SUM(CASE WHEN status = 'inactive' THEN 1 ELSE 0 END) as inactive_count
FROM table_name;
-- Date operations (SQLite)
SELECT
strftime('%Y-%m', date_col) as year_month,
date('now', '-30 days') as thirty_days_ago,
julianday(end_date) - julianday(start_date) as days_elapsed
FROM table_name;
These constructs handle the vast majority of real-world CSV analysis tasks. Master them in the browser tool and apply them anywhere SQL runs.
What Comes Next After Browser SQL
Once you are comfortable with SQL on CSV files in the browser, the path to more powerful SQL environments is straightforward.
Local SQLite: Download the SQLite command-line shell (free, tiny, no installation steps beyond downloading a single executable). Connect to any .db file, run queries, import CSVs. All the SQL you learned in the browser works identically.
DB Browser for SQLite: A free, open-source graphical application for SQLite. Provides a full GUI for creating databases, importing CSVs, running queries, and visualizing schema. The same SQL syntax, with a more complete application interface.
PostgreSQL: The most capable open-source relational database. Available as a local installation or as a managed cloud service (Supabase, Neon, Render, AWS RDS). Adds features beyond SQLite: full-text search, JSON operators, more statistical functions, and enterprise-scale performance. The SQL from this guide transfers almost directly; a few SQLite-specific functions have PostgreSQL equivalents.
BigQuery, Snowflake, Redshift: Cloud data warehouse platforms. SQL dialects are mostly standard with platform-specific extensions. Data volumes scale to billions of rows. Used at organizations with significant data infrastructure.
The browser-based SQL tool is the entry point. The SQL itself is the skill. Both travel forward.
Explore all of ReportMedic’s browser-based tools at reportmedic.org.
