Prepare for data analyst interviews with SQL, dashboards, metrics, Excel, analytics case studies, experimentation, data quality, and stakeholder communication questions.
Data analyst interviews test whether you can turn messy business questions into trustworthy analysis. The strongest candidates combine SQL fluency, metric judgment, data quality discipline, clear communication, and practical business reasoning.
3-5
Typical interview rounds
45-60 min
SQL / case round length
5+
Core analyst skill areas
3-6 wks
Recommended prep window
What data analyst interviewers are evaluating
—
SQL fluency: can you join, filter, aggregate, window, and debug queries accurately?
—
Metric judgment: can you define meaningful metrics instead of blindly reporting numbers?
—
Business reasoning: can you connect data findings to customer behavior, revenue, risk, or operations?
—
Data quality discipline: can you identify missing data, duplicates, outliers, tracking gaps, and biased samples?
—
Dashboard judgment: can you design reporting that helps teams make decisions rather than admire charts?
—
Communication: can you explain assumptions, confidence, limitations, and recommendations clearly to non-technical stakeholders?
Great analysts do not just answer the query
A strong data analyst answer explains what question is being asked, whether the data can actually answer it, what assumptions are required, and what decision the analysis should support. Accuracy matters, but judgment matters just as much.
Data Analyst Interview Process
Most data analyst loops include a recruiter screen, SQL assessment, analytics case study, dashboard or visualization discussion, stakeholder communication round, and behavioral interview.
Typical data analyst interview stages
1
Recruiter screen: confirms role fit, tools, domain experience, compensation range, and motivation.
2
Hiring manager screen: covers previous analysis work, stakeholder experience, business impact, and communication style.
3
SQL round: tests joins, aggregations, filtering, window functions, date logic, null handling, and query debugging.
4
Analytics case round: asks you to investigate a metric movement, evaluate a product change, or recommend a business action.
5
Dashboard or visualization round: tests whether you can select the right metrics, chart types, filters, and reporting cadence.
6
Behavioral round: evaluates ownership, ambiguity, stakeholder management, prioritization, and how you handle flawed data.
SQL Round
Analytics Case Round
Main question
Can you retrieve and transform the right data accurately?
Can you reason from data to a decision?
Strong signal
Correct joins, clean filters, accurate aggregation level, window functions when needed
Clear hypothesis tree, segmentation, metric definition, limitations, and recommendation
Common mistake
Joining at the wrong grain and accidentally duplicating rows
Jumping to a conclusion before validating data quality and segmenting the movement
Best preparation
Practice realistic schemas, dates, cohorts, retention, funnel queries, and debugging
Practice metric diagnosis, experiment readouts, dashboard critique, and stakeholder summaries
The grain of the data is usually the trap
Many analyst interview mistakes come from joining tables at incompatible levels: user-level, order-level, item-level, session-level, or event-level. Before writing SQL, identify the unit of analysis and what each row represents.
SQL Interview Foundation
SQL interviews for data analysts are less about obscure syntax and more about correct thinking: grain, joins, filters, aggregation level, dates, nulls, and validating whether the result makes sense.
A reliable SQL interview flow
1
Clarify the business question and final output columns before writing the query.
2
Identify the grain: one row per user, order, item, session, event, account, or day.
3
Inspect which tables are needed and how they join. State whether joins are one-to-one, one-to-many, or many-to-many.
4
Choose filters carefully, especially date filters, status filters, test users, canceled orders, refunds, and internal accounts.
5
Aggregate only after the grain is correct. If needed, use CTEs to pre-aggregate before joining.
6
Use window functions for ranking, rolling metrics, deduplication, and previous/next event logic.
7
Sanity-check the result: row count, distinct counts, totals, nulls, duplicates, and whether the metric direction makes business sense.
SQL concepts analysts must know
Grain
The level represented by each row in a table or query result. Getting the grain wrong can duplicate revenue, users, orders, or events.
Left join
Keeps all records from the left table even when the right table has no match. Useful when you need to preserve users, orders, or dates with missing related activity.
Window function
Calculates a value across related rows without collapsing them. Common for ranking, running totals, lag/lead, deduplication, and cohort analysis.
Cohort
A group of users or entities sharing a start event or attribute, such as signup month, first purchase date, acquisition channel, or plan type.
âś“ Do
—
Define the row grain before joining tables
—
Use CTEs to make multi-step logic readable
—
Check whether filters belong in WHERE or JOIN conditions
—
Use distinct carefully and explain why it is needed
—
Validate output with row counts and sample records
âś— Don't
—
Use SELECT DISTINCT to hide duplicate bugs
—
Join item-level revenue directly to user-level data without pre-aggregation
—
Filter away nulls from a left join accidentally
—
Ignore timezone and date boundary assumptions
—
Return a metric without explaining what decision it supports
SQL Interview Questions
The best way to prepare for analyst SQL is to practice realistic business questions. Focus on users, orders, events, subscriptions, retention, revenue, and funnel movement.
Framework — Filter -> join -> group -> order -> limit
Assume we have customers(customer_id, name) and orders(order_id, customer_id, order_date, status, revenue). I would first clarify whether revenue should include canceled or refunded orders. Usually, we should include completed orders only.
The query filters orders to the last 30 days, filters to completed status, groups by customer, sums revenue, orders descending, and limits to 5. A clean version would use a CTE for eligible orders if the logic becomes more complex.
Important details: use the correct date column, avoid joining to item-level tables unless pre-aggregated, and decide how to handle ties. If the database supports it, use a ranking window function when ties should be included rather than a simple LIMIT 5.
A strong answer also validates the result by checking total eligible revenue, row count, and whether any top customer has suspiciously duplicated order rows.
First define retention precisely. I would define January cohort users as users whose signup date is between January 1 and February 1. A user is 7-day retained if they have at least one qualifying activity event on the 7th day after signup, or within days 1-7 if the company defines retention as returning within 7 days. Clarify this before querying.
For exact day-7 retention, create a cohort CTE with user_id and signup_date. Then left join events on user_id where event_date equals signup_date + interval 7 days and event_name is a meaningful active event. Count distinct cohort users as denominator and distinct users with a matching event as numerator.
The key is to preserve users with no activity by using a left join. If you use an inner join, you remove non-retained users and inflate retention. Also avoid multiple events duplicating retained users by counting distinct user_id.
Final metric: retained_users / cohort_users. Segment by acquisition channel, platform, geography, or signup week if the interviewer asks for diagnosis.
Likely follow-ups
What is the difference between day-7 retention and rolling 7-day retention?
How would you handle users who signed up less than 7 days ago?
What events should count as active?
Framework — Rank purchases per user
Use a window function to rank completed purchases for each user by purchase timestamp. Filter to completed purchases first, then apply row_number() over partition by user_id order by purchase_at. The second purchase is where row_number = 2.
The filtering order matters. If canceled or refunded purchases should not count, remove them before ranking. If multiple purchases have the same timestamp, add a deterministic tie-breaker such as order_id.
The output can include user_id and second_purchase_at. If the interviewer asks for users who made at least two purchases, return only rows with rank 2. If they ask for all users and null for users without a second purchase, left join the result back to the users table.
Time complexity is handled by the database, but in practical terms this benefits from indexes on user_id and purchase timestamp for large tables.
Likely follow-ups
How would you include users with no second purchase?
What if two purchases happened at the same timestamp?
How would you calculate days between first and second purchase?
Framework — One row per user with step timestamps
I would create one row per user with the earliest timestamp for each funnel step. This avoids double-counting users who trigger the same event multiple times. Use conditional aggregation over the events table: min(case when event_name = signup then event_at end), min(case when event_name = email_verified then event_at end), and so on.
Then enforce sequence if required. For example, email verification should happen after signup, onboarding after verification, and first purchase after onboarding. If the business wants loose conversion regardless of order, state that assumption explicitly.
Conversion metrics: signup to verification, verification to onboarding, onboarding to purchase, and overall signup to purchase. Use distinct users at each step as numerator and the previous step as denominator. Segment by platform, acquisition channel, device, geography, and cohort date to identify where drop-off is concentrated.
Important edge cases: duplicated events, users skipping steps, timezone boundaries, bot/test accounts, and late-arriving event data.
Likely follow-ups
How would you enforce event order?
How would you diagnose a sudden drop in onboarding completion?
How would you visualize this funnel?
Metrics and Business Case Questions
Analytics case interviews test whether you can define the right metric, diagnose movement, segment intelligently, and recommend an action with appropriate confidence.
First validate whether the drop is real. Check instrumentation changes, data pipeline delays, timezone issues, bot filtering, app release changes, and whether the drop appears across multiple dashboards or raw tables.
Then segment the drop: platform, app version, geography, acquisition channel, user tenure, paid versus free users, device type, and traffic source. A global drop suggests a tracking, infrastructure, or broad product issue. A narrow drop points to a platform, release, market, or channel.
Next inspect the user journey. Did app opens drop, login success drop, homepage loads fail, notifications decline, or core actions decline after users arrived? If app opens are stable but core activity is down, the problem is inside the product. If app opens are down, look at notifications, acquisition, seasonality, outages, or external events.
Finally, recommend action based on root cause. If iOS DAU dropped after a release and crash rate increased, rollback or hotfix. If only paid acquisition users dropped, check campaign spend and attribution. If data is delayed, communicate the limitation before creating false urgency.
Likely follow-ups
What chart would you look at first?
How would you distinguish seasonality from a product issue?
Start with the goal. A recommendation feature might aim to increase discovery, engagement, conversion, retention, or order value. The primary metric should reflect the intended user value, not just clicks. For example, for ecommerce recommendations, a strong metric could be purchases or add-to-cart actions from recommended items per active user.
Input metrics include recommendation impressions, click-through rate, add-to-cart rate, conversion rate, revenue per session, coverage, and diversity. Guardrails include returns, refunds, low-quality clicks, page latency, user complaints, and cannibalization of organic discovery.
Segment by new versus returning users, category, device, traffic source, and recommendation surface. A model may improve average performance while hurting new users or long-tail product discovery.
I would evaluate the feature with an A/B test if possible, then monitor longer-term retention and repeat purchase behavior. A short-term click lift is not enough if users are being nudged toward irrelevant or low-satisfaction items.
Revenue is usually traffic times conversion rate times average order value, with additional effects from product mix, pricing, repeat purchase, and refunds. If revenue increased while conversion decreased, several explanations are possible.
Traffic volume may have increased enough to offset lower conversion. Average order value may have risen due to pricing changes, bundles, enterprise customers, larger carts, or product mix shifting toward expensive items. The company may have reduced low-intent traffic, causing fewer conversions by rate but more valuable purchases. There could also be tracking changes affecting either revenue or conversion.
I would segment by channel, product, geography, new versus returning users, customer tier, and device. Then decompose revenue into sessions, conversion, AOV, refund rate, and repeat purchases. I would also check whether conversion is measured at session, user, or visitor level because denominator changes can create misleading trends.
The recommendation depends on quality. If revenue growth comes from healthier high-value customers, lower conversion may be acceptable. If it comes from a one-time price increase while new customer conversion is weakening, that may be a future growth risk.
Likely follow-ups
How would you build a revenue decomposition dashboard?
What if paid traffic changed significantly?
When is lower conversion acceptable?
Dashboards and Data Visualization
Dashboard questions test whether you know how to communicate data. A good dashboard is not a collection of charts; it is a decision tool for a specific audience.
First define the audience and decisions. Executives need a high-level view of growth, retention, monetization, and risk. They do not need every operational detail on the first screen.
Top-level metrics: monthly recurring revenue, net revenue retention, gross revenue retention, new MRR, expansion MRR, contraction MRR, churned MRR, active customers, trial-to-paid conversion, ARPU, CAC payback if available, and forecast versus target.
Useful cuts: customer segment, acquisition channel, plan, geography, company size, cohort month, and sales-assisted versus self-serve. Visuals should include trend lines, cohort retention, MRR bridge, churn reasons, and target variance.
Design principles: show definitions, freshness timestamp, filters, owner, and alert thresholds. Avoid vanity metrics and avoid mixing user counts with revenue metrics without clear labels. The dashboard should answer: are we growing, why, where is risk, and what should leadership investigate next?
I would first clarify what decision they are trying to make. Sometimes a stakeholder asks for a specific chart because they already have a narrative in mind, but the real need is a business answer.
Then I would explain the risk clearly and non-defensively. For example, a cumulative revenue chart may always go up and hide a recent slowdown. A pie chart with too many categories may obscure differences. A chart without confidence intervals may overstate precision. A conversion rate without traffic mix may mislead.
I would propose an alternative that answers the same question more accurately: trend line, cohort chart, funnel, distribution, segmented bar chart, or metric decomposition. If they still need the original chart, I might include it with caveats, but I would not present misleading analysis as my recommendation.
The goal is to preserve trust. Analysts should be helpful, but they also own analytical integrity.
Likely follow-ups
How would you handle pressure from a senior stakeholder?
What chart types are commonly misused?
How do you communicate uncertainty visually?
Dashboard design principles
—
Start with the decision and audience, not the chart type.
—
Define every metric directly in or near the dashboard.
—
Show trend, target, and segment whenever possible.
—
Include freshness, data source, owner, and known caveats.
—
Avoid overloading one dashboard with executive, product, finance, and operations needs at once.
—
Use alerts only for metrics that require action, not every metric that moves.
Excel and Spreadsheet Questions
Many data analyst roles still rely heavily on spreadsheets. Interviewers may test formulas, pivots, cleanup, reconciliation, and whether you can build a model that others can audit.
First profile the data: row count, column names, missing values, duplicates, data types, impossible values, date formats, and outliers. I would preserve a raw copy before making transformations.
Then standardize fields: trim whitespace, normalize casing, parse dates, convert currencies or units, split combined fields if needed, and map inconsistent categories to a controlled list. For duplicates, define the business key before removing anything.
Next validate totals against a trusted source. For example, total revenue should reconcile to finance exports, order counts should match source systems, and date ranges should be complete. I would create checks for null rates, distinct counts, and category values.
Finally, document transformations. A spreadsheet used for decision-making should make assumptions visible, separate raw data from cleaned data, and avoid hidden manual edits that cannot be audited.
Likely follow-ups
How do you handle duplicate rows?
What spreadsheet formulas do you use most often?
How would you make the workbook auditable?
Framework — Exploration versus controlled calculation
Pivot tables are excellent for quick exploration, grouping, slicing, and summarizing data across dimensions. They are useful when the question is about totals, counts, averages, or trends by category and when stakeholders want interactive filtering.
Formula-based analysis is better when logic is custom, multi-step, auditable, or needs precise control. Examples include cohort calculations, waterfall models, weighted scoring, exception flags, reconciliation checks, or inputs feeding a forecast.
In practice, I often use both: pivot tables to explore patterns quickly, then formulas or a cleaner model to produce the final answer. For recurring reporting, I prefer a reproducible query or BI pipeline over a fragile manual spreadsheet.
Likely follow-ups
What are the risks of pivot tables?
How would you explain VLOOKUP versus INDEX/MATCH or XLOOKUP?
When should analysis move out of Excel into SQL or BI?
Spreadsheet skills to be ready for
XLOOKUP / INDEX MATCH
Used to join values from one table into another. Analysts should understand lookup keys, missing matches, duplicate keys, and exact versus approximate matching.
Pivot table
A fast way to aggregate and slice data by dimensions. Strong analysts know how to validate the source range, aggregation type, and filters.
Reconciliation
The process of confirming that analysis totals match a trusted source, such as finance, product analytics, CRM, or billing systems.
Experimentation and A/B Testing
Data analysts are often asked to design, read, or critique experiments. The important skill is knowing what conclusion the data supports and what it does not support.
Hypothesis: the redesigned checkout page reduces friction and increases completed purchases without creating negative downstream effects.
First confirm experiment setup: randomization unit, sample size, duration, exposure logging, eligibility, and whether users stay in the same variant across sessions. Randomizing by session instead of user could contaminate results if users return.
Primary metric: checkout conversion rate from checkout start to completed purchase. Secondary metrics: payment failure rate, time to checkout, average order value, add-on attachment, and return visits. Guardrails: refunds, chargebacks, support tickets, latency, error rate, and customer complaints.
Analysis should include statistical significance and practical significance. A tiny lift may not justify engineering complexity. Segment analysis can reveal whether mobile improved while desktop declined, but avoid overreacting to noisy subgroups.
Recommendation: ship if the primary metric improves meaningfully, guardrails are healthy, instrumentation is trustworthy, and the effect persists across the full test period.
Likely follow-ups
What if conversion improves but refund rate increases?
How do you avoid peeking at test results too early?
What if the test is positive only for new users?
Framework — Check power -> inspect direction -> evaluate cost -> decide
A non-significant result does not automatically mean the feature has no effect. First check whether the test was powered to detect a meaningful effect. If sample size was too small or the metric is noisy, the result may be inconclusive.
Then inspect the effect size and confidence interval. If the interval includes both meaningful upside and meaningful downside, the test is uncertain. If the interval is tightly centered near zero, the feature likely has little impact on that metric.
Next consider cost and strategy. If the feature is expensive to maintain and shows no measurable benefit, do not ship or roll it back. If it is strategically important, low-risk, or improves qualitative user experience, consider iterating or measuring a better metric.
I would summarize the result as: what we tested, what we observed, how confident we are, what limitations exist, and what decision I recommend.
Likely follow-ups
What is the difference between no effect and inconclusive?
How would you explain this to a non-technical stakeholder?
When would you run the test again?
Data Quality and Analytical Judgment
A data analyst is trusted only if the data is trusted. Interviews often test whether you can catch bad data before it becomes a bad decision.
First compare definitions. One dashboard may show gross revenue while another shows net revenue after refunds, discounts, taxes, or chargebacks. Revenue recognition timing may also differ: order date, payment date, shipment date, or invoice date.
Then compare data sources, filters, and grain. One dashboard may exclude test accounts, canceled orders, internal users, enterprise invoices, or certain geographies. Another may join order items incorrectly and duplicate revenue. Also check timezone and data freshness.
I would reconcile from a trusted source by building a bridge: start with revenue from dashboard A, then add or subtract differences step by step until it matches dashboard B. Each difference should have a named reason.
The final output should not just be “dashboard A is wrong.” It should be a corrected definition, owner, source of truth, and plan to prevent future confusion.
Likely follow-ups
How would you decide the source of truth?
What if finance and product use different definitions?
First quantify missingness: which fields, how many rows, what percentage, and whether missingness varies by segment, time, source, or platform. Missing data is not always random.
Then diagnose why it is missing. It could be optional user input, tracking failure, late-arriving data, integration issues, privacy restrictions, or a legitimate not-applicable value. Treatment depends on cause.
Options include excluding rows, imputing values, creating an unknown category, backfilling from another source, or changing the analysis scope. I would avoid blindly filling missing values with zero because zero and unknown mean different things.
Finally, disclose the impact. State how missing data affects confidence and whether the recommendation changes under reasonable assumptions.
Likely follow-ups
When is it okay to exclude missing rows?
What is the risk of imputing values?
How would you detect tracking failure?
Worked Example
Data quality checklist before sharing analysis
You are about to present an analysis showing that a new onboarding flow improved activation.
1
Validate population
Confirm that the user cohort includes the intended signup dates, platforms, countries, and eligibility rules.
2
Validate metric
Confirm activation definition, event tracking, duplicate events, bot/test users, and whether late-arriving events are included.
3
Validate comparison
Check that control and treatment groups are comparable, or that pre/post comparisons account for seasonality and acquisition mix.
4
Validate conclusion
Test whether the conclusion holds across important segments and whether guardrail metrics moved negatively.
Result
The analysis becomes more credible because it explains not just the result, but why the result can be trusted.
Behavioral and Stakeholder Questions
Data analyst behavioral questions usually focus on ambiguity, stakeholder pressure, communication, prioritization, and moments when the data did not support what someone wanted to hear.
Choose a story where the analysis clearly affected a decision. Start with the business decision at stake: launch, pricing, marketing spend, product change, operations, or prioritization.
Then explain the analysis in plain language. What data did you use, what metric mattered, what segments did you inspect, and what was surprising? Avoid spending the whole answer on tools. The interviewer wants to know how your work changed thinking.
Next state the recommendation and impact. For example, your analysis showed that a campaign looked profitable overall but lost money in one channel, so the team shifted budget and improved ROI. Or a product feature increased clicks but reduced retention, so the team rolled it back.
A strong answer includes caveats and stakeholder communication. Explain how you handled uncertainty and how you made the recommendation understandable.
Likely follow-ups
How did you measure impact?
Who disagreed with the recommendation?
What would you do differently now?
Framework — Clarify urgency -> provide directional read -> state limitations -> plan follow-up
First clarify the decision and deadline. If the decision is low-risk, a directional answer may be acceptable. If it affects revenue, customers, compliance, or strategy, the quality bar should be higher.
Then explain what can be answered now and what cannot. I might provide a preliminary read with clear caveats: “Based on currently available data, the trend appears negative, but tracking is incomplete for Android users, so I would not make a final launch decision yet.”
I would separate the immediate answer from the follow-up plan. Immediate: best available estimate and confidence level. Follow-up: data cleaning steps, validation checks, source-of-truth reconciliation, and when a stronger answer will be ready.
This approach keeps the stakeholder moving while protecting analytical integrity. The worst answer is a confident number that is fast but wrong.
I prioritize based on business impact, urgency, effort, dependencies, and whether the analysis supports an irreversible or high-stakes decision. A request tied to a launch decision tomorrow usually beats a nice-to-have dashboard improvement.
I also clarify the decision each request supports. If a request has no clear decision or owner, it may need refinement before analysis starts. For recurring requests, I look for automation opportunities so the team is not trapped in manual reporting.
When priorities conflict, I communicate tradeoffs: “I can complete the churn analysis today or the dashboard refresh today, but not both. Since the churn analysis affects this week's retention plan, I recommend doing that first.”
Strong analysts do not just accept every request in order. They help the organization spend analytical time where it changes decisions.
Likely follow-ups
How do you say no to a stakeholder?
What work would you automate?
How do you handle executive requests?
Data Analyst Prep Strategy
Data analyst prep should combine SQL drills, business case practice, dashboard critique, spreadsheet fluency, and communication practice. The goal is to become accurate, clear, and decision-oriented.
4-week data analyst interview prep plan
1
Week 1: SQL fundamentals. Practice joins, group by, having, dates, nulls, case statements, and aggregation at the correct grain.
2
Week 2: intermediate SQL. Practice window functions, cohorts, retention, funnels, deduplication, ranking, and query debugging.
3
Week 3: analytics cases. Practice metric drops, experiment readouts, revenue decomposition, dashboard design, and business recommendations.
4
Week 4: communication and mock interviews. Practice explaining analysis to non-technical stakeholders, defending assumptions, and discussing past projects.
Role-specific prep by company type
—
Product analytics roles: focus on funnels, retention, experimentation, event data, segmentation, and product recommendations.
—
Marketing analytics roles: focus on attribution, CAC, LTV, channel performance, incrementality, and campaign reporting.
—
Finance or revenue analytics roles: focus on revenue definitions, cohort revenue, forecasting, pricing, churn, and reconciliation.
—
Operations analytics roles: focus on SLAs, capacity, defect rates, queue times, process bottlenecks, and root-cause analysis.
—
Marketplace analytics roles: focus on supply-demand balance, liquidity, matching, pricing, geography, and two-sided marketplace metrics.
Do not sound like a reporting tool
Data analyst interviews reward people who can make decisions clearer. Do not only say what chart you would build or what query you would run. Explain what the result would mean and what action it should inform.
Key Takeaway
Great data analyst interview answers combine SQL accuracy, metric judgment, data quality discipline, and clear business communication. The best analysts do not just produce numbers. They help teams make better decisions with the right level of confidence.
Practice these questions live
Interview Pilot gives you real-time Interview Copilot answer suggestions during live interviews, so you can respond clearly when Data Analyst questions come up.