Interview Questions
Data Analyst Interview Questions
Practice data analyst interview questions across SQL, metrics, analytics cases, dashboards, Excel, experimentation, data quality, and stakeholder communication. Use this as a focused question list alongside the full Data Analyst Interview Guide.
18 questions
7 categories
Data Analyst
Updated May 2026
SQL Interview Questions
The best way to prepare for analyst SQL is to practice realistic business questions. Focus on users, orders, events, subscriptions, retention, revenue, and funnel movement.
Framework — Filter -> join -> group -> order -> limit
Assume we have customers(customer_id, name) and orders(order_id, customer_id, order_date, status, revenue). I would first clarify whether revenue should include canceled or refunded orders. Usually, we should include completed orders only. The query filters orders to the last 30 days, filters to completed status, groups by customer, sums revenue, orders descending, and limits to 5. A clean version would use a CTE for eligible orders if the logic becomes more complex. Important details: use the correct date column, avoid joining to item-level tables unless pre-aggregated, and decide how to handle ties. If the database supports it, use a ranking window function when ties should be included rather than a simple LIMIT 5. A strong answer also validates the result by checking total eligible revenue, row count, and whether any top customer has suspiciously duplicated order rows.
Likely follow-ups
How would you include ties for 5th place?
What if revenue is stored at order item level?
How would you exclude refunded orders?
Framework — Define cohort -> define return event -> left join -> aggregate
First define retention precisely. I would define January cohort users as users whose signup date is between January 1 and February 1. A user is 7-day retained if they have at least one qualifying activity event on the 7th day after signup, or within days 1-7 if the company defines retention as returning within 7 days. Clarify this before querying. For exact day-7 retention, create a cohort CTE with user_id and signup_date. Then left join events on user_id where event_date equals signup_date + interval 7 days and event_name is a meaningful active event. Count distinct cohort users as denominator and distinct users with a matching event as numerator. The key is to preserve users with no activity by using a left join. If you use an inner join, you remove non-retained users and inflate retention. Also avoid multiple events duplicating retained users by counting distinct user_id. Final metric: retained_users / cohort_users. Segment by acquisition channel, platform, geography, or signup week if the interviewer asks for diagnosis.
Likely follow-ups
What is the difference between day-7 retention and rolling 7-day retention?
How would you handle users who signed up less than 7 days ago?
What events should count as active?
Framework — Rank purchases per user
Use a window function to rank completed purchases for each user by purchase timestamp. Filter to completed purchases first, then apply row_number() over partition by user_id order by purchase_at. The second purchase is where row_number = 2. The filtering order matters. If canceled or refunded purchases should not count, remove them before ranking. If multiple purchases have the same timestamp, add a deterministic tie-breaker such as order_id. The output can include user_id and second_purchase_at. If the interviewer asks for users who made at least two purchases, return only rows with rank 2. If they ask for all users and null for users without a second purchase, left join the result back to the users table. Time complexity is handled by the database, but in practical terms this benefits from indexes on user_id and purchase timestamp for large tables.
Likely follow-ups
How would you include users with no second purchase?
What if two purchases happened at the same timestamp?
How would you calculate days between first and second purchase?
Framework — One row per user with step timestamps
I would create one row per user with the earliest timestamp for each funnel step. This avoids double-counting users who trigger the same event multiple times. Use conditional aggregation over the events table: min(case when event_name = signup then event_at end), min(case when event_name = email_verified then event_at end), and so on. Then enforce sequence if required. For example, email verification should happen after signup, onboarding after verification, and first purchase after onboarding. If the business wants loose conversion regardless of order, state that assumption explicitly. Conversion metrics: signup to verification, verification to onboarding, onboarding to purchase, and overall signup to purchase. Use distinct users at each step as numerator and the previous step as denominator. Segment by platform, acquisition channel, device, geography, and cohort date to identify where drop-off is concentrated. Important edge cases: duplicated events, users skipping steps, timezone boundaries, bot/test accounts, and late-arriving event data.
Likely follow-ups
How would you enforce event order?
How would you diagnose a sudden drop in onboarding completion?
How would you visualize this funnel?
Metrics and Business Case Questions
Analytics case interviews test whether you can define the right metric, diagnose movement, segment intelligently, and recommend an action with appropriate confidence.
Framework — Validate -> segment -> funnel -> external factors -> recommendation
First validate whether the drop is real. Check instrumentation changes, data pipeline delays, timezone issues, bot filtering, app release changes, and whether the drop appears across multiple dashboards or raw tables. Then segment the drop: platform, app version, geography, acquisition channel, user tenure, paid versus free users, device type, and traffic source. A global drop suggests a tracking, infrastructure, or broad product issue. A narrow drop points to a platform, release, market, or channel. Next inspect the user journey. Did app opens drop, login success drop, homepage loads fail, notifications decline, or core actions decline after users arrived? If app opens are stable but core activity is down, the problem is inside the product. If app opens are down, look at notifications, acquisition, seasonality, outages, or external events. Finally, recommend action based on root cause. If iOS DAU dropped after a release and crash rate increased, rollback or hotfix. If only paid acquisition users dropped, check campaign spend and attribution. If data is delayed, communicate the limitation before creating false urgency.
Likely follow-ups
What chart would you look at first?
How would you distinguish seasonality from a product issue?
What if DAU dropped but revenue increased?
Framework — Goal -> primary metric -> input metrics -> guardrails -> segments
Start with the goal. A recommendation feature might aim to increase discovery, engagement, conversion, retention, or order value. The primary metric should reflect the intended user value, not just clicks. For example, for ecommerce recommendations, a strong metric could be purchases or add-to-cart actions from recommended items per active user. Input metrics include recommendation impressions, click-through rate, add-to-cart rate, conversion rate, revenue per session, coverage, and diversity. Guardrails include returns, refunds, low-quality clicks, page latency, user complaints, and cannibalization of organic discovery. Segment by new versus returning users, category, device, traffic source, and recommendation surface. A model may improve average performance while hurting new users or long-tail product discovery. I would evaluate the feature with an A/B test if possible, then monitor longer-term retention and repeat purchase behavior. A short-term click lift is not enough if users are being nudged toward irrelevant or low-satisfaction items.
Likely follow-ups
What if CTR increases but purchases do not?
How would you measure recommendation quality?
How would you detect cannibalization?
Framework — Decompose revenue into traffic, conversion, AOV, mix, price, retention
Revenue is usually traffic times conversion rate times average order value, with additional effects from product mix, pricing, repeat purchase, and refunds. If revenue increased while conversion decreased, several explanations are possible. Traffic volume may have increased enough to offset lower conversion. Average order value may have risen due to pricing changes, bundles, enterprise customers, larger carts, or product mix shifting toward expensive items. The company may have reduced low-intent traffic, causing fewer conversions by rate but more valuable purchases. There could also be tracking changes affecting either revenue or conversion. I would segment by channel, product, geography, new versus returning users, customer tier, and device. Then decompose revenue into sessions, conversion, AOV, refund rate, and repeat purchases. I would also check whether conversion is measured at session, user, or visitor level because denominator changes can create misleading trends. The recommendation depends on quality. If revenue growth comes from healthier high-value customers, lower conversion may be acceptable. If it comes from a one-time price increase while new customer conversion is weakening, that may be a future growth risk.
Likely follow-ups
How would you build a revenue decomposition dashboard?
What if paid traffic changed significantly?
When is lower conversion acceptable?
Dashboards and Data Visualization
Dashboard questions test whether you know how to communicate data. A good dashboard is not a collection of charts; it is a decision tool for a specific audience.
Framework — Audience -> decisions -> metrics -> cuts -> alerts
First define the audience and decisions. Executives need a high-level view of growth, retention, monetization, and risk. They do not need every operational detail on the first screen. Top-level metrics: monthly recurring revenue, net revenue retention, gross revenue retention, new MRR, expansion MRR, contraction MRR, churned MRR, active customers, trial-to-paid conversion, ARPU, CAC payback if available, and forecast versus target. Useful cuts: customer segment, acquisition channel, plan, geography, company size, cohort month, and sales-assisted versus self-serve. Visuals should include trend lines, cohort retention, MRR bridge, churn reasons, and target variance. Design principles: show definitions, freshness timestamp, filters, owner, and alert thresholds. Avoid vanity metrics and avoid mixing user counts with revenue metrics without clear labels. The dashboard should answer: are we growing, why, where is risk, and what should leadership investigate next?
Likely follow-ups
What would you put above the fold?
How would this differ for a product manager?
How would you prevent dashboard misuse?
Framework — Clarify decision -> explain risk -> propose better view
I would first clarify what decision they are trying to make. Sometimes a stakeholder asks for a specific chart because they already have a narrative in mind, but the real need is a business answer. Then I would explain the risk clearly and non-defensively. For example, a cumulative revenue chart may always go up and hide a recent slowdown. A pie chart with too many categories may obscure differences. A chart without confidence intervals may overstate precision. A conversion rate without traffic mix may mislead. I would propose an alternative that answers the same question more accurately: trend line, cohort chart, funnel, distribution, segmented bar chart, or metric decomposition. If they still need the original chart, I might include it with caveats, but I would not present misleading analysis as my recommendation. The goal is to preserve trust. Analysts should be helpful, but they also own analytical integrity.
Likely follow-ups
How would you handle pressure from a senior stakeholder?
What chart types are commonly misused?
How do you communicate uncertainty visually?
Excel and Spreadsheet Questions
Many data analyst roles still rely heavily on spreadsheets. Interviewers may test formulas, pivots, cleanup, reconciliation, and whether you can build a model that others can audit.
Framework — Profile -> standardize -> validate -> document
First profile the data: row count, column names, missing values, duplicates, data types, impossible values, date formats, and outliers. I would preserve a raw copy before making transformations. Then standardize fields: trim whitespace, normalize casing, parse dates, convert currencies or units, split combined fields if needed, and map inconsistent categories to a controlled list. For duplicates, define the business key before removing anything. Next validate totals against a trusted source. For example, total revenue should reconcile to finance exports, order counts should match source systems, and date ranges should be complete. I would create checks for null rates, distinct counts, and category values. Finally, document transformations. A spreadsheet used for decision-making should make assumptions visible, separate raw data from cleaned data, and avoid hidden manual edits that cannot be audited.
Likely follow-ups
How do you handle duplicate rows?
What spreadsheet formulas do you use most often?
How would you make the workbook auditable?
Framework — Exploration versus controlled calculation
Pivot tables are excellent for quick exploration, grouping, slicing, and summarizing data across dimensions. They are useful when the question is about totals, counts, averages, or trends by category and when stakeholders want interactive filtering. Formula-based analysis is better when logic is custom, multi-step, auditable, or needs precise control. Examples include cohort calculations, waterfall models, weighted scoring, exception flags, reconciliation checks, or inputs feeding a forecast. In practice, I often use both: pivot tables to explore patterns quickly, then formulas or a cleaner model to produce the final answer. For recurring reporting, I prefer a reproducible query or BI pipeline over a fragile manual spreadsheet.
Likely follow-ups
What are the risks of pivot tables?
How would you explain VLOOKUP versus INDEX/MATCH or XLOOKUP?
When should analysis move out of Excel into SQL or BI?
Experimentation and A/B Testing
Data analysts are often asked to design, read, or critique experiments. The important skill is knowing what conclusion the data supports and what it does not support.
Framework — Hypothesis -> randomization -> metrics -> guardrails -> decision
Hypothesis: the redesigned checkout page reduces friction and increases completed purchases without creating negative downstream effects. First confirm experiment setup: randomization unit, sample size, duration, exposure logging, eligibility, and whether users stay in the same variant across sessions. Randomizing by session instead of user could contaminate results if users return. Primary metric: checkout conversion rate from checkout start to completed purchase. Secondary metrics: payment failure rate, time to checkout, average order value, add-on attachment, and return visits. Guardrails: refunds, chargebacks, support tickets, latency, error rate, and customer complaints. Analysis should include statistical significance and practical significance. A tiny lift may not justify engineering complexity. Segment analysis can reveal whether mobile improved while desktop declined, but avoid overreacting to noisy subgroups. Recommendation: ship if the primary metric improves meaningfully, guardrails are healthy, instrumentation is trustworthy, and the effect persists across the full test period.
Likely follow-ups
What if conversion improves but refund rate increases?
How do you avoid peeking at test results too early?
What if the test is positive only for new users?
Framework — Check power -> inspect direction -> evaluate cost -> decide
A non-significant result does not automatically mean the feature has no effect. First check whether the test was powered to detect a meaningful effect. If sample size was too small or the metric is noisy, the result may be inconclusive. Then inspect the effect size and confidence interval. If the interval includes both meaningful upside and meaningful downside, the test is uncertain. If the interval is tightly centered near zero, the feature likely has little impact on that metric. Next consider cost and strategy. If the feature is expensive to maintain and shows no measurable benefit, do not ship or roll it back. If it is strategically important, low-risk, or improves qualitative user experience, consider iterating or measuring a better metric. I would summarize the result as: what we tested, what we observed, how confident we are, what limitations exist, and what decision I recommend.
Likely follow-ups
What is the difference between no effect and inconclusive?
How would you explain this to a non-technical stakeholder?
When would you run the test again?
Data Quality and Analytical Judgment
A data analyst is trusted only if the data is trusted. Interviews often test whether you can catch bad data before it becomes a bad decision.
Framework — Definitions -> sources -> filters -> grain -> timing -> reconciliation
First compare definitions. One dashboard may show gross revenue while another shows net revenue after refunds, discounts, taxes, or chargebacks. Revenue recognition timing may also differ: order date, payment date, shipment date, or invoice date. Then compare data sources, filters, and grain. One dashboard may exclude test accounts, canceled orders, internal users, enterprise invoices, or certain geographies. Another may join order items incorrectly and duplicate revenue. Also check timezone and data freshness. I would reconcile from a trusted source by building a bridge: start with revenue from dashboard A, then add or subtract differences step by step until it matches dashboard B. Each difference should have a named reason. The final output should not just be “dashboard A is wrong.” It should be a corrected definition, owner, source of truth, and plan to prevent future confusion.
Likely follow-ups
How would you decide the source of truth?
What if finance and product use different definitions?
How would you communicate the discrepancy?
Framework — Measure missingness -> diagnose cause -> choose treatment -> disclose
First quantify missingness: which fields, how many rows, what percentage, and whether missingness varies by segment, time, source, or platform. Missing data is not always random. Then diagnose why it is missing. It could be optional user input, tracking failure, late-arriving data, integration issues, privacy restrictions, or a legitimate not-applicable value. Treatment depends on cause. Options include excluding rows, imputing values, creating an unknown category, backfilling from another source, or changing the analysis scope. I would avoid blindly filling missing values with zero because zero and unknown mean different things. Finally, disclose the impact. State how missing data affects confidence and whether the recommendation changes under reasonable assumptions.
Likely follow-ups
When is it okay to exclude missing rows?
What is the risk of imputing values?
How would you detect tracking failure?
Behavioral and Stakeholder Questions
Data analyst behavioral questions usually focus on ambiguity, stakeholder pressure, communication, prioritization, and moments when the data did not support what someone wanted to hear.
Framework — Decision -> analysis -> insight -> recommendation -> impact
Choose a story where the analysis clearly affected a decision. Start with the business decision at stake: launch, pricing, marketing spend, product change, operations, or prioritization. Then explain the analysis in plain language. What data did you use, what metric mattered, what segments did you inspect, and what was surprising? Avoid spending the whole answer on tools. The interviewer wants to know how your work changed thinking. Next state the recommendation and impact. For example, your analysis showed that a campaign looked profitable overall but lost money in one channel, so the team shifted budget and improved ROI. Or a product feature increased clicks but reduced retention, so the team rolled it back. A strong answer includes caveats and stakeholder communication. Explain how you handled uncertainty and how you made the recommendation understandable.
Likely follow-ups
How did you measure impact?
Who disagreed with the recommendation?
What would you do differently now?
Framework — Clarify urgency -> provide directional read -> state limitations -> plan follow-up
First clarify the decision and deadline. If the decision is low-risk, a directional answer may be acceptable. If it affects revenue, customers, compliance, or strategy, the quality bar should be higher. Then explain what can be answered now and what cannot. I might provide a preliminary read with clear caveats: “Based on currently available data, the trend appears negative, but tracking is incomplete for Android users, so I would not make a final launch decision yet.” I would separate the immediate answer from the follow-up plan. Immediate: best available estimate and confidence level. Follow-up: data cleaning steps, validation checks, source-of-truth reconciliation, and when a stronger answer will be ready. This approach keeps the stakeholder moving while protecting analytical integrity. The worst answer is a confident number that is fast but wrong.
Likely follow-ups
How do you push back on unrealistic deadlines?
When is directional analysis acceptable?
How do you communicate confidence level?
Framework — Impact -> urgency -> effort -> dependency -> stakeholder alignment
I prioritize based on business impact, urgency, effort, dependencies, and whether the analysis supports an irreversible or high-stakes decision. A request tied to a launch decision tomorrow usually beats a nice-to-have dashboard improvement. I also clarify the decision each request supports. If a request has no clear decision or owner, it may need refinement before analysis starts. For recurring requests, I look for automation opportunities so the team is not trapped in manual reporting. When priorities conflict, I communicate tradeoffs: “I can complete the churn analysis today or the dashboard refresh today, but not both. Since the churn analysis affects this week's retention plan, I recommend doing that first.” Strong analysts do not just accept every request in order. They help the organization spend analytical time where it changes decisions.
Likely follow-ups
How do you say no to a stakeholder?
What work would you automate?
How do you handle executive requests?
Practice these answers live
Interview Pilot gives you real-time Copilot answer suggestions during live interviews, so you can respond clearly when these questions come up.
