Question 1

Write a query to find the top 5 customers by total revenue in the last 30 days.

Accepted Answer

Assume we have customers(customer_id, name) and orders(order_id, customer_id, order_date, status, revenue). I would first clarify whether revenue should include canceled or refunded orders. Usually, we should include completed orders only.

The query filters orders to the last 30 days, filters to completed status, groups by customer, sums revenue, orders descending, and limits to 5. A clean version would use a CTE for eligible orders if the logic becomes more complex.

Important details: use the correct date column, avoid joining to item-level tables unless pre-aggregated, and decide how to handle ties. If the database supports it, use a ranking window function when ties should be included rather than a simple LIMIT 5.

A strong answer also validates the result by checking total eligible revenue, row count, and whether any top customer has suspiciously duplicated order rows.

Question 2

Calculate 7-day retention for users who signed up in January.

Accepted Answer

First define retention precisely. I would define January cohort users as users whose signup date is between January 1 and February 1. A user is 7-day retained if they have at least one qualifying activity event on the 7th day after signup, or within days 1-7 if the company defines retention as returning within 7 days. Clarify this before querying.

For exact day-7 retention, create a cohort CTE with user_id and signup_date. Then left join events on user_id where event_date equals signup_date + interval 7 days and event_name is a meaningful active event. Count distinct cohort users as denominator and distinct users with a matching event as numerator.

The key is to preserve users with no activity by using a left join. If you use an inner join, you remove non-retained users and inflate retention. Also avoid multiple events duplicating retained users by counting distinct user_id.

Final metric: retained_users / cohort_users. Segment by acquisition channel, platform, geography, or signup week if the interviewer asks for diagnosis.

Question 3

Find each user's second purchase date.

Accepted Answer

Use a window function to rank completed purchases for each user by purchase timestamp. Filter to completed purchases first, then apply row_number() over partition by user_id order by purchase_at. The second purchase is where row_number = 2.

The filtering order matters. If canceled or refunded purchases should not count, remove them before ranking. If multiple purchases have the same timestamp, add a deterministic tie-breaker such as order_id.

The output can include user_id and second_purchase_at. If the interviewer asks for users who made at least two purchases, return only rows with rank 2. If they ask for all users and null for users without a second purchase, left join the result back to the users table.

Time complexity is handled by the database, but in practical terms this benefits from indexes on user_id and purchase timestamp for large tables.

Question 4

A funnel has signup, email verification, onboarding completion, and first purchase. How would you calculate conversion at each step?

Accepted Answer

I would create one row per user with the earliest timestamp for each funnel step. This avoids double-counting users who trigger the same event multiple times. Use conditional aggregation over the events table: min(case when event_name = signup then event_at end), min(case when event_name = email_verified then event_at end), and so on.

Then enforce sequence if required. For example, email verification should happen after signup, onboarding after verification, and first purchase after onboarding. If the business wants loose conversion regardless of order, state that assumption explicitly.

Conversion metrics: signup to verification, verification to onboarding, onboarding to purchase, and overall signup to purchase. Use distinct users at each step as numerator and the previous step as denominator. Segment by platform, acquisition channel, device, geography, and cohort date to identify where drop-off is concentrated.

Important edge cases: duplicated events, users skipping steps, timezone boundaries, bot/test accounts, and late-arriving event data.

Question 5

Daily active users dropped 12% yesterday. How would you investigate?

Accepted Answer

First validate whether the drop is real. Check instrumentation changes, data pipeline delays, timezone issues, bot filtering, app release changes, and whether the drop appears across multiple dashboards or raw tables.

Then segment the drop: platform, app version, geography, acquisition channel, user tenure, paid versus free users, device type, and traffic source. A global drop suggests a tracking, infrastructure, or broad product issue. A narrow drop points to a platform, release, market, or channel.

Next inspect the user journey. Did app opens drop, login success drop, homepage loads fail, notifications decline, or core actions decline after users arrived? If app opens are stable but core activity is down, the problem is inside the product. If app opens are down, look at notifications, acquisition, seasonality, outages, or external events.

Finally, recommend action based on root cause. If iOS DAU dropped after a release and crash rate increased, rollback or hotfix. If only paid acquisition users dropped, check campaign spend and attribution. If data is delayed, communicate the limitation before creating false urgency.

Question 6

How would you define success for a new recommendation feature?

Accepted Answer

Start with the goal. A recommendation feature might aim to increase discovery, engagement, conversion, retention, or order value. The primary metric should reflect the intended user value, not just clicks. For example, for ecommerce recommendations, a strong metric could be purchases or add-to-cart actions from recommended items per active user.

Input metrics include recommendation impressions, click-through rate, add-to-cart rate, conversion rate, revenue per session, coverage, and diversity. Guardrails include returns, refunds, low-quality clicks, page latency, user complaints, and cannibalization of organic discovery.

Segment by new versus returning users, category, device, traffic source, and recommendation surface. A model may improve average performance while hurting new users or long-tail product discovery.

I would evaluate the feature with an A/B test if possible, then monitor longer-term retention and repeat purchase behavior. A short-term click lift is not enough if users are being nudged toward irrelevant or low-satisfaction items.

Question 7

Revenue increased 20%, but conversion rate decreased. What could explain this?

Accepted Answer

Revenue is usually traffic times conversion rate times average order value, with additional effects from product mix, pricing, repeat purchase, and refunds. If revenue increased while conversion decreased, several explanations are possible.

Traffic volume may have increased enough to offset lower conversion. Average order value may have risen due to pricing changes, bundles, enterprise customers, larger carts, or product mix shifting toward expensive items. The company may have reduced low-intent traffic, causing fewer conversions by rate but more valuable purchases. There could also be tracking changes affecting either revenue or conversion.

I would segment by channel, product, geography, new versus returning users, customer tier, and device. Then decompose revenue into sessions, conversion, AOV, refund rate, and repeat purchases. I would also check whether conversion is measured at session, user, or visitor level because denominator changes can create misleading trends.

The recommendation depends on quality. If revenue growth comes from healthier high-value customers, lower conversion may be acceptable. If it comes from a one-time price increase while new customer conversion is weakening, that may be a future growth risk.

Question 8

Design a dashboard for an executive team tracking subscription business health.

Accepted Answer

First define the audience and decisions. Executives need a high-level view of growth, retention, monetization, and risk. They do not need every operational detail on the first screen.

Top-level metrics: monthly recurring revenue, net revenue retention, gross revenue retention, new MRR, expansion MRR, contraction MRR, churned MRR, active customers, trial-to-paid conversion, ARPU, CAC payback if available, and forecast versus target.

Useful cuts: customer segment, acquisition channel, plan, geography, company size, cohort month, and sales-assisted versus self-serve. Visuals should include trend lines, cohort retention, MRR bridge, churn reasons, and target variance.

Design principles: show definitions, freshness timestamp, filters, owner, and alert thresholds. Avoid vanity metrics and avoid mixing user counts with revenue metrics without clear labels. The dashboard should answer: are we growing, why, where is risk, and what should leadership investigate next?

Question 9

A stakeholder asks for a chart that you think is misleading. What do you do?

Accepted Answer

I would first clarify what decision they are trying to make. Sometimes a stakeholder asks for a specific chart because they already have a narrative in mind, but the real need is a business answer.

Then I would explain the risk clearly and non-defensively. For example, a cumulative revenue chart may always go up and hide a recent slowdown. A pie chart with too many categories may obscure differences. A chart without confidence intervals may overstate precision. A conversion rate without traffic mix may mislead.

I would propose an alternative that answers the same question more accurately: trend line, cohort chart, funnel, distribution, segmented bar chart, or metric decomposition. If they still need the original chart, I might include it with caveats, but I would not present misleading analysis as my recommendation.

The goal is to preserve trust. Analysts should be helpful, but they also own analytical integrity.

Question 10

How would you clean a messy spreadsheet before analysis?

Accepted Answer

First profile the data: row count, column names, missing values, duplicates, data types, impossible values, date formats, and outliers. I would preserve a raw copy before making transformations.

Then standardize fields: trim whitespace, normalize casing, parse dates, convert currencies or units, split combined fields if needed, and map inconsistent categories to a controlled list. For duplicates, define the business key before removing anything.

Next validate totals against a trusted source. For example, total revenue should reconcile to finance exports, order counts should match source systems, and date ranges should be complete. I would create checks for null rates, distinct counts, and category values.

Finally, document transformations. A spreadsheet used for decision-making should make assumptions visible, separate raw data from cleaned data, and avoid hidden manual edits that cannot be audited.

Question 11

When would you use a pivot table versus a formula-based analysis?

Accepted Answer

Pivot tables are excellent for quick exploration, grouping, slicing, and summarizing data across dimensions. They are useful when the question is about totals, counts, averages, or trends by category and when stakeholders want interactive filtering.

Formula-based analysis is better when logic is custom, multi-step, auditable, or needs precise control. Examples include cohort calculations, waterfall models, weighted scoring, exception flags, reconciliation checks, or inputs feeding a forecast.

In practice, I often use both: pivot tables to explore patterns quickly, then formulas or a cleaner model to produce the final answer. For recurring reporting, I prefer a reproducible query or BI pipeline over a fragile manual spreadsheet.

Question 12

How would you evaluate an A/B test for a redesigned checkout page?

Accepted Answer

Hypothesis: the redesigned checkout page reduces friction and increases completed purchases without creating negative downstream effects.

First confirm experiment setup: randomization unit, sample size, duration, exposure logging, eligibility, and whether users stay in the same variant across sessions. Randomizing by session instead of user could contaminate results if users return.

Primary metric: checkout conversion rate from checkout start to completed purchase. Secondary metrics: payment failure rate, time to checkout, average order value, add-on attachment, and return visits. Guardrails: refunds, chargebacks, support tickets, latency, error rate, and customer complaints.

Analysis should include statistical significance and practical significance. A tiny lift may not justify engineering complexity. Segment analysis can reveal whether mobile improved while desktop declined, but avoid overreacting to noisy subgroups.

Recommendation: ship if the primary metric improves meaningfully, guardrails are healthy, instrumentation is trustworthy, and the effect persists across the full test period.

Question 13

An experiment shows no statistically significant result. What do you recommend?

Accepted Answer

A non-significant result does not automatically mean the feature has no effect. First check whether the test was powered to detect a meaningful effect. If sample size was too small or the metric is noisy, the result may be inconclusive.

Then inspect the effect size and confidence interval. If the interval includes both meaningful upside and meaningful downside, the test is uncertain. If the interval is tightly centered near zero, the feature likely has little impact on that metric.

Next consider cost and strategy. If the feature is expensive to maintain and shows no measurable benefit, do not ship or roll it back. If it is strategically important, low-risk, or improves qualitative user experience, consider iterating or measuring a better metric.

I would summarize the result as: what we tested, what we observed, how confident we are, what limitations exist, and what decision I recommend.

Question 14

You find that two dashboards report different revenue numbers. What do you do?

Accepted Answer

First compare definitions. One dashboard may show gross revenue while another shows net revenue after refunds, discounts, taxes, or chargebacks. Revenue recognition timing may also differ: order date, payment date, shipment date, or invoice date.

Then compare data sources, filters, and grain. One dashboard may exclude test accounts, canceled orders, internal users, enterprise invoices, or certain geographies. Another may join order items incorrectly and duplicate revenue. Also check timezone and data freshness.

I would reconcile from a trusted source by building a bridge: start with revenue from dashboard A, then add or subtract differences step by step until it matches dashboard B. Each difference should have a named reason.

The final output should not just be “dashboard A is wrong.” It should be a corrected definition, owner, source of truth, and plan to prevent future confusion.

Question 15

How do you handle missing data in an analysis?

Accepted Answer

First quantify missingness: which fields, how many rows, what percentage, and whether missingness varies by segment, time, source, or platform. Missing data is not always random.

Then diagnose why it is missing. It could be optional user input, tracking failure, late-arriving data, integration issues, privacy restrictions, or a legitimate not-applicable value. Treatment depends on cause.

Options include excluding rows, imputing values, creating an unknown category, backfilling from another source, or changing the analysis scope. I would avoid blindly filling missing values with zero because zero and unknown mean different things.

Finally, disclose the impact. State how missing data affects confidence and whether the recommendation changes under reasonable assumptions.

Question 16

Tell me about a time your analysis changed a business decision.

Accepted Answer

Choose a story where the analysis clearly affected a decision. Start with the business decision at stake: launch, pricing, marketing spend, product change, operations, or prioritization.

Then explain the analysis in plain language. What data did you use, what metric mattered, what segments did you inspect, and what was surprising? Avoid spending the whole answer on tools. The interviewer wants to know how your work changed thinking.

Next state the recommendation and impact. For example, your analysis showed that a campaign looked profitable overall but lost money in one channel, so the team shifted budget and improved ROI. Or a product feature increased clicks but reduced retention, so the team rolled it back.

A strong answer includes caveats and stakeholder communication. Explain how you handled uncertainty and how you made the recommendation understandable.

Question 17

A stakeholder wants a quick answer, but the data is messy. What do you do?

Accepted Answer

First clarify the decision and deadline. If the decision is low-risk, a directional answer may be acceptable. If it affects revenue, customers, compliance, or strategy, the quality bar should be higher.

Then explain what can be answered now and what cannot. I might provide a preliminary read with clear caveats: “Based on currently available data, the trend appears negative, but tracking is incomplete for Android users, so I would not make a final launch decision yet.”

I would separate the immediate answer from the follow-up plan. Immediate: best available estimate and confidence level. Follow-up: data cleaning steps, validation checks, source-of-truth reconciliation, and when a stronger answer will be ready.

This approach keeps the stakeholder moving while protecting analytical integrity. The worst answer is a confident number that is fast but wrong.

Question 18

How do you prioritize multiple analytics requests?

Accepted Answer

I prioritize based on business impact, urgency, effort, dependencies, and whether the analysis supports an irreversible or high-stakes decision. A request tied to a launch decision tomorrow usually beats a nice-to-have dashboard improvement.

I also clarify the decision each request supports. If a request has no clear decision or owner, it may need refinement before analysis starts. For recurring requests, I look for automation opportunities so the team is not trapped in manual reporting.

When priorities conflict, I communicate tradeoffs: “I can complete the churn analysis today or the dashboard refresh today, but not both. Since the churn analysis affects this week's retention plan, I recommend doing that first.”

Strong analysts do not just accept every request in order. They help the organization spend analytical time where it changes decisions.

Data Analyst Interview Questions

SQL Interview Questions

Metrics and Business Case Questions

Dashboards and Data Visualization

Excel and Spreadsheet Questions

Experimentation and A/B Testing

Data Quality and Analytical Judgment

Behavioral and Stakeholder Questions

Practice these answers live