数据分析面试高频考点:窗口函数实战详解(从sqlzoo到LeetCode)

张开发
2026/4/8 12:16:59 15 分钟阅读

分享文章

数据分析面试高频考点:窗口函数实战详解(从sqlzoo到LeetCode)
窗口函数实战指南从SQL基础到数据分析面试通关在数据分析岗位的面试中窗口函数Window Functions几乎是必考的核心技能点。无论是电商平台的用户行为分析还是金融领域的风险监控窗口函数都能高效解决传统SQL难以处理的复杂场景。本文将带你从基础概念出发通过真实业务案例掌握窗口函数在面试与实际工作中的灵活应用。1. 窗口函数核心概念解析窗口函数与传统GROUP BY聚合的本质区别在于它能够在保留原始行记录的同时进行跨行计算。这种特性使其成为解决排名、移动平均、累计求和等问题的利器。1.1 三大排序函数对比-- 创建测试数据 CREATE TABLE sales ( salesperson VARCHAR(50), region VARCHAR(50), amount DECIMAL(10,2) ); INSERT INTO sales VALUES (张三,华东,5000), (李四,华东,5000), (王五,华北,4000), (赵六,华南,6000);执行以下查询观察差异SELECT salesperson, region, amount, RANK() OVER(ORDER BY amount DESC) AS rank_val, DENSE_RANK() OVER(ORDER BY amount DESC) AS dense_rank_val, ROW_NUMBER() OVER(ORDER BY amount DESC) AS row_num FROM sales;结果对比salespersonregionamountrank_valdense_rank_valrow_num赵六华南6000111张三华东5000222李四华东5000223王五华北4000434关键区别RANK()并列排名会跳过后续名次如两个第2名后接第4名DENSE_RANK()并列排名不跳号两个第2名后接第3名ROW_NUMBER()始终生成连续唯一序号1.2 PARTITION BY的分区魔法PARTITION BY是窗口函数的精髓所在它相当于GROUP BY的分组概念但不会减少结果行数SELECT salesperson, region, amount, RANK() OVER(PARTITION BY region ORDER BY amount DESC) AS regional_rank FROM sales;salespersonregionamountregional_rank张三华东50001李四华东50001王五华北40001赵六华南60001提示当需要同时计算整体排名和分组排名时可以在一个查询中使用多个窗口函数2. 高频面试题型实战破解2.1 电商用户消费排名分析场景某电商平台需要分析用户消费行为计算每个用户的消费金额排名并找出各消费层级的关键用户。WITH user_orders AS ( SELECT user_id, SUM(order_amount) AS total_amount FROM orders WHERE order_date BETWEEN 2023-01-01 AND 2023-12-31 GROUP BY user_id ) SELECT user_id, total_amount, NTILE(5) OVER(ORDER BY total_amount DESC) AS user_tier, PERCENT_RANK() OVER(ORDER BY total_amount DESC) AS percentile FROM user_orders ORDER BY total_amount DESC;关键函数解析NTILE(n)将数据分为n个等级桶PERCENT_RANK()计算百分比排名0到1之间2.2 销售业绩环比分析需求计算各销售部门月度销售额的环比增长率WITH monthly_sales AS ( SELECT department, DATE_TRUNC(month, sale_date) AS month, SUM(amount) AS sales_amount FROM sales GROUP BY department, DATE_TRUNC(month, sale_date) ) SELECT department, TO_CHAR(month, YYYY-MM) AS month, sales_amount, LAG(sales_amount, 1) OVER(PARTITION BY department ORDER BY month) AS prev_month_amount, ROUND( (sales_amount - LAG(sales_amount, 1) OVER(PARTITION BY department ORDER BY month)) / LAG(sales_amount, 1) OVER(PARTITION BY department ORDER BY month) * 100, 2 ) AS mom_growth_rate FROM monthly_sales ORDER BY department, month;典型问题解决方案使用LAG()获取上期数据避免除零错误NULLIF(prev_month_amount, 0)处理首月无环比数据的情况COALESCE(mom_growth_rate, 0)3. 高级窗口函数技巧3.1 移动平均计算金融数据分析中常需要计算n日移动平均SELECT stock_code, trade_date, closing_price, AVG(closing_price) OVER( PARTITION BY stock_code ORDER BY trade_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW ) AS ma5, AVG(closing_price) OVER( PARTITION BY stock_code ORDER BY trade_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW ) AS ma20 FROM stock_daily ORDER BY stock_code, trade_date;窗口帧Window Frame关键参数ROWS BETWEEN n PRECEDING AND m FOLLOWINGRANGE BETWEEN INTERVAL 7 DAY PRECEDING AND CURRENT ROW3.2 累计百分比计算市场占有率分析常用技巧WITH product_sales AS ( SELECT product_id, SUM(amount) AS sales_amount FROM orders GROUP BY product_id ) SELECT product_id, sales_amount, SUM(sales_amount) OVER(ORDER BY sales_amount DESC) AS running_total, SUM(sales_amount) OVER() AS grand_total, ROUND(SUM(sales_amount) OVER(ORDER BY sales_amount DESC) / SUM(sales_amount) OVER() * 100, 2) AS cumulative_percentage FROM product_sales ORDER BY sales_amount DESC;4. 性能优化与面试避坑指南4.1 执行效率对比窗口函数 vs 子查询 vs 自连接方法可读性执行效率适用场景窗口函数★★★★★★★★★复杂分析场景子查询★★★★简单聚合自连接★★★★★历史数据对比注意在MySQL 8.0以下版本中窗口函数不可用需使用子查询或临时表替代4.2 常见面试陷阱忽略NULL值处理-- 错误示例 SELECT user_id, LAG(login_date) OVER(PARTITION BY user_id ORDER BY login_date) AS last_login FROM user_logins; -- 正确做法 SELECT user_id, COALESCE(LAG(login_date) OVER(PARTITION BY user_id ORDER BY login_date), login_date) AS last_login FROM user_logins;混淆ORDER BY作用在窗口函数中决定计算顺序在查询末尾决定最终结果排序分区过大导致性能问题-- 低效写法 SELECT AVG(price) OVER(PARTITION BY city) FROM properties; -- 优化方案 WITH city_avg AS ( SELECT city, AVG(price) AS avg_price FROM properties GROUP BY city ) SELECT p.*, c.avg_price FROM properties p JOIN city_avg c ON p.city c.city;在实际项目中窗口函数最常见的应用场景是制作日报/周报中的各类排名、对比指标。我曾在一个电商促销分析中使用RANK()结合CASE WHEN快速定位了各品类中表现最好和最差的商品帮助运营团队及时调整策略。

更多文章