前言
在 MYSQL LEFT JOIN 详解 这篇文章中我们已经知道了,LEFT JOIN 是自己选择驱动表的,而 JOIN 是 MYSQL 优化器选择驱动标的。
如果会优化的话,那么什么时候会优化呢?
在我们线上有这么一条慢 SQL(已处理),执行时间超过 0.5 秒。
select count(distinct order.order_id) from order force index(shop_id) left join `order_extend`on `order`.`order_id` = `order_extend`.`order_id` where `order`.`create_time` >= "2020-08-01 00:00:00" and `order`.`create_time` <= "2020-08-01 23:59:59" and `order`.`shop_id` = 328449726569069326 and `order`.`status` = 1 and `order_extend`.`shop_id` = 328449726569069326 and `order_extend`.`status` = 1复制代码通过 explain,再结合我们之前讲的 MYSQL 连接查询算法,驱动表为 order_extend,循环 3892 次,说多也不多,说少也不少,被驱动表数据查询类型为 eq_ref,所以应该不会太慢,那么问题就出现在 3892 次上面了,想办法将这个数字降下来即可。
顺着这个思路,既然驱动表变了,说明这条 SQL 变为 JOIN 语句了。
MYSQL 选择 order_extend 当做驱动表,说明在 where 条件下 order_extend 查询的数据更少,MYSQL 会选择一个小的表当做驱动表。
为了不影响我们的分析,我们使用 explain 语句,这样整个过程就都是估算的结果,模拟一下 MYSQL 分析的过程。
mysql> explain select count(distinct order.order_id) from order force index(shop_id) where `order`.`create_time` >= "2020-08-01 00:00:00" and `order`.`create_time` <= "2020-08-01 23:59:59" and `order`.`shop_id` = 328449726569069326 and `order`.`status` = 1;+----+-------------+-------+------------+------+--------------------------------+---------+---------+-------+--------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------------+------+--------------------------------+---------+---------+-------+--------+-------------+| 1 | SIMPLE | order | NULL | ref | PRIMARY,shop_id,create_time... | shop_id | 8 | const | 320372 | Using where |+----+-------------+-------+------------+------+--------------------------------+---------+---------+-------+--------+-------------+1 row in set, 1 warning (0.00 sec)复制代码select count(distinct order_extend.order_id) and `order_extend`.`shop_id` = 328449726569069326 and `order_extend`.`status` = 1+----+-------------+--------------+------------+------+------------------+---------+---------+-------+------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+--------------+------------+------+------------------+---------+---------+-------+------+----------+-------------+| 1 | SIMPLE | order_extend | NULL | ref | order_id,shop_id | shop_id | 8 | const | 3892 | 10.00 | Using where |+----+-------------+--------------+------------+------+------------------+---------+---------+-------+------+----------+-------------+1 row in set, 1 warning (0.00 sec)复制代码那么我们再来看看为什么 order 表会扫描这么多数据呢?在 2020-08-01 这一天可能也没有这么多数据啊。那么这个时候我们应该会很容易的想到,是强制走索引的问题,因为在上述查询语句中,我们强制走了 shop_id 索引,这个索引可能不是最优索引,我们把 force index(shop_id) 去掉再试试看
mysql> explain select count(distinct order.order_id) where `order`.`create_time` >= "2020-08-01 00:00:00" and `order`.`create_time` <= "2020-08-01 23:59:59" and `order`.`shop_id` = 328449726569069326 and `order`.`status` = 1;+----+-------------+-------+------------+------+---------------+-------------+---------+-------+-------+----------+--------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+-------+------------+------+---------------+-------------+---------+-------+-------+----------+--------------------------+| 1 | SIMPLE | order | NULL | ref | create_time | create_time | 8 | const | <3892 | 10.00 | Using where; Using index |+----+-------------+-------+------------+------+---------------+-------------+---------+-------+-------+----------+--------------------------+1 row in set, 1 warning (0.00 sec)复制代码所以最终的这条慢 SQL 的原因确定了,是因为我们强制走 shop_id 索引,导致 MYSQL 扫描的行数更多了,我们只需要去掉强制走索引即可,大多数时间 MYSQL 都会选择正确的索引,所以强制使用索引的时候一定要小心谨慎。
问题延伸
答案是会的。那么什么时候会出现这种情况呢?
为了方便阅读,我们将部分内容粘贴出来。
mysql> select * from goods left join goods_category on goods.category_id = goods_category.category_id;+----------+------------+-------------+-------------+---------------+| goods_id | goods_name | category_id | category_id | category_name |+----------+------------+-------------+-------------+---------------+| 1 | 男鞋1 | 1 | 1 | 鞋 || 2 | 男鞋2 | 1 | 1 | 鞋 || 3 | 男鞋3 | 3 | 3 | 羽绒服 || 4 | T恤1 | 2 | 2 | T恤 || 5 | T恤2 | 2 | 2 | T恤 |+----------+------------+-------------+-------------+---------------+5 rows in set (0.00 sec)mysql> select * from goods left join goods_category on goods.category_id = goods_category.category_id;+----------+------------+-------------+-------------+---------------+| goods_id | goods_name | category_id | category_id | category_name |+----------+------------+-------------+-------------+---------------+| 1 | 男鞋1 | 1 | 1 | 鞋 || 2 | 男鞋2 | 1 | 1 | 鞋 || 3 | 男鞋3 | 4 | NULL | NULL || 4 | T恤1 | 2 | 2 | T恤 || 5 | T恤2 | 2 | 2 | T恤 |+----------+------------+-------------+-------------+---------------+5 rows in set (0.00 sec)mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id and g.goods_name = 'T恤1');+----------+------------+-------------+-------------+---------------+| goods_id | goods_name | category_id | category_id | category_name |+----------+------------+-------------+-------------+---------------+| 1 | 男鞋1 | 1 | NULL | NULL || 2 | 男鞋2 | 1 | NULL | NULL || 3 | 男鞋3 | 4 | NULL | NULL || 4 | T恤1 | 2 | 2 | T恤 || 5 | T恤2 | 2 | NULL | NULL |+----------+------------+-------------+-------------+---------------+5 rows in set (0.00 sec)mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id and c.category_name = 'T恤');+----------+------------+-------------+-------------+---------------+| goods_id | goods_name | category_id | category_id | category_name |+----------+------------+-------------+-------------+---------------+| 1 | 男鞋1 | 1 | NULL | NULL || 2 | 男鞋2 | 1 | NULL | NULL || 3 | 男鞋3 | 4 | NULL | NULL || 4 | T恤1 | 2 | 2 | T恤 || 5 | T恤2 | 2 | 2 | T恤 |+----------+------------+-------------+-------------+---------------+5 rows in set (0.00 sec)mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id) where c.category_name = '鞋';+----------+------------+-------------+-------------+---------------+| goods_id | goods_name | category_id | category_id | category_name |+----------+------------+-------------+-------------+---------------+| 1 | 男鞋1 | 1 | 1 | 鞋 || 2 | 男鞋2 | 1 | 1 | 鞋 |+----------+------------+-------------+-------------+---------------+2 rows in set (0.00 sec)mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id) where g.goods_name = 'T恤1';+----------+------------+-------------+-------------+---------------+| goods_id | goods_name | category_id | category_id | category_name |+----------+------------+-------------+-------------+---------------+| 4 | T恤1 | 2 | 2 | T恤 |+----------+------------+-------------+-------------+---------------+1 row in set (0.00 sec)mysql> select * from goods g left join goods_category c on (g.category_id = c.category_id and g.goods_name = 'T恤2') where g.goods_name = 'T恤1';+----------+------------+-------------+-------------+---------------+| goods_id | goods_name | category_id | category_id | category_name |+----------+------------+-------------+-------------+---------------+| 4 | T恤1 | 2 | NULL | NULL |+----------+------------+-------------+-------------+---------------+1 row in set (0.00 sec)复制代码所以,我们可以想到,LEFT JOIN 优化为 JOIN 的条件为:where 条件中有被驱动表的非空条件时,LEFT JOIN 等价于 JOIN。
我们再来编写一个测试用例来验证一下我们的结论。
CREATE TABLE `A` ( `id` int(11) auto_increment, `a` int(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `a` (`a`)) ENGINE=InnoDB;delimiter ;;create procedure idata()begin declare i int; set i=1; while(i<=100)do insert into A (`a`) values(i); set i=i+1; end while;end;;delimiter ;call idata();CREATE TABLE `B` ( `id` int(11) auto_increment, `b` int(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `b` (`b`)) ENGINE=InnoDB;delimiter ;;create procedure idata()begin declare i int; set i=1; while(i<=100)do insert into B (`b`) values(i); set i=i+1; end while;end;;delimiter ;call idata();复制代码从上面看,给 B 表增加了 where 条件之后,如果 B 表扫描的行数更少,那么是有可能换驱动表的,这也说明了,LEFT JOIN 语句被优化成了 JOIN 语句。
总结
同时我们得出了一条结论:当有被驱动表的非空 where 条件时,MYSQL 会将 LEFT JOIN 语句优化为 JOIN 语句。
