Range Optimization 1
For a BTREE
index, an interval might be usable for conditions combined with AND
, where each condition compares a key part with a constant value using =, <=>, IS NULL, >, <, >=, <=, !=, <>, BETWEEN, or LIKE 'pattern' (where 'pattern' does not start with a wildcard)
. An interval can be used as long as it is possible to determine a single key tuple containing all rows that match the condition (or two intervals if <> or != is used).
The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is =, <=>, or IS NULL
. If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE
, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses = from the first comparison. It also uses >= from the second comparison but considers no further key parts and does not use the third comparison for interval construction:
1
2
3
4
key_part1 = 'foo' AND key_part2 >= 10 AND key_part3 > 10
the single interval is
('foo',10,-inf) < (key_part1,key_part2,key_part3) < ('foo',+inf,+inf)
总结,联合索引中从左到右只用一次range。
The key_len
specifies the number of bytes that MySQL uses from the key.
Indexes are always used left_to_right.2
Index Merge Optimization 3
以SELECT * FROM TB1 WHERE c1=”xxx” AND c2=”“xxx” 为例: 1、 当c1列和c2列选择性较高时,按照c1和c2条件进行查询性能较高且返回数据集较小,再对两个数据量较小的数据集求交集的操作成本也较低,最终整个语句查询高效; 2、 当c1列或c2列选择性较差且统计信息不准时,比如整表数据量2000万,按照c2列条件返回1500万数据,按照c1列返回1000条数据,此时按照c2列条件进行索引扫描+聚集索引查找的操作成本极高(可能是整表扫描的百倍消耗),对1000条数据和1500万数据求交集的成本也极高,最终导致整条SQL需要消耗大量CPU和IO资源且相应时间超长,而如果值使用c1列的索引,查询消耗资源较少且性能较高。
最左匹配原则
Good Ref: https://www.cnblogs.com/developer_chan/p/9223671.html
- 在最佳左前缀法则中,如果最左前列(带头大哥)的索引失效,则后面的索引都失效。4
About Explain
in Explain
, type ref
:
https://stackoverflow.com/questions/4508055/what-does-eq-ref-and-ref-types-mean-in-mysql-explain
Foreign Key
foreign key referential actions 5
1
2
3
4
5
6
7
8
[CONSTRAINT [symbol]] FOREIGN KEY
[index_name] (col_name, ...)
REFERENCES tbl_name (col_name,...)
[ON DELETE reference_option]
[ON UPDATE reference_option]
reference_option:
RESTRICT | CASCADE | SET NULL | NO ACTION | SET DEFAULT
CASCADE
: Delete or update the row from the parent table, and automatically delete or update the matching rows in the child table.SET NULL
: Delete or update the row from the parent table, and set the foreign key column or columns in the child table to NULL.- If you specify a SET NULL action, make sure that you have not declared the columns in the child table as NOT NULL.
RESTRICT
: (default) Rejects the delete or update operation for the parent table.NO ACTION
: A keyword from standard SQL. In MySQL, equivalent toRESTRICT
.
Does MySQL index foreign key columns automatically?
-> YES for InnoDB
owner SMALLINT UNSIGNED NOT NULL REFERENCES person(id),
1
2
3
4
5
6
7
8
9
10
11
12
13
CREATE TABLE parent (
id INT NOT NULL,
PRIMARY KEY (id)
) ENGINE=INNODB;
CREATE TABLE child (
id INT,
parent_id INT,
INDEX par_ind (parent_id),
FOREIGN KEY (parent_id)
REFERENCES parent(id)
ON DELETE CASCADE
) ENGINE=INNODB;
SQL Recap
three table:
S(SNO, SNAME, AGE, SEX, Sdept)
SC(SNO, CNO, GRADE)
C(CNO, CNAME, TEACHER)
Below is to form the table
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (5, 'java', 59);
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (5, 'math', 30);
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (5, 'sql', 70);
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (6, 'java', 99);
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (6, 'math', 87);
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (6, 'sql', 48);
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (7, 'java', 33);
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (7, 'sql', 86);
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (8, 'sql', 93);
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (8, 'math', 71);
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (9, 'py', 100);
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (10, 'py', 80);
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (11, 'py', 70);
INSERT INTO `class`(`id`, `subject`, `grade`) VALUES (12, 'py', 23);
CREATE TABLE `qd452db`.`student` ( `id` INT NOT NULL , `name` VARCHAR(10) NOT NULL , `age` INT NOT NULL , `sex` ENUM('M','F') NOT NULL , `department` VARCHAR(10) NOT NULL ) ENGINE = InnoDB;
CREATE TABLE `qd452db`.`module` ( `id` INT NOT NULL , `name` VARCHAR(10) NOT NULL , `teacher` VARCHAR(20) NOT NULL ) ENGINE = InnoDB;
INSERT INTO `module` (`id`, `name`, `teacher`) VALUES ('1', 'math', 'mathTeacher');
INSERT INTO `module` (`id`, `name`, `teacher`) VALUES ('2', 'music', 'musicTeacher');
INSERT INTO `module` (`id`, `name`, `teacher`) VALUES ('3', 'java', 'javaTeacher');
INSERT INTO `module` (`id`, `name`, `teacher`) VALUES ('4', 'sql', 'sqlTeacher');
INSERT INTO `module` (`id`, `name`, `teacher`) VALUES ('5', 'py', 'pyTeacher');
UPDATE module SET teacher = CONCAT(name, '_Teacher');
ALTER TABLE class ADD subject_id INT;
SELECT class.id as Student_ID, class.subject as subject, module.id as Subject_id FROM class LEFT JOIN module ON class.subject = module.name ORDER BY class.id ASC;
-- Important: update field based on another table
UPDATE class
SET class.subject_id = (SELECT module.id FROM module WHERE class.subject = module.name);
ALTER TABLE class DROP subject;
ALTER TABLE class RENAME COLUMN subject_id TO subject;
DELETE FROM student WHERE id=1;
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('1', 'a', '13', 'M', 'AAA');
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('2', 'b', '10', 'M', 'AAA');
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('3', 'c', '13', 'F', 'BBB');
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('4', 'd', '16', 'M', 'BBB');
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('5', 'e', '13', 'F', 'AAA');
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('6', 'f', '16', 'M', 'BBB');
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('7', 'g', '18', 'F', 'CCC');
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('8', 'h', '11', 'M', 'BBB');
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('9', 'i', '18', 'M', 'CCC');
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('10', 'j', '15', 'F', 'BBB');
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('11', 'k', '18', 'M', 'BBB');
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('12', 'l', '18', 'M', 'CCC');
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('13', 'm', '15', 'F', 'AAA');
INSERT INTO `student` (`id`, `name`, `age`, `sex`, `department`) VALUES ('14', 'n', '18', 'M', 'BBB');
1. select SNO where all module pass
1
2
3
SELECT sno
FROM sc
GROUP BY sno HAVING MIN(grade>=60);
2. select sno where has module < 60 and module >90
1
2
3
4
SELECT id
FROM class
WHERE grade<60
AND id IN (SELECT id FROM class WHERE grade>90);
3. select cno, avg_grade where avg grade <60
1
SELECT cno, AVG(grade) FROM sc GROUP BY cno HAVING AVG(grade)<60;
5. every module avg after remove highest and lowest
1
2
3
4
5
SELECT subject, AVG(grade)
FROM class
WHERE grade != (SELECT grade FROM class ORDER BY grade ASC LIMIT 1)
AND grade !=(SELECT grade FROM class ORDER BY grade DESC LIMIT 1)
GROUP BY subject;
6. select students who do not take module 3
1
2
3
SELECT DISTINCT(id)
FROM class
WHERE id NOT IN (SELECT id FROM class WHERE subject=3);
7. select students whose ‘math’ >90 or <60
1
2
3
SELECT DISTINCT(id)
FROM class
WHERE subject='math' AND grade NOT BETWEEN 60 AND 90;
8. query subject and number of student who took it
1
2
3
4
SELECT subject, COUNT(*) as Num_Student
FROM `class` GROUP BY subject;
SELECT subject, COUNT(id) as Num_Student
FROM `class` GROUP BY subject;
9. query student id, name, sex who took module num 3
1
2
3
4
5
6
7
SELECT student.id, student.name, student.sex
FROM class LEFT JOIN student ON student.id = class.id
WHERE class.subject = 3;
-- OR
SELECT student.id, name, sex
FROM student, class
WHERE student.id = class.id AND class.subject = 3;
10. average age of students who took module num 3
1
2
3
SELECT AVG(age)
FROM student, class
WHERE student.id = class.id AND class.subject = 3;
11. average age of students who took module num 3
1
2
3
SELECT AVG(age)
FROM student, class
WHERE student.id = class.id AND class.subject = 3;
12. Many more
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
-- select class have more than 4 student
SELECT subject FROM class GROUP BY subject HAVING COUNT(id)>4;
-- http://www.mysqltutorial.org/mysql-delete-duplicate-rows/
-- https://dba.stackexchange.com/questions/214946/how-to-delete-duplicate-records-in-mysql-in-a-table-without-ids
CREATE TABLE temp LIKE class;
INSERT INTO temp
SELECT DISTINCT * FROM class;
DROP TABLE class;
RENAME TABLE temp TO class;
-- select duplicate from class
SELECT id, subject FROM class GROUP BY id, subject HAVING COUNT(*)>1;
-- add constraint to class
ALTER TABLE class
ADD CONSTRAINT id_subject UNIQUE (id, subject);
-- select student id and name who took 'math'
SELECT student.id, student.name
FROM student WHERE student.id
IN (SELECT class.id
FROM class, module
WHERE class.subject=module.id and module.name='math');
-- Better
SELECT student.id, student.name
FROM student, class, module
WHERE student.id=class.id and class.subject=module.id and module.name='math';
-- select student who took either 1 or 2
SELECT DISTINCT id FROM class WHERE subject IN (1,2);
-- select student who took both 1 and 2
SELECT DISTINCT id FROM class WHERE subject=2 AND
id IN (SELECT id FROM class WHERE subject=1);
-- select students who do not take module 3
SELECT DISTINCT(id) FROM class WHERE id NOT IN (SELECT id FROM class WHERE subject=3);
-- select student name, age who didn't take module 2
SELECT id, name, age
FROM student
WHERE
id NOT IN (SELECT id FROM class WHERE subject = 2);
-- OR
SELECT id, name, age FROM student
WHERE NOT EXISTS
(SELECT * FROM class WHERE id=student.id AND subject=2);
-- TODO: EXISTS: https://dev.mysql.com/doc/refman/8.0/en/exists-and-not-exists-subqueries.html
-- select female student who took math and failed <35
SELECT student.id, student.name, student.sex
FROM student, class, module
WHERE module.id=class.subject AND module.name='math'
AND class.id=student.id
AND student.sex='F' and class.grade<35;
-- query module name and avg grade for every module
SELECT module.name, AVG(class.grade) as AVG_grade
FROM module LEFT JOIN class ON module.id=class.subject
GROUP BY module.name
ORDER BY AVG_grade ASC;
-- OR
SELECT module.name, AVG(class.grade) as AVG_grade
FROM module, class WHERE module.id=class.subject
GROUP BY module.name
ORDER BY AVG_grade ASC;
-- query student name and his/her average grade
SELECT student.name, AVG(grade) as grade FROM student, class
WHERE student.id=class.id
GROUP BY student.name
ORDER BY grade DESC;
Star Select SQL
https://selectstarsql.com/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- Find the proportion of inmates with claims of innocence in their last statements.
SELECT
1.0 * COUNT(CASE WHEN last_statement LIKE '%innocent%'
THEN 1 ELSE NULL END) / COUNT(*)
FROM executions
-- https://selectstarsql.com/longtail.html
SELECT
county,
COUNT(CASE WHEN last_statement IS NOT NULL
THEN 1 ELSE NULL END) AS with,
COUNT(CASE WHEN last_statement IS NULL
THEN 1 ELSE NULL END) AS with
FROM executions
GROUP BY county
--OR
-- essentially they are the same, but just in different format
-- and below should be more efficient
SELECT
county,
last_statement IS NOT NULL AS has_last_statement,
COUNT(*)
FROM executions
where county='Bexar'
GROUP BY county, has_last_statement
-- Find the first and last name of the the inmate with the longest last statement (by character count).
SELECT first_name, last_name
FROM executions
WHERE LENGTH(last_statement) =
(SELECT LENGTH(last_statement) AS len
FROM executions
GROUP BY len
ORDER BY len DESC
LIMIT 1)
-- Much Simpler
SELECT first_name, last_name
FROM executions
WHERE LENGTH(last_statement) =
(SELECT MAX(LENGTH(last_statement))
FROM executions)
-- Insert the <count-of-all-rows> query to find the percentage of executions from each county.
SELECT
county,
100.0 * COUNT(*) / (SELECT COUNT(*) FROM executions)
AS percentage
FROM executions
GROUP BY county
ORDER BY percentage DESC
-- Look up the documentation to fix the query so that it returns the number of days between the dates.
-- https://www.sqlite.org/lang_datefunc.html
SELECT JULIANDAY('1993-08-10') -
JULIANDAY('1989-07-07') AS day_difference
start | end | dayDifference |
---|---|---|
1982-12-07 | 1984-03-14 | 463 |
1988-01-07 | 1988-11-03 | 301 |
2007-09-25 | 2008-06-11 | 260 |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- generate above table
SELECT
last_ex_date AS start,
ex_date AS end,
JULIANDAY(ex_date) - JULIANDAY(last_ex_date) AS day_difference
FROM executions
JOIN (
SELECT
ex_number + 1 AS ex_number,
ex_date AS last_ex_date
FROM executions
) previous
ON executions.ex_number = previous.ex_number
ORDER BY day_difference DESC
LIMIT 10
-- More Elegant
SELECT
previous.ex_date AS start,
executions.ex_date AS end,
JULIANDAY(executions.ex_date) - JULIANDAY(previous.ex_date)
AS day_difference
FROM executions
JOIN executions previous
ON executions.ex_number = previous.ex_number+1
ORDER BY day_difference DESC
LIMIT 10
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
--Find the most networked senator. That is, the one with the most mutual cosponsorships.
WITH mutuals AS
(SELECT DISTINCT a.sponsor_name, a.cosponsor_name
FROM cosponsors a
JOIN cosponsors b
ON a.cosponsor_name = b.sponsor_name
WHERE a.sponsor_name = b.cosponsor_name)
SELECT sponsor_name, COUNT(*)
FROM mutuals
GROUP BY sponsor_name
ORDER BY COUNT(*) DESC
LIMIT 1;
-- Now find the most networked senator from each state.
WITH mutual_counts AS (
SELECT
senator, state, COUNT(*) AS mutual_count
FROM (
SELECT DISTINCT
c1.sponsor_name AS senator,
c1.sponsor_state AS state,
c2.sponsor_name AS senator2
FROM cosponsors c1
JOIN cosponsors c2
ON c1.sponsor_name = c2.cosponsor_name
AND c2.sponsor_name = c1.cosponsor_name
)
GROUP BY senator, state
),
state_max AS (
SELECT
state,
MAX(mutual_count) AS max_mutual_count
FROM mutual_counts
GROUP BY state
)
SELECT
mutual_counts.state,
mutual_counts.senator,
mutual_counts.mutual_count
FROM mutual_counts
JOIN state_max
ON mutual_counts.state = state_max.state
AND mutual_counts.mutual_count = state_max.max_mutual_count
-- Find the senators who cosponsored but didn't sponsor bills.
SELECT DISTINCT cosponsor_name
FROM cosponsors
WHERE
cosponsor_name NOT IN
(SELECT DISTINCT sponsor_name
FROM cosponsors)
-- MORE efficient way
SELECT DISTINCT c1.cosponsor_name
FROM cosponsors c1
LEFT JOIN cosponsors c2
ON c1.cosponsor_name = c2.sponsor_name
-- This join identifies cosponsors
-- who have sponsored bills
WHERE c2.sponsor_name IS NULL
-- LEFT JOIN + NULL is a standard trick for excluding
-- rows. It's more efficient than WHERE ... NOT IN.
一道SQL题
现有注册用户表table_user,有两个字段:user_id(用户id)、reg_tm(注册时间)。有订单表table_order,有三个字段:order_id(订单号)、order_tm(下单时间)、user_id(用户id)。
查询2019年1月1日至今,每天的注册用户数,下单用户数,以及注册当天即下单的用户数(请尽量在一个 sql语句中实现)。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
CREATE TABLE IF NOT EXISTS tb_user
(
user_id INT NOT NULL,
reg_tm DATE NOT NULL
);
INSERT INTO tb_user VALUES (1, '2019-1-1');
INSERT INTO tb_user VALUES (2, '2019-1-1');
INSERT INTO tb_user VALUES (3, '2019-1-1');
INSERT INTO tb_user VALUES (4, '2019-1-2');
INSERT INTO tb_user VALUES (5, '2019-1-2');
INSERT INTO tb_user VALUES (6, '2019-1-4');
INSERT INTO tb_user VALUES (7, '2019-1-5');
CREATE TABLE IF NOT EXISTS tb_order
(
order_id VARCHAR(3) NOT NULL,
order_tm DATE NOT NULL,
user_id INT NOT NULL
);
INSERT INTO tb_order VALUES (1, '2019-1-1', 1);
INSERT INTO tb_order VALUES (2, '2019-1-1', 2);
INSERT INTO tb_order VALUES (3, '2019-1-2', 3);
INSERT INTO tb_order VALUES (4, '2019-1-2', 3);
INSERT INTO tb_order VALUES (5, '2019-1-2', 4);
INSERT INTO tb_order VALUES (6, '2019-1-3', 5);
INSERT INTO tb_order VALUES (7, '2019-1-5', 6);
INSERT INTO tb_order VALUES (7, '2019-1-6', 7);
INSERT INTO tb_order VALUES (7, '2019-1-6', 7);
- mysql doesn’t have
FULL OUTER JOIN
, so here useUNION
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT
tb_date.reg_tm as date,
COUNT(DISTINCT tb_user.user_id) as dailyRegiUserAmount,
COUNT(DISTINCT tb_order.user_id) as dailyOrderUserAmount,
COUNT(DISTINCT
IF(tb_user.user_id=tb_order.user_id AND
tb_user.reg_tm=tb_order.order_tm, tb_user.user_id, NULL)
) as userOrderWhenJustRegister
FROM (SELECT reg_tm FROM tb_user
UNION
SELECT order_tm FROM tb_order
ORDER BY reg_tm) as tb_date
LEFT JOIN tb_user ON tb_date.reg_tm=tb_user.reg_tm
LEFT JOIN tb_order ON tb_date.reg_tm=tb_order.order_tm
WHERE tb_date.reg_tm>'2018-12-31'
GROUP BY tb_date.reg_tm;