dustland

dustball in dustland

关系代数和SQL语言

关系代数与SQL语句

关系模型术语

关系模型术语 对应到表
关系
元组
属性
关系实例 表中的特定行
属性的域 该属性列中所有可能的取值

关系模式:定义类型

关系:关系模式的实例

Employer(ID,salary,gender,age);这就是关系模式

Tom(00001,5k,male,20);这就是关系

相当于

1
2
3
4
5
6
7
8
9
10
11
12
13
class Employer{//Employer关系模式
int ID;
int salary;
bool gender;
int age;
Employer(const int &i,const int &s,const bool &g,const int &a){
ID=i;
salary=s;
gender=g;
age=a;
}
};
Employer Tom(1,5000,1,20);//关系,关系实例

码:区分不同关系的标志

比如人的身份证ID

超码:关系模式中,能够唯一标识一个关系的属性或者属性集合

比如Student(ID,eduID,class,grade,name)

学生(身份证号,学号,年级,班级,姓名)

这里ID和eduID都可以作为超码,

如果认为每个班里都没有重名的人,则(年级,班级,姓名)也可以作为超码

显然超码的任何一个超集也是超码

候选码:最小超码

比如Student(ID,eduID,class,grade,name)

由于ID是超码,因此ID的任何一个超集,比如(ID,grade)也是超码

ID可以作为候选码,因为ID已经是最小超码,但是(ID,grade)不是候选码,因为去了grade,只剩下ID仍然是超码

主码(primary key):实际中,数据库设计者选中作为区分不同关系的候选码

1
2
3
4
5
6
7
mysql> CREATE TABLE troop(
-> id int,
-> name varchar(255),
-> age int,
-> PRIMARY KEY(id)
-> );
Query OK, 0 rows affected (0.01 sec)

将id作为主码

外码:

关系模式r1的属性中可能包含其他关系模式r2的主码.

则该属性在r1上称作"参照r2的外码",

关系模式r1称为"外码依赖的参照关系"

关系模式r2称为"外码的被参照关系"

比如

Person=Person(ID,eduID,name,age,gender),其中身份证号ID为主键

Student=Student(eduID,grade,class,name,age,gender),其中学号eduID为主键

那么eduID就是Person的"参照Student的外码"

Person就是"eduID外码依赖的参照关系"

Student就是"eduID外码的被参照关系"

模式图:

用模式图(schema diagram)表示含有主码和外码依赖的数据库模式

这个玩意用软件画不用手画

关系代数

image-20220914113622895
Operation 中文 符号 \(\LaTeX\)
Projection 投影 Π \Pi
Selection 选择 σ \sigma
Renaming 重命名 ρ \rho
Aggregate Function 聚合函数 G \mathcal{G}
Union \cap
Intersection \cup
Natural Join 自然连接 \bowtie or \Join
Left Outer Join 左外连接 ... 这几个直接复制吧
Right Outer Join 右外连接
Full Outer Join 全外连接
Cartesian product 笛卡尔乘积 × \times
Divide ÷ \div
Assignment 赋值 \leftarrow
And 条件并列 \land or \vee
Negation ¬ \neg
Exist 存在 \exists
For All 对所有 \forall
下标文字 σusername _{\text{}}
粗体文字 Gcount(*) \textbf{}
长长长长括号 (((( \big( \Big( \bigg( \Bigg(
比较 >≥<≤≠ \gt \ge \lt \le \ne

递归定义

假设\(E_1,E_2\)都是关系代数表达式,显然该表达式的返回值是一个关系子集

那么关系代数表达式的表达式也是关系代数表达式

相当于整数的加减乘除等运算的结果还是整数,运算的结果还能参与运算

mysql建表代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
create table if not EXISTS instructor(
ID int,
name varchar(255),
dept_name varchar(255),
salary int,
primary key(id)
);

insert ignore into instructor(ID,name,dept_name,salary) values (10101,"Srinivasan","Comp.Sci.",65000);
insert ignore into instructor(ID,name,dept_name,salary) values (12121,"Wu","Finance",90000);
insert ignore into instructor(ID,name,dept_name,salary) values (15151,"Mozart","Music",40000);
insert ignore into instructor(ID,name,dept_name,salary) values (22222,"Einstein","Physics",95000);
insert ignore into instructor(ID,name,dept_name,salary) values (32343,"El Said","History",60000);
insert ignore into instructor(ID,name,dept_name,salary) values (33456,"Gold","Physics",87000);
insert ignore into instructor(ID,name,dept_name,salary) values (45565,"Katz","Comp.Sci.",75000);
insert ignore into instructor(ID,name,dept_name,salary) values (58583,"Caliiferi","History",62000);
insert ignore into instructor(ID,name,dept_name,salary) values (76543,"Singh","Finance",80000);
insert ignore into instructor(ID,name,dept_name,salary) values (76766,"Crick","Biology",72000);
insert ignore into instructor(ID,name,dept_name,salary) values (83821,"Brandt","Comp.Sci.",92000);
insert ignore into instructor(ID,name,dept_name,salary) values (98345,"Kim","Elec.Eng.",80000);


create table if not exists teaches(
ID int,
course_id varchar(255),
sec_id int,
semester varchar(255),
year int,
primary key(course_id)
);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (10101,"CS-101",1,"Fall",2017);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (10101,"CS-315",1,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (10101,"CS-347",1,"Fall",2017);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (12121,"FIN-201",1,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (15151,"MU-199",1,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (22222,"PHY-101",1,"Fall",2017);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (32343,"HIS-351",1,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (45565,"CS-101",1,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (45565,"CS-319",1,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (76766,"BIO-101",1,"Summer",2017);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (76766,"BIO-301",1,"Summer",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (83821,"CS-190",1,"Spring",2017);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (83821,"CS-190",2,"Spring",2017);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (83821,"CS-319",2,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (98345,"EE-181",1,"Spring",2017);

后来这两个表有变动,但是变动不大

选择\(\sigma\)

\[ \sigma_{谓词条件}(关系表) \]

从目标关系表中选出满足条件的元组行

比如对于以下troop表

1
2
3
4
5
6
7
+----+--------+------+
| id | name | age |
+----+--------+------+
| 1 | Tom | 20 |
| 2 | Vader | 40 |
| 3 | Anakin | 20 |
+----+--------+------+

\[ \sigma_{age>20}(troop) \]

就相当于

1
2
3
4
5
6
7
mysql> select * from troop where age>20;
+----+-------+------+
| id | name | age |
+----+-------+------+
| 2 | Vader | 40 |
+----+-------+------+
1 row in set (0.00 sec)

\[ select * from\ 关系表\ \color{red}where \color{white}\ 关系谓词\longleftrightarrow\color{red}\sigma_{\color{white}关系谓词}\color{white}(关系表) \]

这里关系谓词就是各种等于不等于,逻辑与或非

SQL中的where才相当于关系代数中的\(\sigma\)

投影\(\Pi\)

还是对于troop表

1
2
3
4
5
6
7
8
9
mysql> select * from troop;
+----+--------+------+
| id | name | age |
+----+--------+------+
| 1 | Tom | 20 |
| 2 | Vader | 40 |
| 3 | Anakin | 20 |
+----+--------+------+
3 rows in set (0.00 sec)

如果只想观察(id,name)属性,忽略age属性 \[ \Pi_{id,name}(troops) \] 就相当于

1
2
3
4
5
6
7
8
9
mysql> select id,name from troop;
+----+--------+
| id | name |
+----+--------+
| 1 | Tom |
| 2 | Vader |
| 3 | Anakin |
+----+--------+
3 rows in set (0.00 sec)

\[ select\ 属性1,属性2{...}\ from (数据表)= \Pi_{属性1,属性2,...}(数据表) \]

组合使用\(\Pi,\sigma\)

现在只关心军队中20岁以上士兵的(id,name),那么

1
2
3
4
5
6
7
mysql> select id,name from troop where age>20;
+----+-------+
| id | name |
+----+-------+
| 2 | Vader |
+----+-------+
1 row in set (0.00 sec)

相当于 \[ \Pi_{id,name}(\sigma_{age>20}(troop)) \] 由此可见,\(\sigma\)运算的结果仍然是一张表,可以在这个子表上再进行关系运算

\(\cup\)

假设军官表长这样

1
2
3
4
5
6
7
8
9
10
11
12
mysql> select * from officer;
+----+-------+------+--------+-----------+-----------+
| id | name | age | gender | rank | position |
+----+-------+------+--------+-----------+-----------+
| 1 | Vader | 50 | 1 | marshal | Army |
| 2 | Lisa | 20 | 0 | colonel | Regiment |
| 3 | Rick | 20 | 1 | major | Battalion |
| 4 | Rex | 25 | 1 | captain | Company |
| 5 | Alpha | 23 | 1 | major | Company |
| 6 | Omega | 23 | 0 | brigadier | Brigade |
+----+-------+------+--------+-----------+-----------+
6 rows in set (0.00 sec)

id,姓名,年龄,性别,均线,职位

现在想查询所有上校团长还有准将旅长的名字

1
2
3
4
5
6
7
8
mysql> select name from officer where rank="colonel" and position="Regiment" union select name from officer where rank="brigadier" and position="Brigade";
+-------+
| name |
+-------+
| Lisa |
| Omega |
+-------+
2 rows in set (0.00 sec)

用关系代数表示就是 \[ \Pi_{name}(\sigma_{rank=colonel\ \land\ position=Regiment}(officer) )\cup\Pi_{name}(\sigma_{rank=brigadier\ \land\ position=Brigade}(officer) ) \]

这是在同一张表上联合查询,并运算也允许在两张表上联合查询,但是要求的是两种关系模型的属性个数,以及每个对等位置的属性的意义都是相同的.

比如

海军军官(id,姓名,年龄,性别,军衔,职位)

陆军军官(id,姓名,年龄,性别,军衔,职位)

这里海军军官和陆军军官两种关系就满足要求

可以联合查询海军军官里面的上校团长和陆军军官里面的上校团长

\(-\)

r,s表示两种关系模型,则r-s表示在r中但是不在s中的关系元组

类似于集合的定义\(A-B=A-AB=A(1-B)=A\overline{B}\)

比如一般情况下,连长要上尉军衔的军官担任,现在要在军官表中查询军衔不是上尉的连长的名字

可以先查出连长名表,然后减去上尉名表

然而mysql不支持except和minus关键字,可以用not in代替

1
2
3
4
5
6
7
8
9
10
11
12
mysql> select name from officer
-> where
-> position="Company" and name not in(
-> select name from officer
-> where rank="captain"
-> );
+-------+
| name |
+-------+
| Alpha |
+-------+
1 row in set (0.00 sec)

用关系代数表示相当于 \[ \Pi_{name}(\sigma_{position = Company}(officer))-\Pi_{name}(\sigma_{rank=captain}(officer)) \] 由于这是在同一张表中查询的,可以不使用差集,都写在谓词里

1
2
3
4
5
6
7
8
mysql> select name from officer where position="Company" and rank !="captain";
+-------+
| name |
+-------+
| Alpha |
+-------+
1 row in set (0.00 sec)

用关系代数表示相当于 \[ \Pi_{name}(\sigma_{position=Company\ \land\ rank=captain}(officer)) \]

卷积\(\times\)

两个关系r,s的笛卡尔乘积\(r\times s\)意思是r和s的每一行都要组成一行填到新表里

假设现在要给所有连长任命连队,符合要求的军官有

1
2
3
4
5
6
7
8
mysql> select * from officer where position="Company";
+----+-------+------+--------+---------+----------+
| id | name | age | gender | rank | position |
+----+-------+------+--------+---------+----------+
| 4 | Rex | 25 | 1 | captain | Company |
| 5 | Alpha | 23 | 1 | major | Company |
+----+-------+------+--------+---------+----------+
2 rows in set (0.00 sec)

所有连队有:

1
2
3
4
5
6
7
8
9
mysql> select * from Companies;
+----+-----------+------------+
| id | name | popularity |
+----+-----------+------------+
| 1 | the Brave | 120 |
| 2 | the Loyal | 130 |
| 3 | the Fast | 110 |
+----+-----------+------------+
3 rows in set (0.00 sec)

那么两个连长与三个连队之间的所有任命情况为

1
2
3
4
5
6
7
8
9
10
11
12
mysql> select * from (select id,name,rank from officer where position="Company")as Officer cross join companies;
+----+-------+---------+----+-----------+------------+
| id | name | rank | id | name | popularity |
+----+-------+---------+----+-----------+------------+
| 4 | Rex | captain | 1 | the Brave | 120 |
| 5 | Alpha | major | 1 | the Brave | 120 |
| 4 | Rex | captain | 2 | the Loyal | 130 |
| 5 | Alpha | major | 2 | the Loyal | 130 |
| 4 | Rex | captain | 3 | the Fast | 110 |
| 5 | Alpha | major | 3 | the Fast | 110 |
+----+-------+---------+----+-----------+------------+
6 rows in set (0.00 sec)

这里(select id,name,rank from officer where position="Company")as Officer是给结果子表起了一个别名(alias),叫做Officer,不起别名会导致mysql报错

写成关系代数就相当于 \[ \Pi_{id,name,rank}(\sigma_{position=Company}(officer))\times\ companies \]

这种式子在计算的时候需要区分优先级,最开始计算的应该是 \[ A=\sigma_{position=Company}(officer) \]

然后计算 \[ B=\Pi_{id,name,rank}(A) \] 最后计算

\[ C=B\times companies \]

更名\(\rho\)

实际上就是mysql中的alias别名 \[ \rho_{alias(attr1,attr2,...)}(Expression) \] 其中attr1可以是原来的属性名,也可以是属性别名

在计算完Expression表达式之后,将返回的结果(一个关系子集)命名为alias

比如查询军官表中所有职位为连长的军官(id,name,rank),将该查询子表起名Companies

1
2
3
4
5
6
7
8
mysql> select id as compid,name,age as comage from officer as Companies where position="Company";
+--------+-------+--------+
| compid | name | comage |
+--------+-------+--------+
| 4 | Rex | 25 |
| 5 | Alpha | 23 |
+--------+-------+--------+
2 rows in set (0.00 sec)

查询officer表中所有职位为连长的军官的(id,name,age),并且将属性改名为(compid,name,comage),将该查询子表改名为Companies

写成关系代数相当于 \[ \rho_{Companies(compid,name,comage)}(\Pi_{id,name,rank}(\sigma_{position=Company}(officer))) \]

查询表中的最大薪水\(\Pi_{salary}(instructor)-\Pi_{instructor.salary}(\sigma_{instructor.salary<d.salary}(instructor\times \rho_d(instructor)))\)

假设有一张薪水表长这样

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> select * from instructor;
+-------+------------+-----------+--------+
| ID | name | dept_name | salary |
+-------+------------+-----------+--------+
| 10101 | Srinivasan | Comp.Sci. | 65000 |
| 12121 | Wu | Finance | 90000 |
| 15151 | Mozart | Music | 40000 |
| 22222 | Einstein | Physics | 95000 |
| 32343 | El Said | History | 60000 |
| 33456 | Gold | Physics | 87000 |
| 45565 | Katz | Comp.Sci. | 75000 |
| 58583 | Caliiferi | History | 62000 |
| 76543 | Singh | Finance | 80000 |
+-------+------------+-----------+--------+
9 rows in set (0.00 sec)

现在要查询表中的最大薪水是多少

一种想法是,将该表与自己做笛卡乘,假设记作\(A\times B\)

那么对于最高的薪水x,也就是不存在比他还高的薪水y

也就是说找不到\(a.x<b.y\)此时\(a.x\)就是最高薪水

"找不到"这个不大好实现,但是"找到\(a.x>b.y\)"容易实现,然后用薪水总集减去这"找到集"就是"找不到集"

书上是这样写的: \[ \Pi_{salary}(instructor)-\Pi_{instructor.salary}(\sigma_{instructor.salary<d.salary}(instructor\times \rho_d(instructor))) \] 又臭又长

咋理解呢?自顶向下来看,

最顶层是两个关系集合的差,左边是instructor中的所有薪水组成的集合,

相当于select salary from instructor

右侧是一大坨运算之后得到的表中instructor.salary组成的集合,下面着重分析一下右边干了啥,从最里层开始分析

自下往上来看

最里层\(\rho_{d}(instructor)\)

将instructor表起了一个别名d,

相当于select * from instructor as d

第二层是\(instructor\times \rho_d(instructor)\)

instructor关系集合自己和自己做了笛卡尔积,

为啥要起别名d然后相乘呢?为了方便后来分别引用两个表的属性.

相当于select * from instructor as instructor cross join instructor as d;

前面这个也必须起一个别名,mysql强制令任何过程子表都得有别名,之后最后打印到终端的结果表可以没有别名

第三层是\(\sigma_{instructor.salary<d.salary}(instructor\times \rho_d(instructor))\)

也就是在第二层的结果表中选择,

相当于select * from instructor as d cross join instructor as instructor where instructor.salary < d.salary;

第四层是\(\Pi_{instructor.salary}(\sigma_{instructor.salary<d.salary}(instructor\times \rho_d(instructor)))\)

在第三层的结果上,只保留\(instructor.salary\)这一个属性

相当于select instructor.salary from instructor as d cross join instructor as instructor where instructor.salary < d.salary;

最高层是\(\Pi_{salary}(instructor)-\Pi_{instructor.salary}(\sigma_{instructor.salary<d.salary}(instructor\times \rho_d(instructor)))\)

对两个次高层的过程子表做差

相当于select salary from instructor where salary not in(select instructor.salary from instructor as d cross join instructor as instructor where instructor.salary < d.salary);

这么老太太的裹脚的一坨,竟然语法没错能查出来,有点儿离谱

1
2
3
4
5
6
7
8
mysql> select salary from instructor where salary not in(select instructor.salary from instructor as d cross join instructor as instructor where instructor.salary < d.salary);
+--------+
| salary |
+--------+
| 95000 |
+--------+
1 row in set (0.00 sec)

\(\cap\)

mysql中没有交运算关键字,可以使用in关键字代替

比如要查询instructor表中物理行业子表和薪水超过90k的人的子表,要求查询所有信息 \[ \sigma_{dept\_name=Physics}(instructor)\cap \sigma_{salary>90000}(instructor) \]

1
2
3
4
5
6
7
mysql> select * from instructor where dept_name="Physics" and ID not in(select ID from instructor where salary>90000);
+-------+------+-----------+--------+
| ID | name | dept_name | salary |
+-------+------+-----------+--------+
| 33456 | Gold | Physics | 87000 |
+-------+------+-----------+--------+
1 row in set (0.00 sec)

由于这是在同一张表中查询,可以将表交关系都放到谓词里 \[ \sigma_{dept\_name=Physics\ \land\ salary>90000}(instructor) \]

自然连接\(\Join\)

卷积运算实现自然连接功能

假设有教讲师列表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
mysql> select * from instructor;
+-------+------------+-----------+--------+
| ID | name | dept_name | salary |
+-------+------------+-----------+--------+
| 10101 | Srinivasan | Comp.Sci. | 65000 |
| 12121 | Wu | Finance | 90000 |
| 15151 | Mozart | Music | 40000 |
| 22222 | Einstein | Physics | 95000 |
| 32343 | El Said | History | 60000 |
| 33456 | Gold | Physics | 87000 |
| 45565 | Katz | Comp.Sci. | 75000 |
| 58583 | Caliiferi | History | 62000 |
| 76543 | Singh | Finance | 80000 |
| 76766 | Crick | Biology | 72000 |
| 83821 | Brandt | Comp.Sci. | 92000 |
| 98345 | Kim | Elec.Eng. | 80000 |
+-------+------------+-----------+--------+
12 rows in set (0.00 sec)

还有课程列表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
mysql> select * from teaches;
+-------+-----------+--------+----------+------+
| ID | course_id | sec_id | semester | year |
+-------+-----------+--------+----------+------+
| 10101 | CS-101 | 1 | Fall | 2017 |
| 10101 | CS-315 | 1 | Spring | 2018 |
| 10101 | CS-347 | 1 | Fall | 2017 |
| 12121 | FIN-201 | 1 | Spring | 2018 |
| 15151 | MU-199 | 1 | Spring | 2018 |
| 22222 | PHY-101 | 1 | Fall | 2017 |
| 32343 | HIS-351 | 1 | Spring | 2018 |
| 45565 | CS-319 | 1 | Spring | 2018 |
| 76766 | BIO-101 | 1 | Summer | 2017 |
| 76766 | BIO-301 | 1 | Summer | 2018 |
| 83821 | CS-190 | 1 | Spring | 2017 |
| 98345 | EE-181 | 1 | Spring | 2017 |
+-------+-----------+--------+----------+------+
12 rows in set (0.00 sec)

teaches表中的ID是指任课教师编号,course_id才是这门课的编号

比如CS-101可能就是计导

现在要查询Srinivasan老师教哪些课

显然可以使用笛卡尔积然后约束teaches.ID=instructor.ID

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> select inst.name,teac.course_id
-> from
-> (select ID,name from instructor where name="Srinivasan") as inst
-> cross join
-> (select ID,course_id from teaches)as teac
-> where
-> inst.ID=teac.ID;
+------------+-----------+
| name | course_id |
+------------+-----------+
| Srinivasan | CS-101 |
| Srinivasan | CS-315 |
| Srinivasan | CS-347 |
+------------+-----------+
3 rows in set (0.03 sec)

写成关系代数的形式 $$ {inst.name,teac.course_id}( {inst.ID=teac.ID}(

    \rho_{inst}(\Pi_{name,ID}(instructor))
        \times
    \rho_{teac}(\Pi_{ID,course\_id}(teaches))

)

) $$

显然这样会造成很多不必要的计算,应该使用自然连接简化运算

实际上的自然连接

使用卷积的时候,每个讲师会和每个课程组合一下,但是显然教物理的教不了金融

并且teaches表中的ID列已经给出了这门课又哪个老师教,使用卷积组合时除了老师ID=课老师ID这种组合是有效的,其他组合都是寂寞

应该只保留老师ID=课老师ID的组合,这就是自然连接了

1
2
3
4
5
6
7
8
9
mysql> select name,course_id from (instructor natural join teaches) where instructor.name="Srinivasan";
+------------+-----------+
| name | course_id |
+------------+-----------+
| Srinivasan | CS-101 |
| Srinivasan | CS-315 |
| Srinivasan | CS-347 |
+------------+-----------+
3 rows in set (0.00 sec)

写成关系代数就是 \[ \Pi_{name,course\_id}(\sigma_{instructor.name=Srinivasan}(instructor\Join teaches)) \]

自然连接和卷积的关系

设r,s是两个关系,\(r\Join s\)表示两个关系的自然连接,R,S分别表示两个关系的属性集合

总结一下自然连接应该怎么用低级的关系代数式表示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
mysql> select * from (instructor natural join teaches);
+-------+------------+-----------+--------+-----------+--------+----------+------+
| ID | name | dept_name | salary | course_id | sec_id | semester | year |
+-------+------------+-----------+--------+-----------+--------+----------+------+
| 10101 | Srinivasan | Comp.Sci. | 65000 | CS-101 | 1 | Fall | 2017 |
| 10101 | Srinivasan | Comp.Sci. | 65000 | CS-315 | 1 | Spring | 2018 |
| 10101 | Srinivasan | Comp.Sci. | 65000 | CS-347 | 1 | Fall | 2017 |
| 12121 | Wu | Finance | 90000 | FIN-201 | 1 | Spring | 2018 |
| 15151 | Mozart | Music | 40000 | MU-199 | 1 | Spring | 2018 |
| 22222 | Einstein | Physics | 95000 | PHY-101 | 1 | Fall | 2017 |
| 32343 | El Said | History | 60000 | HIS-351 | 1 | Spring | 2018 |
| 45565 | Katz | Comp.Sci. | 75000 | CS-319 | 1 | Spring | 2018 |
| 76766 | Crick | Biology | 72000 | BIO-101 | 1 | Summer | 2017 |
| 76766 | Crick | Biology | 72000 | BIO-301 | 1 | Summer | 2018 |
| 83821 | Brandt | Comp.Sci. | 92000 | CS-190 | 1 | Spring | 2017 |
| 98345 | Kim | Elec.Eng. | 80000 | EE-181 | 1 | Spring | 2017 |
+-------+------------+-----------+--------+-----------+--------+----------+------+
12 rows in set (0.00 sec)

mysql会自动搜索两个表中的相同属性,然后进行等值连接

比如有两张表\(S(A,B,C),T(B,C,D)\)

此时\(S\Join T=ST(A,B,C,D)\),mysql会考虑B,C都相同的元组合并成同一个元组,

如果\(\exist s(a,b,c,d)\in A,\forall t_i(b_i,c_i,d_i)\in T,!(b=b_i \land c=c_i)\)则该S表中的s元组会被抛弃

同理适用于t

说人话就是S,T表保留公共部分完全相同的表项进行合并

"公共部分"是指所有相同的列属性

那么有 \[ r\Join s=\Pi_{R\cup S}(\sigma_{r.a_1=s.a_1,r.a_2=s.a_2,...,r.a_n=s.a_n}(r\times s)) \]

其中\(a_i\in R\cap S\)表示r和s共有的属性

怎么理解这个公式呢?还是从下往上看这个公式

最里层\(r\times s\),笛卡尔积,r到s所有行的组合都有了

第二层\(\sigma_{r.a_1=s.a_1,r.a_2=s.a_2,...,r.a_n=s.a_n}(r\times s)\),在第一层的计算结果表中,取出\(r.a_1=s.a_1,r.a_2=s.a_2...\)的行组成新的子表

第三层\(\Pi_{R\cup S}(\sigma_{r.a_1=s.a_1,r.a_2=s.a_2,...,r.a_n=s.a_n}(r\times s))\)保留R和S所有属性

赋值\(\leftarrow\)

对于刚才的自然连接,可以使用赋值运算保存临时变量,分布计算 \[ \Pi_{name,course\_id}(\sigma_{instructor.name=Srinivasan}(instructor\Join teaches)) \]

\[ \begin{aligned} temp1&\leftarrow instructor\Join teaches\\ temp2&\leftarrow \sigma_{instructor.name=Srinivasan}(temp1)\\ result&\leftarrow \Pi_{name,course\_id}(temp2) \end{aligned} \]

但是在mysql上没有找到这种用法

外连接

自然连接时,如果s表中的某一项在r表中找不到共有属性完全相同的项则会被抛弃,而外连接将会保留s表中的这一项,并给他创建一个全空的r表对应项

举个例子说

教师表s中的教师可以分成教学的和不教学的两类

课程表t中的课程可以分成自习课和教学课两类

\(s\Join t\)只会保留教课老师和教学课,还得是这个老师和教的课都存在,显然不教学的老师不会出现在结果里,自习课也不会出现在结果里

比如现有课程表和教师表

左课程右教师

SELF-0这课对应教师ID=0,不存在这个老师,意味着自习课,显然在自然连接的时候会被扬了

ID=33456这个Gold老师,在teaches课程表上没有任课记录,自然连接的时候也会被扬了

现在令l表示左表,也就是teaches;令r表示右表,也就是instructor

对于\(teaches \ opt \ instructor\)

如果是opt全外连接则谁也不会被扬了,

如果opt是左外连接则SELF-0留下,33456号教师扬了

如果opt是右外连接则SELF-0扬了,33456号教师留下

如果opt是全外连接则都留下

这三个在\(\LaTeX\)中没有找到符号

左外连接\(⟕\)

谁在左边谁全留下,右边的该扬就得扬

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
mysql> select * from teaches as l LEFT OUTER JOIN instructor as r ON l.ID=r.ID;
+-------+-----------+--------+----------+------+-------+------------+-----------+--------+
| ID | course_id | sec_id | semester | year | ID | name | dept_name | salary |
+-------+-----------+--------+----------+------+-------+------------+-----------+--------+
| 10101 | CS-101 | 1 | Fall | 2017 | 10101 | Srinivasan | Comp.Sci. | 65000 |
| 10101 | CS-315 | 1 | Spring | 2018 | 10101 | Srinivasan | Comp.Sci. | 65000 |
| 10101 | CS-347 | 1 | Fall | 2017 | 10101 | Srinivasan | Comp.Sci. | 65000 |
| 12121 | FIN-201 | 1 | Spring | 2018 | 12121 | Wu | Finance | 90000 |
| 15151 | MU-199 | 1 | Spring | 2018 | 15151 | Mozart | Music | 40000 |
| 22222 | PHY-101 | 1 | Fall | 2017 | 22222 | Einstein | Physics | 95000 |
| 32343 | HIS-351 | 1 | Spring | 2018 | 32343 | El Said | History | 60000 |
| 45565 | CS-319 | 1 | Spring | 2018 | 45565 | Katz | Comp.Sci. | 75000 |
| 76766 | BIO-101 | 1 | Summer | 2017 | 76766 | Crick | Biology | 72000 |
| 76766 | BIO-301 | 1 | Summer | 2018 | 76766 | Crick | Biology | 72000 |
| 83821 | CS-190 | 1 | Spring | 2017 | 83821 | Brandt | Comp.Sci. | 92000 |
| 98345 | EE-181 | 1 | Spring | 2017 | 98345 | Kim | Elec.Eng. | 80000 |
| 0 | SELF-0 | 1 | Spring | 2018 | NULL | NULL | NULL | NULL |
+-------+-----------+--------+----------+------+-------+------------+-----------+--------+
13 rows in set (0.00 sec)

33456号老师已经被扬了

写成关系代数就是 \[ \sigma_{l.ID=r.ID}(\rho_l(teaches)⟕\rho_r(instructor)) \]

右外连接\(⟖\)

谁在右边谁全留下,左边该扬就得扬

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
mysql> select l.course_id,r.name from teaches as l RIGHT OUTER JOIN instructor as r ON l.ID=r.ID;
+-----------+------------+
| course_id | name |
+-----------+------------+
| CS-101 | Srinivasan |
| CS-315 | Srinivasan |
| CS-347 | Srinivasan |
| FIN-201 | Wu |
| MU-199 | Mozart |
| PHY-101 | Einstein |
| HIS-351 | El Said |
| CS-319 | Katz |
| BIO-101 | Crick |
| BIO-301 | Crick |
| CS-190 | Brandt |
| EE-181 | Kim |
| NULL | Gold |
| NULL | Caliiferi |
| NULL | Singh |
+-----------+------------+
15 rows in set (0.00 sec)

SELF-0号自习课已经被扬了

三个不任课的老师留下

写成关系代数的形式就是 \[ \Pi_{l.course\_id,r.name}(\sigma_{l.ID=r.ID}(\rho_l(teaches)⟖\rho_r(instructor))) \]

全外连接\(⟗\)

mysql中没有全外连接

但是可以使用左外连接联合查询右外连接

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
mysql> select l.course_id,r.name
-> from
-> teaches as l
-> LEFT OUTER JOIN
-> instructor as r
-> ON l.ID=r.ID
-> union
-> select l.course_id,r.name
-> from
-> teaches as l
-> RIGHT OUTER JOIN
-> instructor as r
-> ON l.ID=r.ID
-> ;
+-----------+------------+
| course_id | name |
+-----------+------------+
| CS-101 | Srinivasan |
| CS-315 | Srinivasan |
| CS-347 | Srinivasan |
| FIN-201 | Wu |
| MU-199 | Mozart |
| PHY-101 | Einstein |
| HIS-351 | El Said |
| CS-319 | Katz |
| BIO-101 | Crick |
| BIO-301 | Crick |
| CS-190 | Brandt |
| EE-181 | Kim |
| SELF-0 | NULL |
| NULL | Gold |
| NULL | Caliiferi |
| NULL | Singh |
+-----------+------------+
16 rows in set (0.00 sec)

\[ \Pi_{l.course\_id,r.name}(\sigma_{l.ID=r.ID}(\rho_l(teaches)⟗\rho_r(instructor))) \]

\(\div\)

设关系r的属性\(R=(A_1,A_2,...,A_m,B_1,B_2,...,B_n)\)

设关系s的属性\(S=(B_1,B_2,...,B_n)\)

所谓关系就是表

所谓属性就是表的列

\[ r\div s=(A_1,A_2,...,A_m) \]

严谨定义为 \[ r\div s=\{t|t\in \Pi_{R-S}(r)\ \land\forall u\in s\rightarrow tu\in r\} \]

t元组的属性是R-S剩下的属性,

并且t和s中任何元组的积都属于r.

则t就是\(r\div s\)集合的元素

练习

练习1卷出最值

查出account表中最大(小)存款(balance)数

image-20220915002527616 \[ \begin{aligned} 最大存款数&=\Pi_{balance}(account)-\Pi_{acnt.balance}(\sigma_{acnt.balance<account.balance}(\rho_{acnt}(account)\times account))\\ 最小存款数&=\Pi_{balance}(account)-\Pi_{account.balance}(\sigma_{acnt.balance<account.balance}(\rho_{acnt}(account)\times account)) \end{aligned} \]\(\sigma_{acnt.balance<account.balance}(\rho_{acnt}(account)\times account)\)这个选择蹉跎一下,就包含了所有前存款小于后存款的条目,

这些条目中,每一条都是acnt.balance作为前项也就是小的一项,account.balance作为后项也就是大的一项,

那么acnt.balance不含最大项,account.balance不含最小项

练习2连卷转化

查出至少有一个账户上有多于700块钱存款的用户

image-20220915000754242

用卷积做的话 \[ \Pi_{customer\_name}(\sigma_{account.number=depositor.number}(\sigma_{account.balance>700}(account)\times depositor)) \]

用自然连接做的话 \[ \Pi_{customer\_name}(\sigma_{account.balance>700}(account\Join depositor)) \]

练习3卷出多个账户

查询至少有两个账户的用户

image-20220915005408814

\[ \Pi_{customer\_name}(\sigma_{dpstr.customer\_name=depositer.custormer\_name\ \land \ dpstr.account\_number\ne depositor.account\_number}(\rho_{dpstr}(depositer)\times depositer)) \]

练习4先连后卷

查出至少在两个分行(branch)都有账户(account)的用户(customer)

image-20220915000336679

需要注意的是,一个人可以在同一个支行有多个账号,只考虑查询账户数多于两个的用户不成立

显然先把两个表以account_number为公共属性,自然连接起来,对这个自然连接表进行卷积,就能蹉跎出同一个人的不同支行账户组合了 \[ \Pi_{customer\_name}(\sigma_{l.custormer\_name=r.custormer\_name\land l.branch\_name\ne r.branch\_name}(\rho_l(depositor \Join account)\times \rho_r(depositor\Join account))) \]

练习5自然连接

查找在布鲁克林市有账户的用户

image-20220915002042041

\[ \Pi_{customer\_name}(\sigma_{branch\_city=Brooklyn}(account\Join branch)) \]

只要在是布鲁克林市有一个账户就可以了

练习6除法关系

查找在布鲁克林市任意支行都有账户的用户

image-20220915002042041

\[ \Pi_{customer\_name,branch\_name}(\sigma_{branch\_city=Brooklyn}(account\Join branch))\div \Pi_{branch\_name}(\sigma_{branch\_city=Brooklyn}(branch)) \]

关系代数表达式化简