关系代数与SQL语句

关系模型术语

关系模型术语	对应到表
关系	表
元组	行
属性	列
关系实例	表中的特定行
属性的域	该属性列中所有可能的取值

关系模式:定义类型

关系:关系模式的实例

Employer(ID,salary,gender,age);这就是关系模式

Tom(00001,5k,male,20);这就是关系

相当于

class Employer{//Employer关系模式
	int ID;
    int salary;
    bool gender;
    int age;
    Employer(const int &i,const int &s,const bool &g,const int &a){
    	ID=i;
    	salary=s;
    	gender=g;
    	age=a;
    }
};
Employer Tom(1,5000,1,20);//关系,关系实例

码:区分不同关系的标志

比如人的身份证ID

超码:关系模式中,能够唯一标识一个关系的属性或者属性集合

比如Student(ID,eduID,class,grade,name)

学生(身份证号,学号,年级,班级,姓名)

这里ID和eduID都可以作为超码,

如果认为每个班里都没有重名的人,则(年级,班级,姓名)也可以作为超码

显然超码的任何一个超集也是超码

候选码:最小超码

比如Student(ID,eduID,class,grade,name)

由于ID是超码,因此ID的任何一个超集,比如(ID,grade)也是超码

ID可以作为候选码,因为ID已经是最小超码,但是(ID,grade)不是候选码,因为去了grade,只剩下ID仍然是超码

主码(primary key):实际中,数据库设计者选中作为区分不同关系的候选码

mysql> CREATE TABLE troop(
 -> id int,
 -> name varchar(255),
 -> age int,
 -> PRIMARY KEY(id)
 -> );
Query OK, 0 rows affected (0.01 sec)

将id作为主码

外码:

关系模式r1的属性中可能包含其他关系模式r2的主码.

则该属性在r1上称作”参照r2的外码”,

关系模式r1称为”外码依赖的参照关系”

关系模式r2称为”外码的被参照关系”

比如

Person=Person(ID,eduID,name,age,gender),其中身份证号ID为主键

Student=Student(eduID,grade,class,name,age,gender),其中学号eduID为主键

那么eduID就是Person的”参照Student的外码”

Person就是”eduID外码依赖的参照关系”

Student就是”eduID外码的被参照关系”

模式图:

用模式图(schema diagram)表示含有主码和外码依赖的数据库模式

这个玩意用软件画不用手画

关系代数

Operation	中文	符号	$\LaTeX$
Projection	投影	Π	`\Pi`
Selection	选择	σ	`\sigma`
Renaming	重命名	ρ	`\rho`
Aggregate Function	聚合函数	G	`\mathcal{G}`
Union	交	∩	`\cap`
Intersection	补	∪	`\cup`
Natural Join	自然连接	⋈	`\bowtie` or `\Join`
Left Outer Join	左外连接	⟕	… 这几个直接复制吧
Right Outer Join	右外连接	⟖
Full Outer Join	全外连接	⟗
Cartesian product	笛卡尔乘积	×	`\times`
Divide	除	÷	`\div`
Assignment	赋值	←	`\leftarrow`
And	条件并列	∧	`\land` or `\vee`
Negation	非	¬	`\neg`
Exist	存在	∃	`\exists`
For All	对所有	∀	`\forall`
	下标文字	σusername	`_{\text{}}`
	粗体文字	Gcount(*)	`\textbf{}`
	长长长长括号	((((	`\big( \Big( \bigg( \Bigg(`
	比较	>≥<≤≠	`\gt \ge \lt \le \ne`

递归定义

假设$E_1,E_2$都是关系代数表达式,显然该表达式的返回值是一个关系子集

那么关系代数表达式的表达式也是关系代数表达式

相当于整数的加减乘除等运算的结果还是整数,运算的结果还能参与运算

mysql建表代码

create table if not EXISTS instructor(
    ID int,
    name varchar(255),
    dept_name varchar(255),
    salary int,
    primary key(id)
);

insert ignore into instructor(ID,name,dept_name,salary) values (10101,"Srinivasan","Comp.Sci.",65000);
insert ignore into instructor(ID,name,dept_name,salary) values (12121,"Wu","Finance",90000);
insert ignore into instructor(ID,name,dept_name,salary) values (15151,"Mozart","Music",40000);
insert ignore into instructor(ID,name,dept_name,salary) values (22222,"Einstein","Physics",95000);
insert ignore into instructor(ID,name,dept_name,salary) values (32343,"El Said","History",60000);
insert ignore into instructor(ID,name,dept_name,salary) values (33456,"Gold","Physics",87000);
insert ignore into instructor(ID,name,dept_name,salary) values (45565,"Katz","Comp.Sci.",75000);
insert ignore into instructor(ID,name,dept_name,salary) values (58583,"Caliiferi","History",62000);
insert ignore into instructor(ID,name,dept_name,salary) values (76543,"Singh","Finance",80000);
insert ignore into instructor(ID,name,dept_name,salary) values (76766,"Crick","Biology",72000);
insert ignore into instructor(ID,name,dept_name,salary) values (83821,"Brandt","Comp.Sci.",92000);
insert ignore into instructor(ID,name,dept_name,salary) values (98345,"Kim","Elec.Eng.",80000);


create table if not exists teaches(
  ID int,
  course_id varchar(255),
  sec_id int,
  semester varchar(255),
  year int,
  primary key(course_id)
);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (10101,"CS-101",1,"Fall",2017);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (10101,"CS-315",1,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (10101,"CS-347",1,"Fall",2017);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (12121,"FIN-201",1,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (15151,"MU-199",1,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (22222,"PHY-101",1,"Fall",2017);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (32343,"HIS-351",1,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (45565,"CS-101",1,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (45565,"CS-319",1,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (76766,"BIO-101",1,"Summer",2017);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (76766,"BIO-301",1,"Summer",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (83821,"CS-190",1,"Spring",2017);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (83821,"CS-190",2,"Spring",2017);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (83821,"CS-319",2,"Spring",2018);
insert ignore into teaches(ID,course_id,sec_id,semester,year) values (98345,"EE-181",1,"Spring",2017);

后来这两个表有变动,但是变动不大

选择$\sigma$

$$
\sigma_{谓词条件}(关系表)
$$

从目标关系表中选出满足条件的元组行

比如对于以下troop表

+----+--------+------+
| id | name   | age  |
+----+--------+------+
|  1 | Tom    |   20 |
|  2 | Vader  |   40 |
|  3 | Anakin |   20 |
+----+--------+------+

$$
\sigma_{age>20}(troop)
$$

就相当于

mysql> select * from troop where age>20;
+----+-------+------+
| id | name  | age  |
+----+-------+------+
|  2 | Vader |   40 |
+----+-------+------+
1 row in set (0.00 sec)

$$
select * from\ 关系表\ \color{red}where \color{white}\ 关系谓词\longleftrightarrow\color{red}\sigma_{\color{white}关系谓词}\color{white}(关系表)
$$

这里关系谓词就是各种等于不等于,逻辑与或非

SQL中的where才相当于关系代数中的$\sigma$

投影$\Pi$

还是对于troop表

mysql> select * from troop;
+----+--------+------+
| id | name   | age  |
+----+--------+------+
|  1 | Tom    |   20 |
|  2 | Vader  |   40 |
|  3 | Anakin |   20 |
+----+--------+------+
3 rows in set (0.00 sec)

如果只想观察(id,name)属性,忽略age属性
$$
\Pi_{id,name}(troops)
$$
就相当于

mysql> select id,name from troop;
+----+--------+
| id | name   |
+----+--------+
|  1 | Tom    |
|  2 | Vader  |
|  3 | Anakin |
+----+--------+
3 rows in set (0.00 sec)

$$
select\ 属性1,属性2{…}\ from (数据表)= \Pi_{属性1,属性2,…}(数据表)
$$

组合使用$\Pi,\sigma$

现在只关心军队中20岁以上士兵的(id,name),那么

mysql> select id,name from troop where age>20;
+----+-------+
| id | name  |
+----+-------+
|  2 | Vader |
+----+-------+
1 row in set (0.00 sec)

相当于
$$
\Pi_{id,name}(\sigma_{age>20}(troop))
$$
由此可见,$\sigma$运算的结果仍然是一张表,可以在这个子表上再进行关系运算

并$\cup$

假设军官表长这样

mysql> select * from officer;
+----+-------+------+--------+-----------+-----------+
| id | name  | age  | gender | rank      | position  |
+----+-------+------+--------+-----------+-----------+
|  1 | Vader |   50 |      1 | marshal   | Army      |
|  2 | Lisa  |   20 |      0 | colonel   | Regiment  |
|  3 | Rick  |   20 |      1 | major     | Battalion |
|  4 | Rex   |   25 |      1 | captain   | Company   |
|  5 | Alpha |   23 |      1 | major     | Company   |
|  6 | Omega |   23 |      0 | brigadier | Brigade   |
+----+-------+------+--------+-----------+-----------+
6 rows in set (0.00 sec)

id,姓名,年龄,性别,均线,职位

现在想查询所有上校团长还有准将旅长的名字

mysql> select name from officer where rank="colonel" and position="Regiment" union select name from officer where rank="brigadier" and position="Brigade";
+-------+
| name  |
+-------+
| Lisa  |
| Omega |
+-------+
2 rows in set (0.00 sec)

用关系代数表示就是
$$
\Pi_{name}(\sigma_{rank=colonel\ \land\ position=Regiment}(officer) )\cup\Pi_{name}(\sigma_{rank=brigadier\ \land\ position=Brigade}(officer) )
$$

这是在同一张表上联合查询,并运算也允许在两张表上联合查询,但是要求的是两种关系模型的属性个数,以及每个对等位置的属性的意义都是相同的.

比如

海军军官(id,姓名,年龄,性别,军衔,职位)

陆军军官(id,姓名,年龄,性别,军衔,职位)

这里海军军官和陆军军官两种关系就满足要求

可以联合查询海军军官里面的上校团长和陆军军官里面的上校团长

差$-$

r,s表示两种关系模型,则r-s表示在r中但是不在s中的关系元组

类似于集合的定义$A-B=A-AB=A(1-B)=A\overline{B}$

比如一般情况下,连长要上尉军衔的军官担任,现在要在军官表中查询军衔不是上尉的连长的名字

可以先查出连长名表,然后减去上尉名表

然而mysql不支持except和minus关键字,可以用not in代替

mysql> select name from officer
    -> where
    -> position="Company" and name not in(
    ->     select name from officer
    ->     where rank="captain"
    -> );
+-------+
| name  |
+-------+
| Alpha |
+-------+
1 row in set (0.00 sec)

用关系代数表示相当于
$$
\Pi_{name}(\sigma_{position = Company}(officer))-\Pi_{name}(\sigma_{rank=captain}(officer))
$$
由于这是在同一张表中查询的,可以不使用差集,都写在谓词里

mysql> select name from officer where position="Company" and rank !="captain";
+-------+
| name  |
+-------+
| Alpha |
+-------+
1 row in set (0.00 sec)

用关系代数表示相当于
$$
\Pi_{name}(\sigma_{position=Company\ \land\ rank=captain}(officer))
$$

卷积$\times$

两个关系r,s的笛卡尔乘积$r\times s$意思是r和s的每一行都要组成一行填到新表里

假设现在要给所有连长任命连队,符合要求的军官有

mysql> select * from officer where position="Company";
+----+-------+------+--------+---------+----------+
| id | name  | age  | gender | rank    | position |
+----+-------+------+--------+---------+----------+
|  4 | Rex   |   25 |      1 | captain | Company  |
|  5 | Alpha |   23 |      1 | major   | Company  |
+----+-------+------+--------+---------+----------+
2 rows in set (0.00 sec)

所有连队有:

mysql> select * from Companies;
+----+-----------+------------+
| id | name      | popularity |
+----+-----------+------------+
|  1 | the Brave |        120 |
|  2 | the Loyal |        130 |
|  3 | the Fast  |        110 |
+----+-----------+------------+
3 rows in set (0.00 sec)

那么两个连长与三个连队之间的所有任命情况为

mysql> select * from (select id,name,rank from officer where position="Company")as Officer cross join companies;
+----+-------+---------+----+-----------+------------+
| id | name  | rank    | id | name      | popularity |
+----+-------+---------+----+-----------+------------+
|  4 | Rex   | captain |  1 | the Brave |        120 |
|  5 | Alpha | major   |  1 | the Brave |        120 |
|  4 | Rex   | captain |  2 | the Loyal |        130 |
|  5 | Alpha | major   |  2 | the Loyal |        130 |
|  4 | Rex   | captain |  3 | the Fast  |        110 |
|  5 | Alpha | major   |  3 | the Fast  |        110 |
+----+-------+---------+----+-----------+------------+
6 rows in set (0.00 sec)

这里(select id,name,rank from officer where position="Company")as Officer是给结果子表起了一个别名(alias),叫做Officer,不起别名会导致mysql报错

写成关系代数就相当于
$$
\Pi_{id,name,rank}(\sigma_{position=Company}(officer))\times\ companies
$$

这种式子在计算的时候需要区分优先级,最开始计算的应该是
$$
A=\sigma_{position=Company}(officer)
$$

然后计算
$$
B=\Pi_{id,name,rank}(A)
$$
最后计算

$$
C=B\times companies
$$

更名$\rho$

实际上就是mysql中的alias别名
$$
\rho_{alias(attr1,attr2,…)}(Expression)
$$
其中attr1可以是原来的属性名,也可以是属性别名

在计算完Expression表达式之后,将返回的结果(一个关系子集)命名为alias

比如查询军官表中所有职位为连长的军官(id,name,rank),将该查询子表起名Companies

mysql> select id as compid,name,age as comage from officer as Companies where position="Company";
+--------+-------+--------+
| compid | name  | comage |
+--------+-------+--------+
|      4 | Rex   |     25 |
|      5 | Alpha |     23 |
+--------+-------+--------+
2 rows in set (0.00 sec)

查询officer表中所有职位为连长的军官的(id,name,age),并且将属性改名为(compid,name,comage),将该查询子表改名为Companies

写成关系代数相当于
$$
\rho_{Companies(compid,name,comage)}(\Pi_{id,name,rank}(\sigma_{position=Company}(officer)))
$$

查询表中的最大薪水$\Pi_{salary}(instructor)-\Pi_{instructor.salary}(\sigma_{instructor.salary<d.salary}(instructor\times \rho_d(instructor)))$

假设有一张薪水表长这样

mysql> select * from instructor;
+-------+------------+-----------+--------+
| ID    | name       | dept_name | salary |
+-------+------------+-----------+--------+
| 10101 | Srinivasan | Comp.Sci. |  65000 |
| 12121 | Wu         | Finance   |  90000 |
| 15151 | Mozart     | Music     |  40000 |
| 22222 | Einstein   | Physics   |  95000 |
| 32343 | El Said    | History   |  60000 |
| 33456 | Gold       | Physics   |  87000 |
| 45565 | Katz       | Comp.Sci. |  75000 |
| 58583 | Caliiferi  | History   |  62000 |
| 76543 | Singh      | Finance   |  80000 |
+-------+------------+-----------+--------+
9 rows in set (0.00 sec)

现在要查询表中的最大薪水是多少

一种想法是,将该表与自己做笛卡乘,假设记作$A\times B$

那么对于最高的薪水x,也就是不存在比他还高的薪水y

也就是说找不到$a.x<b.y$此时$a.x$就是最高薪水

“找不到”这个不大好实现,但是”找到$a.x>b.y$”容易实现,然后用薪水总集减去这”找到集”就是”找不到集”

书上是这样写的:
$$
\Pi_{salary}(instructor)-\Pi_{instructor.salary}(\sigma_{instructor.salary<d.salary}(instructor\times \rho_d(instructor)))
$$
~~又臭又长~~

咋理解呢?自顶向下来看,

最顶层是两个关系集合的差,左边是instructor中的所有薪水组成的集合,

相当于select salary from instructor

右侧是一大坨运算之后得到的表中instructor.salary组成的集合,下面着重分析一下右边干了啥,从最里层开始分析

自下往上来看

最里层$\rho_{d}(instructor)$

将instructor表起了一个别名d,

相当于select * from instructor as d

第二层是$instructor\times \rho_d(instructor)$

instructor关系集合自己和自己做了笛卡尔积,

为啥要起别名d然后相乘呢?为了方便后来分别引用两个表的属性.

相当于select * from instructor as instructor cross join instructor as d;

前面这个也必须起一个别名,mysql强制令任何过程子表都得有别名,之后最后打印到终端的结果表可以没有别名

第三层是$\sigma_{instructor.salary<d.salary}(instructor\times \rho_d(instructor))$

也就是在第二层的结果表中选择,

相当于select * from instructor as d cross join instructor as instructor where instructor.salary < d.salary;

第四层是$\Pi_{instructor.salary}(\sigma_{instructor.salary<d.salary}(instructor\times \rho_d(instructor)))$

在第三层的结果上,只保留$instructor.salary$这一个属性

相当于 select instructor.salary from instructor as d cross join instructor as instructor where instructor.salary < d.salary;

最高层是$\Pi_{salary}(instructor)-\Pi_{instructor.salary}(\sigma_{instructor.salary<d.salary}(instructor\times \rho_d(instructor)))$

对两个次高层的过程子表做差

相当于 select salary from instructor where salary not in(select instructor.salary from instructor as d cross join instructor as instructor where instructor.salary < d.salary);

这么老太太的裹脚的一坨,竟然语法没错能查出来,有点儿离谱

mysql> select salary from instructor where salary not in(select instructor.salary from instructor as d cross join instructor as instructor where instructor.salary < d.salary);
+--------+
| salary |
+--------+
|  95000 |
+--------+
1 row in set (0.00 sec)

交$\cap$

mysql中没有交运算关键字,可以使用in关键字代替

比如要查询instructor表中物理行业子表和薪水超过90k的人的子表,要求查询所有信息
$$
\sigma_{dept_name=Physics}(instructor)\cap \sigma_{salary>90000}(instructor)
$$

mysql> select * from instructor where dept_name="Physics" and ID not in(select ID from instructor where salary>90000);
+-------+------+-----------+--------+
| ID    | name | dept_name | salary |
+-------+------+-----------+--------+
| 33456 | Gold | Physics   |  87000 |
+-------+------+-----------+--------+
1 row in set (0.00 sec)

由于这是在同一张表中查询,可以将表交关系都放到谓词里
$$
\sigma_{dept_name=Physics\ \land\ salary>90000}(instructor)
$$

自然连接$\Join$

卷积运算实现自然连接功能

假设有教讲师列表

mysql> select * from instructor;
+-------+------------+-----------+--------+
| ID    | name       | dept_name | salary |
+-------+------------+-----------+--------+
| 10101 | Srinivasan | Comp.Sci. |  65000 |
| 12121 | Wu         | Finance   |  90000 |
| 15151 | Mozart     | Music     |  40000 |
| 22222 | Einstein   | Physics   |  95000 |
| 32343 | El Said    | History   |  60000 |
| 33456 | Gold       | Physics   |  87000 |
| 45565 | Katz       | Comp.Sci. |  75000 |
| 58583 | Caliiferi  | History   |  62000 |
| 76543 | Singh      | Finance   |  80000 |
| 76766 | Crick      | Biology   |  72000 |
| 83821 | Brandt     | Comp.Sci. |  92000 |
| 98345 | Kim        | Elec.Eng. |  80000 |
+-------+------------+-----------+--------+
12 rows in set (0.00 sec)

还有课程列表

mysql> select * from teaches;
+-------+-----------+--------+----------+------+
| ID    | course_id | sec_id | semester | year |
+-------+-----------+--------+----------+------+
| 10101 | CS-101    |      1 | Fall     | 2017 |
| 10101 | CS-315    |      1 | Spring   | 2018 |
| 10101 | CS-347    |      1 | Fall     | 2017 |
| 12121 | FIN-201   |      1 | Spring   | 2018 |
| 15151 | MU-199    |      1 | Spring   | 2018 |
| 22222 | PHY-101   |      1 | Fall     | 2017 |
| 32343 | HIS-351   |      1 | Spring   | 2018 |
| 45565 | CS-319    |      1 | Spring   | 2018 |
| 76766 | BIO-101   |      1 | Summer   | 2017 |
| 76766 | BIO-301   |      1 | Summer   | 2018 |
| 83821 | CS-190    |      1 | Spring   | 2017 |
| 98345 | EE-181    |      1 | Spring   | 2017 |
+-------+-----------+--------+----------+------+
12 rows in set (0.00 sec)

teaches表中的ID是指任课教师编号,course_id才是这门课的编号

比如CS-101可能就是计导

现在要查询Srinivasan老师教哪些课

显然可以使用笛卡尔积然后约束teaches.ID=instructor.ID

mysql> select inst.name,teac.course_id
    -> from
    ->     (select ID,name from instructor where name="Srinivasan") as inst
    ->         cross join
    ->     (select ID,course_id from teaches)as teac
    -> where
    ->     inst.ID=teac.ID;
+------------+-----------+
| name       | course_id |
+------------+-----------+
| Srinivasan | CS-101    |
| Srinivasan | CS-315    |
| Srinivasan | CS-347    |
+------------+-----------+
3 rows in set (0.03 sec)

写成关系代数的形式
$$
\Pi_{inst.name,teac.course_id}(
\sigma_{inst.ID=teac.ID}(

    \rho_{inst}(\Pi_{name,ID}(instructor))
        \times
    \rho_{teac}(\Pi_{ID,course\_id}(teaches))

)

)
$$

显然这样会造成很多不必要的计算,应该使用自然连接简化运算

实际上的自然连接

使用卷积的时候,每个讲师会和每个课程组合一下,但是显然教物理的教不了金融

并且teaches表中的ID列已经给出了这门课又哪个老师教,使用卷积组合时除了老师ID=课老师ID这种组合是有效的,其他组合都是寂寞

应该只保留老师ID=课老师ID的组合,这就是自然连接了

mysql> select name,course_id from (instructor natural join teaches) where instructor.name="Srinivasan";
+------------+-----------+
| name       | course_id |
+------------+-----------+
| Srinivasan | CS-101    |
| Srinivasan | CS-315    |
| Srinivasan | CS-347    |
+------------+-----------+
3 rows in set (0.00 sec)

写成关系代数就是
$$
\Pi_{name,course_id}(\sigma_{instructor.name=Srinivasan}(instructor\Join teaches))
$$

自然连接和卷积的关系

设r,s是两个关系,$r\Join s$表示两个关系的自然连接,R,S分别表示两个关系的属性集合

总结一下自然连接应该怎么用低级的关系代数式表示

mysql> select * from (instructor natural join teaches);
+-------+------------+-----------+--------+-----------+--------+----------+------+
| ID    | name       | dept_name | salary | course_id | sec_id | semester | year |
+-------+------------+-----------+--------+-----------+--------+----------+------+
| 10101 | Srinivasan | Comp.Sci. |  65000 | CS-101    |      1 | Fall     | 2017 |
| 10101 | Srinivasan | Comp.Sci. |  65000 | CS-315    |      1 | Spring   | 2018 |
| 10101 | Srinivasan | Comp.Sci. |  65000 | CS-347    |      1 | Fall     | 2017 |
| 12121 | Wu         | Finance   |  90000 | FIN-201   |      1 | Spring   | 2018 |
| 15151 | Mozart     | Music     |  40000 | MU-199    |      1 | Spring   | 2018 |
| 22222 | Einstein   | Physics   |  95000 | PHY-101   |      1 | Fall     | 2017 |
| 32343 | El Said    | History   |  60000 | HIS-351   |      1 | Spring   | 2018 |
| 45565 | Katz       | Comp.Sci. |  75000 | CS-319    |      1 | Spring   | 2018 |
| 76766 | Crick      | Biology   |  72000 | BIO-101   |      1 | Summer   | 2017 |
| 76766 | Crick      | Biology   |  72000 | BIO-301   |      1 | Summer   | 2018 |
| 83821 | Brandt     | Comp.Sci. |  92000 | CS-190    |      1 | Spring   | 2017 |
| 98345 | Kim        | Elec.Eng. |  80000 | EE-181    |      1 | Spring   | 2017 |
+-------+------------+-----------+--------+-----------+--------+----------+------+
12 rows in set (0.00 sec)

mysql会自动搜索两个表中的相同属性,然后进行等值连接

比如有两张表$S(A,B,C),T(B,C,D)$

此时$S\Join T=ST(A,B,C,D)$,mysql会考虑B,C都相同的元组合并成同一个元组,

如果$\exist s(a,b,c,d)\in A,\forall t_i(b_i,c_i,d_i)\in T,!(b=b_i \land c=c_i)$则该S表中的s元组会被抛弃

同理适用于t

说人话就是S,T表保留公共部分完全相同的表项进行合并

“公共部分”是指所有相同的列属性

那么有
$$
r\Join s=\Pi_{R\cup S}(\sigma_{r.a_1=s.a_1,r.a_2=s.a_2,…,r.a_n=s.a_n}(r\times s))
$$

其中$a_i\in R\cap S$表示r和s共有的属性

怎么理解这个公式呢?还是从下往上看这个公式

最里层$r\times s$,笛卡尔积,r到s所有行的组合都有了

第二层$\sigma_{r.a_1=s.a_1,r.a_2=s.a_2,…,r.a_n=s.a_n}(r\times s)$,在第一层的计算结果表中,取出$r.a_1=s.a_1,r.a_2=s.a_2…$的行组成新的子表

第三层$\Pi_{R\cup S}(\sigma_{r.a_1=s.a_1,r.a_2=s.a_2,…,r.a_n=s.a_n}(r\times s))$保留R和S所有属性

赋值$\leftarrow$

对于刚才的自然连接,可以使用赋值运算保存临时变量,分布计算
$$
\Pi_{name,course_id}(\sigma_{instructor.name=Srinivasan}(instructor\Join teaches))
$$

$$
\begin{aligned}
temp1&\leftarrow instructor\Join teaches\
temp2&\leftarrow \sigma_{instructor.name=Srinivasan}(temp1)\
result&\leftarrow \Pi_{name,course_id}(temp2)
\end{aligned}
$$

但是在mysql上没有找到这种用法

外连接

自然连接时,如果s表中的某一项在r表中找不到共有属性完全相同的项则会被抛弃,而外连接将会保留s表中的这一项,并给他创建一个全空的r表对应项

举个例子说

教师表s中的教师可以分成教学的和不教学的两类

课程表t中的课程可以分成自习课和教学课两类

$s\Join t$只会保留教课老师和教学课,还得是这个老师和教的课都存在,显然不教学的老师不会出现在结果里,自习课也不会出现在结果里

比如现有课程表和教师表

SELF-0这课对应教师ID=0,不存在这个老师,意味着自习课,显然在自然连接的时候会被扬了

ID=33456这个Gold老师,在teaches课程表上没有任课记录,自然连接的时候也会被扬了

现在令l表示左表,也就是teaches;令r表示右表,也就是instructor

对于$teaches \ opt \ instructor$

如果是opt全外连接则谁也不会被扬了,

如果opt是左外连接则SELF-0留下,33456号教师扬了

如果opt是右外连接则SELF-0扬了,33456号教师留下

如果opt是全外连接则都留下

这三个在$\LaTeX$中没有找到符号

左外连接$⟕$

谁在左边谁全留下,右边的该扬就得扬

mysql> select * from teaches as l LEFT OUTER JOIN instructor as r ON l.ID=r.ID;
+-------+-----------+--------+----------+------+-------+------------+-----------+--------+
| ID    | course_id | sec_id | semester | year | ID    | name       | dept_name | salary |
+-------+-----------+--------+----------+------+-------+------------+-----------+--------+
| 10101 | CS-101    |      1 | Fall     | 2017 | 10101 | Srinivasan | Comp.Sci. |  65000 |
| 10101 | CS-315    |      1 | Spring   | 2018 | 10101 | Srinivasan | Comp.Sci. |  65000 |
| 10101 | CS-347    |      1 | Fall     | 2017 | 10101 | Srinivasan | Comp.Sci. |  65000 |
| 12121 | FIN-201   |      1 | Spring   | 2018 | 12121 | Wu         | Finance   |  90000 |
| 15151 | MU-199    |      1 | Spring   | 2018 | 15151 | Mozart     | Music     |  40000 |
| 22222 | PHY-101   |      1 | Fall     | 2017 | 22222 | Einstein   | Physics   |  95000 |
| 32343 | HIS-351   |      1 | Spring   | 2018 | 32343 | El Said    | History   |  60000 |
| 45565 | CS-319    |      1 | Spring   | 2018 | 45565 | Katz       | Comp.Sci. |  75000 |
| 76766 | BIO-101   |      1 | Summer   | 2017 | 76766 | Crick      | Biology   |  72000 |
| 76766 | BIO-301   |      1 | Summer   | 2018 | 76766 | Crick      | Biology   |  72000 |
| 83821 | CS-190    |      1 | Spring   | 2017 | 83821 | Brandt     | Comp.Sci. |  92000 |
| 98345 | EE-181    |      1 | Spring   | 2017 | 98345 | Kim        | Elec.Eng. |  80000 |
|     0 | SELF-0    |      1 | Spring   | 2018 |  NULL | NULL       | NULL      |   NULL |
+-------+-----------+--------+----------+------+-------+------------+-----------+--------+
13 rows in set (0.00 sec)

33456号老师已经被扬了

写成关系代数就是
$$
\sigma_{l.ID=r.ID}(\rho_l(teaches)⟕\rho_r(instructor))
$$

右外连接$⟖$

谁在右边谁全留下,左边该扬就得扬

mysql> select l.course_id,r.name from teaches as l RIGHT OUTER JOIN instructor as r ON l.ID=r.ID;
+-----------+------------+
| course_id | name       |
+-----------+------------+
| CS-101    | Srinivasan |
| CS-315    | Srinivasan |
| CS-347    | Srinivasan |
| FIN-201   | Wu         |
| MU-199    | Mozart     |
| PHY-101   | Einstein   |
| HIS-351   | El Said    |
| CS-319    | Katz       |
| BIO-101   | Crick      |
| BIO-301   | Crick      |
| CS-190    | Brandt     |
| EE-181    | Kim        |
| NULL      | Gold       |
| NULL      | Caliiferi  |
| NULL      | Singh      |
+-----------+------------+
15 rows in set (0.00 sec)

SELF-0号自习课已经被扬了

三个不任课的老师留下

写成关系代数的形式就是
$$
\Pi_{l.course_id,r.name}(\sigma_{l.ID=r.ID}(\rho_l(teaches)⟖\rho_r(instructor)))
$$

全外连接$⟗$

mysql中没有全外连接

但是可以使用左外连接联合查询右外连接

mysql> select l.course_id,r.name
    ->     from
    ->         teaches as l
    ->             LEFT OUTER JOIN
    ->         instructor as r
    ->             ON l.ID=r.ID
    -> union
    -> select l.course_id,r.name
    ->     from
    ->         teaches as l
    ->             RIGHT OUTER JOIN
    ->         instructor as r
    ->             ON l.ID=r.ID
    -> ;
+-----------+------------+
| course_id | name       |
+-----------+------------+
| CS-101    | Srinivasan |
| CS-315    | Srinivasan |
| CS-347    | Srinivasan |
| FIN-201   | Wu         |
| MU-199    | Mozart     |
| PHY-101   | Einstein   |
| HIS-351   | El Said    |
| CS-319    | Katz       |
| BIO-101   | Crick      |
| BIO-301   | Crick      |
| CS-190    | Brandt     |
| EE-181    | Kim        |
| SELF-0    | NULL       |
| NULL      | Gold       |
| NULL      | Caliiferi  |
| NULL      | Singh      |
+-----------+------------+
16 rows in set (0.00 sec)

$$
\Pi_{l.course_id,r.name}(\sigma_{l.ID=r.ID}(\rho_l(teaches)⟗\rho_r(instructor)))
$$

除$\div$

设关系r的属性$R=(A_1,A_2,…,A_m,B_1,B_2,…,B_n)$

设关系s的属性$S=(B_1,B_2,…,B_n)$

所谓关系就是表

所谓属性就是表的列

则
$$
r\div s=(A_1,A_2,…,A_m)
$$

严谨定义为
$$
r\div s={t|t\in \Pi_{R-S}(r)\ \land\forall u\in s\rightarrow tu\in r}
$$

t元组的属性是R-S剩下的属性,

并且t和s中任何元组的积都属于r.

则t就是$r\div s$集合的元素

练习

练习1卷出最值

查出account表中最大(小)存款(balance)数

$$
\begin{aligned}
最大存款数&=\Pi_{balance}(account)-\Pi_{acnt.balance}(\sigma_{acnt.balance<account.balance}(\rho_{acnt}(account)\times account))\
最小存款数&=\Pi_{balance}(account)-\Pi_{account.balance}(\sigma_{acnt.balance<account.balance}(\rho_{acnt}(account)\times account))
\end{aligned}
$$
用$\sigma_{acnt.balance<account.balance}(\rho_{acnt}(account)\times account)$这个选择蹉跎一下,就包含了所有前存款小于后存款的条目,

这些条目中,每一条都是acnt.balance作为前项也就是小的一项,account.balance作为后项也就是大的一项,

那么acnt.balance不含最大项,account.balance不含最小项

练习2连卷转化

查出至少有一个账户上有多于700块钱存款的用户

用卷积做的话
$$
\Pi_{customer_name}(\sigma_{account.number=depositor.number}(\sigma_{account.balance>700}(account)\times depositor))
$$

用自然连接做的话
$$
\Pi_{customer_name}(\sigma_{account.balance>700}(account\Join depositor))
$$

练习3卷出多个账户

查询至少有两个账户的用户

$$
\Pi_{customer_name}(\sigma_{dpstr.customer_name=depositer.custormer_name\ \land \ dpstr.account_number\ne depositor.account_number}(\rho_{dpstr}(depositer)\times depositer))
$$

练习4先连后卷

查出至少在两个分行(branch)都有账户(account)的用户(customer)

需要注意的是,一个人可以在同一个支行有多个账号,只考虑查询账户数多于两个的用户不成立

显然先把两个表以account_number为公共属性,自然连接起来,对这个自然连接表进行卷积,就能蹉跎出同一个人的不同支行账户组合了
$$
\Pi_{customer_name}(\sigma_{l.custormer_name=r.custormer_name\land l.branch_name\ne r.branch_name}(\rho_l(depositor \Join account)\times \rho_r(depositor\Join account)))
$$

练习5自然连接

查找在布鲁克林市有账户的用户

$$
\Pi_{customer_name}(\sigma_{branch_city=Brooklyn}(account\Join branch))
$$

只要在是布鲁克林市有一个账户就可以了

练习6除法关系

查找在布鲁克林市任意支行都有账户的用户

$$
\Pi_{customer_name,branch_name}(\sigma_{branch_city=Brooklyn}(account\Join branch))\div \Pi_{branch_name}(\sigma_{branch_city=Brooklyn}(branch))
$$

dustland

关系代数和SQL语言