Source From: https://15445.courses.cs.cmu.edu/fall2020/schedule.html
Databases
A database is an organized collection of inter-related data that models some aspect of the real-world(e.g. modeling the students in a class or a digital music store).
People often confuse “databases” with “databases management systems” (e.g. MySQL, Oracle, MongoDB). A database management system(DBMS) is the software that manages a database.
Consider a database that models a digital music store(e.g. Spotify). Let the database hold information about the artists and which albums those artists have released.
Flat File Strawman
Database is stored as comma-separated value(CSV) files that the DBMS manages. Each entity will be stored in its own file. The application has to parse files each time it wants to read or update records. Each entity has its own set of attributes, so in each file, different records are delimited by new lines, while each of the corresponding attributes within a record are delimited by a comma.
Keeping along with the digital music store example, there would be two files: one for artist and the other for album. An artist could have a name, year, and country attributes, while an album has name, artist and year attributes.
Issues with Flat File
- Data Integrity
- How do we ensure that the artist is the same for each album entry?
- What if somebody overwrites the album year with an invalid string?
- How do we store theat there are multiple artists on one album?
- Implementation
- How do you find a particular record?
- What if we now want to create a new application that uses the same database?
- What if two threads try to write to the same file at the same time?
- Durability
- What if machine crashes while out program is updating a record?
- What if we want to replicate the database on multiple machines for high availability?
Database Management System
A DBMS is a software that allows applications to store and analyze information in a database.
A general-purpos DBMS is designed to allow the definition, creation, querying, updation, and administration of databases.
Database applications were difficult to build and maintain because there was a tight coupling between logical and physical layers. The logical layer is which entities and attributes the database has while the physical layer is how those entities and attributes are being stored. Early on, the physical layer was defined in the application code, so if we wanted to change the physical layer the application was using, we would have to change all of the code to match the new physical layer.
Relational Model
Ted Codd noticed that people were rewriting DBMSs every time they wanted to change the physical layer, so in 1970 he proposed the relational model to avoid this. This relational model has three key points:
- Store database in simple data structures(relations).
- Access data through high-level language.
- Physical storage left up to implementation.
A data model is a collection of concepts for describing the data in a database. The relational model is an example of a data model.
A scheme is a description of a particular collection of data, using a given data model.
The relational data model defines three concepts:
- Structure: The definition of relations and their contents. This is the relations have and the values that those attributes can hold.
- Integrity: Ensure the database’s contents satify constraints. An example constraint would be that any value for the year attribute has to be a number.
- Manipulation: How to access and modify a database’s contents.
A relation is an unordered set that contains the relationship of attributes that represent entities. Since the relationships are unordered, the DBMS can store them in any way it wants, allowing for optimization.
A turple is a set of attribute values(also known as its domain) in the relation. Originally, values had to be atomic or scalar, but now values can also be lists or nested(嵌套的) data structures. Every attribute can be a special value, NULL, which means for a given turple the attribute is undefined.
A relation with n attributes is called an n-array relation.
A relation’s primary key uniquely identifies a single tuple. Some DBMSs automatically create an internal primary key if you do not define one. A lot of DBMSs have support fot autogenerated keys so an application does not have to manually increment the keys.
A foreign key specifies that an attribute from on relation has to map to a tuple in another relation.
Data Manipulation Languages(DMLs)
A language to store and retrieve information from a database. There are two classes of languages for this.
Procedural: The query specifies the (high-level) strategy the DBMS should use to find the desired result. This is called Relational Algebra
, which use fundamental operations to retrieve and manipulate tuples in s relation, based on set algebra. Each operator takes one or more relations as its inputs and outputs a new relation. So, we can chain operators together to create more complex operations.
Some operators: $\sigma$ means Select, $\pi$ means Projection, $\cup$ means Union, $\cap$ means Intersection, $-$ means Difference, $\times$ means Product, $\bowtie$ means Join.
- Select: Choose a subset of the tuples from a relation that satisfies a selection predicate(谓词). $\sigma_{predicate}(R)$,
SELECT * FROM R WHERE predicate
- Predicate acts as a filter to retain only tuples that fufill its qualifying requirement.
- Can combine multiple predicates using conjuctions(合取) / disjunctions(析取).
- Projection: Generate a relation with tuples that contains only the specified attributes. $\pi_{A_{1}A_{2},…,A_{n}}(R)$,
SELECT A1, A2,...,An FROM R
- Can rearrange attributes’ ordering.
- Can manipulate the values.
- Union: Generate a relation that contains all tuples that appear in either only one or both input relations. $R\cup S$
(SELECT * FROM R) UNION ALL (SELECT * FROM S)
- Intersection: Generate a relation that contains only the turples that appear in both of the input relations. $(R \cap S)$
(SELECT * FROM R) INTERSECT (SELECT * FROM S)
- Difference: Generate a relation that contains only the turples that appear in the first and not the second of the input relations. $R - S$
(SELECT * FROM R) EXCEPT (SELECT * FROM S)
- Product: Generate a relation that contains all possible combinaitions of tuples from the input relations. $R \times S$
SELECT * FROM R CROSS JOIN S
,SELECT * FROM R, S
- Join: Generate a relation that contains all tuples that are a combination of two tuples(one from each input relation) with a common value(s) for one or more attributes.
Non-Procedural: The query specifies only what data is wanted and not how to find it. This is calledRelational Calculus
.
- 本文作者: 夏花
- 本文链接: http://xiahua19.github.io/2022/08/30/cmu15-1-Relational-Model/
- 版权声明: 本博客所有文章除特别声明外,均采用 MIT 许可协议。转载请注明出处!