Source From: https://15445.courses.cs.cmu.edu/fall2020/schedule.html
Relational Languages
Edgar Codd published a major paper on relational models in the early 1970s. Originally, he only defined the mathematical notation(符号) for how a DBMS could execute queries on a relational model DBMS.
The user only needs to specify the result that they want using a declarative language(i.e. SQL). The DBMS is responsible fot determining the most efficient plan to produce that answer.
Relational algebra is based on sets(unordered, no duplicates). SQL is based on bags(unordered, allows duplicates).
SQL History
SQL is a declarative query language for relational databases. It was originally developed in the 1970s as part of the IBM System R project. IBM originally called it “SEQUEL” (Structured English Query Language). The name changed in the 1980s to just “SQL”(Structured Query Language).
The language is comprised of different classes of commands.
- Data Manipulation Language(DML): SELECT, INSERT, UPDATE, and DELETE statements.
- Data Definition Language(DDL): Schema definitions for tables, indexes, views, and other objects.
- Data Control Language(DCL): Security, access controls.
SQL is not a dead language. It is being update with new features every couple years. SQL-92 is the minimum that a DBMS has to support to claim they support SQL. Each follows the standard to a certain degree but there are many proprietary extensions.
Aggregates
An aggregation function takes in a bag of tuples as its input and then produces a single scalar value as its output. Aggregate functions can only be used in SELECT output list. SELECT AVG(gpa), COUNT(*) FROM student WHRER login LIKE '%@cs'
Output of other columns outside of an aggregate is undefined. Thus, other columns outside aggregate must be aggregated or used in a GROUP BY command.
1 | SELECT AVG(s.gpa), e.cid |
HAVING is a filter output results after aggregation. Like a WHERE clause for a GROUP BY.
1 | SELECT AVG(s.gpa) AS avg_gpa, e.cid |
String Operations
The SQL standard says that strings are case sensitive and single-quotes only. There are functions to manipulate strings that can be used in any part of a query.
Pattern Matching: The LIKE keyword is used for string matching in predicates.
- “%” matches any substrings(including empty)
- “-“ matches any one character.
Concatenation: Two vertical bars(“||”) will concatenate two or more strings together into a single string.
Output Redirection
Instead of having the result a query returned to the client(e.g. terminal), you can tell the DBMS to store the results into another table. You can then access this data in subsequent queries.
- New Table: Store the output if the query into a new (parmanent) table.
SELECT DISTINCT cid INTO CourseIds FROM enrolled
- Existing Table: Store the output of the query into a table that already exists in the database. The target table must have the same number of columns with the same types as the target table, but the names of the columns in the ouput query do not have to match.
INSERT INTO CourseIds (SELECT DISTINCT cid FROM enrolled)
Output Control
Since the results SQL are unordered, you have to use the ORDER BY clause to impose a sort on tuples.
1 | SELECT sid FROM enrolled WHERE cid = '15-721' |
You can use multiple ORDER BY clauses to break ties or do more complex sorting.
1 | SELECT sid FROM enrolled WHERE cid = '15-721' |
You can also use any arbitrary expression in the ORDER BY clause.
1 | SELECT sid FROM enrolled WHERE cid = '15-721' |
By default, the DBMS will return all of the tuples produced by the query. You can use the LIMIT clause to restrict the number of result tuples.
1 | SELECT sid, name, FROM student WHERE login LIKE '%@cs' LIMIT 10; |
Can also provide an offset to return a range in the results:
1 | SELECT sid, name FROM student WHERE login LIKE '%@cs' LIMIT 20 OFFSET 10; |
Unless you use an ORDER BY clause with a LIMIT, the DBMS could produce different tuples in the result on each invocation of the query because the relational model dose not impose an ordering.
Nested Queries
Invoke(调用) queries inside of other queries to execute more complex logic within a single query. The scope of outer query is included in inner query(i.e. inner query can access attributes from outer query), but not the other way around.
Inner queries can appear in almost any part of a query:
- SELECT Output Targets:
SELECT (SELECT 1) AS one FROM student
- FROM Clause:
SELECT name FROM student AS s, (SELECT sid FROM enrolled) AS e WHERE s.sid = e.sid
- WHERE Clause:
SELECT name FROM student WHERE sid IN (SELECT sid FROM enrolled)
Nest Query Results Expression
- ALL: Must satisfy expression for all rows in sub-query.
- ANY: Must satisfy expression for at least one row in sub-query.
- IN: Equivalent to =ANY().
- EXISTS: At least one row is returned.
Window Functions
Performs “moving” caculation across set of tuples. Like an aggregation but it still returns the original tuples. The window function can be any of the aggregation functions that we discussed above. There are also special window functions:
- ROW_NUMBER: The number of the current row.
- RANK: The order position of the current row.
Grouping: The OVER clause specifies how to group together tuples when computing the window function. Use PARTITION BY to specify group.
1 | SELECT cid, sid, ROW_NUMBER() OVER(PARTITION BY cid) |
You can also put an ORDER BY within OVER to ensure a deterministic ordering of results even if database changes internally.
1 | SELECT *, ROW_NUMBER() OVER (ORDER BY cid) |
Important: The DBMS computes RANK after the window function sorting, whereas it computes ROW_NUMBER before the sorting.
Common Table Expressions
Common Table Expressions(CTEs) are an alternative to windows or nested queries to writing more complex queries. One can think of a CTE like a temporary table for just on query. The WITH clause binds the output of the inner query to a temporary result with that name.
Generate a CTE called cteName that contains a single tuple with a single attribute set to “1”. The query at the bottom then just returns all the attributes from cteName.
1 | WITH cteName AS ( |
You can bind output columns to names before the AS:
1 | WITH cteName (col1, col2) AS ( |
A single query can contain multiple CTE declaratios:
1 | WITH cte1 (col1) AS ( |
Adding the RECURSIVE keyword after WITH allows a CTE to reference itself.
1 | WITH RECURSIVE cteSource (counter) AS ( |
- 本文作者: 夏花
- 本文链接: http://xiahua19.github.io/2022/08/30/cmu15-2-Intermediate-SQL/
- 版权声明: 本博客所有文章除特别声明外,均采用 MIT 许可协议。转载请注明出处!