In the last lesson, we learned the basics of SQL and are well on our way to knowing just about everything we'd want to know about how to perform query operations on a single table. But of course, a database is a collection of multiple tables, with each table storing a different dimension of information. In our e-commerce example, we have a table for Products and another for Reviews. Why? We'll get deeper into this in an upcoming lesson on Domain Modeling (which just a fancy way of saying "how to design tables") – but, given the existing design, how do we ask questions of multiple tables? In this lesson, we'll take a deep dive into doing just that.
JOINs
Sometimes the answer to a question is contained within more than one table. Consider our Products
table:
Name | Price | Department |
---|---|---|
Camera | $299 | Electronics |
Sofa | $699 | Furniture |
Dining Table | $1299 | Furniture |
Toaster | $79 | Housewares |
And Reviews
:
Product | Rating | Body |
---|---|---|
Camera | 4 | Takes nice pictures! |
Sofa | 5 | Comfy! |
Camera | 5 | Best camera ever!!! |
Toaster | 3 | Makes toast. |
And the question we want answered is:
Can I see all the reviews for the products in the Furniture department?
The results – our 👶 baby table – and the WHERE
clause would both need to contain columns from both the Products
and Reviews
tables. Let's follow our usual formula and see what happens:
- Identify the table(s) that contain the information you need –
Reviews
andProducts
- Identify the columns the hold the information you need – the
Product
,Rating
, andBody
fromReviews
, as well asDepartment
fromProducts
- Add any other conditions – products in the Furniture department
Before being able to ask the question, though, we need to add one additional step to this recipe.
- Identify the thing the tables have in common – the
Name
from theProducts
table is equivalent to theProduct
column from theReviews
table
Now, we can ask the question:
SELECT Products.Department, Reviews.Product, Reviews.Rating, Reviews.Body
FROM Reviews INNER JOIN Products ON Products.Name = Reviews.Product
WHERE Products.Department = "Furniture";
Whew! There seems to be a lot more here than what we've seen thus far... but don't worry; it's actually quite straightforward. Let's break it down.
Because there are multiple tables involved, we're prefixing each column name with the name of the table and a dot .
– e.g. because the Body
column is in the Reviews
table, we write it as Reviews.Body
. This is prevent conflicts in naming – for instance, what if the Products
and Reviews
tables both had a Body
column? This syntax clears that up.
And, the FROM
clause is more complex now. Previously, it was just the name of the single table we were reading from. Now, we're reading from the intersection of the two tables. This intersection is called a JOIN
, and the commonality between the two tables is specified in the ON
clause. The intersection between the table happens when the Name
from the Products
table is equivalent to the Product
column from the Reviews
table, so it's ON Products.Name = Reviews.Product
. Furthermore, if there was a third, fourth, fifth, etc. table involved, they would be chained together in the FROM
clause with more INNER JOIN
s.
The Primary Key
A shocking confession: the examples I've provided thus far – the Products
and Reviews
tables from our fictional e-commerce store that sells a rather eclectic variety of stuff – hasn't been entirely realistic. The one thing that basically all tables in a relational database have, and that's missing from these tables, is some sort of unique identifier for each record. A primary key. If you've ever seen a database table outside of this course, you may have seen a variety of different types of primary keys. Based on the database design, sometimes you'll see something like a Social Security Number, or some other strictly-enforced integer value. Sometimes it's a long alphanumeric string based on a software algorithm. Or, a lot of the time, it's simply an integer that starts at 1 and goes up by 1, forever. The auto-incrementing integer primary key is fairly common for a freshly-built application nowadays, and that will be the rule for primary keys that we'll follow for the reminder of this course.
Have a look at the permanent URL for the tweet above – that URL never changes, and will always direct you to the page for that tweet, unless Mr. Culkin decides to delete it. What is the 1298730289737293824 all about? That's the primary key. That's right, there was a tweet #1 at some point, and at the time this tweet was posted, it was the 1,298,730,289,737,293,824th tweet that had ever been posted. The number never goes down, and the number never gets reused. Luckily, there are a lot of numbers in the universe still left to use.
Foreign Keys
Now that we know what a primary key is, let's look at why it's so important to use them in database design. First and foremost, we use primary keys to make our data design less fragile. To see what I mean by that – let's look at a quick example. If we look at the existing table design we've been working with, which does not contain primary keys, we can see how we can easily break the integrity of the data. For instance, what if we change the name of the "camera" product to something else?
Name | Price | Department |
---|---|---|
$299 | Electronics | |
Sofa | $699 | Furniture |
Dining Table | $1299 | Furniture |
Toaster | $79 | Housewares |
Product | Rating | Body |
---|---|---|
Camera | 4 | Takes nice pictures! |
Sofa | 5 | Comfy! |
Camera | 5 | Best camera ever!!! |
Toaster | 3 | Makes toast. |
Anarchy! We've immediately broken the relationship between the Products
and Reviews
tables. Now, the rows in Reviews
that previously referenced the "Camera" are now orphaned, that is, they are left with no valid parent record that is referenced in its Product
column. The SQL query that we just wrote to get product reviews in the furniture department – broken. Of course, we could update each review to reference the new name of the product. That would be easy enough in a database of this size, but certainly not in a database of more complex design and possibly millions of records. It's much better to update our database design to include a primary key.
ID | Name | Price | Department |
---|---|---|---|
1 | Digital SLR Camera | $299 | Electronics |
2 | Sofa | $699 | Furniture |
3 | Dining Table | $1299 | Furniture |
4 | Toaster | $79 | Housewares |
The auto-incrementing primary key ID
column starts at 1 and simply increments by 1 for each new record in our Products
table. Next, we slightly re-design the Reviews
table to point to the product's ID
instead of its Name
:
ID | ProductID | Rating | Body |
---|---|---|---|
1 | 1 | 4 | Takes nice pictures! |
2 | 2 | 5 | Comfy! |
3 | 1 | 5 | Best camera ever!!! |
4 | 3 | 3 | Makes toast. |
And to fix our SQL from above, we join by the ID
and ProductID
, instead of the Name
of the product:
SELECT Products.Department, Reviews.Product, Reviews.Rating, Reviews.Body
FROM Reviews INNER JOIN Products ON Products.ID = Reviews.ProductID
WHERE Products.Department = "Furniture";
The ProductID
in the Reviews
column is known as the foreign key. A foreign key is simply a column in a table that points to a primary key in another table.
Lab 4: JOINs with Primary/Foreign Keys
See the files named lab-4-*.sql
for the questions, and modify each file until you achieve the desired result. See the answers when done.