In the last lesson, we learned the basics of SQL and are well on our way to knowing just about everything we'd want to know about how to perform query operations on a single table. But of course, a database is a collection of multiple tables, with each table storing a different dimension of information. In our e-commerce example, we have a table for Products and another for Reviews. Why? We'll get deeper into this in an upcoming lesson on Domain Modeling (which just a fancy way of saying "how to design tables") – but, given the existing design, how do we ask questions of multiple tables? In this lesson, we'll take a deep dive into doing just that.

JOINs

Sometimes the answer to a question is contained within more than one table. Consider our Products table:

Name Price Department
Camera $299 Electronics
Sofa $699 Furniture
Dining Table $1299 Furniture
Toaster $79 Housewares

And Reviews:

Product Rating Body
Camera 4 Takes nice pictures!
Sofa 5 Comfy!
Camera 5 Best camera ever!!!
Toaster 3 Makes toast.

And the question we want answered is:

Can I see all the reviews for the products in the Furniture department?

The results – our 👶 baby table – and the WHERE clause would both need to contain columns from both the Products and Reviews tables. Let's follow our usual formula and see what happens:

  • Identify the table(s) that contain the information you need – Reviews and Products
  • Identify the columns the hold the information you need – the Product, Rating, and Body from Reviews, as well as Department from Products
  • Add any other conditions – products in the Furniture department

Before being able to ask the question, though, we need to add one additional step to this recipe.

  • Identify the thing the tables have in common – the Name from the Products table is equivalent to the Product column from the Reviews table

Now, we can ask the question:

SELECT Products.Department, Reviews.Product, Reviews.Rating, Reviews.Body
FROM Reviews INNER JOIN Products ON Products.Name = Reviews.Product
WHERE Products.Department = "Furniture";

Whew! There seems to be a lot more here than what we've seen thus far... but don't worry; it's actually quite straightforward. Let's break it down.

Because there are multiple tables involved, we're prefixing each column name with the name of the table and a dot . – e.g. because the Body column is in the Reviews table, we write it as Reviews.Body. This is prevent conflicts in naming – for instance, what if the Products and Reviews tables both had a Body column? This syntax clears that up.

And, the FROM clause is more complex now. Previously, it was just the name of the single table we were reading from. Now, we're reading from the intersection of the two tables. This intersection is called a JOIN, and the commonality between the two tables is specified in the ON clause. The intersection between the table happens when the Name from the Products table is equivalent to the Product column from the Reviews table, so it's ON Products.Name = Reviews.Product. Furthermore, if there was a third, fourth, fifth, etc. table involved, they would be chained together in the FROM clause with more INNER JOINs.

The Primary Key

A shocking confession: the examples I've provided thus far – the Products and Reviews tables from our fictional e-commerce store that sells a rather eclectic variety of stuff – hasn't been entirely realistic. The one thing that basically all tables in a relational database have, and that's missing from these tables, is some sort of unique identifier for each record. A primary key. If you've ever seen a database table outside of this course, you may have seen a variety of different types of primary keys. Based on the database design, sometimes you'll see something like a Social Security Number, or some other strictly-enforced integer value. Sometimes it's a long alphanumeric string based on a software algorithm. Or, a lot of the time, it's simply an integer that starts at 1 and goes up by 1, forever. The auto-incrementing integer primary key is fairly common for a freshly-built application nowadays, and that will be the rule for primary keys that we'll follow for the reminder of this course.

https://twitter.com/IncredibleCulk/status/1298730289737293824

Have a look at the permanent URL for the tweet above – that URL never changes, and will always direct you to the page for that tweet, unless Mr. Culkin decides to delete it. What is the 1298730289737293824 all about? That's the primary key. That's right, there was a tweet #1 at some point, and at the time this tweet was posted, it was the 1,298,730,289,737,293,824th tweet that had ever been posted. The number never goes down, and the number never gets reused. Luckily, there are a lot of numbers in the universe still left to use.

Foreign Keys

Now that we know what a primary key is, let's look at why it's so important to use them in database design. First and foremost, we use primary keys to make our data design less fragile. To see what I mean by that – let's look at a quick example. If we look at the existing table design we've been working with, which does not contain primary keys, we can see how we can easily break the integrity of the data. For instance, what if we change the name of the "camera" product to something else?

Name Price Department
Camera Digital SLR Camera $299 Electronics
Sofa $699 Furniture
Dining Table $1299 Furniture
Toaster $79 Housewares
Product Rating Body
Camera 4 Takes nice pictures!
Sofa 5 Comfy!
Camera 5 Best camera ever!!!
Toaster 3 Makes toast.

Anarchy! We've immediately broken the relationship between the Products and Reviews tables. Now, the rows in Reviews that previously referenced the "Camera" are now orphaned, that is, they are left with no valid parent record that is referenced in its Product column. The SQL query that we just wrote to get product reviews in the furniture department – broken. Of course, we could update each review to reference the new name of the product. That would be easy enough in a database of this size, but certainly not in a database of more complex design and possibly millions of records. It's much better to update our database design to include a primary key.

ID Name Price Department
1 Digital SLR Camera $299 Electronics
2 Sofa $699 Furniture
3 Dining Table $1299 Furniture
4 Toaster $79 Housewares

The auto-incrementing primary key ID column starts at 1 and simply increments by 1 for each new record in our Products table. Next, we slightly re-design the Reviews table to point to the product's ID instead of its Name:

ID ProductID Rating Body
1 1 4 Takes nice pictures!
2 2 5 Comfy!
3 1 5 Best camera ever!!!
4 3 3 Makes toast.

And to fix our SQL from above, we join by the ID and ProductID, instead of the Name of the product:

SELECT Products.Department, Reviews.Product, Reviews.Rating, Reviews.Body
FROM Reviews INNER JOIN Products ON Products.ID = Reviews.ProductID
WHERE Products.Department = "Furniture";

The ProductID in the Reviews column is known as the foreign key. A foreign key is simply a column in a table that points to a primary key in another table.

Lab 4: JOINs with Primary/Foreign Keys

See the files named lab-4-*.sql for the questions, and modify each file until you achieve the desired result. See the answers when done.