SQL 3: Modifying the Database – Data and Structure

In the first and second lessons within our SQL unit, we've assumed that we're dealing with an existing database with existing data. This would certainly be the case for many people whose job it is to analyze and report on existing data – business analysts, data scientists, etc. For many of these job roles, what we've done so far might be all you'd ever need.

In this lesson, we're going to look at the other side of the equation. What is the world of SQL like for those who design databases, as well as those who add and manipulate data within the database design? The database administrators and developers who work on the applications that gather and generate the data that others consume – how do they use SQL?

Generally speaking, there are three main types (roles) of database users:

Those who have the ability to read from the database – i.e. can do SELECT
Those who have the ability to read and write – i.e. can add/edit/delete rows of data but not the structure of the tables
Those who have the ability to change the structure of the database – i.e. can change the tables and columns within

There are certainly sub-roles of each type of user, e.g. a user that can add a row but not delete, but let's stick to these three for now.

In a company setting, the person or team who administers the database can often allow/disallow access to some or any of these roles. With our SQLite databases, however, we are going to assume the default settings, which is that we have access to do any of these things.

A Blank Slate

While many developers/DBA do work on existing databases, we're going to start from the very beginning, so we can catch a glimpse of everything we can do.

If you're already running the Baseball database with sqlite, exit back to the command-line prompt by typing:

.exit

Once you're back at the prompt, let's create a new database. This one will be for a fictional school, so let's type the following:

sqlite3 school.sqlite3

This will create a new file called school.sqlite3 (which is a brand-new, empty database) and start sqlite3 again, using that database instead of the baseball database. Let's see what tables this new database contains:

.schema

When we ran this same command inside the baseball database, we saw the structure of the teams, players, and stats tables within. But since this is a new, empty database, the .schema command returns no output.

Creating Tables

What's the command in SQL that creates a table? We've already seen it – when we ran .schema in the baseball database – it's CREATE TABLE. Let's create a table that we'll use to store information about the students in our course:

CREATE TABLE students (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  first_name TEXT,
  last_name TEXT
);

Don't forget that semi-colon! As we see here, the syntax is quite straightforward – it's CREATE TABLE, followed by the name of the table we want to create, followed by a list of columns in parentheses. And, the column list is separated by commas. We can also see that our primary key field is an integer that automatically increments by 1 with every new record that is created.

If we ask for the .schema again, we'll see that our table has been created!

Column Types

In the CREATE TABLE command we just ran, each column in our list of columns has a type – i.e. INTEGER or TEXT. The list of available types does differ from one database software to another; for example, the list of available types is slightly different between SQLite, MySQL, and PostgreSQL. In SQLite, the main data types we'll be concerned within the context of this course are:

INTEGER – a whole number
REAL – a floating point number with decimals
TEXT – words, not just numbers

Just to make a finer point of it, in MySQL, these same data types are called INT, VARCHAR, and FLOAT. These are not going to be consistent across different database vendors. In a later lesson, we'll see how to properly handle these inconsistencies when building database-backed applications, but for now, we'll only worry about the types in SQLite.

Modifying Tables

Oops! We forgot to add a column in our students table to hold our students' email addresses. Since we already created our table, what can we do?

The first option is to delete the table entirely, and then create it again, adding the column we want this time around:

DROP TABLE students;

CREATE TABLE students (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  first_name TEXT,
  last_name TEXT,
  email TEXT
);

If we type .schema again, we can see that this worked fine. Our students table looks exactly the way we want it. But, there's a problem.

What if there is data in the table already?

If we DROP TABLE, the entire table and its data goes away. No "are you sure?", no undo function 😱. Obviously, this a dangerous operation and we should only do it if we are absolutely sure we are ok with losing any existing data in the table. Fortunately, there's a less scary way to simply add a column to our table. Let's add a phone number column:

ALTER TABLE students ADD COLUMN phone_number TEXT;

Much less scary, since it simply alters the structure of the existing table and adds a column. And, we can also decide that we don't want that column anymore and remove it – with the same caveat that comes with DROP TABLE – we must be absolutely sure we don't want the data in that column anymore, because it will go away forever.

ALTER TABLE students DROP COLUMN phone_number;

Manipulating Data – CRUD

CRUD is a fun acronym that refers to the four main operations that can be performed on an existing database table:

Create – add rows of data
Read – basically everything we've done up until now
Update – edit an existing row of data
Delete – delete or remove an existing row of data

We've already seen how to read data – we use the SELECT statement and its many optional clauses to produce whatever output we want from the database. Now, let's examine how we use SQL to perform the other three operations – Create, Update, and Delete.

Creating Data – The `INSERT` Statement

Let's fire up our School database again, and create some data in our Students table:

INSERT INTO students (
  first_name,
  last_name,
  email
)
VALUES (
  "Jane",
  "Doe",
  "jane@example.com"
);

We can then SELECT * FROM students; to see if it worked:

+----+------------+-----------+------------------+
| id | first_name | last_name |      email       |
+----+------------+-----------+------------------+
| 1  | Jane       | Doe       | jane@example.com |
+----+------------+-----------+------------------+

To break our INSERT statement down:

The INTO clause specifies the table name
A comma-delimited list of columns inside parentheses
The VALUES clause specifies another comma-delimited list of values, which follows the same order as the list of columns

Also, note that the id column was automatically set to 1 – our auto-incrementing primary key is working!

Updating Data – The `UPDATE` Statement

Let's suppose the student we created in the last step changes her email address. Let's update it:

UPDATE students
SET email = "janedoe@gmail.com"
WHERE id = 1;

Again, we can use SELECT * FROM students; to see the results:

+----+------------+-----------+-------------------+
| id | first_name | last_name |       email       |
+----+------------+-----------+-------------------+
| 1  | Jane       | Doe       | janedoe@gmail.com |
+----+------------+-----------+-------------------+

As we can see, there are 3 parts to a typical UPDATE statement:

A table name – students
A SET clause, where we can set the new column values – multiple columns can be specified here, when separated by a comma
A WHERE clause, which specifies conditions, i.e. which rows to update

The WHERE clause is the most critical part of the UPDATE statement! If we get this wrong, we'll update the wrong row(s) of data. Even worse, if we omit the WHERE clause entirely, we will update ALL rows of data. We must always ensure that we have a WHERE clause – one that refers to an id when updating a single row of data, or one that points to specific conditions if updating multiple rows.

Removing Data – The `DELETE` Statement

Finally, the DELETE statement gives us the ability to permanently remove rows of data from our database tables. This is another command, like UPDATE, that should always contain a WHERE clause, unless you really do want to delete every row from the table. Let's remove Jane Doe from our students table, using the id of the record in question:

DELETE FROM students
WHERE id = 1;

Doing a SELECT * FROM students; will now yield no results. For fun, let's do one final INSERT into the table. It's the same INSERT we did before:

INSERT INTO students (
  first_name,
  last_name,
  email
)
VALUES (
  "Jane",
  "Doe",
  "jane@example.com"
);

Let's check the new contents using SELECT * FROM students;

+----+------------+-----------+------------------+
| id | first_name | last_name |      email       |
+----+------------+-----------+------------------+
| 2  | Jane       | Doe       | jane@example.com |
+----+------------+-----------+------------------+

Even though we've added the exact same data as we did before, our new record now has the id of 2 – primary keys fields automatically increment by 1 and are never reused!

Conclusion

At this point, we have all the SQL syntax we need to fully manipulate both the structure and the data in our database. In the next lesson, we will begin to look at how to design multiple tables and the relationships in our database, so we can build applications from scratch.