SQL 3: Modifying the Database – Data and Structure
In the first and second lessons within our SQL unit, we've assumed that we're dealing with an existing database with existing data. This would certainly be the case for many people whose job it is to analyze and report on existing data – business analysts, data scientists, etc. For many of these job roles, what we've done so far might be all you'd ever need.
In this lesson, we're going to look at the other side of the equation. What is the world of SQL like for those who design databases, as well as those who add and manipulate data within the database design? The database administrators and developers who work on the applications that gather and generate the data that others consume – how do they use SQL?
Generally speaking, there are three main types (roles) of database users:
- Those who have the ability to read from the database – i.e. can do
SELECT
- Those who have the ability to read and write – i.e. can add/edit/delete rows of data but not the structure of the tables
- Those who have the ability to change the structure of the database – i.e. can change the tables and columns within
There are certainly sub-roles of each type of user, e.g. a user that can add a row but not delete, but let's stick to these three for now.
In a company setting, the person or team who administers the database can often allow/disallow access to some or any of these roles. With our SQLite databases, however, we are going to assume the default settings, which is that we have access to do any of these things.
A Blank Slate
While many developers/DBA do work on existing databases, we're going to start from the very beginning, so we can catch a glimpse of everything we can do.
If you're already running the Baseball database with sqlite
, exit back to the command-line prompt by typing:
.exit
Once you're back at the prompt, let's create a new database. This one will be for a fictional school, so let's type the following:
sqlite3 school.sqlite3
This will create a new file called school.sqlite3
(which is a brand-new, empty database) and start sqlite3
again, using that database instead of the baseball database. Let's see what tables this new database contains:
.schema
When we ran this same command inside the baseball database, we saw the structure of the teams
, players
, and stats
tables within. But since this is a new, empty database, the .schema
command returns no output.
Creating Tables
What's the command in SQL that creates a table? We've already seen it – when we ran .schema
in the baseball database – it's CREATE TABLE
. Let's create a table that we'll use to store information about the students in our course:
CREATE TABLE students (
id INTEGER PRIMARY KEY AUTOINCREMENT,
first_name TEXT,
last_name TEXT
);
Don't forget that semi-colon! As we see here, the syntax is quite straightforward – it's CREATE TABLE
, followed by the name of the table we want to create, followed by a list of columns in parentheses. And, the column list is separated by commas. We can also see that our primary key field is an integer that automatically increments by 1 with every new record that is created.
If we ask for the .schema
again, we'll see that our table has been created!
Column Types
In the CREATE TABLE
command we just ran, each column in our list of columns has a type – i.e. INTEGER
or TEXT
. The list of available types does differ from one database software to another; for example, the list of available types is slightly different between SQLite, MySQL, and PostgreSQL. In SQLite, the main data types we'll be concerned within the context of this course are:
INTEGER
– a whole numberREAL
– a floating point number with decimalsTEXT
– words, not just numbers
Just to make a finer point of it, in MySQL, these same data types are called INT
, VARCHAR
, and FLOAT
. These are not going to be consistent across different database vendors. In a later lesson, we'll see how to properly handle these inconsistencies when building database-backed applications, but for now, we'll only worry about the types in SQLite.
Modifying Tables
Oops! We forgot to add a column in our students
table to hold our students' email addresses. Since we already created our table, what can we do?
The first option is to delete the table entirely, and then create it again, adding the column we want this time around:
DROP TABLE students;
CREATE TABLE students (
id INTEGER PRIMARY KEY AUTOINCREMENT,
first_name TEXT,
last_name TEXT,
email TEXT
);
If we type .schema
again, we can see that this worked fine. Our students
table looks exactly the way we want it. But, there's a problem.
What if there is data in the table already?
If we DROP TABLE
, the entire table and its data goes away. No "are you sure?", no undo function 😱. Obviously, this a dangerous operation and we should only do it if we are absolutely sure we are ok with losing any existing data in the table. Fortunately, there's a less scary way to simply add a column to our table. Let's add a phone number column:
ALTER TABLE students ADD COLUMN phone_number TEXT;
Much less scary, since it simply alters the structure of the existing table and adds a column. And, we can also decide that we don't want that column anymore and remove it – with the same caveat that comes with DROP TABLE
– we must be absolutely sure we don't want the data in that column anymore, because it will go away forever.
ALTER TABLE students DROP COLUMN phone_number;
Manipulating Data – CRUD
CRUD is a fun acronym that refers to the four main operations that can be performed on an existing database table:
- Create – add rows of data
- Read – basically everything we've done up until now
- Update – edit an existing row of data
- Delete – delete or remove an existing row of data
We've already seen how to read data – we use the SELECT
statement and its many optional clauses to produce whatever output we want from the database. Now, let's examine how we use SQL to perform the other three operations – Create, Update, and Delete.
Creating Data – The INSERT
Statement
Let's fire up our School database again, and create some data in our Students
table:
INSERT INTO students (
first_name,
last_name,
email
)
VALUES (
"Jane",
"Doe",
"jane@example.com"
);
We can then SELECT * FROM students;
to see if it worked:
+----+------------+-----------+------------------+
| id | first_name | last_name | email |
+----+------------+-----------+------------------+
| 1 | Jane | Doe | jane@example.com |
+----+------------+-----------+------------------+
To break our INSERT
statement down:
- The
INTO
clause specifies the table name - A comma-delimited list of columns inside parentheses
- The
VALUES
clause specifies another comma-delimited list of values, which follows the same order as the list of columns
Also, note that the id
column was automatically set to 1 – our auto-incrementing primary key is working!
Updating Data – The UPDATE
Statement
Let's suppose the student we created in the last step changes her email address. Let's update it:
UPDATE students
SET email = "janedoe@gmail.com"
WHERE id = 1;
Again, we can use SELECT * FROM students;
to see the results:
+----+------------+-----------+-------------------+
| id | first_name | last_name | email |
+----+------------+-----------+-------------------+
| 1 | Jane | Doe | janedoe@gmail.com |
+----+------------+-----------+-------------------+
As we can see, there are 3 parts to a typical UPDATE
statement:
- A table name –
students
- A
SET
clause, where we can set the new column values – multiple columns can be specified here, when separated by a comma - A
WHERE
clause, which specifies conditions, i.e. which rows to update
The WHERE
clause is the most critical part of the UPDATE
statement! If we get this wrong, we'll update the wrong row(s) of data. Even worse, if we omit the WHERE
clause entirely, we will update ALL rows of data. We must always ensure that we have a WHERE
clause – one that refers to an id
when updating a single row of data, or one that points to specific conditions if updating multiple rows.
Removing Data – The DELETE
Statement
Finally, the DELETE
statement gives us the ability to permanently remove rows of data from our database tables. This is another command, like UPDATE
, that should always contain a WHERE
clause, unless you really do want to delete every row from the table. Let's remove Jane Doe from our students
table, using the id
of the record in question:
DELETE FROM students
WHERE id = 1;
Doing a SELECT * FROM students;
will now yield no results. For fun, let's do one final INSERT
into the table. It's the same INSERT
we did before:
INSERT INTO students (
first_name,
last_name,
email
)
VALUES (
"Jane",
"Doe",
"jane@example.com"
);
Let's check the new contents using SELECT * FROM students;
+----+------------+-----------+------------------+
| id | first_name | last_name | email |
+----+------------+-----------+------------------+
| 2 | Jane | Doe | jane@example.com |
+----+------------+-----------+------------------+
Even though we've added the exact same data as we did before, our new record now has the id
of 2
– primary keys fields automatically increment by 1 and are never reused!
Conclusion
At this point, we have all the SQL syntax we need to fully manipulate both the structure and the data in our database. In the next lesson, we will begin to look at how to design multiple tables and the relationships in our database, so we can build applications from scratch.