Intro to Domain Modeling

In the last lesson, we learned about SQL – its syntax and how the language is used to CRUD data as well as manipulate the design of database tables. When designing database tables from a blank slate, though, how do we get started? What decisions influence what tables and columns will be created? How do we balance a database design that is simple and easy-to-understand, versus one that is most performant? The approach to answering these questions and more is known as domain modeling.

It's Your World

The actual, physical world that is all around us is composed of many, many, billions of objects – living or not – each with their own behavior, attributes, and idiosyncrasies. It's the job of scientists and domain experts to study these objects and the ways in which they interact with each other, and to develop theories and laws to describe their observations, so that we can better understand the way the world works. It's all very complicated.

The world of the software we create is not nearly this complex. Everything that exists in this world is entirely up to us. We define what objects live in this world, their attributes and behaviors, and their relationships with one another. The result is a domain model of our own creation, and it can be as simple or as complex we need it to be.

From the Wikipedia article on domain modeling:

A domain model is a system of abstractions that describes selected aspects of a sphere of knowledge, influence or activity (a domain). The model can then be used to solve problems related to that domain. The domain model is a representation of meaningful real-world concepts pertinent to the domain that need to be modeled in software. The concepts include the data involved in the business and rules the business uses in relation to that data. A domain model leverages natural language of the domain.

A domain model generally uses the vocabulary of the domain, thus allowing a representation of the model to be communicated to non-technical stakeholders.

Oof. That's a lot of words. Let's TL;DR this thing. A domain model:

Is a real-world concept (aka domain) represented as software
Contains only the data and rules involved in the domain
Is described using the actual words (i.e. vocabulary) used by those working in the domain

It's important to note that a domain model doesn't need to include everything in the domain – only the pertinent bits we need right now in order to build our system. We can always add more later – it's just a CREATE TABLE statement, right? But how do we determine what is needed right now? Let's take a look at this through the lens of a startup.

Minimum Viable Product (MVP)

Popularized in the early 2000s, most notably by Eric Ries in his book The Lean Startup, the Minimum Viable Product (MVP) approach to building software products has become the de-facto standard in the startup community. There is plenty of in-depth reading and research we can do on the topic, both in-print and online; we're going to give you the highlight reel here.

In short, building an MVP means that, when building a new software product, we build only the features necessary (minimum) in order to get feedback from real users (viable). Build any more than those core features, and you're just wasting time and money building something that customers/users probably don't want anyway.

How do we begin to enumerate what those core features are, then? To get things started, it can be useful to create user stories.

User Stories

Our product's user stories is a written document that, in a regimented way, describes the core features in a user-centric style. How does a user of your system deal with their problem, perform their task, or otherwise receive value by interacting with it?

Let's begin by looking at the typical template for writing a user story:

As a [some user role], I want to [some goal], so I can [some value]

Pretty vague, right? Don't worry, it will make more sense once we look at a practical example.

Let's continue with the idea of a "school" system that we started building the database for in the last lesson. We'll begin with a top-level, overall vision for the system. Perhaps this is something that's written down, articulated in some way, or just something we're thinking about:

This will be a system used to manage our school. A student in our school can use the system to browse the available courses and enroll in them, based on the dates/times that best fit their schedule. Students are added to the system by teachers. So are the courses they teach.

From there, we can begin the process of transforming that general vision into the user story format. Let's look at a single story, one that we could argue is truly the core purpose of the system:

As a student, I want to browse available courses, seeing the course description and bio of the instructor, so I can decide if I want to enroll in it.

By going through this process, we've immediately transformed the somewhat loose, general vision into a concise story that clearly articulates what that part of the system is supposed to do from a user perspective, and the value it creates for the user. We can also see that we're using the language that's specific to this domain, that is, we're describing things in the vocabulary that would be typically used by the people using this system.

If you like, you can stop reading here, and see if you can come up with the rest of the stories.

Here's the list that we came up with:

As a student, I want to browse available courses, seeing the course description and bio of the instructor, so I can decide if I want to enroll in it.
As a student, I want to see the available sections (dates and times offered) and enroll in the section of my choosing, so that I can enroll in the course I want at the time that best works for my schedule.
As a teacher, I want to be able to maintain (add, edit, and remove) a list of students at our school, so that we know who our students are, their contact information (email and phone number), and provide access to enroll in classes.
As a teacher, I want to be able to maintain (add, edit, and remove) a list of teachers (including myself) at our school, so that we can manage teachers' bios and a list of classes they teach.
As a teacher, I want to be able to maintain (add, edit, and remove) a list of courses and the sections of those courses, so that we can publish it for students to browse and enroll.

In the end, there may be more stories, or a need to make certain stories more specific, but this is certainly a good list to start. Again, all we've done is taken a rather general vision for our overall product and its purpose, and used the user story template to help us think about what the pieces are, and how to articulate them in a way that illuminates the value of each part of the system.

After getting these stories down on paper, it's time to evaluate. If we have too many stories, this process may help us decide how to shave the list down to only the parts of the system needed to create value and receive feedback. If we have too few stories, this helps us learn what we'd need to add to make this a valuable system for our potential users.

It's important to note that user stories are not meant to be some sort of formal document that gets etched in stone; rather, it's more of an iterative process that help us to continually think about what the system is really supposed to do and the value it creates.

Wireframing

Wireframes are another great tool for iterating on a system's design. Once we have our user stories on paper, we can then create a visual representation of the user's experience, i.e. what will the system look like to the user?

We can use practically any tools we're comfortable with to draw our wireframes. These tools might include:

Pencil and paper
A presentation tool like Powerpoint or Keynote
Graphics/design apps like Figma, Sketch, Adobe XD or Photoshop
Wireframing tools and services like Balsamiq Mockups, Justinmind, or wireframe.cc
If you already know it, HTML and CSS

The tool doesn't matter. The goal is to transform our user stories into a visual of what the finished MVP might look like, so we can discover more detail than we knew before, or to iterate on what the needed functionality might be. It's very common to return to the user stories and make changes after wireframing.

Here's an example of a wireframe for our school enrollment app, created using Balsamiq. For the purposes of domain modeling, there are no specifications or formal process for wireframing; it's really all about helping to visualize what we've already articulated using our user stories.

Models

User stories and wireframes is a way to think about and discover all the entities in your world, as well as the relationships between them. We'll now shift that thinking to building our database. From now on, we'll call each real-world entity a model. And all of our models, together with all of their relationships, is called our domain model.

What are the real, tangible models we're dealing with in our school system? That is, things that really exist in the real world, outside the realm of software? From our user stories and wireframes, a couple of obvious ones might be:

Student
Teacher

Students and teachers do not exist in theory. They exist in the real, physical world. Students are the people who attend our school. Teachers are people who teach the courses.

But you may be wondering, why we don't we have a single model for People? After all, a student and a teacher are both people. This is a valid question and one that is quite nuanced in nature. It's because models are distinguished by their unique attributes. According to our user stories/wireframes, teachers have bios. Students do not. Students have contact information (email/phone number). Teachers do not. We might decide later that students need a bio, too – at which point, we'd go back and iterate on our user stories. But not right now.

Let's try to identify the attributes – the data needed for – these two models.

Student – name (first/last), email, and phone number
Teacher – name (first/last), bio

Database-Backed Models

If you've put 2+2 together, based on our last lesson on SQL, a model and its attributes can be neatly represented in a database table.

Each column represents a particular model attribute
Each row represents a unique model instance

Each model will need a different table, because the column definitions will be different for every model. The start of our domain model, along with some sample data, might look like this:

students

id	first_name	last_name	email	phone_number
1	Jane	Doe	jane@example.com	555-1212
2	Jenny	Smith	jenny@gmail.com	867-5309
3	John	Johnson	john@acme.com	456-7890

teachers

id	first_name	last_name	bio
1	Ben	Block	Often talks to a rubber ducky.
2	Brian	Eng	Loves tacos.

A decent start. Let's add the model for a Course. A course, while not necessarily a real, physical thing like a student or a teacher, is an entity that really exists that we need to retain data for – that is, it has attributes. We can also create instances of it, i.e. each row in the database table represents the data for each available course. Let's try and determine the attributes of a course. What data would we need to keep track of for every course, according to our user stories and wireframes?

A course name
A course description

And that's all for now:

courses

id	name	description
1	Introduction to Software Development	The course is focused on software development...
2	Taco-Making 101	In this course, you’ll learn how to build a proper taco...

Relationships

At this point, you might be asking yourself... wait, is that all the data needed for a course? What about the sections – the dates and times the course is available? Or the teachers who teach the course?

Knowing the difference between what belongs as an attribute on a model and what should be separated into its own model... is one of the keys to being good at domain modeling.

Is it possible to add the dates/times and teacher of each course to the Course model? Sure, let's see what that would look like:

courses

id	name	description	time_1	teacher_id_1	time_2	teacher_id_2
1	Introduction to Software Development	The course is focused on software development...	Tuesday 8:30-11:30am	2	Wednesday 6-9pm	1
2	Taco-Making 101	In this course, you’ll learn how to build a proper taco...	Wednesday 6-9pm	2	Thursday 6-9pm	1

Seems perfectly reasonable. But what happens when the demand for Taco-Making becomes so large that we add a third time? Or a fourth? Or a 10th? While we could, of course, continue to grow the courses table horizontally, it turns out that it's generally a much better idea to grow a database vertically. Exponential horizontal growth of a database table is an indicator that we should be doing something different.

What's we seeing here is a one-to-many relationship – one course can have many times it is available, each with a different possible teacher – i.e. Sections.

Remember a few paragraphs ago, when we defined what a domain model is? To refresh your memory:

... we'll call each real-world entity a model. And all of our models, together with all of their relationships, is called our domain model.

The Section model, in this particular case, is not so much a tangible, real-world thing. Rather, it's a model that's made necessary due to the relationships between our real-world entities.

It makes sense for Section to be its own model because:

We don't know how many sections of a course there could be. It might be none, only one, or a million.
A section has its own attributes, like the time it's offered, and the teacher who teaches it. This list of attributes could potentially change in the future, making it very difficult to manage if this data lived within the Courses model.

courses

id	name	description
1	Introduction to Software Development	The course is focused on software development...
2	Taco-Making 101	In this course, you’ll learn how to build a proper taco...

sections

id	time	course_id	teacher_id
1	Tuesday 8:30-11:30am	1	2
2	Wednesday 6-9pm	1	1
3	Wednesday 6-9pm	2	2
4	Thursday 6-9pm	2	1

Aaaahhh! So much better. Now the Course model only holds information about the course itself, delegating the data about each section to the Section model. We can now support as many sections as needed without resorting to changes in our database design.

We can also see that, in order to create a one-to-many relationship in a domain model, we simply need to add a foreign key column on the "many" side of the equation. In this example, because one course has many sections, we add a course_id to sections and we're there.

And yes, there are two foreign keys on sections, because there are actually two one-to-many relationships where Section is involved.

A course can have one to many sections. A teacher can have one to many sections.

Here we see the last piece of the puzzle:

Two one-to-many relationships = One many-to-many relationship

A course can have many teachers. A teacher can teach many courses. The relationship between our Course and Teacher models is a many-to-many relationship, and the Section model is known as their join model.

Naming Conventions

We've seen enough of domain modeling that it now makes sense to talk about the conventions we've been using. Although there is not a technical reason to adhere to these conventions, it is certainly a set of best practices that we will continue to follow for the remainder of this course.

When talking about models like Course and Section, we write them in the singular form, and beginning with a capital letter – it will make sense why we do this later in the course.
Database table names like courses and sections are always plural, and in all lower-case.
Column names are always in lower-case, e.g. name or description.
If a table or column name is made up of more than one word, we separate the words by underscores, e.g. first_name or teacher_id.

Lab

Begin with the template project on GitHub, create a new repo from the template, and open this repo in Gitpod.

Modify the school.sql script to make the courses, teachers, and sections tables real (the statement that creates the students table is already there). Execute the script by running .read school.sql at the SQLite prompt, and execute the .schema command to ensure that everything went as expected.
Design the model that allows students to sign up for classes – let's call it Enrollment – do it on paper, in Excel, or any other tool of your choice.
Add the enrollments table you designed above by writing a CREATE TABLE statement in the school.sql script, and execute the script. (For reference, you can find the SQL for adding tables here.)