Encryption & Data Security

If you haven't already, first read the previous post about Authentication.

At this point in our app, a user can signup with their email and password and then login with the same credentials. Their email may be widely known, but their password should (at least in theory) be a secret. Unfortunately, we're not doing anything to ensure that this secret credential isn't accidentally exposed. There are 2 points of vulnerability:

data in transit
data at rest

Data "in transit" refers to when the data is moving through our app, which usually occurs when it's submitted via a form. Data "at rest" refers to how it's stored in the database. This doesn't just apply to passwords, when dealing with any sensitive data in your application, you want to think about its exposure both in transit and at rest.

Data in transit

Go back to the user signup page (the new form) and submit the form again. If you look at the server log for that request, you'll see the password right there in the params. The server log is just a text file with all the application activity saved (you can find it in your app's files (/log/development.log). This file is easily accessed by anyone with server access, and typically it's also exposed to an external service for debugging purposes. That means a lot of people can see these user passwords being submitted (i.e. in transit). So let's fix it so that the password is hidden (obfuscated) when it hits the server log.

Obfuscating data in the server log is a common requirement, so rails has a configuration we can set for this purpose.

Open up the /config/initializers/filter_parameter_logging.rb file. You'll see this code:

# Configure sensitive parameters which will be filtered from the log file.
Rails.application.config.filter_parameters += [
  # :passw, # TODO: comment back in to hide params beginning with "passw"
  :secret,
  :token,
  :_key,
  :crypt,
  :salt,
  :certificate,
  :otp,
  :ssn
]

The rails framework actually comes preconfigured to obfuscate any params that match those in this array. We commented out :passw for demo purposes so you would see what it looks like in the server log. But now we can uncomment it and restore it.

# Configure sensitive parameters which will be filtered from the log file.
Rails.application.config.filter_parameters += [
  :passw,
  :secret,
  :token,
  :_key,
  :crypt,
  :salt,
  :certificate,
  :otp,
  :ssn
]

If you ever have other data that you want to hide from the server log (e.g. payment info), you can simply add the names of those params to this list.

Configuration files are often loaded once when an application starts, so before this change will take effect, you need to stop and restart the rails server. Once you've done so, go back to the new user form in the browser and resubmit it. Check the server log and you'll see that the password is no longer is visible. Instead, you should see password"=>"[FILTERED]". The password is still being transmitted to our backend, but it's being hidden from view in the logs. If anyone gets their hands on the log file, they won't see any user passwords. Note: this isn't retroactive, so any passwords that were previously shown in the log are still visible.

The "[FILTERED]" value you see in the server log is not what's being stored in the database. If you run a SELECT statement from the sqlite console, you'll see the password's real value - so we'll have to fix that data at rest as well. But at least for now we've secured the server log.

We're choosing not to also obfuscate the email address. Since it's personally identifiable information, it too may need to be protected. When debugging, it can be helpful to search the logs for a user by their email address. So this choice is a tradeoff and you'll have to decide what's appropriate for your application. If you choose not to obfuscate it, just know that you may need to provide your users with the option to request that their email be purged entirely from the logs.

There's one other point in transit when the password data is exposed. When submitting the new user and new session forms, the password is visibile in the form input itself. Anyone glancing over your shoulder as you submit the form will see your password. That's also bad. To fix that, open up each of those forms (/app/views/users/new.html.erb and /app/views/sessions/new.html.erb) and find the "password" input in the form code. Change the type html attribute from "text" to "password":

<input type="password" name="password">

This will produce a different html form input element with type="password" instead of type="text". When you reload the form in the browser and enter in a password, you'll see it's hidden with asterisks. It's perhaps more prone to typos now since the user can't see what they're typing, but it's also more secure. Tradeoffs...always tradeoffs.

With these changes, our data in transit is now reasonably secure, so let's move on to data at rest.

Data at rest

We've hidden password data from view in the server logs and browser, but the passwords are still being stored in our users table in plain text. Let's take a quick look. Stop the rails server and start the sqlite console sqlite3 db/development.sqlite3. We haven't done this in a while, but it's worth querying the users table to see the data:

SELECT * FROM users;

You should see rows for each submission of the new user form. And there, full visible, are the users passwords.

It might go without saying, but to be explicitly clear... Storing passwords as plain text in the database is BAD. It's almost inevitable that, at some point, your application will go through a security breach of some kind. You might get hacked or leave a machine unattended or rely on another library of code that has a security vulnerability. Try as hard as you can to avoid it, but there's no such thing as 100% secure software. So how you prepare for and respond to a security breach is critically important.

In the case of a security breach that exposes your database, you want to be able to state that "yes our database was compromised, but no sensitive information was accessible". The only way to be able to do so is to encrypt sensitive information (like passwords) so that even if it is accessed, the decrypted values can never be read. In fact, the data should be encrypted in such a way that even you can't read it.

Data encryption comes in 2 common forms: 1-way and 2-way encryption. In both 1-way and 2-way encryption, the process begins with using an algorithm to manipulate the data into a new value. For example, a trivial encryption technique would be to replace all letters in the value with their subsequent letter in the alphabet. So "apple" becomes "bqqmf". This would be a terrible encryption technique, but gives you an idea of what encrypting data entails. In practice, we use well known, industry-approved encryption algorithms that are near impossible to crack.

With 2-way encryption, the goal is to secure some data so that only an intended recipient will be able to decrypt it and read the original data. Any other observers in between will only see the encrypted data which is indecipherable.

With 1-way encryption, the goal is to secure some data so that it can never be decrypted and read. Although it can never be read, it can be validated - you can prove you know the original data by encrypting it again and seeing that the encrypted versions match.

Take a moment to think about these 2 approaches. Which do you think makes the most sense for securing passwords in the database?

Our goal is to encrypt the password and only the person who knows the password can validate that they know their password. We don't ever need to read the password from the database - in fact, we should never be able to read it. If the database is ever hacked or exposed, we want to be able to honestly say that "yes our database was compromised, but no sensitive information was accessible". We want 1-way encryption.

In our application, we're going to use a 1-way encryption algorithm called bcrypt. It may seem counterintuitive to use a well-known, open-source tool for security, but that's actually exactly what we want. The algorithm is tried and tested to be mathematically impossible to crack - the fact that it's public doesn't change that fact. There are several alternatives to bcrypt (just google "bcrypt alternatives"), but regardless of the algorithm you choose, unless you're a brilliant mathematician, you don't want to be writing your own.

To get started, open up the Gemfile - we haven't looked at this file yet, but this is the manifest of any libraries that our application is dependent on. These libraries get installed when you open up your repository in Gitpod or when you run the command bundle install. Notice that we've already added the bcrypt library for you. If you're ever starting your own application from scratch, you'll want to add this library here.

We want to encrypt the password when inserting it into the database table, so we need to go to the create action in the users controller (/app/controllers/users_controller.rb).

The line of code assigning the user password is:

@user["password"] = params["password"]

This is taking the user-submitted password from the form params and inserting it directly in its original form into the database. Instead, we want to first encrypt it and then store the encrypted value, so replace that line of code with this:

@user["password"] = BCrypt::Password.create(params["password"])

This is using the BCrypt library to encrypt the value. Submit the new user form again and you'll be redirected to the user show page where the user's password column is being displayed. Notice that this time, you don't see the original password but rather a long unintelligible string of characters. This is the encrypted value of the password that you submitted via the form. If you query the table in SQL, you'll see the same thing. The password is no longer exposed, and as we used 1-way encryption, it can never be decrypted - even developers with access have no way of reading user passwords. With that 1 line of code, the password data at rest is now secure.

There's one additional step - when authenticating, checking the user-submitted password on login with the password in the database will no longer work. Let's look at the code in the create action of the sessions controller (/app/controllers/sessions_controller.rb).

if @user["password"] == params["password"]

Try logging in with the newest user (the one you just created with the encrypted password) and you'll find that your login fails. This is because the password in the user row is no longer the same as the password from the form params. Remember, with 1-way encryption we should be able to prove a user knows their password by encrypting again and comparing. Change that code to the following:

if BCrypt::Password.new(@user["password"]) == params["password"]

It looks like a simple == comparison operation. But it's actually not. The BCrypt library is overwriting that functionality and doing some fancy stuff behind the scenes. It's actually encrypting the params["password"] value on the right and then comparing it with the value on the left @user["password"].

Side note: It's actually even more complicated than that. Take the password "puppies" (a terrible password, but your own personal password behavior is a topic for another day). If you were to encrypt it a hundred times using BCrypt::Password.create("puppies"), you would get a different encrypted value every time. So how can I prove that I know "puppies" is my password if it's always encrypted differently? It's a mind-bender, but the simplified explanation is that when you compare "puppies" with an encrypted version of "puppies" using the code above, there's a key hidden within the encrypted version that will help bcrypt compare the 2 properly. If this interests you, there are many more detailed explanations you can find online. For our purposes, it's enough to know that the password is safely encrypted in the database and, no matter how hard a hacker tries, it cannot be decrypted. It can only be validated when compared to the original password.

At this point, you have a few user rows with unencrypted passwords and trying to login as those users will cause issues because this new code assumes they are encrypted. The easiest solution is to reset the data and create new users with encrypted passwords. You can do this by stopping your server and running rails db:setup. That will clear out this bad user data. Then you'll want to add the starting company and contact data again (which was also cleared) by running rails runner scripts/create_data.rb. You should see the output "There are now 3 companies, 4 contacts, and 2 activities." Lastly, restart your server with rails server.

Our basic authentication scheme is now complete - a user can signup, we're encrypting the user's secret password, we're then checking the user's credentials on login and responding appropriately. But is the user logged-in? Not yet. Remember that HTTP is stateless - each request/response lifecycle has no knowledge of previous requests. So even if the credentials are accurate, the browser forgets about it as soon as the next request begins (i.e. when the user is redirected).

How will we get the browser to know/remember that a user is logged-in from request to request? To solve this dilemma, we need to revisit the notion of browser state and learn about another browser tool: cookies (next post).