SQL Design Disorders – Null Fields

I’m going to continue my series about the use of SQL Server databases in software projects. In this blog post I’m going to discuss null fields and default fields.

Null Fields

Anyone who has worked with a database for more than five minutes knows what a null field is. If you’re just learning database design in school (or on your own), these fields can contain a null value as well as data in the type that field was set to. Null fields are very convenient when building a new database. You can just leave all the fields null (except for primary key fields) and that makes it easy to insert records. The “Allow Nulls” check box in SQL Server is defaulted to the checked state when using the table designer. If you’re not paying attention, then your field will be nullable by default.

Null fields are perfectly OK, if you intend them to be null and you properly handle them. If you’re lazy about them, you’ll get burned. The lazy method of handling null fields seems to be the default in the software development world. I know, because I have fixed so many poorly designed databases, that it is refreshing to run into a database that is properly designed. Unintended null fields are something that gets under my skin and the reason why is that it is so easy to fix the problem when the database is first designed. Once the database and software is built, null fields become a huge problem and are difficult to fix.

Testing Software to Respect Null Fields

So you threw together your database design in an afternoon, then you spent six months building the software and now it’s crashing regularly. You keep running into problems where a field contains null data and the software was never written to handle it. You throw in some code to detect a null field and you pass on default data. SQL Server will do this for you automatically! Let’s say that you have a table that has rows that contain records for a checking account. There is a withdrawal and a deposit field which is set as a money type. You left them null. Each time you try to total the fields up, you have to handle null and convert it into a zero. This is where you would design the database so that the field is set to zero as a default value and make the fields not nullable.

Here’s an example table you can add to SQL Server for testing purposes:

CREATE TABLE [dbo].[checking](
	[Id] [int] IDENTITY(1,1) NOT NULL,
	[TransactionDate] [datetime] NOT NULL,
	[Withdrawal] [money] NOT NULL,
	[Deposit] [money] NOT NULL,
 CONSTRAINT [PK_checking] PRIMARY KEY CLUSTERED 
(
	[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO

In the instance above, I don’t use any default fields, but I set all fields to not null. To insert data, you would have to provide all fields in your insert query, like this:

INSERT INTO checking (TransactionDate,Withdrawal,Deposit) 
VALUES
(GETDATE(), 0, 500)

Now, let’s add some defaults for the date and two money fields:

ALTER TABLE dbo.checking
  ADD CONSTRAINT checking_transactionDate
  DEFAULT GETDATE() FOR TransactionDate;
GO

ALTER TABLE dbo.checking
  ADD CONSTRAINT checking_withdrawal
  DEFAULT 0 FOR Withdrawal;
GO

ALTER TABLE dbo.checking
  ADD CONSTRAINT checking_deposit
  DEFAULT 0 FOR Deposit;
GO

Now you can create a short-hand insert query like this:

INSERT INTO checking (Withdrawal) VALUES (50)

That will cause a withdrawal transaction for today’s date to be inserted into the checking table. You should end up with records like this:

Let’s pretend that you don’t know if your software is going to blow up when a field is null. How do you test it? Look at the definition of your table that you want to test. Create an insert query that contains only the fields that are not nullable. Then insert a record with all nullable fields set to null. Then run your software and see what happens. You can search through the code and find every instance where that table is referenced, then test the web pages or screens that use that code. If the page blows up, then fix it. Keep fixing your problems until you account for all null fields.

Now you might be thinking that you have handled it in code. That your inserts account for all fields so there is no way they will get null values in them. Then one day, someone manually manipulates the database, or someone writes a new query and leaves fields null because they have no purpose for the field in their code. Sure, it works in the new code or it worked when the person inserted the record in the SSMS console. Then a dozen bugs show up and your customers are complaining because reports are crashing. Why? Because they were never designed to work with null fields. If a field is nullable, you can bet that some day, it will contain null data. Make it not nullable or make sure your software can account for the null data. In the case of my contrived example that you always insert all fields, then it doesn’t cost you anything to make all of those fields not null to protect yourself. You can make the fields not null without changing any software.

Bit fields, that are used for true/false data, are a great candidate for default data. Those fields should only be nullable when you are first adding them to an existing table. Then you should set the default constraint, pre-populate the data and set the bit field to not null. Normally, I set the default to zero, which is false, but it depends on your use of the field. With a default constraint on your field, you can modify the database first, your existing software will not crash and your new version of software will see the field right away. This is a safe way to deploy your software.

The Database is Your Foundation

If you read my last SQL Design Disorders post, I’m going to sound like a broken record, but I can’t say this enough. Plan your database design first.

I like to think of the database as the foundation of my software. If I spend a little more time on the design of the database and get really detailed about how it’s put together, I have found that my job as a software engineer is much simpler. Let’s face it, if I set all fields to not null that should be not null, then my code will blow up when I mess something up while I build it (i.e. I forgot to add that needed field to my insert query). The sooner it crashes, the easier it is to fix. If I use default fields where they make sense, then it makes my job easy when I create test data by hand (because I don’t have to include all fields in my insert query). By short cutting the database design, you’ll spend a lot more than that amount of time chasing bugs and structural problems.

This is where I’m going to mention another developer short cut that gets under my skin. The “fix it with software” fix. I have seen some really dumb things done in software. Somebody leaves a field null, or they don’t apply a primary key, or some other crazy thing (like nullable foreign key fields. Oh yeah, I’ve seen it). Then, the software crashes. Instead of carefully analyzing the situation, someone decides to perform the fix in code. Probably because they are a junior level developer with little to no database knowledge. Before long, I’ll stumble across code that renumbers the primary keys of a table because duplicate numbers show up. It might be code that inserts zeros in place of null fields that should have been zero. This code might be a process that runs when an error occurs or it might be a utility that someone runs once a week or something. This is not the right way to fix that problem. Fix the database first. Get the primary field deduplicated, then set it to a primary key field. Then run through your software and find out which code blows up. Then fix it. If some unknown code is causing fields to be null and you can’t find the code, don’t fix the end result code with a hack. Make the fields not nullable first. Then try and find the offending code or if that is difficult, wait for the next but report that identifies where the null is being inserted. Then fix the code that is causing the bug in the first place.

Real World Example

I worked for a company that had a database built in FoxPro. This was decades ago. I looked at the screen where you can graphically show all the foreign keys in the database. There were no foreign keys. I confronted the developer that had put the database together. He was the only developer when the software was first built and had a very limited knowledge of database design. He told me that he tried to add foreign key constraints but then the software blew up and he didn’t now how to fix it. In his mind, a database was a collection of spreadsheets that were related to each other. He had foreign key fields, but they were nullable. There was a lot of code that was written to get around problems that occurred. In the end the software itself was improperly designed and that was partially due to the poor design of the database. The import screens all were written around the idea that records can be marked as deleted and then they would be undeleted as they were detected in the import data (and modified if necessary). This caused the entire system to blow up if someone else was running a cleanup and compressed the records (which deletes the records marked as deleted). Another issue was that the programmer wrote all import processes to import the child records first, then create the parent record(s). If an import crashed in the middle, there would be a lot of orphaned child records. Sometimes these records did not get properly deleted or cleaned up before the next import and data did not total correctly. The problems were so numerous that this company had a staff of database scrubbers. The company owners thought that database scrubbing was just a normal part of owning a database. After I redesigned and normalized a new database and we further rewrote the software to match the new database design, it was discovered that the job “database scrubber” had no useful purpose.

These problem systems always start out small. Developers are in a hurry, they take short-cuts, then the monster grows. There are a few bugs at first and everything seems OK. Eventually, the product becomes unstable and bug reports are just pouring in. If you’re new to designing database driven software, save yourself the misery of having to fix these types of problems later on. They’re much easier to fix when you first design the database. Scrutinize every nullable field and ask yourself if you really want to write code to handle null data in that field. If you don’t, then make it not nullable up front. Then ask yourself what the default would be. If there is an answer to that, then set a default constraint for each of these fields. This will all take a little extra time. You’ll make up for that time when you start building your software and you don’t have to write a ton of code to get around “bad data.”

Leave a Reply