Database Basics or The School of Hard Knocks

This blog post is directed to students who are in college taking a database class for computer science or those who are just starting out in database design.  I’m going to give a little background story of my own early experience in database systems and I’m going to talk about how you can get beyond the eye-glazing boredom of learning databases and get into the hands-on world of actually learning about databases.

Some History

I took a database class at the University of Michigan.  I needed about 33 credits of 300 and above classes to complete my BS in Computer Science, so I filled those credits primarily with computer science classes.  As many as I could take.  Most of the specialized classes I took, like modeling and simulation, I blew through without studying.  Technically, I was hungry for information on simulation techniques and it was a fun class for me to take.  When I arrived at database class, our instructor was working from class notes that he developed from the main-frame era (which really wasn’t that long ago from when I took that class, but it was still a bit outdated).  He spent a lot of time on the theory of database design, foreign keys, candidate keys, First Normal Form, etc.  I understood what he was talking about as he was talking in class, but I really had to make up some flash cards for each concept and practice memorizing them to pass the test.  I assumed that databases were just not my thing.  Boy was I wrong!

So MS Access, version 1.0, arrived right around the time I graduated from college and I had a friend that wanted a database built for his business.  It was relatively easy to build, until I discovered that some things didn’t quite work right.  What compounded my problems was the bugginess of version 1.  Still, I assumed that databases were just not my thing.  When I arrived at Pirelli, I was tasked with building a real-time data collection system.  My database was again MS Access.  This was due to restrictions put on me by Pirelli’s IT department, not my choice.  By then, MS Access was up to version 3 and it was a more mature product.  Still, I made the decision up front to avoid using most of Access and just use the tables for storage and use VB version 5 to read and write to the tables.  This database was rather simple since it only collected basic information from the tire building machines in the factory.  The difficult part of that project was keeping up with real-time data.  If I had my choice today, with my current level of experience, I would insist on Oracle or MS SQL server and I might have even suggested experimenting with MapReduce (though, I don’t think they had that much data to store).

So I was recruited into Building Technology Associates by a former professor that just happened to be working as the IT manager at that company.  BTA had a problem.  OK, they had a few problems.  First, they consisted of 3 companies in the same building and two of the three companies had their own database development team that operating independent of each other.  Their second and most critical problem was that nobody in either team had any formal database knowledge when they built either database and both database were built on the premise that a database is just a collection of spreadsheets in one location.  Arrrgghh!! 

This is where I really learned how to properly design a database.  This is where reality met up with database theory.  Let me describe what I had to start with:

Database 1:
– Built with MS SQL version 6.5 on a PC running Windows NT 3, unpatched.
– Many foreign key fields were nullable and contained text.
– One foreign key field contained letters, each letter represented a different record from a lookup table (called subsystems) and the programmers used substring in their program to figure out which subsystem applied to each record.
– No relational integrity constraints.

Database 2:
– Built with MS FoxPro 5.0 hosted on the file server.
– Some tables contained old foreign keys that consisted of a unique number, but nullable fields.
– Some tables contained a new foreign key that was based on the millisecond clock and contained a string of characters that could contain an ASCII value between 0 to 255.  Also, nullable.
– No relational integrity constraints.


What I described above was just the beginning of my problems, but sometimes when things are really bad you have to Triage.  So the most important problem to fix first was the foreign keys and the relational integrity.  At first, I wanted to refactor everything (this was before the word “refactor” became as common as it is today).  Unfortunately, things were so bad that we had people dedicated to fixing data issues that was caused by the above database integrity problems coupled with bugs in the software. 

So I set into place a plan to build a new database in Oracle.  The first task I needed to do was determine how the primary keys would be produced.  I needed to make sure that all tables had a primary key and they had to be unique.  The problem was that we had a lot of external software that generated records that would be imported and renumbering keys was a nightmare.  So I invented a scheme involving a 12 character key that consisted of a 4 character unique prefix, issued by the master system and an 8 character locally generated key.  Each piece of software, when first installed would be missing the registry entry that contained the prefix.  So the software would ask for a prefix disk (floppies back then), and we distributed prefix disks with 10 prefixes on each.  We used 0-9, A-Z and the underscore as legal characters in this key scheme (uppercase only).  I had to analyze if we would run out of keys and when would we run out if data was generated continuously 24/7/365.  Since a digit consists of 37 possible characters and there are 8 digits, that gave us 37^8 number of combinations for each user (that’s 3.5 x 10^12 keys total).  If you generate 100 keys every second of every minute of every hour of every day each year, you’ll use up 3.7×10^10 keys.  Which comes out to about 92 years before burning up all the keys and requiring to get a new prefix.  So it’s safe to say that we’re not going to run out of keys soon.  Anyway, I check every few years to see who’s account has burned up the most keys (all the prefixes and keys are stored in the database for web users) and keys are not being used up anywhere near that speed.  My fallback plan is to expand the key size to 32 digits, but that’s only if data increases in speed.

Database 2 had a clever scheme by generating keys using the millisecond timer and just converting to an 8 character key.  Unfortunately, by the time I came on the scene, computers were becoming fast enough to generate keys faster than 1 millisecond.  So importing data that required new keys was causing duplicate keys to be generated.  Also, it was impossible to type a particular key into a query because many of the characters were unprintable. 

My second strict rule was NO NULLABLE FOREIGN KEYS!  Wow.  That caused a couple of ugly data problems to occur.  The person who designed database 2 was still working for the company when I was investigating these databases and I asked him why he didn’t use the integrity constraint capabilities available to FoxPro.  He claimed (I’m not kidding) that he tried it once but it just broke his software, so he deleted all the constraints that he set up.  AHHHHH!!!!!  So my third strict rule was to put the constraints in place and fix the software.  Much of the software had to be re-written to make it work with the new database, but we managed to make some of it work with the new database.  Then we created a converter to convert the old databases into the new Oracle database.  Oh, that was fun.  My determination paid dividends, because after we rolled out the new database, over half of our software problems vanished and our database integrity problems went to near zero.

So How Do you Learn the Lessons Before Meeting a Disaster?

If you are in database class, download MS SQL server express (it’s free) from here.  Install it, and play around with it.  Create some tables by hand.  Setup foreign keys.  Follow your book.  If you’re a professor teaching database class, create some sample problems containing violations of various normalization rules.

When you’re satisfied with your ability to perform basic database tasks with MS SQL, I would recommend putting in the extra effort to learn Oracle.  Once you learn one database, any other database is just a little different from the one you learned.  It’s best to see it up close a personal.  Click here to download the Oracle express version (free for developers).  There is a windows version that you can install that gets you up and running without installing a server.  If you’re familiar with Linux and you happen to have a Linux machine setup already, you can experiment with the Linux version of Oracle and learn all the nuances of connecting to an Oracle server.

If you install these databases on your machine and practice with them as your professor teaches the class, you’ll learn database theory much easier than just memorizing terms out of the book.  Trust me, I know all about foreign keys, Functional Dependencies, Candidate keys and various other terms, now that I know what they really are.

You can also practice with MS Access if you bought the professional version of Office that comes with MS Access.  Or you can purchase MS Access, but I would recommend sticking with one of the most commonly used database servers that I mentioned above.  There are also other servers like NoSQL and MySQL, but they are a bit looser with their rules.  Keep to a database system that has strict rules and learn it.  Then it’s a piece of cake to use one that doesn’t have strict rules.

That’s all for now, I’ll make a note to do a follow-up post with some example MS SQL and Oracle practice databases to help you learn more about database design.

 

Leave a Reply