Stored Procedures Vs. No Stored Procedures

There is this debate raging among developers: “Is it better to use stored procedures or not use stored procedures”.  From first glance, this seems like a simple question, but there are some complicated implications around this question.  Here’s the basic pros and cons of using stored procedures in your system:

Pros

  • You can isolate table changes from your front-end.
  • Updating business code is easy, just update the stored procedure while your system is running (assuming you already tested the changes off-line and everything is good).
  • Many queries can be reduced to one call to the database making your system faster.

Cons

  • Your business processing is performed by your database which is expensive (license fees compared to web servers).
  • Unit testing is all but impossible.
  • Version control is difficult.
  • You are married to your database server technology (MS SQL cannot be easily ported to Oracle).
  • Stored Procedure languages are not well structured.  This leads to spaghetti code.

Now, let’s expand a bit and discuss each of these issues:

Isolate Table Changes

It may seem like overkill to setup all of your select, insert, update and delete queries to call a stored procedure, then have the stored procedure call the tables.  This can be a boon for situations where you might change your table structure in the future.  An example would be a situation where a table becomes too wide and you want to break the table into two tables.  Your stored procedures can have the same interface to your program and the code inside could handle all the details of where the data is stored and read from.  Removing a field from a table can also be safely done, since you can just remove it inside the stored procedure but leave the parameter called from your software until you can root out all calls to the database and change those (or leave the dead parameter).

Updating Business Code

Business code is the logic that computes and performs work on your database.  Your stored procedure might update one table, add an entry to a log table and remove corresponding records from another table.  Another use for a stored procedure is to pass filter and ordering information into a query for a list of data.  A dynamic query can be formed in a stored procedure and executed with all the parameters entered.  This relieves your front-end from the task of sorting and filtering data.  It can also reduce the amount of raw data returned from the database.

The point here is that your business code might need a change or an enhancement.  If there is a bug, then it can be fixed and deployed on the fly.  A .Net application must be compiled and carefully deployed on servers that are not accessed during the time of deployment.  A stored procedure can be changed on the fly.  If your website is a large monolithic application, this capability becomes a larger “pro”.

One Call to the Database

If a record edit requires a log entry, then you’ll be forced to call your database twice.  If you use a stored procedure, the stored procedure code can make those calls, which would be performed from the database itself and not over the wire.  This can reduce the latency time to perform such an operation.

Now for the list of cons…

Business Processing is Being Performed by the Database

MS SQL server and Oracle licensing is expensive.  Both licenses are based on number of CPUs and the price can get steep.  If you have a choice between performing your processing on a web server or a database server, it’s a no-brainer.  Do it on the web server.  Web servers, even an MS Server license is cheaper than a SQL Server license.  Initially your business will not feel the cost difference because your system is small.  However, once your system grows you’ll see your costs increase on an exponential scale.

Unit Testing

Unit testing is not available for stored procedures.  You could probably get creative and setup a testing environment in Visual Studio, but it would be custom.  Without unit tests, you are not able to regression test.  If your stored procedures contain logic that allows different modes to occur, it can be difficult to properly test each mode.  An example is a stored procedure that performs filter and sorting operations.  If you are building a dynamic query using “if” statements, then you’ll have a combination of different possible inputs.  How do you ensure that your dynamic query doesn’t have a missing “and” or a missing comma, or a union query with non-matching fields?  It’s difficult.  If this logic is in your website code, you can wrap each combination of inputs in unit tests to provide regression testing when you add a filter or sorting option or change an option.

Version Control

No matter how you divide your code, you’ll need to make sure you version control your database so you can keep a history of your changes and match those changes with your code.  Visual Studio allows you to define all your database objects in a project (see SQL Server Database Project type).  There are tools available to allow you to create change scripts from two Team Foundation Server versions.  This can be used to update multiple databases.  Versioning a database is not a common practice and that is why I’ve put this under the “con” list instead of the “pro”.  For companies that keep their database definitions in a version control system, they can take this off the “con” list.

Married to Your Database Server

Till death do you part!  If you ever decide to switch from one database server technology to another, you’ll discover how steep the hill is that you’ll need to climb.  Each stored procedure will need to be converted by hand one-by-one.  If your system doesn’t have stored procedures, then you’ll have an easier time converting.  Minor differences between triggers and indexes might be an issue between Oracle and SQL, and there’s the recursive query in Oracle that is different.  You might even have issues with left and right outer joins if you used the “(+)” symbol in Oracle.  Stored procedures will be your Achilles heel.

Spaghetti Code

Writing stored procedures is a lot like writing in Classic ASP.  It’s messy.  I see a lot of sloppy coding practices.  There is no standard for formatting queries or TSQL code.  Everybody has their own short-cuts.  Once a system grows to a point where it contains tens of thousands of stored procedures, you’re faced with a mess that has no hope.  C# code has the luxury of being able to be refactored.  This is a powerful capability that can be used to reduce entangled code.  Being able to break code into small and management chunks is also helpful.  If your database code is contained in a Visual Studio project, you can perform some of the same refactoring, but you can’t test on the fly.  So programmers prefer to change stored procedures on their test database where refactoring is not available.

Conclusion

Are there more pros and cons?  Sure.  Every company has special needs for their database.  Some companies have a lot of bulk table to table processing that must be performed.  That screams stored procedures, and I would recommend sticking with that technique.  Other companies have a website front-end with large quantities of small transactions.  I would recommend those companies keep their business logic in their website code where they can unit test their logic.  In the end, you’ll need to take this list of pros and cons and decide which item to give more weight.  Your scale may tip one way or the other depending on which compromise you want to make.

 

LINQ and Entity Framework … sigh

Our company has a ton of legacy code using VB and simple text based queries that are submitted to the database.  There are several disadvantages to this method versus new techniques like LINQ and EF.  First, syntax errors in the query itself can only be tested in run-time.  Second, syntax highlighting, automatic formatting and intellisense are not supported.

One of the improvements started earlier this year was the use of LINQ in place of string queries.  Immediately a flaw was discovered in the interface for setting up tables.  That flaw is the missing capability to synchronize changes made in tables in the corresponding database.  The developer is forced to delete the table and re-add the table in order to reflect any field changes.  This is ok, if there are no stored procedures that return a result set for that table.  Then you have to go to each stored procedure and switch to something else, replace the table, then switch the stored procedures back.  There are third-party add-ons that can synch the tables, but they seem to of the bloat-ware variety and are not very effective (some of them don’t work right).  A second “feature” of LINQ is that it creates a settings.settings file in the project to add the ability to change database settings from an interface.  This seems like a great feature until one realizes that it over writes the apps.config file when changes are saved.  The side effect is that comments in apps.config are removed.  Our company uses comments to switch between databases located on our local PC, staging server and the production server.  Powershell scripts comment and uncomment the sections for each of the connector strings to enable the correct database connector.  This goes out the window when the settings.settings control panel wipes out any commented connection strings (usually the staging server and production server settings). 

So we began looking at Entity Framework.  This was a dream.  It has a synchronize capability making it much easier to use.  I jumped in with both feet and started replacing LINQ setups with EF (they are compatible with each other, just keep the table names the same).  My plan was to create a device context of table definitions in each object directory.  This would create an organization where the tables that were used by the current object were in the same directory and any abandoned tables would not be used by some far off code not related to the object that might be refactored in the future.  Then I ran into a serious problem.  EF doesn’t recognize name spaces.  Yikes!  So I changed plans quick.  I placed the device context in a central directory under the project and decided we would just use one device context for all objects.  I further discovered that only one database can be used per device context.  While that hasn’t been an issue yet, it will be if we decide to use this for more of our larger objects (that query spanning two or more databases).  I theorized that we might be able to use nested “using” statements, but I haven’t tried it yet.  This issue, coupled with the namespace issue has made me put on the breaks until I can do further analysis.  I have also heard rumors that EF is not as efficient at queries as LINQ. 

Other issues that discovered is that the objects created for the tables that are dragged into the context may have an “s” appended to them.  This causes all kinds of confusion about the table names.  It’s also a headache to have to trace back to the device context to get the real table name to find it in the database (if you are troubleshooting code you are unfamiliar with).  I would prefer an ORM* that would create object names that matched the tables exactly, followed the standard namespace convention and had a visual interface that didn’t stomp on my app settings and was flexible enough to synchronize changes with the database (because we make database changes).  Am I asking for too much?

So now what’s my plan?  At this point, I’m going to put some time and effort into NHibernate (I have posted on that subject before).  Stay tuned for my results on this ORM.



*LINQ and EF are also known as ORMs (Object Relational Mappers).

 

More Database Basics

So I just completed a post about database basics and at the end I showed how students (and faculty members) can download Oracle or MS SQL Server, install it on their PC and create a database to match the book they are using in database class.  Then I thought about the fact that I haven’t really used Oracle since 2008 and maybe, just maybe I’m out of practice.  There’s nothing worse than creating a post containing untested code or information that might not be valid, so I went out and downloaded Oracle 11g XE (express).  First, I noticed that there was only a windows x32 version and a warning that it doesn’t work on x64.  Being a skillful developer, I ignored that warning and tried it for myself.  There was some sort of error during the installation but it was more of a warning that something was missing.  This did not affect the capabilities of the installed program from what I’ve used already.  I’m sure it involves stored procedures or maybe some capability that I’m not going to use for an entry level class.  So I dragged out my “Database Systems: principles, design, & implementation” book from the early 1800’s (OK, it’s copyrighted in 1990).

Here’s the starting sample database in the book:

and here’s the matching tables with data:


So, I installed Oracle (which took a very long time), then I jumped right into the control panel, which is all new.  In fact, it’s a web interface now instead of a java application.  No problem, I clicked around the menu options until I saw a table designer interface (I’ll leave it up to the student to read up and figure out how to create a table, it’s easy).  So I created all three tables, two foreign key constraints, some not-null constraints and primary keys for the student and class tables.  Yes, I was cringing as I typed the data in and my mind was just screaming (that should be broken out in a separate table!).  But that’s the purpose of this exercise in the book is to start with a somewhat functional but incorrectly designed database and work through the normalization rules to make this into a properly designed database.

So here are the three final tables and their data in Oracle:

Student Table
 
Class Table
Enrollment Table

As you can see, learning Oracle is like riding a bicycle.  Technically, all databases function similarly.  They all have tables, fields, foreign key constraints, etc.  There are some nuances in the way different databases handle cascade deletes and triggers and other capabilities, but the basic functions are nearly identical.

Any student should be able to download the express edition of Oracle and create the three tables above in an afternoon.  It only took me two hours (one and a half of that was install time).  Warning: If you’re an MS SQL Server user like me, try not to hit the F5 key to execute your query in Oracle.  It just refreshes the browser and erases the query you typed in.

Now you should be able to join the student table to the enrollment table to get a list of students and their grades:

Student Grades

 
Now this is learning!

 

Database Basics or The School of Hard Knocks

This blog post is directed to students who are in college taking a database class for computer science or those who are just starting out in database design.  I’m going to give a little background story of my own early experience in database systems and I’m going to talk about how you can get beyond the eye-glazing boredom of learning databases and get into the hands-on world of actually learning about databases.

Some History

I took a database class at the University of Michigan.  I needed about 33 credits of 300 and above classes to complete my BS in Computer Science, so I filled those credits primarily with computer science classes.  As many as I could take.  Most of the specialized classes I took, like modeling and simulation, I blew through without studying.  Technically, I was hungry for information on simulation techniques and it was a fun class for me to take.  When I arrived at database class, our instructor was working from class notes that he developed from the main-frame era (which really wasn’t that long ago from when I took that class, but it was still a bit outdated).  He spent a lot of time on the theory of database design, foreign keys, candidate keys, First Normal Form, etc.  I understood what he was talking about as he was talking in class, but I really had to make up some flash cards for each concept and practice memorizing them to pass the test.  I assumed that databases were just not my thing.  Boy was I wrong!

So MS Access, version 1.0, arrived right around the time I graduated from college and I had a friend that wanted a database built for his business.  It was relatively easy to build, until I discovered that some things didn’t quite work right.  What compounded my problems was the bugginess of version 1.  Still, I assumed that databases were just not my thing.  When I arrived at Pirelli, I was tasked with building a real-time data collection system.  My database was again MS Access.  This was due to restrictions put on me by Pirelli’s IT department, not my choice.  By then, MS Access was up to version 3 and it was a more mature product.  Still, I made the decision up front to avoid using most of Access and just use the tables for storage and use VB version 5 to read and write to the tables.  This database was rather simple since it only collected basic information from the tire building machines in the factory.  The difficult part of that project was keeping up with real-time data.  If I had my choice today, with my current level of experience, I would insist on Oracle or MS SQL server and I might have even suggested experimenting with MapReduce (though, I don’t think they had that much data to store).

So I was recruited into Building Technology Associates by a former professor that just happened to be working as the IT manager at that company.  BTA had a problem.  OK, they had a few problems.  First, they consisted of 3 companies in the same building and two of the three companies had their own database development team that operating independent of each other.  Their second and most critical problem was that nobody in either team had any formal database knowledge when they built either database and both database were built on the premise that a database is just a collection of spreadsheets in one location.  Arrrgghh!! 

This is where I really learned how to properly design a database.  This is where reality met up with database theory.  Let me describe what I had to start with:

Database 1:
– Built with MS SQL version 6.5 on a PC running Windows NT 3, unpatched.
– Many foreign key fields were nullable and contained text.
– One foreign key field contained letters, each letter represented a different record from a lookup table (called subsystems) and the programmers used substring in their program to figure out which subsystem applied to each record.
– No relational integrity constraints.

Database 2:
– Built with MS FoxPro 5.0 hosted on the file server.
– Some tables contained old foreign keys that consisted of a unique number, but nullable fields.
– Some tables contained a new foreign key that was based on the millisecond clock and contained a string of characters that could contain an ASCII value between 0 to 255.  Also, nullable.
– No relational integrity constraints.


What I described above was just the beginning of my problems, but sometimes when things are really bad you have to Triage.  So the most important problem to fix first was the foreign keys and the relational integrity.  At first, I wanted to refactor everything (this was before the word “refactor” became as common as it is today).  Unfortunately, things were so bad that we had people dedicated to fixing data issues that was caused by the above database integrity problems coupled with bugs in the software. 

So I set into place a plan to build a new database in Oracle.  The first task I needed to do was determine how the primary keys would be produced.  I needed to make sure that all tables had a primary key and they had to be unique.  The problem was that we had a lot of external software that generated records that would be imported and renumbering keys was a nightmare.  So I invented a scheme involving a 12 character key that consisted of a 4 character unique prefix, issued by the master system and an 8 character locally generated key.  Each piece of software, when first installed would be missing the registry entry that contained the prefix.  So the software would ask for a prefix disk (floppies back then), and we distributed prefix disks with 10 prefixes on each.  We used 0-9, A-Z and the underscore as legal characters in this key scheme (uppercase only).  I had to analyze if we would run out of keys and when would we run out if data was generated continuously 24/7/365.  Since a digit consists of 37 possible characters and there are 8 digits, that gave us 37^8 number of combinations for each user (that’s 3.5 x 10^12 keys total).  If you generate 100 keys every second of every minute of every hour of every day each year, you’ll use up 3.7×10^10 keys.  Which comes out to about 92 years before burning up all the keys and requiring to get a new prefix.  So it’s safe to say that we’re not going to run out of keys soon.  Anyway, I check every few years to see who’s account has burned up the most keys (all the prefixes and keys are stored in the database for web users) and keys are not being used up anywhere near that speed.  My fallback plan is to expand the key size to 32 digits, but that’s only if data increases in speed.

Database 2 had a clever scheme by generating keys using the millisecond timer and just converting to an 8 character key.  Unfortunately, by the time I came on the scene, computers were becoming fast enough to generate keys faster than 1 millisecond.  So importing data that required new keys was causing duplicate keys to be generated.  Also, it was impossible to type a particular key into a query because many of the characters were unprintable. 

My second strict rule was NO NULLABLE FOREIGN KEYS!  Wow.  That caused a couple of ugly data problems to occur.  The person who designed database 2 was still working for the company when I was investigating these databases and I asked him why he didn’t use the integrity constraint capabilities available to FoxPro.  He claimed (I’m not kidding) that he tried it once but it just broke his software, so he deleted all the constraints that he set up.  AHHHHH!!!!!  So my third strict rule was to put the constraints in place and fix the software.  Much of the software had to be re-written to make it work with the new database, but we managed to make some of it work with the new database.  Then we created a converter to convert the old databases into the new Oracle database.  Oh, that was fun.  My determination paid dividends, because after we rolled out the new database, over half of our software problems vanished and our database integrity problems went to near zero.

So How Do you Learn the Lessons Before Meeting a Disaster?

If you are in database class, download MS SQL server express (it’s free) from here.  Install it, and play around with it.  Create some tables by hand.  Setup foreign keys.  Follow your book.  If you’re a professor teaching database class, create some sample problems containing violations of various normalization rules.

When you’re satisfied with your ability to perform basic database tasks with MS SQL, I would recommend putting in the extra effort to learn Oracle.  Once you learn one database, any other database is just a little different from the one you learned.  It’s best to see it up close a personal.  Click here to download the Oracle express version (free for developers).  There is a windows version that you can install that gets you up and running without installing a server.  If you’re familiar with Linux and you happen to have a Linux machine setup already, you can experiment with the Linux version of Oracle and learn all the nuances of connecting to an Oracle server.

If you install these databases on your machine and practice with them as your professor teaches the class, you’ll learn database theory much easier than just memorizing terms out of the book.  Trust me, I know all about foreign keys, Functional Dependencies, Candidate keys and various other terms, now that I know what they really are.

You can also practice with MS Access if you bought the professional version of Office that comes with MS Access.  Or you can purchase MS Access, but I would recommend sticking with one of the most commonly used database servers that I mentioned above.  There are also other servers like NoSQL and MySQL, but they are a bit looser with their rules.  Keep to a database system that has strict rules and learn it.  Then it’s a piece of cake to use one that doesn’t have strict rules.

That’s all for now, I’ll make a note to do a follow-up post with some example MS SQL and Oracle practice databases to help you learn more about database design.