Documentation

Why is there No Documentation?

I’m surprised at the number of developers who don’t create any documentation.  There are ways to self-document code and there are packages to add automatic help to an API to document the services.  Unfortunately, that’s not enough.  I’ve heard all the arguments:

  • It chews up programming time.
  • It rapidly becomes obsolete.
  • The code should explain itself.
  • As a programmer, I’m more valuable if I keep the knowledge to myself.

Let me explain what I mean by documentation.  What should you document outside of your code?  Every time you create a program you usually create a design for it.  If it’s just drawings on a napkin, then it needs to go someplace where other developers can access it if they need to work on your software.  Configuration parameters need documentation.  Not a lot, just enough to describe what the purpose is.  Installation?  If there is some trick to making your program work in a production, QA or staging environment, then you should document it.  Where is the source code located?  Is there a deployment package?  Was there a debate on the use of one technology over another?

So what happens when you have no documentation?  First, you need to find the source code.  Hopefully it’s properly named and resides in the current repository.  Otherwise, you may be forced to dig through directories on old servers or in some cases the source might not be available.  If the source is not available your options are limited: De-compile, rewrite or work with what is built.  Looking at source code written by a programmer that no longer exists at your company is a common occurrence (did that programmer think not documenting made him/her valuable?).  Usually such code is tightly coupled, contains poorly named methods and variables with no comments.  So here are the arguments of why you should do documentation:

  • Reverse engineering software chews up programming time.
  • Most undocumented code is not written to be self-explanatory.
  • Attempting to figure out why a programmer wrote a program the way he/she did can be difficult and sometimes impossible.
  • Programmers come and go no matter how little documentation exists.

It’s easy to go overboard with documentation.  This can be another trap.  Try to keep your documentation to just the facts.  Don’t write long-winded literature.  Keep it technical.  Start with lists of notes.  Expand as needed.  Remove any obsolete documentation.

Getting Your Documentation Started

The first step to getting your documentation started is to decide on a place to store it.  The best option is a wiki of some sort.  I prefer Confluence or GitHub.  They both have clean formatting and are easy to edit and drop in pictures/screenshots.

So you have a wiki setup and it’s empty.  Next, create some subjects.  If you have several software projects in progress, start with those.  Create a subject for each project and load up all the design specifications.  If your development team is performing a retrospective, type it directly into the wiki.  If there is a debate or committee meeting to discuss a change or some nuance with the software, type it into the wiki.  They can just be raw historical notes.

Next, add “documentation” as a story point to your project, or add it to each story.  This should be a mandatory process.  Make documentation part of the development process.  Developers can just add a few notes, or they can dig in and do a brain-dump.  Somewhere down the road a developer not involved in the project will need to add an enhancement or fix a bug.  That developer will have a starting point.

Another way to seed the wiki is to create subjects for each section of your existing legacy code and just do a dump of notes in each section.  Simple information off the top of everyone’s head is good enough.  The wiki can be reworked at a later date to make things more organized.  Divide an conquer.  If a developer has fixed a bug in a subsystem that nobody understands, that developer should input their knowledge into the wiki.  This will save a lot of time when another developer has to fix a bug in that system and it will prevent your developers from becoming siloed.

You Have Documentation – Now What?

One of the purposes of your technical documentation is to train new people.  This is something that is overlooked a lot.  When a new developer is hired, that person can get up to speed faster if they can just browse a wiki full of technical notes.  With this purpose in mind, you should expand your wiki to include instructions on how to setup a desktop/laptop for a development environment.  You can also add educational material to get a developer up to speed.  This doesn’t mean that you need to type in subjects on how to write an MVC application.  You should be able to link to articles that can be used by new developers to hone their skills.  By doing this, you can keep a new person busy while you coordinate your day to day tasks, instead of being tied down to an all-day training session to get that person up to speed.

Your documentation should also contain a subject on your company development standards.  What frameworks are acceptable?  What processes must be followed before introducing new technologies to the system?  Coding standards?  Languages that can be used?  Maybe a statement of goals that have been laid down.  What is the intended architecture of your system?  If your company has committees to decide what the goal of the department is, then maybe the meeting minutes would be handy.

Who is Responsible for Your Documentation

Everyone should be responsible.  Everyone should participate.  Make sure you keep backups in case something goes wrong or someone makes a mistake.  Most wiki software is capable of tracking revisions.  Documentation should be treated like version control.  Don’t delete anything!  If you want to hide subjects that have been deprecated, then create a subject at the bottom for all your obsolete projects.  When a project is deprecated, move the wiki subject to that folder.  Someday, someone might ask a question about a feature that used to exist.  You can dig up the old subject and present what used to exist if necessary.  This is especially handy if you deprecated a subsystem that was a problem-child due to it’s design.  If someone wants to create that same type of mess, they can read what experience was learned from the obsolete subsystem.  The answer can be: “We tried that, it didn’t work and here’s why…” or it can be: “We did that before and here’s how it used to work…”

If you make the task of documenting part of the work that is performed, developers can add documentation as software is created.  Developers can modify documentation when bugs are fixed or enhancements are made.  Developers can remove or archive documentation when software is torn down or replaced.  The documentation should be part of the software maintenance cycle.  This will prevent the documentation from getting out of sync with your software.

 

Three Tier Architecture

There is a lot of information on the Internet about the three-tier architecture, three-tier structure and other names for the concept of breaking a program into tiers.  The current system design paradigm is to break your software into APIs and the three-tier architecture still applies.  I’m going to try and explain the three-tier architecture from the point of practicality and explain the benefits of following this structure.  But first, I have to explain what happens when a hand-full of inexperienced programmers run in and build a system from the ground up…

Bad System Design

Every seasoned developer knows what I’m talking about.  It’s the organically grown, not very well planned system.  Programming is easy.  Once a person learns the basic syntax, the world is their oyster!  Until the system get really big.  There’s a tipping point where tightly-coupled monolithic systems become progressively more difficult and time consuming to enhance.  Many systems that I have worked on were well beyond that point when I started working on them.  Let me show a diagram of what I’m talking about:

Technically, there is no firm division between front-end and back-end code.  The HTML/JavaScript is usually embedded in back-end code and there is typically business code scattered between the HTML.  Sometimes systems like this contain a lot of stored procedures which does nothing more than marry you to the database that was first used to build the application.  Burying business code in stored procedures also has the additional financial burden of ensuring you are performing your processing on the product with the most expensive licensing fees.  When your company grows, you’ll be forced to purchase more and more licenses for the database in order to keep up.  This proceeds exponentially and you’ll come to a point where any tiny change in a stored procedure, function or table causes your user traffic to overwhelm the hardware that your database runs on.

I often-times joke about how I would like to build a time machine for no other reason than to go back in time, sit down with the developers of the system I am working on and tell them what they should do.  To head it off before it becomes a very large mess.  I suspect that this craziness occurs because companies are started by non-programmers and they hook-up with some young and energetic programmer with little real-world experience who can make magic happen.  Any programmer can get the seed started.  Poor programming practices and bad system design doesn’t show up right away.  A startup company might only have a few hundred users at first.  Hardware is cheap, SQL server licenses seem reasonable, everything is working as expected.  I also suspect that those developers move on when the system becomes too difficult to manager.  They move on to another “new” project that they can start bad.  Either that, or they learn their lesson and the next company they work at is lucky to get a programmer with knowledge of how not to write a program.

Once the software gets to the point that I’ve described, then it takes programmers like me to fix it.  Sometimes it takes a lot of programmers with my kind of knowledge to fix it.  Fixing a system like this is expensive and takes time.  It’s a lot like repairing a jet while in flight.  The jet must stay flying while you shut down one engine and upgrade it to a new one.  Sound like fun?  Sometimes it is, usually it’s not.

Three-Tier Basics

In case you’re completely unfamiliar with the three-tier system, here is the simplified diagram:

It looks simple, but the design is a bit nuanced.  First of all, the HTML, JavaScript, front-end frameworks, etc. must be contained in the front-end box.  You need isolation from the back-end or middle-tier.  The whole purpose of the front-end is to handle the presentation or human interface part of your system.  The back-end or middle-tier is all the business logic.  It all needs to be contained in this section.  It must be loosely coupled and unit tested.  Preferably with an IOC container like AutoFac.  The database must be nothing more than a container for your data.  Reduce special features as much as possible.  Your caching system is also located in this layer.

The connection between the front-end and back-end is usually an API connection using REST.  You can pass data back and forth between these two layers using JSON or XML or just perform “get”, “post”, “delete” and “put” operations.  If you treat your front-end as a system that communicates with another system called your back-end, you’ll have a successful implementation.  You’ll still have hardware challenges (like network bandwidth and server instances), but those can be solved much quicker and cheaper than rewriting software.

The connection between the back-end and database has another purpose.  Your goal should be to make sure your back-end is database technology independent as much as possible.  You want the option of switching to a database with cheap licensing costs.  If you work hard up-front, you’ll get a pay-back down the road when your company expands to a respectable size and the database licensing cost starts to look ugly.

What About APIs?

The above diagram looks like a monolithic program at first glance.  If you follow the rules I already laid out, you’ll end up with one large monolithic program.  So there’s one more level of separation you must be aware of.  You need to logically divide your system into independent APIs.  You can split your system into a handful of large APIs or hundreds of smaller APIs.  It’s better to build a lot of smaller APIs, but that can depend on what type of system is being built and how many logical boxes you can divide it into.  Here’s an example of a very simple system divided into APIs:

This is not a typical way to divide your APIs.  Typically, an API can share a database with another API and the front-end can be separate from the API itself.  For now, let’s talk about the advantages of this design as I’ve shown.

  1. Each section of your system is independent.  If a user decides to consume a lot of resources by executing a long-running task, it won’t affect any other section of your system.  You can contain the resource problem.  In the monolithic design, any long-running process will kill the entire system and all users will experience the slow-down.
  2. If one section of your system requires heavy resources, then you can allocate more resources for that one section and leave all other sections the same.  In other words, you can expand one API to be hosted by multiple servers, while other APIs are each on one server.
  3. Deployment is easy.  You only deploy the APIs that are updated.  If your front-end is well isolated, then you can deploy a back-end piece without the need for deployment of your front-end.
  4. Your technology can be mixed.  You can use different database technologies for each section.  You can use different programming languages for each back-end or different frameworks for each front-end.  This also means that you have a means to convert some or all of your system to a Unix hosted system.  A new API can be built using Python or PHP and that API can be hosted on a Linux virtual box.  Notice how the front-end should require no redesign as well as your database.  Just the back-end software for one subsection of your system.

Converting a Legacy System

If you have a legacy system built around the monolithic design pattern, you’re going to want to take steps as soon as possible to get into a three-tier architecture.  You’ll also want to build any new parts using an API design pattern.  Usually it takes multiple iterations to remove the stored procedures and replace the code with decoupled front-end and back-end code.  You’ll probably start with something like this:

In this diagram the database is shared between the new API and the legacy system, which is still just a monolithic program.  Notice how stored procedures are avoided by the API on the right side.  All the business logic must be contained in the back-end so it can be unit tested.  Eventually, you’ll end up with something like this:

Some of your APIs can have their own data while others rely on the main database.  The monolithic section of your system should start to shrink.  The number of stored procedures should shrink.  This system is already easier to maintain than the complete monolithic system.  You’re still saddled with the monolithic section and the gigantic database with stored procedures.  However, you now have sections of your system that is independent and easy to maintain and deploy.  Another possibility is to do this:

In this instance the front-end is consistent.  One framework with common JavaScript can be contained as a single piece of your system.  This is OK because your front-end should not contain any business logic.

The Front-End

I need to explain a little more about the front-end that many programmers are not aware of.  Your system design goal for the front-end is to assume that your company will grow so large that you’re going to have front-end specialists.  These people should be artists who work with HTML, CSS and other front-end languages.  The front-end designer is concerned with usability and aesthetics.  The back-end designer is concerned about accuracy and speed.  These are two different skill sets.  The front-end person should be more of a graphic designer while the back-end person should be a programmer with knowledge of scalability and system performance.  Small companies will hire a programmer to perform both tasks, but a large company must begin to divide their personnel into distinct skill-sets to maximize the quality of their product.

Another overlooked aspect of the front-end is that it is going to become stale.  Somewhere down the road your front-end is going to be ugly compared to the competition.  If your front-end code is nothing more than HTML, CSS, JavaScript and maybe some frameworks, you can change the look and feel of the user interface with minimum disruption.  If you have HTML and JavaScript mixed into your business logic, you’ve got an uphill battle to try and upgrade the look and feel of your system.

The Back-End

When you connect to a database the common and simple method is to use something like ODBC or ADO.  Then SQL statements are sent as strings with parameters to the database directly.  There are many issues with this approach and the current solution is to use an ORM like Entity Framework, NHibernate or even Dapper.  Here’s a list of the advantages of an ORM:

  1. The queries are in LINQ and most errors can be found at compile time.
  2. The context can be easily changed to point to another database technology.
  3. If you including mappings that match your database, you can detect many database problems at compile time, like child to parent relationship issues (attempt to insert a child record with no parent).
  4. An ORM can break dependency with the database and provide an easy method of unit testing.

As I mentioned earlier, you must avoid stored procedures, functions and any other database technology specific features.  Don’t back yourself into a corner because MS SQL server had a feature that made it easy to use as an enhancement.  If your system is built around a set of stored procedures, you’ll be in trouble if you want to switch from MS SQL to MySQL, or from MS SQL to Oracle.

Summary

I’m hoping that this blog post is read by a lot of entry-level programmers.  You might have seen the three-tier architecture mentioned in your school book or on a website and didn’t realize what it was all about.  Many articles get into the technical details of how to implement a three-tier architecture using C# or some other language, glossing over the big picture of “why” it’s done this way.  Be aware that there are also other multi-tier architectures that can be employed.  Which technique you use doesn’t really matter as long as you know why it’s done that way.  When you build a real system, you have to be aware of what the implications of your design are going to be five or ten years from now.  If you’re just going to write some code and put it into production, you’ll run into a brick wall before long.  Keep these techniques in mind when you’re building your first system.  It will pay dividends down the road when you can enhance your software just be modifying a small API and tweak some front-end code.

 

Dear Computer Science Majors…

Introduction

It has been a while since I wrote a blog posts directed at newly minted Computer Science Majors.  In fact, the last time I wrote one of these articles was in 2014.  So I’m going to give all of you shiny-new Computer Scientists a leg-up in the working world by telling you some inside information about what companies need from you.  If you read through my previous blog post (click here) and read through this post, you’ll be ahead of the pack when you submit your resume for that first career starting job.

Purpose of this Post

First of all, I’m going to tell you my motivation for creating these posts.  In other words: “What’s in it for Frank.”  I’ve been programming since 1978 and I’ve been employed as a software engineer/developer since 1994.  One of my tasks as a seasoned developer is to review submitted programming tests, create programming tests, read resumes, submit recommendations, interview potential developers, etc.  By examining the code submitted by a programming test, I can tell a lot about the person applying for a job.  I can tell how sophisticated they are.  I can tell if they are just faking it (i.e. they just Googled results, made it work and don’t really understand what they are doing).  I can tell if the person is interested in the job or not.  One of the trends that I see is that there is a large gap between what is taught in colleges and what is needed by a company.  That gap has been increasing for years.  I would like to close the gap, but it’s a monstrous job.  So YOU, the person reading this blog that really wants a good job, must do a little bit of your own leg-work.  Do I have your attention?  Then read on…

What YOU Need to Do

First, go to my previous blog post on this subject and take notes on the following sections:

  • Practice
  • Web Presence

For those who are still in school and will not graduate for a few more semesters, start doing this now:

  • Programming competitions
  • Resume Workshops
  • Did I mention: Web Presence?

You’ll need to decide which track you’re going to follow and try to gain deep knowledge in that area.  Don’t go out and learn a hundred different frameworks, twenty databases and two dozen in-vogue languages.  Stick to something that is in demand and narrow your expertise to a level that you can gain useful knowledge.  Ultimately you’ll need to understand a subject well enough to make something work.  You’re not going to be an expert, that takes years of practice and a few failures.  If you can learn a subject well enough to speak about it, then you’re light-years ahead of the average newly minted BS-degree.

Now it’s time for the specifics.  You need to decide if you’re going to be a Unix person or a .Net person.  I’ve done both and you can cross-over.  It’s not easy to cross-over, but I’m proof that it can happen.  If you survive and somehow end up programming as long as I have, then you’ll have experience with both.  Your experience will not be even between the two sides.  It be weighted toward one end or the other.  In my case, my experience is weighted toward .Net because that is the technology that I have been working on more recently.

If you’re in the Unix track, I’m probably not the subject expert on which technologies you need to follow.  Python, Ruby, which frameworks, unit testing, you’ll need to read up and figure out what is in demand.  I would scan job sites such as Glass Door, Indeed, LinkedIn, Stack Exchange or any other job sites just to see what is in demand.  Look for entry level software developer positions.  Ignore the pay or what they are required to do and just take a quick tally of how many companies are asking for Python, PHP, Ruby, etc.  Then focus on some of those.

If you’re in the .Net track, I can tell you exactly what you need to get a great paying job.  First, you’re going to need to learn C#.  That is THE language of .Net.  Don’t let anybody tell you otherwise.  Your college taught you Java?  No problem, you’re language knowledge is already 99% there.  Go to Microsoft’s website and download the free version of Visual Studio (the community version) and install it.  Next, you’ll need a database and that is going to be MS SQL Server.  Don’t bother with MS Access.  There is a free version of SQL as well.  In fact the developer version is fully functional, but you probably don’t need to download and install that.  When you install Visual Studio the Express version of SQL is normally installed with it.  You can gain real database knowledge from that version.

Follow this list:

  • Install Visual Studio Community.
  • Check for a pre-installed version of MS SQL Server Express.
  • Go out and sign up for a GitHub account.  Go ahead, I’ll wait (click here).
  • Download and install SourceTree (click here).

Now you have the minimum tools to build your knowledge.  Here’s a list of what you need to learn, using those tools:

  • How to program in C# using a simple console application.
  • How to create simple unit tests.
  • Create an MVC website, starting with the template site.
  • How to create tables in MS SQL Server.
  • How to insert, delete, update and select data in MS SQL Server.
  • How to create POCOs, fluent mappings and a database context in C#.
  • How to troubleshoot a website or API (learn some basic IIS knowledge).
  • How to create a repository on GitHub.
  • How to check-in your code to GitHub using SourceTree.

That would pretty much do it.  The list above will take about a month of easy work or maybe a hard-driven weekend.  If you can perform these tasks and talk intelligently about them, you’ll have the ability to walk into a job.  In order to seal-the-deal, you’ll have to make sure this information is correctly presented on your resume.  So how should you do that?

First, make sure you polish your projects and remove any commented code, remove any unused or dead-code.  If there are tricky areas, put in some comments.  Make sure you update your “Read-Me” file on GitHub for each of your projects.  Put your GitHub URL near the top of your resume.  If I see a programmer with a URL to a GitHub account, that programmer has already earned some points in my informal scale of who gets the job.  I usually stop reading the resume and go right to the GitHub account and browse their software.  If you work on a project for some time, you can check-in your changes as you progress.  This is nice for me to look at, because I can see how much effort you are putting into your software.  If I check the history and I see the first check-in was just a blank solution followed by several check-ins that show the code being refactored and re-worked into a final project, I’m going to be impressed.  That tells me that you’re conscious enough to know to get your code checked-in and protected from loss immediately.  Don’t wait for the final release.  Building software is a lot like producing sausage.  The process is messy, but the final product is good (assuming you like sausage).

If you really want to impress me and by extension, any seasoned programmer, create a technical blog.  Your blog can be somewhat informal, but you need to make sure you express your knowledge of the subject.  A blog can be used as a tool to secure a job.  It doesn’t have to get a million hits a day to be successful.  In fact, if your blog only receives hits from companies that are reading your resume, it’s a success.  You see, the problem with the resume is that it doesn’t allow me to into your head.  It’s just a sheet of paper (or more if you have a job history) with the bare minimum information on it.  It’s usually just to get you past the HR department.  In the “olden” days, when resumes were mailed with a cover letter, the rule was one page.  Managers would not have time to read novels, so they wanted the potential employee to narrow down their knowledge to one page.  Sort of a summary of who you are in the working world.  This piece of paper is compared against a dozen or hundreds of other single-page resumes to determine which hand-full of people would be called in to be interviewed.  Interviews take a lot of physical time, so the resume reading needs to be quick.  That has changed over the years and the rules don’t apply to the software industry as a whole.  Even though technical resumes can go on for two or more pages, the one-page resume still applies for new graduates.  If you are sending in a resume that I might pick up and read, I don’t want to see that you worked at a Walmart check-out counter for three years, followed by a gig at the car wash.  If you had an intern job at a tech company where you got some hands-on programming experience, I want to see that.  If you got an intern with Google filing their paperwork, I don’t care.

Back to the blog.  What would I blog about if I wanted to impress a seasoned programmer?  Just blog about your experience with the projects you are working on.  It can be as simple as “My First MVC Project”, with a diary format like this:

Day 1, I created a template MVC project and started digging around.  Next I modified some text in the “View” to see what would happen.  Then I started experimenting with the ViewBag object.  That’s an interesting little object.  It allowed me to pass data between the controller and the view.

And so on…

Show that you did some research on the subject.  Expand your knowledge by adding a feature to your application.  Minor features like, search, column sort, page indexing are important.  It demonstrates that you can take an existing program and extend it to do more.  When you enter the working world, 99% of what you will create will be a feature added to code you never wrote.  If your blog demonstrates that you can extend existing code, even code you wrote yourself, I’ll be impressed.

Taking the Test

Somewhere down the line, you’re going to generate some interest.  There will be a company out there that will want to start the process and the next step is the test.  Most companies require a programming test.  At my point in my career the programming test is just an annoyance.  Let’s call it a formality.  As a new and inexperienced programmer, the test is a must.  It will be used to determine if you’re worth someone’s time to interview.  Now I’ve taken many programming tests and I’ve been involved in designing and testing many different types of programming tests.  The first thing you need to realize is that different companies have different ideas about what to test.  If it was up to me, I would want to test your problem solving skills.  Unfortunately, it’s difficult to test for that skill without forcing you to take some sort of test that may ask for your knowledge in a subject that you don’t have.  I’ve seen tests that allow the potential hire to use any language they want.  I’ve also seen tests that give very gray specifics and are rated according to how creative the solution is.  So here are some pointers for passing the test:

  • If it’s a timed test, try to educate yourself on the subjects you know will on the tests before it starts.
  • If it’s not a timed test, spend extra time on it.  Make it look like you spent some time to get it right.
  • Keep your code clean.
    • No “TODO” comments.
    • No commented code or dead-code.
    • Don’t leave code that is not used (another description for dead-code).
    • Follow the naming convention standards, no cryptic variable names (click here or here for examples).
  • If there is extra credit, do it.  The dirty secret is that this is a trick to see if you’re going to be the person who does just the minimum, or you go the extra mile.
  • Don’t get too fancy.
    • Don’t show off your knowledge by manually coding a B-tree structure instead of using the “.Sort()” linq method.
    • Don’t perform something that is obscure just to look clever.
    • Keep your program as small as possible.
    • Don’t add any “extra” features that are not called for in the specification (unless the instructions specifically tell you to be creative).

When you’re a student in college, you are required to analyze algorithms and decide which is more efficient in terms of memory use and CPU speed.  In the working world, you are required to build a product that must be delivered in a timely manner.  Does it matter if you use the fastest algorithm?  It might not really matter.  It will not make a difference if you can’t deliver a working product on time just because you spent a large amount of your development time on a section of code that is only built for the purpose of making the product a fraction faster.  Many companies will need a product delivered that works.  Code can be enhanced later.  Keep that in mind when you’re taking your programming test.  Your program should be easy to follow so another programmer can quickly enhance it or repair bugs.

The Interview

For your interview, keep it simple.  You should study up on general terms, in case you’re asked.  Make sure you understand these terms:

  • Dependency injection
  • Polymorphism
  • Encapsulation
  • Single purpose
  • Model/View/Controller
  • REST
  • Base Class
  • Private/public methods/classes
  • Getters/Setters
  • Interface
  • Method overloading

Here’s a great place to do your study (click here).  These are very basic concepts and you should have learned them in one of your object oriented programming classes.  Just make sure you haven’t forgotten about them.  Make sure you understand the concepts that you learned from any projects that you checked into GitHub.  If you learned some unit testing, study the terms.  Don’t try to act like an expert for your first interview.  Just admit the knowledge that you have.  If I interview you and you have nothing more than a simple understanding of unit testing, I’m OK with that.  All it means is that there is a base-line of knowledge that you can build on with my help.

Wear a suit, unless explicitly specified that you don’t need one.  At a minimum, you need to dress one step better than the company dress policy.  I’m one of the few who can walk into an interview with red shoes, jeans and a technology T-shirt and get a job.  Even though I can get away with such a crazy stunt, I usually show up in a really nice suit.  To be honest, I only show up in rags when I’m Luke-warm about a job and I expect to be wooed.  If I really like the company, I look sharp.  The interviewers can tell me to take off my tie if they think I’m too stuffy.  If you’re interviewing for your first job, wear a business suit.

Don’t BS your way through the interview.  If you don’t know something, just admit it.  I ask all kinds of questions to potential new hires just to see “if” by chance they know a subject.  I don’t necessarily expect the person to know the subject and it will not have a lot of bearing on the acceptance or rejection of the person interviewing.  Sometimes I do it just to find out what their personality is like.  If you admit that you know SQL and how to write a query, I’m going to hand you a dry-erase marker and make you write a query to join two tables together.  If you pass that, I’m going to give you a hint that I want all records from the parent table to show up even if it doesn’t have child records.  If you don’t know how to do a left-outer join, I’m not going to hold it against you.  If you are able to write a correct or almost correct left join, I’ll be impressed.  If you start performing a union query or try to fake it with a wild guess, I’ll know you don’t know.  I don’t want you to get a lucky guess.  I’m just trying to find out how much I’m going to have to teach you after you’re hired.  Don’t assume that another candidate is going to get the job over you just because they know how to do a left outer join.  That other candidate might not impress me in other ways that are more important.  Just do the best you can and be honest about it.

Don’t worry about being nervous.  I’m still nervous when I go in for an interview and I really have not reason to be.  It’s natural.  Don’t be insulted if the interviewer dismisses you because they don’t think you’ll be a fit for their company.  You might be a fit and their interview process is lousy.  Of course, you might not be a fit.  The interviewer knows the company culture and they know the type of personality they are looking for.  There are no hard-fast rules for what an interviewer is looking for.  Every person who performs an interview is different.  Every company is different.

What Does a Career in Software Engineering Look Like

This is where I adjust your working world expectations.  This will give you a leg-up on what you should focus on as you work your first job and gain experience.  Here’s the general list:

  • Keep up on the technologies.
  • Always strive to improve your skills.
  • Don’t be afraid of a technology you’ve never encountered.

Eventually you’re going to get that first job.  You’ll get comfortable with the work environment and you’ll be so good at understanding the software you’ve been working on that you won’t realize the world of computer programming has gone off on a different track.  You won’t be able to keep up on everything, but you should be able to recognize a paradigm shift when it happens.  Read.  Read a lot.  I’m talking about blogs, tech articles, whatever interests you.  If your company is having issues with the design process, do some research.  I learned unit testing because my company at the time had a software quality issue from the lack of regression testing.  The company was small and we didn’t have QA people to perform manual regression testing, so bugs kept appearing in subsystems that were not under construction.  Unit testing solved that problem.  It was difficult to learn how to do unit testing correctly.  It was difficult to apply unit testing after the software was already built.  It was difficult to break the dependencies that were created from years of adding enhancements to the company software.  Ultimately, the software was never 100% unit tested (if I remember correctly, it was around 10% when I left the company), but the unit tests that were applied had a positive effect.  When unit tests are used while the software is being developed, they are very effective.  Now that the IOC container is main-stream, dependencies are easy to break and unit tests are second nature.  Don’t get complacent about your knowledge.  I have recently interviewed individuals who have little to no unit testing experience and they have worked in the software field for years.  Now they have to play catch-up, because unit testing is a requirement, not an option.  Any company not unit testing their software is headed for bankruptcy.

APIs are another paradigm.  This falls under system architecture paradigms like SOA and Microservices.  The monolithic application is dying a long and slow death.  Good riddance.  Large applications are difficult to maintain.  They are slow to deploy.  Dependencies are usually everywhere.  Breaking a system into smaller chunks (called APIs) can ease the deployment and maintenance of your software.  This shift from monolithic design to APIs started to occur years ago.  I’m still stunned at the number of programmers that have zero knowledge of the subject.  If you’ve read my blog, you’ll know that I’m a big fan of APIs.  I have a lot of experience designing, debugging and deploying APIs.

I hope I was able to help you out.  I want to see more applicants that are qualified to work in the industry.  There’s a shortage of software developers who can do the job and that problem is getting worse every year.  The job market for seasoned developers is really good, but the working world is tough because there is a serious shortage of knowledgeable programmers.  Every company I’ve worked for has a difficult time filling a software developer position and I don’t see that changing any time in the near future.  That doesn’t mean that there is a shortage of Computer Science degrees graduating each year.  What it means is that there are still too many people graduating with a degree that just to measure up.  Don’t be that person.

Now get started on that blog!

 

 

 

 

The Case for Unit Tests

Introduction

I’ve written a lot of posts on how to unit test, break dependencies, mocking objects, creating fakes, dependency injection and IOC containers.  I am a huge advocate of writing unit tests.  Unit tests are not the solution to everything, but they do solve a large number of problems that occur in software that is not unit tested.  In this post, I’m going to build a case for unit testing.

Purpose of Unit Tests

First, I’m going to assume that the person reading this post is not sold on the idea of unit tests.  So let me start by defining what a unit test is and what is not a unit test.  Then I’ll move on to defining the process of unit testing and how unit tests can save developers a lot of time.

A unit test is a tiny, simple test on a method or logic element in your software.  The goal is to create a test for each logical purpose that your code performs.  For a given “feature” you might have a hundred unit tests (more or less, depending on how complex the feature is).  For a method, you could have one, a dozen or hundreds of unit tests.  You’ll need to make sure you can cover different cases that can occur for the inputs to your methods and test for the appropriate outputs.  Here’s a list of what you should unit test:

  • Fence-post inputs.
  • Obtain full code coverage.
  • Nullable inputs.
  • Zero or empty string inputs.
  • Illegal inputs.
  • Representative set of legal inputs.

Let me explain what all of this means.  Fence-post inputs are dependent on the input data type.  If you are expecting an integer, what happens when you input a zero?  What about the maximum possible integer (int.MaxValue)?  What about minimum integer (int.MinValue)?

Obtain full coverage means that you want to make sure you hit all the code that is inside your “if” statements as well as the “else” portion.  Here’s an example of a method:

public class MyClass
{
    public int MyMethod(int input1)
    {
        if (input1 == 0)
        {
            return 4;
        }
        else if (input1 > 0)
        {
            return 2;
        }
        return input1;
    }
}

How many unit tests would you need to cover all the code in this method?  You would need three:

  1. Test with input1 = 0, that will cover the code up to the “return 4;”
  2. Test with input = 1 or greater, that will cover the code to “return 2;”
  3. Test with input = -1 or less, that will cover the final “return input1;” line of code.

That will get you full coverage.  In addition to those three tests, you should account for min and max int values.  This is a trivial example, so min and max tests are overkill.  For larger code you might want to make sure that someone doesn’t break your code by changing the input data type.  Anyone changing the data type from int to something else would get failed unit tests that will indicate that they need to review the code changes they are performing and either fix the code or update the unit tests to provide coverage for the redefined input type.

Nullable data can be a real problem.  Many programmers don’t account for all null inputs.  If you are using an input type that can have null data, then you need to account for what will happen to your code when it receives that input type.

The number zero can have bad consequences.  If someone adds code and the input is in the denominator, then you’ll get a divide by zero error, and you should catch that problem before your code crashes.  Even if you are not performing a divide, you should probably test for zero, to protect a future programmer from adding code to divide and cause an error.  You don’t necessarily have to provide code in your method to handle zero.  The example above just returns the number 4.  But, if you setup a unit test with a zero for an input, and you know what to expect as your output, then that will suffice.  Any future programmer that adds a divide with that integer and doesn’t catch the zero will get a nasty surprise when they execute the unit tests.

If your method allows input data types like “string”, then you should check for illegal characters.  Does your method handle carriage returns?  Unprintable characters?  What about an empty string?  Strings can be null as well.

Don’t forget to test for your legal data.  The three tests in the previous example test for three different legal inputs.

Fixing Bugs

The process of creating unit tests should occur as you are creating objects.  In fact, you should constantly think in terms of how you’re going to unit test your object, before you start writing your object.  Creating software is a lot like a sausage factory and even I write objects before unit tests as well as the other way around.  I prefer to create an empty object and some proposed methods that I’ll be creating.  Just a small shell with maybe one or two methods that I want to start with.  Then I’ll think up unit tests that I’ll need ahead of time.  Then I add some code and that might trigger a thought for another unit test.  The unit tests go with the code that you are writing and it’s much easier to write the unit tests before or just after you create a small piece of code.  That’s because the code you just created is fresh in your mind and you know what it’s supposed to do.

Now you have a monster that was created over several sprints.  Thousands of lines of code and four hundred unit tests.  You deploy your code to a Quality environment and a QA person discovers a bug.  Something you would have never thought about, but it’s an easy fix.  Yeah, it was something stupid, and the fix will take about two seconds and you’re done!

Not so fast!  If you find a bug, create a unit test first.  Make sure the unit test triggers the bug.  If this is something that blew up one of your objects, then you need to create one or more unit tests that feeds the same input into your object and forces it to blow up.  Then fix the bug.  The unit test(s) should pass.

Now why did we bother?  If you’re a seasoned developer like me, there have been numerous times that another developer unfixes your bug fix.  It happens so often, that I’m never surprised when it does happen.  Maybe your fix caused an issue that was unreported.  Another developer secretly fixes your bug by undoing your fix, not realizing that they are unfixing a bug.  If you put a unit test in to account for a bug, then a developer that unfixes the bug will get an error from your unit test.  If your unit test is named descriptively, then that developer will realize that he/she is doing something wrong.  This episode just performed a regression test on your object.

Building Unit Tests is Hard!

At first unit tests are difficult to build.  The problem with unit testing has more to do with object dependency than with the idea of unit testing.  First, you need to learn how to write code that isn’t tightly coupled.  You can do this by using an IOC container.  In fact, if you’re not using an IOC container, then you’re just writing legacy code.  Somewhere down the line, some poor developer is going to have to “fix” your code so that they can create unit tests.

The next most difficult concept to overcome is learning how to mock or fake an object that is not being unit tested.  These can be devices, like database access, file I/O, smtp drivers, etc.  For devices, learn how to use interfaces and wrappers.  Then you can use Moq to mock your unit tests.

Unit Tests are Small

You need to be conscious of what you are unit testing.  Don’t create a unit test that checks a whole string of objects at once (unless you want to consider those as integration tests).  Limit your unit tests to the smallest amount of code you need in order to test your functionality.  No need to be fancy.  Just simple.  Your unit tests should run fast.  Many slow running unit tests bring no benefit to the quality of your product.  Developers will avoid running unit tests if it takes 10 minutes to run them all.  If your unit tests are taking too long to run, you’ll need to analyze what should be scaled back.  Maybe your program is too large and should be broken into smaller pieces (like APIs).

There are other reasons to keep your unit tests small and simple: Some day one or more unit tests are going to fail.  The developer modifying code will need to look at the failing unit test and analyze what it is testing.  The quicker a developer can analyze and determine what is being tested, the quicker he/she can fix the bug that was caused, or update the unit test for the new functionality.  A philosophy of keeping code small should translate into your entire programming work pattern.  Keep your methods small as well.  That will keep your code from being nested too deep.  Make sure your methods server a single purpose.  That will make unit testing easier.

A unit test only tests methods of one object.  The only time you’ll break other objects is if you add parameters to your object or public methods/parameters.  If you change something to a private method, only unit tests for the object you’re working on will fail.

Run Unit Tests Often

For a continuous integration environment, your unit tests should run right after you build.  If you have a build serer (and you should), your build server must run the unit tests.  If your tests do not pass, then the build needs to be marked as broken.  If you only run your unit tests after you end your sprint, then you’re going to be in for a nasty surprise when hundreds of unit tests fail and you need to spend days trying to fix all the problems.  Your programming pattern should be: Type some code, build, test, repeat.  If you test after each build, then you’ll catch mistakes as you make them.  Your failing unit tests will be minimal and you can fix your problem while you are focused on the logic that caused the failure.

Learning to Unit Test

There are a lot of resources on the Internet for the subject of unit testing.  I have written many blog posts on the subject that you can study by clicking on the following links:

 

Mocking Your File System

Introduction

In this post, I’m going to talk about basic dependency injection and mocking a method that is used to access hardware.  The method I’ll be mocking is the System.IO.Directory.Exists().

Mocking Methods

One of the biggest headaches with unit testing is that you have to make sure you mock any objects that your method under test is calling.  Otherwise your test results could be dependent on something you’re not really testing.  As an example for this blog post, I will show how to apply unit tests to this very simple program:

class Program
{
    static void Main(string[] args)
    {
        var myObject = new MyClass();
        Console.WriteLine(myObject.MyMethod());
        Console.ReadKey();
    }
}

The object that is used above is:

public class MyClass
{
    public int MyMethod()
    {
        if (System.IO.DirectoryExists("c:\\temp"))
        {
            return 3;
        }
        return 5;
    }
}

Now, we want to create two unit tests to cover all the code in the MyMethod() method.  Here’s an attempt at one unit test:

[TestMethod]
public void test_temp_directory_exists()
{
    var myObject = new MyClass();
    Assert.AreEqual(3, myObject.MyMethod());
}

The problem with this unit test is that it will pass if your computer contains the c:\temp directory.  If your computer doesn’t contain c:\temp, then it will always fail.  If you’re using a continuous integration environment, you can’t control if the directory exists or not.  To compound the problem you really need test both possibilities to get full test coverage of your method.  Adding a unit test to cover the case where c:\temp to your test suite would guarantee that one test would pass and the other fail.

The newcomer to unit testing might think: “I could just add code to my unit tests to create or delete that directory before the test runs!”  Except, that would be a unit test that modifies your machine.  The behavior would destroy anything you have in your c:\temp directory if you happen to use that directory for something.  Unit tests should not modify anything outside the unit test itself.  A unit test should never modify database data.  A unit test should not modify files on your system.  You should avoid creating physical files if possible, even temp files because temp file usage will make your unit tests slower.

Unfortunately, you can’t just mock System.IO.Directory.Exists().  The way to get around this is to create a wrapper object, then inject the object into MyClass and then you can use Moq to mock your wrapper object to be used for unit testing only.  Your program will not change, it will still call MyClass as before.  Here’s the wrapper object and an interface to go with it:

public class FileSystem : IFileSystem
{
  public bool DirectoryExists(string directoryName)
  {
    return System.IO.Directory.Exists(directoryName);
  }
}

public interface IFileSystem
{
    bool DirectoryExists(string directoryName);
}

Your next step is to provide an injection point into your existing class (MyClass).  You can do this by creating two constructors, the default constructor that initializes this object for use by your method and a constructor that expects a parameter of IFileSystem.  The constructor with the IFileSystem parameter will only be used by your unit test.  That is where you will pass along a mocked version of your filesystem object with known return values.  Here are the modifications to the MyClass object:

public class MyClass
{
    private readonly IFileSystem _fileSystem;

    public MyClass(IFileSystem fileSystem)
    {
        _fileSystem = fileSystem;
    }

    public MyClass()
    {
        _fileSystem = new FileSystem();
    }

    public int MyMethod()
    {
        if (_fileSystem.DirectoryExists("c:\\temp"))
        {
            return 3;
        }
        return 5;
    }
}

This is the point where your program should operate as normal.  Notice how I did not need to modify the original call to MyClass that occurred at the “Main()” of the program.  The MyClass() object will create a IFileSystem wrapper instance and use that object instead of calling System.IO.Directory.Exists().  The result will be the same.  The difference is that now, you can create two unit tests with mocked versions of IFileSystem in order to test both possible outcomes of the existence of “c:\temp”.  Here is an example of the two unit tests:

[TestMethod]
public void test_temp_directory_exists()
{
    var mockFileSystem = new Mock<IFileSystem>();
    mockFileSystem.Setup(x => x.DirectoryExists("c:\\temp")).Returns(true);

    var myObject = new MyClass(mockFileSystem.Object);
    Assert.AreEqual(3, myObject.MyMethod());
}

[TestMethod]
public void test_temp_directory_missing()
{
    var mockFileSystem = new Mock<IFileSystem>();
    mockFileSystem.Setup(x => x.DirectoryExists("c:\\temp")).Returns(false);

    var myObject = new MyClass(mockFileSystem.Object);
    Assert.AreEqual(5, myObject.MyMethod());
}

Make sure you include the NuGet package for Moq.  You’ll notice that in the first unit test, we’re testing MyClass with a mocked up version of a system where “c:\temp” exists.  In the second unit test, the mock returns false for the directory exists check.

One thing to note: You must provide a matching input on x.DirectoryExists() in the mock setup.  If it doesn’t match what is used in the method, then you will not get the results you expect.  In this example, the directory being checked is hard-coded in the method and we know that it is “c:\temp”, so that’s how I mocked it.  If there is a parameter that is passed into the method, then you can mock some test value, and pass the same test value into your method to make sure it matches (the actual test parameter doesn’t matter for the unit test, only the results).

Using an IOC Container

This sample is setup to be extremely simple.  I’m assuming that you have existing .Net legacy code and you’re attempting to add unit tests to the code.  Normally, legacy code is hopelessly un-unit testable.  In other words, it’s usually not worth the effort to apply unit tests because of the tightly coupled nature of legacy code.  There are situations where legacy code is not too difficult to add unit testing.  This can occur if the code is relatively new and the developer(s) took some care in how they built the code.  If you are building new code, you can use this same technique from the beginning, but you should also plan your entire project to use an IOC container.  I would not recommend refactoring an existing project to use an IOC container.  That is a level of madness that I have attempted more than once with many man-hours of wasted time trying to figure out what is wrong with the scoping of my objects.

If your code is relatively new and you have refactored to use contructors as your injection points, you might be able to adapt to an IOC container.  If you are building your code from the ground up, you need to use an IOC container.  Do it now and save yourself the headache of trying to figure out how to inject objects three levels deep.  What am I talking about?  Here’s an example of a program that is tightly coupled:

class Program
{
    static void Main(string[] args)
    {
        var myRootClass = new MyRootClass();

        myRootClass.Increment();

        Console.WriteLine(myRootClass.CountExceeded());
        Console.ReadKey();
    }
}
public class MyRootClass
{
  readonly ChildClass _childClass = new ChildClass();

  public bool CountExceeded()
  {
    if (_childClass.TotalNumbers() > 5)
    {
        return true;
    }
    return false;
  }

  public void Increment()
  {
    _childClass.IncrementIfTempDirectoryExists();
  }
}

public class ChildClass
{
    private int _myNumber;

    public int TotalNumbers()
    {
        return _myNumber;
    }

    public void IncrementIfTempDirectoryExists()
    {
        if (System.IO.Directory.Exists("c:\\temp"))
        {
            _myNumber++;
        }
    }

    public void Clear()
    {
        _myNumber = 0;
    }
}

The example code above is very typical legacy code.  The “Main()” calls the first object called “MyRootClass()”, then that object calls a child class that uses System.IO.Directory.Exists().  You can use the previous example to unit test the ChildClass for examples when c:\temp exist and when it doesn’t exist.  When you start to unit test MyRootClass, there’s a nasty surprise.  How to you inject your directory wrapper into that class?  If you have to inject class wrappers and mocked classes of every child class of a class, the constructor of a class could become incredibly large.  This is where IOC containers come to the rescue.

As I’ve explained in other blog posts, an IOC container is like a dictionary of your objects.  When you create your objects, you must create a matching interface for the object.  The index of the IOC dictionary is the interface name that represents your object.  Then you only call other objects using the interface as your data type and ask the IOC container for the object that is in the dictionary.  I’m going to make up a simple IOC container object just for demonstration purposes.  Do not use this for your code, use something like AutoFac for your IOC container.  This sample is just to show the concept of how it all works.  Here’s the container object:

public class IOCContainer
{
  private static readonly Dictionary<string,object> ClassList = new Dictionary<string, object>();
  private static IOCContainer _instance;

  public static IOCContainer Instance => _instance ?? (_instance = new IOCContainer());

  public void AddObject<T>(string interfaceName, T theObject)
  {
    ClassList.Add(interfaceName,theObject);
  }

  public object GetObject(string interfaceName)
  {
    return ClassList[interfaceName];
  }

  public void Clear()
  {
    ClassList.Clear();
  }
}

This object is a singleton object (global object) so that it can be used by any object in your project/solution.  Basically it’s a container that holds all pointers to your object instances.  This is a very simple example, so I’m going to ignore scoping for now.  I’m going to assume that all your objects contain no special dependent initialization code.  In a real-world example, you’ll have to analyze what is initialized when your objects are created and determine how to setup the scoping in the IOC container.  AutoFac has options of when the object will be created.  This example creates all the objects before the program starts to execute.  There are many reasons why you might not want to create an object until it’s actually used.  Keep that in mind when you are looking at this simple example program.

In order to use the above container, we’ll need to use the same FileSystem object and interface from the prevous program.  Then create an interface for MyRootObject and ChildObject.  Next, you’ll need to go through your program and find every location where an object is instantiated (look for the “new” command).  Replace those instances like this:

public class ChildClass : IChildClass
{
    private int _myNumber;
    private readonly IFileSystem _fileSystem = (IFileSystem)IOCContainer.Instance.GetObject("IFileSystem");

    public int TotalNumbers()
    {
        return _myNumber;
    }

    public void IncrementIfTempDirectoryExists()
    {
        if (_fileSystem.DirectoryExists("c:\\temp"))
        {
            _myNumber++;
        }
    }

    public void Clear()
    {
        _myNumber = 0;
    }
}

Instead of creating a new instance of FileSystem, you’ll ask the IOC container to give you the instance that was created for the interface called IFileSystem.  Notice how there is no injection in this object.  AutoFac and other IOC containers have facilities to perform constructor injection automatically.  I don’t want to introduce that level of complexity in this example, so for now I’ll just pretend that we need to go to the IOC container object directly for the main program as well as the unit tests.  You should be able to see the pattern from this example.

Once all your classes are updated to use the IOC container, you’ll need to change your “Main()” to setup the container.  I changed the Main() method like this:

static void Main(string[] args)
{
    ContainerSetup();

    var myRootClass = (IMyRootClass)IOCContainer.Instance.GetObject("IMyRootClass");
    myRootClass.Increment();

    Console.WriteLine(myRootClass.CountExceeded());
    Console.ReadKey();
}

private static void ContainerSetup()
{
    IOCContainer.Instance.AddObject<IChildClass>("IChildClass",new ChildClass());
    IOCContainer.Instance.AddObject<IMyRootClass>("IMyRootClass",new MyRootClass());
    IOCContainer.Instance.AddObject<IFileSystem>("IFileSystem", new FileSystem());
}

Technically the MyRootClass object does not need to be included in the IOC container since no other object is dependent on it.  I included it to demonstrate that all objects should be inserted into the IOC container and referenced from the instance in the container.  This is the design pattern used by IOC containers.  Now we can write the following unit tests:

[TestMethod]
public void test_temp_directory_exists()
{
    var mockFileSystem = new Mock<IFileSystem>();
    mockFileSystem.Setup(x => x.DirectoryExists("c:\\temp")).Returns(true);

    IOCContainer.Instance.Clear();
    IOCContainer.Instance.AddObject("IFileSystem", mockFileSystem.Object);

    var myObject = new ChildClass();
    myObject.IncrementIfTempDirectoryExists();
    Assert.AreEqual(1, myObject.TotalNumbers());
}

[TestMethod]
public void test_temp_directory_missing()
{
    var mockFileSystem = new Mock<IFileSystem>();
    mockFileSystem.Setup(x => x.DirectoryExists("c:\\temp")).Returns(false);

    IOCContainer.Instance.Clear();
    IOCContainer.Instance.AddObject("IFileSystem", mockFileSystem.Object);

    var myObject = new ChildClass();
    myObject.IncrementIfTempDirectoryExists();
    Assert.AreEqual(0, myObject.TotalNumbers());
}

[TestMethod]
public void test_root_count_exceeded_true()
{
    var mockChildClass = new Mock<IChildClass>();
    mockChildClass.Setup(x => x.TotalNumbers()).Returns(12);

    IOCContainer.Instance.Clear();
    IOCContainer.Instance.AddObject("IChildClass", mockChildClass.Object);

    var myObject = new MyRootClass();
    myObject.Increment();
    Assert.AreEqual(true,myObject.CountExceeded());
}

[TestMethod]
public void test_root_count_exceeded_false()
{
    var mockChildClass = new Mock<IChildClass>();
    mockChildClass.Setup(x => x.TotalNumbers()).Returns(1);

    IOCContainer.Instance.Clear();
    IOCContainer.Instance.AddObject("IChildClass", mockChildClass.Object);

    var myObject = new MyRootClass();
    myObject.Increment();
    Assert.AreEqual(false, myObject.CountExceeded());
}

In these unit tests, we put the mocked up object used by the object under test into the IOC container.  I have provided a “Clear()” method to reset the IOC container for the next test.  When you use AutoFac or other IOC containers, you will not need the container object in your unit tests.  That’s because IOC containers like the one built into .Net Core and AutoFac use the constructor of the object to perform injection automatically.  That makes your unit tests easier because you just use the constructor to inject your mocked up object and test your object.  Your program uses the IOC container to magically inject the correct object according to the interface used by your constructor.

Using AutoFac

Take the previous example and create a new constructor for each class and pass the interface as a parameter into the object like this:

private readonly IFileSystem _fileSystem;

public ChildClass(IFileSystem fileSystem)
{
    _fileSystem = fileSystem;
}

Instead of asking the IOC container for the object that matches the interface IFileSystem, I have only setup the object to expect the fileSystem object to be passed in as a parameter to the class constructor.  Make this change for each class in your project.  Next, change your main program to include AutoFac (NuGet package) and refactor your IOC container setup to look like this:

static void Main(string[] args)
{
    IOCContainer.Setup();

    using (var myLifetime = IOCContainer.Container.BeginLifetimeScope())
    {
        var myRootClass = myLifetime.Resolve<IMyRootClass>();

        myRootClass.Increment();

        Console.WriteLine(myRootClass.CountExceeded());
        Console.ReadKey();
    }
}

public static class IOCContainer
{
    public static IContainer Container { get; set; }

    public static void Setup()
    {
        var builder = new ContainerBuilder();

        builder.Register(x => new FileSystem())
            .As<IFileSystem>()
            .PropertiesAutowired()
            .SingleInstance();

        builder.Register(x => new ChildClass(x.Resolve<IFileSystem>()))
            .As<IChildClass>()
            .PropertiesAutowired()
            .SingleInstance();

        builder.Register(x => new MyRootClass(x.Resolve<IChildClass>()))
            .As<IMyRootClass>()
            .PropertiesAutowired()
            .SingleInstance();

        Container = builder.Build();
    }
}

I have ordered the builder.Register command from innner most to the outer most object classes.  This is not really necessary since the resolve will not occur until the IOC container is called by the object to be used.  In other words, you can define the MyRootClass first, followed by FileSystem and ChildClass, or in any order you want.  The Register command is just storing your definition of which physical object will be represented by each interface and which dependencies it will depend on.

Now you can cleanup your unit tests to look like this:

[TestMethod]
public void test_temp_directory_exists()
{
    var mockFileSystem = new Mock<IFileSystem>();
    mockFileSystem.Setup(x => x.DirectoryExists("c:\\temp")).Returns(true);

    var myObject = new ChildClass(mockFileSystem.Object);
    myObject.IncrementIfTempDirectoryExists();
    Assert.AreEqual(1, myObject.TotalNumbers());
}

[TestMethod]
public void test_temp_directory_missing()
{
    var mockFileSystem = new Mock<IFileSystem>();
    mockFileSystem.Setup(x => x.DirectoryExists("c:\\temp")).Returns(false);

    var myObject = new ChildClass(mockFileSystem.Object);
    myObject.IncrementIfTempDirectoryExists();
    Assert.AreEqual(0, myObject.TotalNumbers());
}

[TestMethod]
public void test_root_count_exceeded_true()
{
    var mockChildClass = new Mock<IChildClass>();
    mockChildClass.Setup(x => x.TotalNumbers()).Returns(12);

    var myObject = new MyRootClass(mockChildClass.Object);
    myObject.Increment();
    Assert.AreEqual(true, myObject.CountExceeded());
}

[TestMethod]
public void test_root_count_exceeded_false()
{
    var mockChildClass = new Mock<IChildClass>();
    mockChildClass.Setup(x => x.TotalNumbers()).Returns(1);

    var myObject = new MyRootClass(mockChildClass.Object);
    myObject.Increment();
    Assert.AreEqual(false, myObject.CountExceeded());
}

Do not include the AutoFac NuGet package in your unit test project.  It’s not needed.  Each object is isolated from all other objects.  You will still need to mock any injected objects, but the injection occurs at the constructor of each object.  All dependencies have been isolated so you can unit test with ease.

Where to Get the Code

As always, I have posted the sample code up on my GitHub account.  This project contains four different sample projects.  I would encourage you to download each sample and experiment/practice with them.  You can download the samples by following the links listed here:

  1. MockingFileSystem
  2. TightlyCoupledExample
  3. SimpleIOCContainer
  4. AutoFacIOCContainer
 

Vintage Hardware – The IBM 701

The Basics

The IBM 701 computer was one of IBM’s first commercially available computers.  Here are a few facts about the 701:

  • The 701 was introduced in 1953.
  • IBM sold only 30 units.
  • It used approximately 4,000 tubes.
  • Memory was stored on 72 CRT tubes called Williams Tubes with a total capacity of 2,048 words.
  • The word size was 36 bits wide.
  • The CPU contained two programmer accessible registers: Accumulator, multiplier/quotient register.

Tubes

There were two types of electronic switches in the early 1950’s: The relay and the vacuum tube.  Relays were unreliable, mechanical, noisy, power-hungry and slow.  Tubes were unreliable and power-hungry.  Compared to the relay, tubes were fast.  One of the limitations on how big and complex a computer could be in those days, was determined by the reliability of the tube.  Tubes had a failure rate of 0.3 percent per 1000 hours (in the late 40’s, early 50’s).  That means, that in 1,000 hours of use, 0.3 percent of the tubes will fail.  That’s pretty good if you’re talking about a dozen tubes.  At that failure rate, the 701 would fail at an average of less than one hour (0.3 x 4,000 tubes = 1,200 units per 1,000 hours) assuming an even distribution of failures.

Tubes use higher voltage levels than semiconductors and that means a lot more heat and more power usage.  It also means that tubes are much slower than semiconductors.  By today’s standards, tubes are incredible slow.  Most tube circuit times were measured in microseconds instead of nanoseconds.  A typical add instruction on the IBM 701 took 60 microseconds to complete.

The main memory was composed of Williams tubes.  These are small CRT tubes that stored data by exploiting the delay time that it took phosphor to fade.  Instead of using the visible properties of phosphor to store data, a secondary emission effect can occur by increasing the beam to a threshold causing a charge to occur.  A thin conducting plate is mounted to the front of the CRT.  When the beam hits a point where the phosphor is already lit up, no charge is induced on the plate.  This is a read as a one.  When the beam hits a place where there was no lit phosphor, a current will flow indicating a zero.  The read operation causes the bit to overwritten as a one, so the data must be re-written as it is read.  Wiki has an entire article on the Williams tube: Williams tube wiki.  Here’s an example of the dot pattern on a Williams tube:

By National Institute of Standards and Technology – National Institute of Standards and Technology, Public Domain

The tubes that were used for memory storage didn’t have visible phosphor, therefore a technician could plug in a phosphor tube in parallel and see the image as picture above for troubleshooting purposes.

Williams tube memory was called electrostatic storage and it had an access time of 12 microseconds.  Each tube stored 1,024 bits of data data for one bit of the data bus.  To form a parallel data bus, 36 tubes together represented the full data path for 1,024 words of data.  The data is stored on the tube in 32 columns by 32 rows.  If you count the columns and rows in the tube pictured above, you’ll see that it is storing 16 by 16, representing a total of 256 bits of data.  The system came with 72 tubes total for a memory size of 2,048 words of memory.  Since the word size is 36 bits wide, 2,048 words is equal to 9,216 bytes of storage.

Here’s a photo of one drawer containing two Williams tubes:

By www.Computerhistory.org

Williams tubes used electromagnets to steer the electron beam to an xy point on the screen.  What this means is that the entire assembly must be carefully shielded to prevent stray magnetic fields from redirecting the beam.  In the picture above you can see the black shielding around each of the long tubes to prevent interference between the two tubes.

In order to address the data from the tube, the addressing circuitry would control the x and y magnets to steer to the correct dot, then read or write the dot.  If you want to learn more about the circuitry that drives an electrostatic memory unit, you can download the schematics in PDF here.

One type of semiconductor was used in this machine and that was the Germanium diode.  The 701 used at total of 13,000 germanium diodes.

CPU

The CPU or Analytical Control Unit could perform 33 calculator operations as shown in this diagram:

Here is the block diagram of the 701:

Instructions are read from the electrostatic storage into the memory register where it is directed to the instruction register. Data that is sent to external devices must go through the multiplier/quotient register.  Data read from external devices (i.e. tape, drum or card reader) is loaded into the multiplier/quotient register.  As you can see from the block diagram, there is a switch that feeds inputs from the tape, drum, card reader or the electrostatic memory.  This is a physical switch on the front panel of the computer.  The computer operator would change the switch position to the desired input before starting the machine.  Here’s a photo of the front panel, you can see the round switch near the lower left (click to zoom):

By Dan – Flickr: IBM 701, CC BY 2.0

The front panel also has switches and lights for each of the registers so the operator could manually input binary into the accumulator or enter a starting address (instruction counter).  Notice how the data width of this computer is wider than the address width.  Only 12 bits are needed to address all 2,048 words of memory.  If you look closely, you’ll also notice that the lights are grouped in octal sets (3 lights per group).  The operator can key in data that is written as octal numbers (0-7) without trying to look at a 12-bit number of ones and zeros.  The switches for entering data are grouped gray and white in octal groupings as well.

There is a front panel control to increment the machine cycle.  An operator or technician could troubleshoot a problem by executing one machine cycle at a time with one press of the button per machine cycle.  For a multiply instruction the operator could push the button 38 times to complete the operation or examine the partial result as each cycle was completed.  The machine also came with diagnostic programs that could be run to quickly identify a physical problem with the machine.  A set of 16 manuals was provided to assist the installation, maintenance, operation and programming of the 701.  The computer operator normally only operated the start, stop and reset controls as well as the input and output selectors on the machine.  The instruction counter and register controls are normally used by a technician to troubleshoot problems.

The CPU was constructed using a modular technique.  Each module or “Pluggable Unit” had a complete circuit on it and could be replaced all at once by a technician.  The units looked like this:

Image from Pintrest (Explore Vacuum Tube, IBM, and more!)

Each of these units are aligned together into one large block:

(By Bitsavers, IBM 701)

Notice how all the tubes face the front of the machine.  One of the first things to burn out on a tube is the heating element.  By facing the tubes toward the front, a technician can quickly identify any burned out tubes and replace them.  If all tubes are glowing, then the technician will need to run diagnostics and try to limit down which tube or tubes are not working.  Another reason to design the modules to face all tubes forward, is that the technician can grab a tube and pull it out of it’s socket to put it into a tube tester and determine which tube really is bad.

My experience with tubes date back to the late ’70s when people wanted to “fix” an old TV set (usually a TV that was built in the early 60s).  I would look for an unlit tube, read the tube number off the side and run down to the electronics shop to buy a new one.  If that failed, then my next troubleshooting step was to pull all the tubes (after the TV cooled off), put them in a box and run up to the electronics store where they had a tube tester (they wanted to sell new tubes, so they provided a self-help tester right at their store).  I would plug my tube into the tester, flip through the index pages for the tube type being tested.  Then adjust the controls according to the instructions and push the test button.  If the tube was bad, then I bought a new one.  Other components such as diodes, resisters, capacitors rarely went bad.  Those would be the next thing on the troubleshooting list after all the tubes were determined to be good.  One other note about troubleshooting tubes: The tube tester had a meter that would be read to determine what the testing current flow was.  The manual had a range of acceptable values (minimum and maximum values).  The values were based on the tube manufacturer’s specifications.  For some devices, the tube would not work, even though it was within manufacturer specifications.  We referred to tubes like this as a “weak” tube (in other words it might not be able to pass as much current as needed by the circuit).  So a determination would have to be made to replace the tube.

Here’s an example of the type of tube tester I remember using:

(PDX Retro)

All cabinets were designed to be no larger than necessary to be able to transport up an elevator and fit through a normal sized office door (see: Buchholz IBM 701 System Design, Page 1285 Construction and Design Details).  The entire system was divided into separate cabinets for this very purpose.  Cables would be run through the floor to connect cabinets together, much like the wiring in modern server rooms.

Storage Devices

The 701 could be configured with a drum storage unit and a tape storage unit.  The IBM 731 storage unit contained two physical drums organized as four logical drums able to store 2,048 full words of information each (for a total of 8,192 words, equal to 82,000 decimal digits).  Each logical drum reads and writes 36 bits in a row.  The drum spins at a rate of 2,929 RPM with a density of 50 bits to the inch.  There are 2,048 bits around the drum for each track.  Seek time could take up to 1,280 microseconds.  The storage capacity was intentionally designed to match the capacity of the electrostatic memory.  This was used to read all memory onto a drum or read a drum into the electrostatic memory.

Inputs

The primary input device for the 701 was the punch card reader.  Computers in the 50’s were designed as batch machines.  It cost a lot of money to run a computer room and it is more economical to batch jobs so the computer was processing data continuously.  It would be too expensive to have an operator typing data into the computer and saving it on a storage device like we do today.  Timeshare systems were not invented yet and this machine was too slow and tiny for time sharing.  In order to perform batch jobs, operators or programmers would use the keypunch machine to type their program and data onto punch cards (usually in a different room).  The keypunch machine was an electro-mechanical device that looked like a giant typewriter (IBM 026):

By Columbia University, The IBM 026 Keypunch

A punch card reader is connected as an input to the 701 in order to read in a batch of punch cards.  The cards can be read in at a rate of 150 cards per minute.  Each card holds one line of data from a program or 72 characters wide.  If the punch cards are used as binary input, then they can contain 24 full words per card.  A program can be punched onto a stack of cards and then loaded into the card reader for execution.  Programmers used a lot of common routines with their programs, so they would punch a routine on a small stack of cards and then include that stack with their main program to be run in sequence (see: Buchholz IBM 701 System Design, Page 1274 Card programming).  Then they could save the common routine cards to be used with other jobs.  The stacks of cards were equivalent to today’s “files”.

By IBM Archives, IBM 711 Punch Card Reader

Outputs

The 701 was normally configured with a line printer that can print 150 lines per minute.  Information to be printed was transferred from the multiplier/quotient register to 72 thyratrons (a type of current amplifier tube) connected directly to the printer.  The tyratrons were located inside the Analytical Control Unit and their function was shared by the printer and the card punch (thyratrons activated the print magnets or the punch magnets).  Data that was printed came directly from the electrostatic storage and needed to be converted by the program into decimal data before going to the printer.

Sources

I have provided links from each embedded photo above to the sources (with the exception of diagrams I copied from the sources below).  You may click on a photo to go to the source and obtain more information.  For more detailed information about the IBM 701, I would recommend clicking through these sites/documents:

 

Stored Procedures Vs. No Stored Procedures

There is this debate raging among developers: “Is it better to use stored procedures or not use stored procedures”.  From first glance, this seems like a simple question, but there are some complicated implications around this question.  Here’s the basic pros and cons of using stored procedures in your system:

Pros

  • You can isolate table changes from your front-end.
  • Updating business code is easy, just update the stored procedure while your system is running (assuming you already tested the changes off-line and everything is good).
  • Many queries can be reduced to one call to the database making your system faster.

Cons

  • Your business processing is performed by your database which is expensive (license fees compared to web servers).
  • Unit testing is all but impossible.
  • Version control is difficult.
  • You are married to your database server technology (MS SQL cannot be easily ported to Oracle).
  • Stored Procedure languages are not well structured.  This leads to spaghetti code.

Now, let’s expand a bit and discuss each of these issues:

Isolate Table Changes

It may seem like overkill to setup all of your select, insert, update and delete queries to call a stored procedure, then have the stored procedure call the tables.  This can be a boon for situations where you might change your table structure in the future.  An example would be a situation where a table becomes too wide and you want to break the table into two tables.  Your stored procedures can have the same interface to your program and the code inside could handle all the details of where the data is stored and read from.  Removing a field from a table can also be safely done, since you can just remove it inside the stored procedure but leave the parameter called from your software until you can root out all calls to the database and change those (or leave the dead parameter).

Updating Business Code

Business code is the logic that computes and performs work on your database.  Your stored procedure might update one table, add an entry to a log table and remove corresponding records from another table.  Another use for a stored procedure is to pass filter and ordering information into a query for a list of data.  A dynamic query can be formed in a stored procedure and executed with all the parameters entered.  This relieves your front-end from the task of sorting and filtering data.  It can also reduce the amount of raw data returned from the database.

The point here is that your business code might need a change or an enhancement.  If there is a bug, then it can be fixed and deployed on the fly.  A .Net application must be compiled and carefully deployed on servers that are not accessed during the time of deployment.  A stored procedure can be changed on the fly.  If your website is a large monolithic application, this capability becomes a larger “pro”.

One Call to the Database

If a record edit requires a log entry, then you’ll be forced to call your database twice.  If you use a stored procedure, the stored procedure code can make those calls, which would be performed from the database itself and not over the wire.  This can reduce the latency time to perform such an operation.

Now for the list of cons…

Business Processing is Being Performed by the Database

MS SQL server and Oracle licensing is expensive.  Both licenses are based on number of CPUs and the price can get steep.  If you have a choice between performing your processing on a web server or a database server, it’s a no-brainer.  Do it on the web server.  Web servers, even an MS Server license is cheaper than a SQL Server license.  Initially your business will not feel the cost difference because your system is small.  However, once your system grows you’ll see your costs increase on an exponential scale.

Unit Testing

Unit testing is not available for stored procedures.  You could probably get creative and setup a testing environment in Visual Studio, but it would be custom.  Without unit tests, you are not able to regression test.  If your stored procedures contain logic that allows different modes to occur, it can be difficult to properly test each mode.  An example is a stored procedure that performs filter and sorting operations.  If you are building a dynamic query using “if” statements, then you’ll have a combination of different possible inputs.  How do you ensure that your dynamic query doesn’t have a missing “and” or a missing comma, or a union query with non-matching fields?  It’s difficult.  If this logic is in your website code, you can wrap each combination of inputs in unit tests to provide regression testing when you add a filter or sorting option or change an option.

Version Control

No matter how you divide your code, you’ll need to make sure you version control your database so you can keep a history of your changes and match those changes with your code.  Visual Studio allows you to define all your database objects in a project (see SQL Server Database Project type).  There are tools available to allow you to create change scripts from two Team Foundation Server versions.  This can be used to update multiple databases.  Versioning a database is not a common practice and that is why I’ve put this under the “con” list instead of the “pro”.  For companies that keep their database definitions in a version control system, they can take this off the “con” list.

Married to Your Database Server

Till death do you part!  If you ever decide to switch from one database server technology to another, you’ll discover how steep the hill is that you’ll need to climb.  Each stored procedure will need to be converted by hand one-by-one.  If your system doesn’t have stored procedures, then you’ll have an easier time converting.  Minor differences between triggers and indexes might be an issue between Oracle and SQL, and there’s the recursive query in Oracle that is different.  You might even have issues with left and right outer joins if you used the “(+)” symbol in Oracle.  Stored procedures will be your Achilles heel.

Spaghetti Code

Writing stored procedures is a lot like writing in Classic ASP.  It’s messy.  I see a lot of sloppy coding practices.  There is no standard for formatting queries or TSQL code.  Everybody has their own short-cuts.  Once a system grows to a point where it contains tens of thousands of stored procedures, you’re faced with a mess that has no hope.  C# code has the luxury of being able to be refactored.  This is a powerful capability that can be used to reduce entangled code.  Being able to break code into small and management chunks is also helpful.  If your database code is contained in a Visual Studio project, you can perform some of the same refactoring, but you can’t test on the fly.  So programmers prefer to change stored procedures on their test database where refactoring is not available.

Conclusion

Are there more pros and cons?  Sure.  Every company has special needs for their database.  Some companies have a lot of bulk table to table processing that must be performed.  That screams stored procedures, and I would recommend sticking with that technique.  Other companies have a website front-end with large quantities of small transactions.  I would recommend those companies keep their business logic in their website code where they can unit test their logic.  In the end, you’ll need to take this list of pros and cons and decide which item to give more weight.  Your scale may tip one way or the other depending on which compromise you want to make.

 

Caching

Summary

This subject is much larger than the blog title suggests.  I’m going to discuss some basics of using a caching system to take the load off of your database and speed up your website.  I’ll also discuss how cache should be handled by your software and what the pitfalls of caching can be.

Caching Data Calls

First of all, you can speed up your database routines by using a cache system configured as a helper.  Here’s a diagram of what I mean by configuring a cache system as a helper:

Basically, what this abstract diagram shows is that the website front-end retrieves it’s data from the database by going through the cache server.  Technically, that is not how the system is setup.  The cache and database are accessed through the back-end, but the cache instructions are setup with the following flow-chart:

You’ll notice that the first place to check is the cache system.  Only if the data is not in the cache, does the program connect to the database and read the data from there.  If the data is read from the database, then it must be saved in the cache first, then returned to the calling method.

One other thing to note here, is that it is vitally important to design and test your cache system in a way that it will work without crashing if the cache server is off or not responding.  If your cache server crashes, you don’t want your website to crash, you want it to continue operating directly from the database.  Your website will operate more slowly, but it will operate.  This will prevent your cache system from becoming a single point of failure.

Why is Caching Faster?

Programmers who have never dealt with caching systems before might ask this question.  It doesn’t seem like a good way to speed up a system by just throwing an extra server in the middle and adding more code to every data read.  The reason for the speed increase is due to the fact that the cache system is typically RAM based, where databases are mostly hard-drive based (though all modern databases use caching of their own).  Also, a Redis server costs pennies compared to a SQL server just in licensing costs alone.  By reducing the load on your SQL server you can reduce the number of processors needed for your SQL instances and reduce the amount of your licensing fees.

This sounds so magical, but caching requires great care and can cause a lot of problems if not implemented correctly.

Caching Pitfalls to Avoid

One pitfall of caching is how to handle stale data.  If you are using an interactive interface with CRUD operations, you’ll want to make sure you flush your cached objects for edits and deletes.  You only need to delete the key of the data in the cache server that relates to the data changed in the database.  This can become complicated if data being changed shows up in more than one cache key.  An example is where you cache result sets instead of the raw data.  You might have one cache object that contains a list of products at a particular store that contains the store name and another cache object that contains product coupons offered by a store containing the store name.  Caching is not a normalized structure and in this instance the cached instance is matched to the web page that needs the data.  Now, think about an example where the store name is changed for one reason or another.  If your software is responsible for clearing the cache when data is changed, then the store name change administration page must be aware of all cache keys that could contain the store name and delete/expire those keys.  The quick and dirty method of such an operation is to flush all the cache keys, but that method could cause a slowdown of the system and is not recommended.

It’s also tempting to cache every result in your website.  While this could theoretically be done, there is a point where caching can become more of a headache than a help.  One instance is in caching large result sets that constantly change.  One example is if you have a list of stores with sales figures that you refer to often.  The underlying query to compute this data might be SQL intensive, so it’s tempting to cache the results.  However, if the data is constantly changing, then each change in data must clear the cache and the underlying operation of caching and expiring cache keys can slow down your system.  Another example is to cache a list of employees with a different cache key per sort or filter operation.  This could lead to a large number of cached sets that can fill up your cache server and cause the server to expire cached items early in order to make room for the constant saving of new data.  If you need to cache a list, then cache the whole list and do your filtering and sorting in your website after reading it from the cache.  If there is too much data to cache, you can limit the cache time.

Sticky Cache

You can adjust the cache expire time to improve system performance.  If your cached data doesn’t change very often, like a lookup table that is used everywhere in your system, then make the expire time infinite.  Handle all cache expiring through your interface by only expiring the data when it is changed.  This can be one of your greatest speed increases.  Especially if you have lookup tables that are hit by all kinds of pages, like your system settings or user rights matrix.  These lookup tables might get hit several times for each web page access.  If you can cache that lookup, then the cache system will take the hit instead of your database.  This type of caching is sometimes referred to as “Sticky” because the cache keys stick in the system and never expire.

Short Cache Cycles

The opposite extreme is to assign a short expire time.  This can be used for instances where you would cache the results of a list page.  The cached page might have an expire time of 5 minutes with a sliding expire time.  That would allow a user to see the list, click on a next page button until they find what they are looking for.  After the user has found the piece of data to view or edit, the list is no longer needed.  The expire time can kick in and expire the cached data, or the view/edit screen can expire the cache when the user clicks on the link to go to that page.  It can also reduce database calls by using a cached set that is just the raw list.  Sorting and searching can be done from the front-end.  When the user clicks on the header of a column to sort by that column, the data can be re-read from the cache and sorted by that column, instead of being read from the database.

This particular use of caching should only be used after careful analysis of your data usage.  You’ll need the following data points:

  • Total amount of data per cache key.  This will be the raw query of records that will be read by the user.
  • Total number of times database is accessed first time arrival at the web page in question, per user.  This can be used to compute memory used by cache system.
  • Average number of round trips to the database the website uses when a user is performing a typical task.  This can be obtained by totaling the number of accesses to the database by each distinct user.

Multiple Cache Servers

If you get to a point where you need more than one cache server there are many ways to divide up your cached items.  If you’re using multiple database instances, then it would make sense to use one cache server per instance (or one per two database instances depending on usage).  Either way, you’ll need to make sure that you are able to know which instance your cached data is located on.  If you have a load-balanced web farm setup with a round-robin scheme, you don’t want to perform your caching per web server.  Such a setup would cause your users to get cache misses more often than hits and you would duplicate your caching for most items.  It’s best to think of this type of caching as being married to your database.  Each database should be matched up with one cache server.

If you have multiple database instances that you maintain for your customer data and you have a common database for system related lookup information, it would be advisable to setup a caching system for the common database system first.  You’ll get more bang for the buck by doing this and you’ll reduce your load on your common system.  Your common data is usually where your sticky caching will be used.  If your fortunate, you’ll be able to cache all of your common data and only use your common database for loading the system when there are changes or outages.

Result Size to Cache

Let’s say you have a lookup table containing all the stores in your company.  This data doesn’t change very often and there is one administrative screen that edits this data.  So sticky caching is your choice.  Analysis shows that each website call causes a read from this table to lookup the store name from the id or some such information.  Do you cache the entire table as one cache item or do you cache each store as one cache key per store?

If your front-end only looks up a store by it’s id, then you can name your cache keys with the store id and it will be more efficient to store each key separately.  Loading the cache will take multiple reads to the database, but each time your front-end hits the cache, the minimum amount of cached data is sent over the wire.

If your front-end searches or filters by store name or by zip code or by state, etc.  Then it’s wiser to cache the whole table into the cache system as one key.  Then your front-end can pull all the cached data and perform filtering and sorting as needed.  This will also depend on data size.

If your data size is very large, then you might need to create duplicated cached data for each store by id, each zip code, each state, etc.  This would seem wasteful at first, but remember the data is not stored permanently.  It’s OK to store duplicate results in cache.  The object of cache is not to reduce wasted space but to reduce website latency and reduce the load on your database.

 

.Net MVC Project with AutoFac, SQL and Redis Cache

Summary

In this blog post I’m going to demonstrate a simple .Net MVC project that uses MS SQL server to access data.  Then I’m going to show how to use Redis caching to cache your results to reduce the amount of traffic hitting your database.  Finally, I’m going to show how to use the AutoFac IOC container to tie it all together and how you can leverage inversion of control to to break dependencies and unit test your code.

AutoFac

The AutoFac IOC container can be added to any .Net project using the NuGet manager.  For this project I created an empty MVC project and added a class called AutofacBootstrapper to the App_Start directory.  The class contains one static method called Run() just to keep it simple.  This class contains the container builder setup that is described in the instructions for AutoFac Quick Start: Quick Start.

Next, I added .Net library projects to my solution for the following purposes:

BusinessLogic – This will contain the business classes that will be unit tested.  All other projects will be nothing more than wire-up logic.

DAC – Data-tier Application.

RedisCaching – Redis backed caching service.

StoreTests – Unit testing library

I’m going to intentionally keep this solution simple and not make an attempt to break dependencies between dlls.  If you want to break dependencies between modules or dlls, you should create another project to contain your interfaces.  For this blog post, I’m just going to use the IOC container to ensure that I don’t have any dependencies between objects so I can create unit tests.  I’m also going to make this simple by only providing one controller, one business logic method and one unit test.

Each .Net project will contain one or more objects and each object that will be referenced in the IOC container must use an interface.  So there will be the following interfaces:

IDatabaseContext – The Entity Framework database context object.

IRedisConnectionManager – The Redis connection manager provides a pooled connection to a redis server.  I’ll describe how to install Redis for windows so you can use this.

IRedisCache – This is the cache object that will allow the program to perform caching without getting into the ugly details of reading and writing to Redis.

ISalesProducts – This is the business class that will contain one method for our controller to call.

Redis Cache

In the sample solution there is a project called RedisCaching.  This contains two classes: RedisConnectionManager and RedisCache.  The connection manager object will need to be setup in the IOC container first.  That needs the Redis server IP address, which would normally be read from a config file.  In the sample code, I fed the IP address into the constructor at the IOC container registration stage.  The second part of the redis caching is the actual cache object.  This uses the connection manager object and is setup in the IOC container next, using the previously registered connection manager as a paramter like this:

builder.Register(c => new RedisConnectionManager("127.0.0.1"))
    .As<IRedisConnectionManager>()
    .PropertiesAutowired()
    .SingleInstance();

builder.Register(c => new RedisCache(c.Resolve<IRedisConnectionManager>()))
    .As<IRedisCache>()
    .PropertiesAutowired()
    .SingleInstance();

In order to use the cache, just wrap your query with syntax like this:

return _cache.Get("ProductList", 60, () =>
{
  return (from p in _db.Products select p.Name);
});

The code between the { and } represents the normal EF linq query.  This must be returned to the anonymous function call: ( ) =>

The cache key name in the example above is “ProductList” and it will stay in the cache for 60 minutes.  The _cache.Get() method will check the cache first, if the data is there, then it returns the data and moves on.  If the data is not in the cache, then it calls the inner function, causing the EF query to be executed.  The result of the query is then saved to the cache server and then the result is returned.  This guarantees that the next query in less than 60 minutes will be in the cache for direct retrieval.  If you dig into the Get() method code you’ll notice that there are multiple try/catch blocks that will error out if the Redis server is down.  For a situation where the server is down, the inner query will be executed and the result will be returned.  In a production situation your system would run a bit slower and you’ll notice your database is working harder, but the system keeps running.

A precompiled version of Redis for Windows can be downloaded from here: Service-Stack Redis.  Download the files into a directory on your computer (I used C:\redis) then you can open a command window and navigate into your directory and use the following command to setup a windows service:

redis-server –-service-install

Please notice that there are two “-” in front of the “service-install” instruction.  Once this is setup, then Redis will start every time you start your PC.

The Data-tier

The DAC project contains the POCOs, the fluent configurations and the context object.  There is one interface for the context object and that’s for AutoFac’s use:

builder.Register(c => new DatabaseContext("Server=SQL_INSTANCE_NAME;Initial Catalog=DemoData;Integrated Security=True"))
    .As<IDatabaseContext>()
    .PropertiesAutowired()
    .InstancePerLifetimeScope();

The context string should be read from the configuration file before being injected into the constructor shown above, but I’m going to keep this simple and leave out the configuration pieces.

Business Logic

The business logic library is just one project that contains all the complex classes and methods that will be called by the API.  In a large application you might have two or more business logic projects.  Typically though, you’ll divide your application into independent APIs that will each have their own business logic project as well as all the other wire-up projects shown in this example.  By dividing your application by function you’ll be able to scale your services according to which function uses the most resources.  In summary, you’ll put all the complicated code inside this project and your goal is to apply unit tests to cover all combination of features that this business logic project will contain.

This project will be wired up by AutoFac as well and it needs the caching and the data tier to be established first:

builder.Register(c => new SalesProducts(c.Resolve<IDatabaseContext>(), c.Resolve<IRedisCache>()))
    .As<ISalesProducts>()
    .PropertiesAutowired()
    .InstancePerLifetimeScope();

As you can see the database context and the redis caching is injected into the constructor of the SalesProjects class.  Typically, each class in your business logic project will be registered with AutoFac.  That ensures that you can treat each object independent of each other for unit testing purposes.

Unit Tests

There is one sample unit test that performs a test on the SalesProducts.Top10ProductNames() method.  This test only tests the instance where there are more than 10 products and the expected count is going to be 10.  For effective testing, you should test less than 10, zero, and exactly 10.  The database context is mocked using moq.  The Redis caching system is faked using the interfaces supplied by StackExchange.  I chose to setup a dictionary inside the fake object to simulate a cached data point.  There is no check for cache expire, this is only used to fake out the caching.  Technically, I could have mocked the caching and just made it return whatever went into it.  The fake cache can be effective in testing edit scenarios to ensure that the cache is cleared when someone adds, deletes or edits a value.  The business logic should handle cache clearing and a unit test should check for this case.

Other Tests

You can test to see if the real Redis cache is working by starting up SQL Server Management Studio and running the SQL Server Profiler.  Clear the profiler, start the MVC application.  You should see some activity:

Then stop the MVC program and start it again.  There should be no change to the profiler because the data is coming out of the cache.

One thing to note, you cannot use IQueryable as a return type for your query.  It must be a list because the data read from Redis is in JSON format and it’s de-serialized all at once.  You can de-searialize and serialize into a List() object.  I would recommend adding a logger to the cache object to catch errors like this (since there are try/catch blocks).

Another aspect of using an IOC container that you need to be conscious of is the scope.  This can come into play when you are deploying your application to a production environment.  Typically developers do not have the ability to easily test multi-user situations, so an object that has a scope that is too long can cause cross-over data.  If, for instance, you set your business logic to have a scope of SingleInstance() and then you required your list to be special to each user accessing your system, then you’ll end up with the data of the first person who accessed the API.  This can also happen if your API receives an ID to your data for each call.  If the object only reads the data when the API first starts up, then you’ll have a problem.  This sample is so simple that it only contains one segment of data (top 10 products).  It doesn’t matter who calls the API, they are all requesting the same data.

Other Considerations

This project is very minimalist, therefore, the solution does not cover a lot of real-world scenarios.

  • You should isolate your interfaces by creating a project just for all the interface classes.  This will break dependencies between modules or dlls in your system.
  • As I mentioned earlier, you will need to move all your configuration settings into the web.config file (or a corresponding config.json file).
  • You should think in terms of two or more instances of this API running at once (behind a load-balancer).  Will there be data contention?
  • Make sure you check for any memory leaks.  IOC containers can make your code logic less obvious.
  • Be careful of initialization code in an object that is started by an IOC container.  Your initialization might occur when you least expect it to.

Where to Get The Code

You can download the entire solution from my GitHub account by clicking here.  You’ll need to change the database instance in the code and you’ll need to setup a redis server in order to use the caching feature.  A sql server script is provided so you can create a blank test database for this project.

 

DotNet Core vs. NHibernate vs. Dapper Smackdown!

The Contenders

Dapper

Dapper is a hybrid ORM.  This is a great ORM for those who have a lot of ADO legacy code to convert.  Dapper uses SQL queries and parameters can be used just like ADO, but the parameters to a query can be simplified into POCOs.  Select queries in Dapper can also be translated into POCOs.  Converting legacy code can be accomplished in steps because the initial pass of conversion from ADO is to add Dapper, followed by a step to add POCOs, then to change queries into LINQ (if desired).  The speed difference in my tests show that Dapper is better than my implementation of ADO for select queries, but slower for inserts and updates.  I would expect ADO to perform the best, but there is probably a performance penalty for using the data set adapter instead of the straight sqlCommand method.

If you’re interested in Dapper you can find information here: Stack Exchange/Dapper.   Dapper has a NuGet package, which is the method I used for my sample program.

ADO

I rarely use ADO these days, with the exception of legacy code maintenance or if I need to perform some sort of bulk insert operation for a back-end system.  Most of my projects are done in Entity Framework, using the .Net Core or the .Net version.  This comparison doesn’t feel complete without including ADO, even though my smackdown series is about ORM comparisons.  So I assembled a .Net console application with some ADO objects and ran a speed test with the same data as all the ORM tests.

NHibernate

NHiberate is the .Net version of Hibernate.  This is an ORM that I used at a previous company that I worked for.  At the time it was faster than Entity Framework 6 by a large amount.  The .Net Core version of Entity Framework has fixed the performance issues of EF and it no longer makes sense to use NHibernate.  I am providing the numbers in this test just for comparison purposes.  NHibernate is still faster than ADO and Dapper for everything except the select.  Both EF-7 and NHibernate are so close in performance that I would have to conclude that they are the same.  The version of NHibernate used for this test is the latest version as of this post (version 4.1.1 with fluent 2.0.3).

Entity Framework 7 for .Net Core

I have updated the NuGet packages for .Net Core for this project and re-tested the code to make sure the performance has not changed over time.  The last time I did a smackdown with EF .Net Core I was using .Net Core version 1.0.0, now I’m using .Net Core 1.1.1.  There were not measurable changes in performance for EF .Net Core.

The Results

Here are the results side-by-side with the .ToList() method helper and without:

Test for Yourself!

First, you can download the .Net Core version by going to my GitHub account here and downloading the source.  There is a SQL script file in the source that you can run against your local MS SQL server to setup a blank database with the correct tables.  The NHibernate speed test code can also be downloaded from my GitHub account by clicking here. The ADO version is here.  Finally, the Dapper code is here.  You’ll want to open the code and change the database server name.