Versioning Your APIs

Introduction

If you’ve ever written an API and used it in a real-world application, you’ll discover the need to make changes to enhance your software.  The problem with changing an API is that once the interface has been published and used by other applications, the end points cannot be changed.  The best way to deal with this problem is to version your API so that a new version can have different end points.  This will give consumers time to change their code to match the new versions.

One sensitive consumer of API data is the mobile application.  If your company produces mobile applications, then you’ll need to deploy the new mobile app to the app store after the API has been deployed and is able to consume requests.  This creates a chicken or the egg problem of which should go first.  If the API is versioned, then it can be deployed in advance of the mobile application being available for download.  The new mobile app can use the new version of the API while the older mobile app versions still consume data from the previous versions of your API.  This can also avoid the problem of forcing your end users to upgrade ASAP.

Versioning Method 1

To version your API, there is an obvious but painful method of versioning.  This involves using a feature in IIS that allows multiple applications to be created under one website.  The process is to make a copy of the previous version of your API, then make changes to the code to represent the next version of the API.  Next, create a new application in IIS, say “V2.0”.  Then the path to your API will be something like “myapi.com/V2.0/controllername/method”.

Here is a list of the drawbacks to this method:

  • Deployment involves the creation of a new application every time a new version is deployed.
  • Any web.config file in the root directory of IIS would be inherited by all applications.
  • Keeping multiple versions of code becomes difficult to track.
  • Continuous integration becomes a headache because each version will need a deployment.

Several of these issues can be “fixed” by using creative merging/splitting of branches.  The inheritance problem can be fixed by leaving the root directory empty.  The creation of new applications under IIS can be automated through Powershell scripting (deployment process can check if app exists and create it if it doesn’t).

Versioning Method 2

A NuGet package that can be added to your solution to allow automatic versioning of controllers (here’s the package for .Net Core).  There is a great blog post on this package here.  I looked over the blog post and decided to do a little testing of my own.  I tested a sample Web API project for .Net Core to see if I could get the same results as duplicating projects and installing them as applications under IIS.  This is what I came up with for my version 2 controller:

using System.Collections.Generic;
using Microsoft.AspNetCore.Mvc;

namespace WebApiVersionedSample.Controllers
{
  [ApiVersion("2.0")]
  [ApiVersion("2.1")]
  [Route("v{version:apiVersion}/Values")]
    public class Values2Controller : Controller
    {
      [HttpGet, MapToApiVersion("2.0")]
      public IEnumerable GetV20()
      {
          return new string[] { "value1 - version 2", "value2 - version 2" };
      }

      [HttpGet, MapToApiVersion("2.1")]
      public IEnumerable GetV21()
      {
        return new string[] { "value1 - version 2.1", "value2 - version 2.1" };
      }

      [HttpGet("{id}", Name = "Get"), MapToApiVersion("2.0")]
      public string Get20(int id)
      {
          return $"id={id} version 2.0";
      }

      [HttpGet("{id}", Name = "Get"), MapToApiVersion("2.1")]
      public string Get21(int id)
      {
        return $"id={id} version 2.1";
      }
    }
}

As you can see, I set this up to accept version 2.0 or version 2.1, and I removed the “api” default path.  If you specify version 2.0, your consumer application can only see the GetV20 method for a get operation and your application will see Get20(int id) for any get method that passes and integer id variable.  In my sample code, I only printed the version number to show what code was executed when I selected a particular version.  Normally, you’ll call a business class from your method and that business class can be shared between two or more versions if the functionality didn’t change.  If, however, you have different logic in your business class between version 2.0 and version 2.1, then you’ll need to create another business class to call from your get method for version 2.1 and leave your version 2.0 untouched.

If you want to keep things simple, you can start a new version by creating new controllers for each endpoint and just add the version number to the end of the name.  Then you can change any one or more get, post, put or delete to conform to your new version.  Just be aware that this logic will need to continue into your business classes if necessary.

For my version 1.0 controller, as an example, I used the ValuesController object with an attribute at the top like so:

[ApiVersion("1.0")]
[Route("v{version:apiVersion}/Values")]
public class ValuesController : Controller

The “Route” shows how the version number precedes “/Values” controller name.  To access this in your browser hit F5 and change the version number.

Example: “http://localhost:64622/v1.0/values” or “http://localhost:64622/v2.1/values”.

To change your startup browser location, expand your properties and double-click on the launchSettings.json file:

Now you can change the application url and the launchUrl:

{
  "iisSettings": {
    "windowsAuthentication": false,
    "anonymousAuthentication": true,
    "iisExpress": {
      "applicationUrl": "http://localhost:64622/",
      "sslPort": 0
    }
  },
  "profiles": {
    "IIS Express": {
      "commandName": "IISExpress",
      "launchBrowser": true,
      "launchUrl": "v1.0/values",
      "environmentVariables": {
        "ASPNETCORE_ENVIRONMENT": "Development"
      }
    },
    "WebApiVersionedSample": {
      "commandName": "Project",
      "launchBrowser": true,
      "launchUrl": "v1.0/values",
      "environmentVariables": {
        "ASPNETCORE_ENVIRONMENT": "Development"
      },
      "applicationUrl": "http://localhost:64623/"
    }
  }
}

This method of versioning your API has a few advantages over the previous method:

  • Deployments are set.  Add new version code and re-deploy.
  • No need to keep multiple copies of source code.
  • No IIS changes are necessary.

There are some potential pitfalls that developers will need to be aware of.  One complex situation is the possibility of a database change.  This can ripple down to a change to your Entity Framework POCOs, and possibly your context object.  If you are cautious and add non-breaking changes (like adding a new field), then you can change your database repository code without breaking your previous versions.  If you have breaking changes (such as a change to a stored procedure), then you’ll need to get creative and design it so both the old version and new version of your code still work together.

Where to Find the Code

You can download the sample code used in this blog post by going to my GitHub account (click here).

 

Dynamic or Extended Field Data

Summary

SQL Databases are very powerful mechanisms for storing and retrieving data.  With careful design, a SQL database can store large quantities of records and retrieve them at lightning speeds.  The downside to a SQL database is that they are very rigid structures and, if not carefully designed, they can become slow and unwieldy.  There is a common mechanism that allows a database to be dynamically extended.  This mechanism can allow the customer to add a field to a table that represents a holding place for data that the system was never designed to hold.  Such a mechanism is used in SAS (Software As a Service) systems where there are multiple customers with different data needs that cannot be accommodated by enhancements by the SAS company.  I’m going to show several techniques that I’ve seen used, as well as their pros and cons, and then I’m going to show an effective technique for performing this function.

Fixed Fields

I’m going to start with one of the worse techniques I’ve seen used.  The fixed field technique.  The gist of this technique is that each table that allows extended data will have multiple extended fields.  Here’s a sample table (called productextended):

If you’re not cringing after looking at the sample table design above, then you need to read on.  The table above is only a small sample of systems that I’ve seen in production.  Production systems I’ve worked on have more than 10 text, 10 datetime, 10 shortdate, 10 int, 10 float, etc.  In order to make such a system work there is usually some sort of dictionary of information stored that tells what each field is used for.  Then there is a configuration screen that allows the customer to choose what each of those fixed fields can be used as.

Here is an example meta data lookup table with fields matching the sample data in the extended table above:

If you examine the lookup table above, you’ll see that the Money1 field represents a sale price and the Bit1 field represents a flag indicating that the product is sold out.  There is no relational integrity between these tables because normalization between these two does not exist.  The tables are not relational.  If you delete a data type in the meta data lookup table and re-use the original field to represent other data, then the data that existed will become the new data.  You’ll need to write special code to handle a situation when a field is re-used for other purposes.  Your code would need to delete data from that field for the entire productextended table.

I’m going to list some obvious disadvantages to using this technique:

  • There are a limited number of extended fields available per table
  • Tables are extra wide and slow
  • First normal form is broken, no relational integrity constraints can be used.

Using XML Field

The second example that I’ve seen is the use of one extended field of XML data type.  This was a very clever idea involving one field on each table called “extended” and it was setup to contain XML data.  The data stored in this field is serialized and de-serialized by the software and then the actual data is read from a POCO object.  Here’s a sample table:

This is less cringe-worthy than the previous example.  The advantage to this setup is that the table width is still manageable and first normal form has not been broken (although, there is no relation to the extended data).  If the extended field is being serialized and de-serialized into a POCO, then the customer will not be able to change field data on the fly, unless the POCO contains some clever setup, like an array of data fields that can be used at run-time (my example will show this ability).

Here is a sample POCO for the ProducXml table:

public class ProductXml
{
    public XElement XmlValueWrapper
    {
        get { return XElement.Parse(Extended); }
        set { Extended = value.ToString(); }
    }

    public int Id { get; set; }
    public int Store { get; set; }
    public string Name { get; set; } 
    public decimal? Price { get; set; }
    public string Extended { get; set; }

    public virtual Store StoreNavigation { get; set; }
}

The POCO for the xml data looks like this:

public class ExtendedXml
{
    public decimal SalePrice { get; set; }
    public bool OutOfStock { get; set; }
    public DataRecord ExtendedData = new DataRecord();
}

public class DataRecord
{
    public string Key { get; set; }
    public string Value { get; set; }
}

To make this work, you’ll need to tweak the EF configuration to look something like this:

public static class ProductXmlConfig
{
    public static void ProductMapping(this ModelBuilder modelBuilder)
    {
        modelBuilder.Entity(entity =>
        {
            entity.ToTable("ProductXml");

            entity.HasKey(e => e.Id);
            entity.Property(e => e.Id).HasColumnName("Id");
            entity.Property(e => e.Name).HasColumnType("varchar(50)");
            entity.Property(e => e.Price).HasColumnType("money");
            entity.Property(c => c.Extended).HasColumnType("xml");
            entity.Ignore(c => c.XmlValueWrapper);

            entity.HasOne(d => d.StoreNavigation)
                .WithMany(p => p.ProductXml)
                .HasForeignKey(d => d.Store)
                .OnDelete(DeleteBehavior.Restrict)
                .HasConstraintName("FK_store_product_xml");
        });
    }
}

Now the Linq code to insert a value could look like this:

using (var db = new DatabaseContext())
{
    var extendedXml = new ExtendedXml
    {
        SalePrice = 3.99m,
        OutOfStock = false,
        ExtendedData = new DataRecord
        {
            Key = "QuantityInWarehouse",
            Value = "5"
        }
    };

    var productXml = new ProductXml
    {
        Store = 1,
        Name = "Stuffed Animal",
        Price = 5.95m,
        Extended = extendedXml.Serialize()
    };

    db.ProductXmls.Add(productXml);
    db.SaveChanges();
}

The results of executing the Linq code from above will produce a new record like this:

If you click on the XML link in SQL, you should see something like this:

As you can see, there are two hard-coded POCO fields for the sale price and out of stock values.  These are not customer controlled fields.  These fields demonstrate that an enhancement could use the extended field to store new data without modifying the table.  The dictionary data contains one item called QuantityInWarehouse.  This is a customer designed field and these can be added through a data entry screen and maybe a meta data table that contains the names of extra data fields stored in the extended field.

XML allows flexible serialization, so if you add a field to the xml POCO, it will still de-serialize xml that does not contain that data (just make sure POCO field is nullable).

To see a working example, go to my GitHub account (see end of this article) and download the sample code.

You can use the following SQL query to extract data from the xml:

SELECT
    Extended.value('(/ExtendedXml/SalePrice)[1]', 'nvarchar(max)') as 'SalePrice',
    Extended.value('(/ExtendedXml/OutOfStock)[1]', 'nvarchar(max)') as 'OutOfStock', 
    Extended.value('(/ExtendedXml/ExtendedData/Key)[1]', 'nvarchar(max)') as 'Key',
    Extended.value('(/ExtendedXml/ExtendedData/Value)[1]', 'nvarchar(max)') as 'Value'
FROM 
    ProductXml

The query above should produce the following output:

Here are some disadvantages to using this technique:

  • It is difficult to query one extended field
  • Loading data is slow because the entire field of XML is loaded

Using Extended Table

This is the preferred example of designing an extended field system.  With this technique, data of any type can be added to any table without adding fields to the existing tables.  This technique does not break first normal form and forming a query is easy and powerful.  The idea behind this technique is to create two tables: The first table contains metadata describing the table and field to extend and the second table contains the actual value of the data stored.  Here’s an example of MetaDataDictionary table:

Here’s an example of the ExtendedData table:


A custom query can be formed to output all of the extended data for each record.  Here’s an example for the above data for the product table:

SELECT
	p.*,
	(SELECT e.value FROM ExtendedData e WHERE e.RecordId = p.Id AND e.MetaDataDictionaryId=1) AS 'SalePrice',
	(SELECT e.value FROM ExtendedData e WHERE e.RecordId = p.Id AND e.MetaDataDictionaryId=2) AS 'SoldOut'
FROM	
	Product p

This will produce:

To obtain data from one extended field, a simple query can be formed to lookup the value.  This leads to another bonus, the fact that Entity Framework and Linq can be used to query data that is organized in this fashion.  Why is this so important?  Because the use of EF and Linq allows all of the business logic to reside in code where it is executed by the front-end and it can be unit tested.  If there is a significant amount of code in a stored procedure, that code cannot be unit tested.

I’m going to list a few advantages of this method over the previous two methods:

  • Your implementation can have any number of extended fields
  • Any table in the system can be extended without modification to the database
  • Forming a query to grab one extended field value is simple

One thing to note about this method is that I’m storing the value in a varchar field.  You can change the size to accommodate any data stored.  You will need to perform some sort of data translation between the varchar and the actual data type you expect to store.  For example: If you are storing a date data type, then you might want some type checking when converting from varchar to the date expected.  The conversion might occur at the Linq level, or you might do it with triggers on the extended value table (though, I would avoid such a design, since it will probably chew up a lot of SQL CPU resources).

Where to Get the Code

You can find the sample code used by the xml extended field example at my GitHub account (click here).  The project contains a file named “SqlServerScripts.sql”.  You can copy this script to your local SQL server to generate the tables in a demo database and populate the data used by this blog post (saving you a lot of time).

 

Test Driven Development – Sample

In this post, I’m going to create a small program and show how to perform Test Driven Development (TDD) to make your life easier as a programmer.  Instead of creating the typical throw-away program, I’m going to do something a little more complicated.  I’m going to create a .Net Core Web API application that calls another API to get an address from a database.  The “Other” API will not be presented in this sample post because we don’t need the other API in order to write our code.  I’m going to show how to setup unit tests to pretend like there are results coming back from another API and then write the code based on the unit tests.  In real life, this scenario can happen when parallel development efforts occur.  Sometimes a fake API must be developed to finish the work being performed to access the API.  In this case, I’m going to skip the fake API part and just mock the call to the API and feed sample data back.

To get started, all we need is an empty business logic project and a unit test project.  We don’t need to wire-up any of the API front end stuff because we’re going to exercise the business logic from unit tests.  Here’s the scenario:

  1. Our API will accept an address in JSON format, plus a person id from a database.
  2. The result will be true if the database contains the same address information submitted to the API for the person id given.

Ultimately, we’ll have a business object that is instantiated by the IOC container.  Let’s focus on the business object and see if we can come up with an empty shell.  For now we’ll assume that we need an address to compare with and the person id.

public class CheckAddress
{
  public bool IsEqual(int personId, AddressPoco currentAddress)
  {
    
  }
}

There will be a POCO for the address information.  We’ll use the same POCO for the data returned as for the method above.:

public class AddressPoco
{
  public string Address { get; set; }
  public string City { get; set; }
  public string State { get; set; }
  public string Zip { get; set; }
}

So far, so good.  Inside the IsEqual() method in the CheckAddress class above, we’re going to call our address lookup API and then compare the result with the “currentAddress” value.  If they are equal, then we’ll return true.  Otherwise false.  To call another API, we could write an object like this:

public class AddressApiLookup
{
  private const string Url = "http://myurl.com";

  public AddressPoco Get(int personId)
  {
    using (var webClient = new WebClient())
    {
      webClient.Headers["Accept-Encoding"] = "UTF-8";
      webClient.Headers["Content-Type"] = "application/json";

      var arr = webClient.DownloadData(new Uri($"{Url}/api/GetAddress/{personId}"));
      return JsonConvert.DeserializeObject<AddressPoco>(Encoding.ASCII.GetString(arr));
    }
  }
}

In our IOC container, we’ll need to make sure that we break dependencies with the AddressApiLookup, which means we’ll need an interface.  We’ll also need an interface for our CheckAddress object, but that interface will not be needed for this round of unit tests.  Here’s the interface for the AddressApiLookup object:

public interface IAddressApiLookup
{
  AddressPoco Get(int personId);
}

Now we can mock the AddressApiLookup by using Moq.  Don’t forget to add the interface to the class declaration, like this:

public class AddressApiLookup : IAddressApiLookup

One last change you’ll need to perform: The CheckAddress object will need to have the AddressApiLookup injected in the constructor.  Your IOC container is going to perform the injection for you when your code is complete, but for now, we’re going to inject our mocked object into the constructor.  Change your object to look like this:

public class CheckAddress
{
  private IAddressApiLookup _addressApiLookup;

  public CheckAddress(IAddressApiLookup addressApiLookup)
  {
    _addressApiLookup = addressApiLookup;
  }

  public bool IsEqual(int personId, AddressPoco currentAddress)
  {
    
  }
}

You can setup the usual unit tests, like this:

  1. Both addresses are alike
  2. Addresses are different
  3. No address returned form remote API

You’ll probably want to test other scenarios like a 500 error, but you’ll need to change the behavior of the API calling method to make sure you return the result code.  We’ll stick to the simple unit tests for this post.  Here is the first unit test:

[Fact]
public void EqualAddresses()
{
  // arrange
  var address = new AddressPoco
  {
    Address = "123 main st",
    City = "Baltimore",
    State = "MD",
    Zip = "12345"
  };

  var addressApiLookup = new Mock<IAddressApiLookup>();
  addressApiLookup.Setup(x => x.Get(1)).Returns(address);

  // act
  var checkAddress = new CheckAddress(addressApiLookup.Object);
  var result = checkAddress.IsEqual(1, address);

  //assert
  Assert.True(result);
}

In the arrange segment, the address POCO is populated with some dummy data.  This data is used by both the API call (the mock call) and the CheckAddress object.  That guarantees that we get an equal result.  We’ll use “1” as the person id, that means that we’ll need to use “1” in the mock setup and “1” in the call to the IsEqual method.  Otherwise, we can code the setup to use “It.IsAny<int>()” as a matching input parameter and any number in the IsEqual method call.

The act section creates an instance of the CheckAddress object and injects the mocked AddressApiLookup object.  Then the result is obtained from a call to the IsEqual with the same address passed as a parameter.  The assert just checks to make sure it’s all true.

If you execute your unit tests here, you’ll get a failure.  For now, let’s go ahead and write the other two unit tests:

[Fact]
public void DifferentAddresses()
{
  // arrange
  var addressFromApi = new AddressPoco
  {
    Address = "555 Bridge St",
    City = "Washington",
    State = "DC",
    Zip = "22334"
  };

  var address = new AddressPoco
  {
    Address = "123 main st",
    City = "Baltimore",
    State = "MD",
    Zip = "12345"
  };

  var addressApiLookup = new Mock<IAddressApiLookup>();
  addressApiLookup.Setup(x => x.Get(1)).Returns(addressFromApi);

  // act
  var checkAddress = new CheckAddress(addressApiLookup.Object);
  var result = checkAddress.IsEqual(1, address);

  //assert
  Assert.False(result);
}

[Fact]
public void NoAddressFound()
{
  // arrange
  var addressFromApi = new AddressPoco
  {
  };

  var address = new AddressPoco
  {
    Address = "123 main st",
    City = "Baltimore",
    State = "MD",
    Zip = "12345"
  };

  var addressApiLookup = new Mock<IAddressApiLookup>();
  addressApiLookup.Setup(x => x.Get(1)).Returns(addressFromApi);

  // act
  var checkAddress = new CheckAddress(addressApiLookup.Object);
  var result = checkAddress.IsEqual(1, address);

  //assert
  Assert.False(result);
}

In the DifferentAddress test, I had to setup two addresses, one to be returned by the mocked object and one to be fed into the IsEqual method call.  For the final unit test, I created an empty POCO for the address used by the API.

Now the only task to perform is to create the code to make all the tests pass.  To perform TDD to the letter, you would create the first unit test and then insert code to make that unit test pass.  For that case, you can just return a true and the first unit test will pass.  Then create the second unit test and refactor the code to make that unit test pass.  Writing two or more unit tests before you start creating code can sometimes save you the time of creating a trivial code solution.  That’s what I’ve done here.  So let’s take a stab at writing the code:

public bool IsEqual(int personId, AddressPoco currentAddress)
{
  return currentAddress == _addressApiLookup.Get(personId);
}

Next, run your unit tests and they will all pass.

Except, there is one possible problem with the tests that were created: The equality might be checking only the memory address to the address POCO instance.  In that case, the equal unit test might not be testing if the data inside the object is the same.  So let’s change the equal address unit test to use two different instances of address with the same address (copy one of them and change the name):

[Fact]
public void EqualAddresses()
{
  // arrange
  var addressFromApi = new AddressPoco
  {
    Address = "123 main st",
    City = "Baltimore",
    State = "MD",
    Zip = "12345"
  };
  var address = new AddressPoco
  {
    Address = "123 main st",
    City = "Baltimore",
    State = "MD",
    Zip = "12345"
  };

  var addressApiLookup = new Mock<IAddressApiLookup>();
  addressApiLookup.Setup(x => x.Get(1)).Returns(addressFromApi);

  // act
  var checkAddress = new CheckAddress(addressApiLookup.Object);
  var result = checkAddress.IsEqual(1, address);

  //assert
  Assert.True(result);
}

Now, if you run the unit tests you’ll get the following result:

Aha!  Just as I suspected.  This means that we need to refactor our method to properly compare the two POCO objects.  You’ll have to implement IComparable inside the AddressPoco:

public class AddressPoco : IComparable
{
    public string Address { get; set; }
    public string City { get; set; }
    public string State { get; set; }
    public string Zip { get; set; }

    public int CompareTo(AddressPoco other)
    {
        if (ReferenceEquals(this, other)) return 0;
        if (ReferenceEquals(null, other)) return 1;
        var addressComparison = string.Compare(Address, other.Address, StringComparison.Ordinal);
        if (addressComparison != 0) return addressComparison;
        var cityComparison = string.Compare(City, other.City, StringComparison.Ordinal);
        if (cityComparison != 0) return cityComparison;
        var stateComparison = string.Compare(State, other.State, StringComparison.Ordinal);
        if (stateComparison != 0) return stateComparison;
        return string.Compare(Zip, other.Zip, StringComparison.Ordinal);
    }
}

I have ReSharper installed on my machine, so it has an auto-generate CompareTo method.  That auto-generate created the method and the code inside it.  You can also override the equals and use the equal sign, but this is simpler.  Next, you’ll have to modify the IsEqual() method to use the CompareTo() method:

public bool IsEqual(int personId, AddressPoco currentAddress)
{
  return currentAddress.CompareTo(_addressApiLookup.Get(personId)) == 0;
}

Now run your unit tests:

Where to Get the Sample Code

You can go to my GitHub account to download the sample code used in this blog post by going here.  I swear by ReSharper and I have purchased the Ultimate version (so I can use the unit test coverage tool).  This product for an individual is approximately $150 (first time) or less for the upgrade price.  ReSharper is one of the best software products I’ve ever bought.

 

Unit Testing EF Data With Moq

Introduction

I’ve discussed using the in-memory Entity Framework unit tests in a previous post (here).  In this post, I’m going to demonstrate a simple way to use Moq to unit test a method that uses Entity Framework Core.

Setup

For this sample, I used the POCOs, context and config files from this project (here).  You can copy the cs files from that project, or you can just download the sample project from GitHub at the end of this article.

You’ll need several parts to make your unit tests work:

  1. IOC container – Not in this post
  2. List object to DbSet Moq method
  3. Test data
  4. Context Interface

I found a method on stack overflow (here) that I use everywhere.  I created a unit test helper static object and placed it in my unit test project:

public class UnitTestHelpers
{
  public static DbSet<T> GetQueryableMockDbSet<T>(List<T> sourceList) where T : class
  {
    var queryable = sourceList.AsQueryable();

    var dbSet = new Mock<DbSet<T>>();
    dbSet.As<IQueryable<T>>().Setup(m => m.Provider).Returns(queryable.Provider);
    dbSet.As<IQueryable<T>>().Setup(m => m.Expression).Returns(queryable.Expression);
    dbSet.As<IQueryable<T>>().Setup(m => m.ElementType).Returns(queryable.ElementType);
    dbSet.As<IQueryable<T>>().Setup(m => m.GetEnumerator()).Returns(() => queryable.GetEnumerator());
    dbSet.Setup(d => d.Add(It.IsAny<T>())).Callback<T>(sourceList.Add);

    return dbSet.Object;
  }
}

The next piece is the pretend data that you will use to test your method.  You’ll want to keep this as simple as possible.  In my implementation, I allow for multiple data sets.

public static class ProductTestData
{
  public static List Get(int dataSetNumber)
  {
    switch (dataSetNumber)
    {
      case 1:
      return new List
      {
        new Product
        {
          Id=0,
          Store = 1,
          Name = "Cheese",
          Price = 2.5m
        },
        ...

      };
    }
    return null;
  }
}

Now you can setup a unit test and use Moq to create a mock up of your data and then call your method under test.  First, let’s take a look at the method and see what we want to test:

public class ProductList
{
  private readonly IDatabaseContext _databaseContext;

  public ProductList(IDatabaseContext databaseContext)
  {
    _databaseContext = databaseContext;
  }

  public List GetTopTen()
  {
    var result = (from p in _databaseContext.Products select p).Take(10).ToList();

    return result;
  }
}

The ProductList class will be setup from an IOC container.  It has a dependency on the databaseContext object.  That object will be injected by the IOC container using the class constructor.  In my sample code, I set up the class for this standard pattern.  For unit testing purposes, we don’t need the IOC container, we’ll just inject our mocked up context into the class when we create an instance of the object.

Let’s mock the context:

[Fact]
public void TopTenProductList()
{
  var demoDataContext = new Mock<IDatabaseContext>();

}

As you can see, Moq uses interfaces to create a mocked object.  This is the only line of code we need for the context mocking.  Next, we’ll mock some data.  We’re going to tell Moq to return data set 1 if the Products getter is called:

[Fact]
public void TopTenProductList()
{
  var demoDataContext = new Mock<IDatabaseContext>();
  demoDataContext.Setup(x => x.Products).Returns(UnitTestHelpers.GetQueryableMockDbSet(ProductTestData.Get(1)));

}

I’m using the GetQueryableMockDbSet unit test helper method in order to convert my list objects into the required DbSet object.  Any time my method tries to read Products from the context, data set 1 will be returned.  This data set contains 12 items.  As you can see from the method that we are going to mock up, there should be only ten items returned.  Let’s add the method under test setup:

[Fact]
public void TopTenProductList()
{
  var demoDataContext = new Mock<IDatabaseContext>();
  demoDataContext.Setup(x => x.Products).Returns(UnitTestHelpers.GetQueryableMockDbSet(ProductTestData.Get(1)));

  var productList = new ProductList(demoDataContext.Object);

  var result = productList.GetTopTen();
  Assert.Equal(10,result.Count);
}

The object under test is very basic, just get an instance and pass the mocked context (you have to use .Object to get the mocked object).  Next, just call the method to test.  Finally, perform an assert to conclude your unit test.  If the productList() method returns an amount that is not ten, then there is an issue (for this data set).  Now, we should test an empty set.  Add this to the test data switch statement:

case 2:
  return new List
  {
  };

Now the unit test:

[Fact]
public void TopTenProductList()
{
  var demoDataContext = new Mock<IDatabaseContext>();
  demoDataContext.Setup(x => x.Products).Returns(UnitTestHelpers.GetQueryableMockDbSet(ProductTestData.Get(2)));

  var productList = new ProductList(demoDataContext.Object);

  var result = productList.GetTopTen();
  Assert.Empty(result.Count);
}

All the work has been done to setup the static test data object, so I only had to add one case to it.  Then the unit test is identical to the previous unit test, except the ProductTestData.Get() has a parameter of 2, instead of 1 representing the data set number.  Finally, I changed the assert to test for an empty set instead of 10.  Execute the tests:

Now you can continue to add unit tests to test for different scenarios.

Where to Get the Code

You can go to my GitHub account and download the sample code (click here).  If you would like to create the sample tables to make this program work (you’ll need to add your own console app to call the GetTopTen() method), you can use the following MS SQL Server script:

CREATE TABLE [dbo].[Store](
	[Id] [int] IDENTITY(1,1) NOT NULL,
	[Name] [varchar](50) NULL,
	[Address] [varchar](50) NULL,
	[State] [varchar](50) NULL,
	[Zip] [varchar](50) NULL,
 CONSTRAINT [PK_Store] PRIMARY KEY CLUSTERED 
(
	[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[Product](
	[Id] [int] IDENTITY(1,1) NOT NULL,
	[Store] [int] NOT NULL,
	[Name] [varchar](50) NULL,
	[Price] [money] NULL,
 CONSTRAINT [PK_Product] PRIMARY KEY NONCLUSTERED 
(
	[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[Product]  WITH CHECK ADD  CONSTRAINT [FK_store_product] FOREIGN KEY([Store])
REFERENCES [dbo].[Store] ([Id])
GO

ALTER TABLE [dbo].[Product] CHECK CONSTRAINT [FK_store_product]
GO
 

Unit Tests are not an Option!

Introduction

I’ve been writing software since 1978.  Which is to say that I’ve seen many paradigm changes.  I witnessed the inception of object oriented programming.  I first became aware of objects when I read a Byte magazine article on a language called SmallTalk (issued August 1981).  I read and re-read that article many times to try and understand what the purpose of object oriented programming was.  Object oriented programming took ten more years before programmers began to recognize it.  In the early 90’s the University of Michigan only taught a few classes using object oriented C++.  It was still new and shiny.  Now all languages are object oriented, or they are legacy languages.

The web was another major paradigm that I witnessed.  Before the browser was invented (while I was in college), all programs were written to be executed on the machine it was run on.  I was immersed in the technology of the Internet while I was a student at UofM and we used tools such as Telnet, FTP, Archie, DNS, and Gopher (to name a few), to navigate and find information.  The Internet was primarily composed of data about programming.  When Mosaic came along as well as HTML, the programming world went crazy.  The technology didn’t mature until the early 2000’s.  Many programming languages were thrown together to accommodate the new infrastructure (I’m looking at you “Classic ASP”).

Extreme programming came of age in the late 90’s.  I did not get involved in XP until the mid 2000’s.  Waterfall was the way things were done.  The industry was struggling with automated testing suites.  Unit testing came onto the scene, but breaking dependencies was an unknown quantity.  It took some years before somebody came up with the idea of inversion of control.  The idea was so abstract that most programmers ignored it and moved on.

The latest paradigm change, and it’s much bigger than most will give it credit for is the IOC container.  Even Microsoft has incorporated this technology into their latest tools.  IOC is part of .Net Core.  If you’re a programmer and you haven’t used IOC containers yet, or you don’t understand the underlying reason for it, you better get on the bandwagon.  I predict that within five years, IOC will be recognized as the industry standard, even for companies that build software for their internal use only.  It will be difficult to get a job as a programmer without understanding this technology.  Don’t believe me?  Pretend you’re a software engineer with no object oriented knowledge.  Now search for a job and see what results come up.  Grim, isn’t it?

Where am I going with this?  I currently work for a company that builds medical software.  We have a huge collection of legacy code.  I’m too embarrassed to admit how large this beast is.  It just makes me cry.  Our company uses the latest tools and we have advanced developers who know how to build IOC containers, unit tests, properly scoped objects, etc.  We also practice XP, to a limited degree.  We do the SCRUMs, stand-ups, code-reviews (sometimes), demos, and sprint planning meetings.  What we don’t do is unit testing.  Oh we have unit tests, but the company mandate is that they are optional.  When there is extra time to build software, unit tests are incorporated.  Only a small hand-full of developers incorporate unit tests into their software development process.  Even I have built some new software without unit tests (and I’ve paid the price).

The bottom line is that unit tests are not part of the software construction process.  The company is staffed with programmers that are unfamiliar with TDD (Test Driven Development) and in fact, most are unfamiliar with unit testing altogether.  Every developer has heard of unit test, but I suspect that many are not sold on the concept of the unit test.  Many developers look at unit testing as just more work.  There are the usual arguments against unit testing: They need to be maintained, they become obsolete, they break when I refactor code, etc.  These are old arguments that were disproved years ago, but, like myths, they get perpetuated forever.

I’m going to divert my the subject a bit here, just to show how crazy this is.

Our senior developers have gathered in many meetings to discuss the agreed upon architecture that we are aiming for.  That architecture is not much different from any other company: Break our monolithic application into smaller APIs, use IOC containers, separate database concerns from business and business from the front-end.  We have a couple dozen APIs and they were written with this architecture in mind.  They are all written with IOC containers.  We use Autofac for our .Net applications and .Net Core has it’s own IOC container technology.  Some of these APIs have unit tests.  These tests were primarily added after the code was written, which is OK.  Some of our APIs have no unit tests.  This is not OK.

So the big question is: Why go through the effort of using an IOC container in the first place, if there is no plan for unit tests?

The answer is usually “to break dependencies.”  Which is correct, except, why?  Why did anybody care about breaking dependencies?  Just breaking dependencies gains nothing.  The IOC container itself, does not help with the maintainability of the code.  Is it safer to refactor code with an IOC container?  No.  Is it easier to troubleshoot and fix bugs in code that has dependencies broken?  Not unless your using unit tests.

My only conclusion to this crazy behavior is that developers don’t understand the actual purpose of unit testing.

Unit Test are Part of the Development Process

The most difficult part of creating unit tests is breaking dependencies.  IOC containers make that a snap.  Every object (with some exceptions) should be put into the container.  If an object instance must be created by another object, then it must be created inside the IOC container.  This will break the dependency for you.  Now unit testing is easy.  Just focus on one object at a time and write tests for that object.  If the object needs other objects to be injected, then us a mocking framework to make mock those objects.

As a programmer, you’ll need to go farther than this.  If you want to build code that can be maintained, you’ll need to build your unit tests first or at least concurrently.  You cannot run through like the Tasmanian devil, building your code, and then follow-up with a hand-full of unit tests.  You might think you’re clever by using a coverage tool to make sure you have full code coverage, but I’m going to show an example where code-coverage is not the only reason for unit testing.  Your workflow must change.  At first, it will slow you down, like learning a new language.  Keep working at it and eventually, you don’t have to think about the process.  You just know.

I can tell you from experience, that I don’t even think about how I’m going to build a unit test.  I just look at what I’ll need to test and I know what I need to do.  It took me years to get to this point, but I can say, hands down, that unit testing makes my workflow faster.  Why?  Because I don’t have to run the program in order to test for all the edge cases.  I write one unit test at a time and I run that test against my object.  I use unit testing as a harness for my objects.  That is the whole point of using an IOC container.  First, you take care of the dependencies, then you focus on one object at a time.

Example

I’m sure you’re riveted by my rambling prose, but I’m going to prove what I’m talking about.  At least on a small scale.  Maybe this will change your mind, maybe it won’t.

Let’s say for example, I was writing some sort of API that needed to return a set of patient records from the database.  One of the requirements is that the calling program can feed filter parameters to select a date range for the records desired.  There is a start date and an end date filter parameter.  Furthermore, each date parameter can be null.  If both are null, then give me all records.  If the start parameter is null, then give me up to the end date.  If the end date is null, then give me from the start date to the latest record.  The data in the database will return a date when the patient saw the doctor.  This is hypothetical, but based on a real program that my company uses.  I’m sure this scenario is used by any company that queries a database for web use, so I’m going to use it.

Let’s say the program is progressing like this:

public class PatienData
{
  private DataContext _context;

  public List<PatientVisit> GetData(int patientId, DateTime ? startDate, DateTime ? endDate)
  {
    var filterResults = _context.PatientVisits.Where(x => x.BetweenDates(startDate,endDate));

    return filterResults.ToList();
  }
}

You don’t want to include the date range logic in your LINQ query, so you create an extension method to handle that part.  Your next task is to write the ugly code called “BetweenDates()”.  This will be a static extension class that will be used with any of your PatientVisit POCOs.  If you’re unfamiliar with a POCO (Plain Old C# Code) object, then here’s a simple example:

public class PatientVisit
{
  public int PatientId { get; set; }
  public DateTime VisitDate { get; set; }
}

This is used by Entity Framework in a context.  If you’re still confused, please search through my blog for Entity Framework subjects and get acquainted with the technique.

Back to the “BetweenDates()” method.  Here’s the empty shell of what needs to be written:

public static class PatientVisitHelpers
{
  public static bool BetweenDates(this PatientVisit patientVisit, DateTime ? startDate, DateTime ? endDate)
  {
    
  }
}

Before you start to put logic into this method, start thinking about all the edge cases that you will be required to test.  If you run in like tribe of Comanche Indians and throw the logic into this method, you’ll be left with a manual testing job that will probably take you half a day (assuming you’re thorough).  Later, down the road, if someone discovers a bug, you will need to fix this method and then redo all the manual tests.

Here’s where unit test are going to make your job easy.  The unit tests are going to be part of the discovery process.  What Discovery?  One aspect of writing software that is different from any other engineering subject is that every project is new.  We don’t know what has to be built until we start to build it.  Then we “discover” aspects of the problem that we never anticipated before.  In this sample, I’ll show how that occurs.

Let’s list the rules:

  1. If the dates are both null, give me all records.
  2. If the first date is null, give me all records up to that date (including the date).
  3. If the last date is null, give me all records from the starting date (including the start date).
  4. If both dates exist, then give me all records, including the start and end dates.

According to this list, there should be at least four unit tests.  If you discover any edge cases, you’ll need a unit tests for each edge case.  If a bug is discovered, you’ll need to add a unit test to simulate the bug and then fix the bug.  Which tells you that you’ll keep adding unit tests to a project every time you fix a bug or add a feature (with the exception that one or more unit tests were incorrect in the first place).  An incorrect unit test usually occurs when you misinterpret the requirements.  In such an instance, you’ll fix the unit test and then fix your code.

Now that we have determined that we need five unit tests, create five empty unit test methods:

public class PatientVisitBetweenDates
{
  [Fact]
  public void BothDatesAreNull()
  {

  }
  [Fact]
  public void StartDateIsNull()
  {

  }
  [Fact]
  public void EndDateIsNull()
  {

  }
  [Fact]
  public void BothDatesPresent()
  {

  }
}

I have left out the IOC container code from my examples.  I am testing a static object that has no dependencies, therefore, it does not need to go into a container.  Once you have established an IOC container and you have broken dependencies on all objects, you can focus on your code just like the samples I am showing here.

Now for the next step: Write the unit tests.  You already have the method stubbed out.  So you can complete your unit tests first and then write the code to make the tests pass.  You can do one unit test, followed by writing code, then the next test, etc.  Another method is to write all the unit tests and then write the code to pass all tests.  I’m going to write all the unit tests first.  By now, you might have analyzed my empty unit tests and realized what I meant earlier by “discovery”.  If you haven’t, then this will be a good lesson.

For the first test, we’ll need the setup data.  We don’t have to concern ourselves with any of the Entity Framework code other than the POCO itself.  In fact, the “BetweenDates()” method only looks at one instance, or rather, one record.  If the date of the record will be returned with the set, then the method will return true.  Otherwise, it should return false.  The tiny scope of this method makes our unit testing easy.  So put one record of data in:

[Fact]
public void BothDatesAreNull()
{
  var testSample = new PatientVisit
  {
    PatientId = 1,
    VisitDate = DateTime.Parse("1/7/2015")
  };
}

Next, setup the object and perform an assert.  This unit test should return a true for the data given because both the start date and the end date passed into our method will be null and we return all records.

[Fact]
public void BothDatesAreNull()
{
  var testSample = new PatientVisit
  {
    PatientId = 1,
    VisitDate = DateTime.Parse("1/7/2015")
  };

  var result = testSample.BetweenDates(null,null);
  Assert.True(result);
}

This test doesn’t reveal anything yet.  Technically, you can put code into your method that just returns true, and this test will pass.  At this point, it would be valid to do so.  Then you can write your next test and then refactor to return the correct value.  This would be the method used for pure Test Driven Development.  Only use the simplest code to make the test pass.  The code will be completed when all unit tests are completed and they all pass.

I’m going to go on to the next unit test, since I know that the first unit test is a trivial case.  Let’s use the same data we used on the last unit test:

[Fact]
public void StartDateIsNull()
{
  var testSample = new PatientVisit
  {
    PatientId = 1,
    VisitDate = DateTime.Parse("1/7/2015")
  };
}

Did you “discover” anything yet?  If not, then go ahead and put the method setup in:

[Fact]
public void StartDateIsNull()
{
  var testSample = new PatientVisit
  {
    PatientId = 1,
    VisitDate = DateTime.Parse("1/7/2015")
  };
  
  var result = testSample.BetweenDates(null, DateTime.Parse("1/8/2015"));
}

Now, you’re probably scratching your head because we need at least two test cases and probably three.  Here are the tests cases we need when the start date is null but the end date is filled in:

  1. Return true if the visit date is before the end date.
  2. Return false if the visit date is after the end date.

What if the date is equal to the end date?  Maybe we should test for that edge case as well.  Break the “StartDateIsNull()” unit test into three unit tests:

[Fact]
public void StartDateIsNullVisitDateIsBefore()
{
  var testSample = new PatientVisit
  {
    PatientId = 1,
    VisitDate = DateTime.Parse("1/7/2015")
  };
  var result = testSample.BetweenDates(null, DateTime.Parse("1/3/2015"));
  Assert.True(result);
}
[Fact]
public void StartDateIsNullVisitDateIsAfter()
{
  var testSample = new PatientVisit
  {
    PatientId = 1,
    VisitDate = DateTime.Parse("1/7/2015")
  };
  var result = testSample.BetweenDates(null, DateTime.Parse("1/8/2015"));
  Assert.False(result);
}
[Fact]
public void StartDateIsNullVisitDateIsEqual()
{
  var testSample = new PatientVisit
  {
    PatientId = 1,
    VisitDate = DateTime.Parse("1/7/2015")
  };
  var result = testSample.BetweenDates(null, DateTime.Parse("1/7/2015"));
  Assert.True(result);
}

Now you can begin to see the power of unit testing.  Would you have manually tested all three cases?  Maybe.

That also reveals that we will be required to expand the other two tests that contain dates.  The test case where we have a null end date will have a similar set of three unit tests and the in-between dates test will have more tests.  For the in-between, we now need:

  1. Visit date is less than start date.
  2. Visit date is greater than start date but less than end date.
  3. Visit date is greater than end date.
  4. Visit date is equal to start date.
  5. Visit date is equal to end date.
  6. Visit date is equal to both start and end date (start and end are equal).

That makes 6 unit test for the in-between case.  Bringing our total to 13 tests.

Fill in the code for the remaining tests.  When that is completed, verify each test to make sure they are all valid cases.  Once this is complete, you can write your code for the helper method.  You now have a complete detailed specification for your method written in unit tests.

Was that difficult?  Not really.  Most unit tests fall into this category.  Sometimes you’ll need to mock an object that your object under test depends on.  That is made easy by the IOC container.

Also, you can execute your code directly from the unit test.  Instead of writing a test program to send inputs to your API, or using your API in a full system where you are typing data in manually, you just execute the unit test you are working with.  You type in your code, then run all the unit tests for this method.  As you create code to account for each test case, you’ll see your unit tests start to turn green.  When all unit tests are green, you’re work is done.

Now, if Quality finds a bug that leads to this method, you can reverify your unit tests for the case that QA found.  You might discover a bug in code that is outside your method or it could have been a case missed by your unit tests.  Once you have fixed the bug, you can re-run the unit tests instead of manually testing each case.  In the long run, this will save you time.

Code Coverage

You should strive for 100% code coverage.  You’ll never get it, but the more code you can cover, the safer it will be to refactor code in the future.  Any code not covered by unit tests is at risk for failure when code is refactored.  As I mentioned earlier, code coverage doesn’t solve all your problems.  In fact, if I wrote the helper code for the previous example and then I created unit tests afterwards, I bet I can create two or three unit tests that covers 100% of the code in the helper method.  What I might not cover are edge cases, like the visit date equal to the start date.  It’s best to use code coverage tools after the code and unit tests are written.  The code coverge will be your verfication that you didn’t miss something.

Another problem with code coverage tools is that it can make you lazy.  You can easily look at the code and then come up with a unit test that executes the code inside an “if” statement and then create a unit test to execute code inside the “else” part.  The unit tests might not be valid.  You need to understand the purpose of the “if” and “else” and the purpose of the code itself.  Keep that in mind.  If you are writing new code, create the unit test first or concurrently.  Only use the code coverage tool after all your tests pass to verify you covered all of your code.

Back to the 20,000 foot View

Let’s take a step back and talk about what the purpose of the exercise was.  If you’re a hold-out for a world of programming without unit tests, then you’re still skeptical of what was gained by performing all that extra work.  There is extra code.  It took time to write that code.  Now there are thirteen extra methods that must be maintained going forward.

Let’s pretend this code was written five years ago and it’s been humming along for quite some time without any bugs being detected.  Now some entry-level developer comes on the scene and he/she decides to modify this code.  Maybe the developer in question thinks that tweaking this method is an easy short-cut to creating some enhancement that was demanded by the customer.  If the code is changed and none of the unit tests break, then we’re OK.  If the code is changed and one or more unit tests breaks, then the programmer modifying the code must look at those unit tests and determine if the individual behavoirs should be changed, or maybe those were broken because the change is not correct.  If the unit tests don’t exist, the programmer modifying the code has no idea what thought process and/or testing went into the original design.  The programmer probably doesn’t know the full specification of the code when it was written.  The suite of unit tests make the purpose unambiguous.  Any programmer can look at the unit tests and see exactly what the specification is for the method under test.

What if a bug is found and all unit tests pass?  What you have discovered is an edge case that was not known at the time the method was written.  Before fixing the bug, the programmer must create a unit test with the edge case that causes the bug.  That unit test must fail with the current code and it should fail in the same manner as the real bug.  Once the failing unit test is created, then the bug should be fixed to make the unit test pass.  Once that has been accomplished, run all unit tests and make sure you didn’t break prevous features when fixing the bug.  This method ends the whack-a-mole technique of trying to fix bugs in software.

Next, try to visualize a future where all your business code is covered by unit tests.  If each class and method had a level of unit testing to the specification that this mehod has, it would be safe and easy to refactor code.  Any refactor that breaks code down the line will show up in the unit tests (as broken tests).  Adding enhancements would be easy and quick.  You would be virtually guarenteed to produce a quality product after adding an enhancement.  That’s because you are designing the software to be maintainable.

Not all code can be covered by unit tests.  In my view, this is a shame.  Unfortunately, there are sections of code that cannot be put into a unit test for some reason or another.  With an IOC container, your projects should be divided into projects that are unit testable and projects that are not.  Projects, such as the project containiner your Entity Framework repository, are not unit testable.  That’s OK, and you should limit how much actual code exists in this project.  All the code should be POCO’s and some connecting code.  Your web interface should be limited to code that connects the outside world to your business classes.  Any code that is outside the realm of unit testing is going to be difficult to test.  Try to limit the complexity of this code.

Finally…

I have looked over the shoulder of students building software for school projects at the University of Maryland and I noticed that they incorporated unit testing into a Java project.  That made me smile.  While the project did not contain an IOC container, it’s a step in the right direction.  Hopefully, withing the next few years, universities will begin to produce students that understand that unit tests are necessary.  There is still a large gap between those students and those in the industry that have never used unit tests.  That gap must be filled in a self-taught manner.  If you are one of the many who don’t incorporate unit testing into your software development process, then you better start doing it.  Now is the time to learn and get good at it.  If you wait too long, you’ll be one of those Cobol developers that wondered who moved their cheese.

 

What .Net Developers Should Know about MS SQL and IIS

Summary

In this post, I’m going to explain a couple techniques and tools that every developer should know.  If you are just graduating from college and you are looking for a .Net job, learn these simple techniques and advance your career.  If you’re a mid-level to advanced .Net developer and you don’t know these tools and techniques yet, learn them.  These tools and techniques will save you time and give you the edge to building better software.

SQL Profiler

Let’s assume you have built this outstanding program.  It’s a work of art.  It uses .Net Core 2.0 with IOC containers, Entity Framework Core 2.0 and other advanced technologies.  You’re testing your web-based software and you notice a pause when you click on a button or a page is loading.  The first thing that pops into my mind is: What is causing the slowdown?  Is it the database or the IIS server?  Finally, what can I do about it?

Let’s eliminate or confirm the database.  If you have installed the profiler tool in SQL (I’m going to assume you did, otherwise you’ll need to start the installer and install this tool), then go to the Tools menu and select SQL Server Profiler.  A new window will open and you’ll need to connect to your database instance as though you are opening the management studio itself.  Once you open the profiler, it’s time to execute that page that you are having issues with.  You can click on the stop button and use the eraser to clean up any records that have shown in the window already.  Get to the point where you are about to click the button to your web page.  Then hit the run button in the profiler and hit the web page.  Once the page loads, then hit the stop button in your profiler so nothing else will record.  Now you have records to analyze.  You’ll be surprised at the number of database calls EF will perform.

I used this project as a demo for the screenshot coming up:

https://github.com/fdecaire/MVCWithAutoFacIOC

Download the project, run the database create script (store_product_database_setup.sql) and run the application.  You should see something like this:

As you can see there is one call to the database and you can see the “select” command.  Click on the “BatchStarting” line and notice the detail in the window at the bottom:

Now you can scrape that query and paste it into the Server Management Studio and test the exact query that your Entity Framework is sending to SQL:

This will indicate if you are querying for too many fields, or the total records queried are crazy.  If you discover that your query result was a million records and your interface only shows the top 10 records, then you need to tweak your LINQ query to only ask for 10 records.  In this tiny example we have three records.  Let’s make it ask for 2 records.  Here’s my original LINQ query:

(from s in _db.Stores select s).ToList();

I changed it to:

(from s in _db.Stores select s).Take(2).ToList();

Re-run the program, capture the data in profiler and this is what I get:

Notice the “Top(2)” difference in the query.  This is the kind of performance tuning you should be aware of.  It’s very easy to create C# code and LINQ queries, only to never understand what is really going on behind the scenes.  Entity Framework takes your LINQ query and turns it into a string that represents a SELECT query and transmits that to MS SQL.  Then MS SQL queries the database, returns the results so that EF can turn it back into a list of objects.  With SQL profiler, you can get into the nuts and bolts of what is really going on and I would recommend you run profiler at least once after you have built your software and think it is ready for prime-time.  If you see a query pause, copy the profile SQL query into management studio and see if you can speed up the query and get the query results that you need.

Another tool you can use is the estimated execution plan tool.  The toolbar button looks like this:

This tool will break your query down into the pieces that will be executed to form the results.  In the case of my previous query, there is only one piece:

That piece of the query costs 100% of the execution time.  If your query included a union and maybe some sub-queries, this tool is very useful in determining which part of the query is costing you the most processing cycles.  Use this tool to decide which part of your query you want to focus your energy on.  Don’t waste time trying to optimize the portion of your query that only takes 2% of the execution time.  Maybe you can get that to go twice as fast, but the overall query will only be about 1% faster than before.  If you focus your energy on a section that takes 98% of your execution time, then you’ll be able to boost the performance in a noticeable manner.

Web Server Troubleshooting

If you’re using IIS, some day you’ll run into a problem where you don’t get any logs and your website or API crashes immediately (usually a 500 error).  This is always a difficult problem to troubleshoot, until you realize that there are only a hand-full of problems that cause this.  The most common problem is an issue with the XML formatting in your web.config file.  I can’t tell you how many times I have been bit by this problem!  The easiest way to test and troubleshoot this error is to open the IIS manager control panel, select your website and then click on one of the icons that displays a section of your web.config file, like “Modules”:

If there is an error, then the line number in the web.config file will be shown.  You’ll be able to look at the xml in the web.config and see your missing tag, extra quote or other symbol (sometimes it’s an “&”, “>” or “<” symbol inside your database connection string password or something).  Fix the web.config issue and go back to modules again.  If there is another error, then fix it and return again, until it works.

On .Net Core, there is an error log that can report startup errors.  This logging happens before your log program starts and is very handy for situations where you don’t get any logging.  When you publish your site to a folder in .Net Core (or Core 2.0), you’ll end up with a bunch of dlls, some config files and a web.config file.  The web.config file is mostly empty and might look like this:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
 <system.webServer>
 <handlers>
 <add name="aspNetCore" path="*" verb="*" modules="AspNetCoreModule" resourceType="Unspecified" />
 </handlers>
 <aspNetCore processPath="dotnet" arguments=".\Website.dll" stdoutLogEnabled="false" stdoutLogFile=".\logs\stdout" />
 </system.webServer>
</configuration>

Change your “stdoutLogFile” parameter to point to a file location that you can find.  I usually set mine to “C:\logs\myapplication_logging_error.txt” or something like that.  Then I run the program until it crashes and check in the c:\logs directory to see if the file exists.  If it does, it usually contains information about the crash that can be used to troubleshoot what is going on.  I’m assuming at this point in your troubleshooting, the website or API works from Visual Studio and you are having issues with the deployed application.  If you are having issues with executing your application in Visual Studio, you should be able to zero in on the error in VS using breakpoints and other tools.

For NLog there is an error log for the error log.  This is where the errors go when there is an error detected in the NLog code.  Usually caused by a configuration error.  At the top of your nlog.config file should be something like this:

<?xml version="1.0" encoding="utf-8" ?>
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 autoReload="true"
 internalLogLevel="Warn"
 internalLogFile="C:\logs\myapplication_nlog.txt">

As you can see, there is an “internalLogFile” parameter.  Set this to a location and filename that you can find.  Then execute your program and see if that log file appears.  If it does, then open it up and examine the contents.  Usually it’s an exception error involving the NLog logger.

Application Pools

The application pool type must match the .Net version that you are using.  For .Net Core and Core 2.0, you’ll need to use “no managed code”.  For .Net, you’ll set your pool to “.Net Framework v4.0”.  If you are unfortunate enough to be using really obsolete legacy code, you can set it to “.Net Framework v2.0”.

When you deploy your Core 2.0 application, you may need to cycle the application pool.  You’ll run into a problem where you deploy new code and the “bug fix” or “new feature” does not show up.  Cycle the app pool and see if it fixes your problem.  The application pool might be using a copy of the previous program in memory.

Sites

Inside your site settings, there is an ASP settings section:

You can turn on debugging flags inside here.  For your local development system, turn it all on:

Leave these settings off for any production system.  You do not want to advertise the line numbers of your code (and your code) when you have a crash.  For development purposes, you want to know all the data you can get from an exception.  If these settings are on, you’ll get the full dump of the exception message in your browser and then you can return to Visual Studio to find the exact line of code where the error occurred.

I hope this saves you some time when a frustrating deployment problem occurs.  The trick is to learn the most common causes of deployment problems so you don’t spend all day troubleshooting a tiny error instead of working on that next big chunk of code.

 

 

 

 

Agile Programming with Kanban

Summary

In this post I’m going to discuss the differences between waterfall and agile methods of programming.  Then I’m going to focus more on Kanban and what advantages and disadvantages it has in comparison to Scrum.  I will also be discussing the business case for using Kanban and how it can improve the performance of developing software and reducing waste.

Producing Software

Producing software costs money.  That is the bottom line.  The largest cost to creating software is the cost of man-hours or how much money is paid to the analysts, developers and quality people to create the software.  After the software has been written and delivered, then the primary cost becomes the operating costs.  Operating costs can break down into licensing fees for database and operating systems as well as hardware or hosting fees.  There will also be labor expenses tied up in support such as IT personnel, help-desk and training personnel.  I’m going to focus on the cost it takes to create the software.  As a seasoned software architect, my job is to analyze and design the software with an eye for reducing the cost of operating the software after it is delivered.  That is a subject for another blog post.  In this post I’m concerned about the process of creating the software with an eye toward reducing wasted time and money.

Waterfall Method

Everyone has heard of the waterfall method.  It’s the most obvious way to design and build software.  Basically a team of software analysts communicate with the customer to determine a list of features required/desired for the software to be built.  They will then turn this into a requirements document.  Once the requirements document has been agreed upon, a massive design effort is launched.  This can take months and for grand projects it can take years.  The project is designed down to the smallest detail so that it can be estimated.  Estimating the amount of man-hours it takes to perform the programming can be done by analysts or, if the developers are already working for the company performing the work, they can perform the estimation (it’s preferable to obtain estimates from the developers that will actually work on the project because then you can get buy-in).  Next, the negotiations start with the customer.  The customer will most certainly decide that they can’t afford the “magic wand”1 version of the software that they described and will begin to reduce the features to get the price down to something reasonable.

Once a contract is signed, then the specifications are thrown over the wall (figuratively) to the developers.  I’ve seen companies that use GAANT2 charts with critical paths detailing when each feature will be scheduled, but most companies just hand over the spec with the idea that developers can just divide-and-conquer.  Then months and probably years are spent developing the software.  There are usually demonstrations of what has been created to keep the customer from cancelling the project.  There is nothing worse than paying for a product and not hearing anything about the product until a year later when the whole thing is delivered.  The customer will want progress reports.  Is it on schedule?  Hopefully, for the customer’s sake, they paid a flat-rate price for the software and will not have to shell out more money if the developers cannot produce the software by the deadline indicated in the estimates.  Otherwise, more than likely, the project will come in over budget and late.

Once the developers have completed their tasks, quality can start testing the whole project.  QA should be building test scripts while the developers are creating the code.  Otherwise, there should be no QA people on the project until they are about ready to start testing.  Once the QA people have completed their tasks and the developers have fixed any bugs found, then it’s time for the release.

This is the waterfall method in a nutshell.

Problems with Waterfall

The biggest problem with the waterfall method is that it is wasteful.  Many projects get cancelled before completion and the customer is left with nothing.  If the tasks are not scheduled to ensure that sections of the software are workable, then there is no way to cut off development and deliver a working product with fewer features.  Also, if a project is cut because of cost overruns, then there is still QA work to be done.  Finally, the software analysts and designers must be paid for designing the entire project before any software is built.

Next, there are months where no software is being written.  This is the analysis, design and estimation phase.  The whole thing must be estimated before an agreement can be signed.  That means that time is burning while this whole process is going on and development doesn’t start until all of this work is completed.

Usability is difficult to perform before the software is built.  Usability is expensive to fix after the whole project is complete (designed, built and QA’d).  The best method of ensuring usability is cheap and effective is to test the usability as the software is being built.  This is not something that waterfall can accommodate without difficulty.  It would require very detailed scheduling and changes would loop back to a lot of changes in the design that has already been completed.  In practice, waterfall does not support effective usability testing.

The Agile Method

The theory behind the agile method is that the software is only roughly specified up front.  Then the most critical and important parts of the software are designed and estimated first.  Usually only a week or month’s worth of software is designed at a time.  Then the design is estimated by the developers who will start work immediately.  Once a few pieces of the software (usually called stories) are completed, they are QA’d and then deployed to a limited audience (the customer).  The customer will then review what is being demonstrated and normally they get hands-on access to test what has been created.  If the piece of software is large enough to be useful, the customer can start using the software.  A critique of the software can be fed back to the analysts to have some of the usability problems fixed in a future release.  Meanwhile the design team is working on the next specifications or stories to be put in the hopper.  At the same time the developers are working on the previously designed and estimated stories.  QA is working on the previously finished stories and the quality checked pieces are collected up for the next deployment.  This continues like a factory.

The stories need to be produced in the order that the customer demands.  So they can put off minor enhancements to the end and have working software as early as possible.  If the customer decides that they have spent enough money and the product is good enough, then they can cut off the project and walk away with a product that is usable.  This reduces waste.

Benefits over Waterfall

It’s almost obvious what the benefits are:

  • Working software is delivered right away.
  • There is a short startup time when the designers create the first stories.  The developers start working right away instead of waiting months or years to start.
  • The customer is more involved in the creation of the product.  Instant feed-back about usability problems can help fix the problem before the developers have forgotten what they’ve worked on.
  • The customer can cut off the project at any time and walk away with a functioning product.
  • The customer, theoretically, could start using their product before it is finished.
  • Re-prioritizing features is quick and easy since developers don’t just grab any story at any time.  The customer has full control on when features are developed.

Scrum vs. Kanban

There are several methods of running an agile development team, but two are near the top: Kanban and Scrum.  Scrum is a method where there is a fixed amount of time that developers will work on a set of stories.  This is called the sprint and it may last two, three or four weeks.   The sprint is usually a fixed time frame that is repeated throughout the project, and they are usually numbered.  For example: The analysts/designers will group stories into sprint 1 until they have filled two weeks worth of work for the team that will work on the software.  Then they will start to fill in sprint 2, etc.  In a Scrum, there are daily-standup meetings where the “team” discusses the progress since the previous standup.  The idea for a standup is that everyone stands and the meeting is limited to reporting progress, issues and blockers.  If a one-on-one discussion lasts more than a few minutes, it must be taken off-line because it is wasting the time of the entire team to sort out a problem that can be solved by two people.  Scrums provide an environment where the team knows what everyone else is working on for each sprint.

Stories that are completed are immediately passed on to QA during the sprint.  The QA personnel are included as part of the “team” and they attend the daily standup meetings as well.  At the end of the sprint the stories that are complete and QA’d are deployed and demonstrated to the customer who can give feed-back.  Any changes can be put into the next sprint or a future sprint.

Kanban is a bit different.  Kanban is broken down like a factory.  There are “lanes” where the stories will go when they are moved from the design phase to the deployment phase, like a pipeline.  Analysts/designers will put the stories into the backlog lane.  Once the stories are in the backlog, developers can immediately pick one story up and work on it.  The developer will move the story to their development lane.  When the story is complete, it is moved into the “To be QA’d” lane.  Then a QA person can pull a story out of that lane and put it in their “QA” lane.  After the QA person has completed their task, the story can be placed into the “To be Deployed” lane.  When enough stories are in that lane, they can be deployed to be reviewed by the customer.  Technically, each story can and should be deployed immediately.  This can be accomplished by an automated deployment system.

In the case where QA discovers a bug, the story must be moved back into the developer’s lane.  The developer must fix the bug as soon as he/she can and get it back into the “To be QA’d” lane.  There can be limits set to each lane to reduce the amount of allowed work in progress or WIP.  The WIP count controls the flow of stories going through the pipeline.  As you can see, Kanban is setup as a just-in-time delivery system.  The customer can literally end a project as soon as they decide they have enough features and the whole assembly line can stop without much waste in the system.  Scrum can have up to two-weeks (or three or four depending on the sprint size) worth of waste in the system when a cut-off occurs.  Keep in mind that scrum is still extremely efficient compared to waterfall and we are only splitting hairs over how much waste exists if a Kanban or Scrum project is cancelled.

Kanban

I’m going to focus on potential issues that can occur with Kanban.  First, it can be difficult to determine when a feature will be delivered.  This happens for items that are sitting in the backlog lane.  Items in the backlog can be re-prioritized until they are picked up and work begins.  Once work begins, the story must be completed (unless there is some circumstance that warrants the stoppage of work on a particular story).  If a story is at the top of the back-log an estimate of when it will get completed can only be determined when it is picked up by a developer.  Adjusting the WIP count to be low can make it easier to estimate when the story will go through the pipeline, but assumptions must be made about the timing of existing stories.  If the WIP count is high, then developers might have several stories in their lane at one time.  Now you’re probably scratching your head and thinking “why would a developer have two or more stories in their lane at a time?”  This situation usually happens when there is something that is blocking the completion of a story.  Maybe there is a question about how the story will be implemented.  The developer is waiting for an analyst to make a decision on how to proceed.  In such an instance, it is best for the developer to pickup the next story and start working on that one.  Developers are responsible for clearing their lane before picking up another story unless there is something blocking a story.  In other words, no non-blocked stories can sit in the developer’s lane.

Straight Kanban assumes that all developers are minimally qualified to work on every task.  That is not usually realistic and there are work-arounds for shops that have specialized developers.  First, Kanban can be setup with multiple backlog lanes.  Let’s pretend that your shop has back-end developers and front-end developers3.  As stories are created by analysts/designers, they can be divided and organized by front-end work vs. back-end work and placed in the appropriate backlog lane.  Back-end developers will pull stories from the back-end backlog lane and so on.  Of course there is the scheduling problem where now back-end developers must finish an API before a front-end programmer can consume the API for their interface.  This can be mitigated by breaking off a story that creates the API “shell” program with dummy data that responds to requests from the front-end.  Then the front-end developer can start consuming the API data before the back-end developer has completed the API.  Both lanes must be monitored to ensure that back-end programming lines up with front-end programming.  Otherwise, there could be situations where the front-end programmers have no API to consume and the software to be QA’d and deployed is not usable.  For Scrum, the back-end programming and front-end can be staggered in different sprints to ensure that the APIs are completed before the front-end programmers start programming.  This technique can also be used in Kanban by starting the back-end group ahead of the front-end group of programmers.

As you can tell there is usually no daily standup for Kanban.  There’s no need.  Each individual developer can meet with the person they need to in order to complete their work.  Meetings can still be held for kick-offs or for retrospectives.  I would recommend a retrospective for every project completed, Scrum or Kanban.

One last feature of Kanban that is more accommodating is the idea of throwing more resources at the problem.  When there is a team working on a problem there are multiple communication paths to consider.  Each team member must be made aware of what everyone else is working on.  In Kanban the idea is to design the stories to be independent of each other.  If a developer can focus on one story at a time, then adding a new developer to the mix is easy.  The new developer will just pickup a story and start working on it4.

Pitfalls to Watch For

Here’s a big pitfall with the agile method that must be headed off early on:  In an agile workshop the entire project is not estimated up front.  The customer wants to know what it would cost for the whole product.  In fact, most customers want a cafeteria style estimate of what every feature will cost so they can pick and choose what they want.  What the customer does not want is to pay for a month’s worth of work and then wonder how many more months it will take to get to a viable product.  It would be nice to know ahead of time how long it will take and how much it will cost.  To accommodate this, agile shops must be able to give a rough estimate of the entire project without designing the entire project.  In fact the product does not have to be designed down to the nth degree to get an estimate.  Also, an estimate on a product that is designed to the tiniest feature is not more accurate than an over-all rough estimate.  Confidence level is something that should always be taken into consideration in software design.  As projects get larger, the confidence level drops lower and lower.  The confidence level does not increase just because the product is designed down to the detail.  Don’t believe me?  Search for any large government software project that was cancelled or over-budget an you’ll discover that these projects missed their marks by 100 to 200% or more.  They are always underestimated.  Those projects are designed to the intimate detail.  The problem with software design is that there are always so many unknowns.

Create a rough design.  List the features and give a rough estimate for each feature.  Add some time to features that are really gray in definition.  Tighten your estimates for features that you know can be built in “x” amount of time.  This estimate can be used for a contract to “not exceed…” “x” amount of months or “x” amount of resources.  When the project is about to run up against the end, the customer must be made aware of the short-fall (if there is any).  Normally a shortfall will occur because a customer thinks up a feature that they need while the project is in progress.  This is additional work that can be inserted into the work-flow and preempt one of the lower priority features or the customer can agree to an extension of the project.  Sometimes a feature takes longer than the estimate and the customer should be notified of each feature that went over budget.

Customers can also be A.D.D. when it comes to deciding which stories to stuff in the backlog.  The backlog queue can churn like a cauldron of stories causing the scheduling of features to be delivered to be unknown.  If the customer is OK with the unknown delivery time, then the churn does not effect the development staff.  However, if stories are pulled out of the work lanes, then problems can start.  Shelving unfinished code can be hazardous.  Especially if a story is shelved for a month and then put back in play.  By that time the un-shelved code my not work with the current code-base and must be reworked, causing the estimate for the story to go long.

Documentation

I would recommend a wiki for the project.  The wiki should contain the design specifications and changes as they are made.  If you are using a product such as Confluence and Jira, you can use the forum feature to add questions to a story and follow up answers.  This becomes your documentation for the software.  If you add developers, they can read through the notes on what is going on and get a good idea of why the software was built the way it was built.  This documentation should be maintained as long as the software is in production.  Future development teams could use this documentation to see what ideas went into the original design.  When an enhancement is added, the notes for the enhancement should be appended to this documentation for future developers to refer to.  This documentation can also provide witness testimony for any disputes that occur between the customer and the entity developing the software.

Notes

  1. The term “Magic Wand” refers to the idea of: What would the customer want if they had a “Magic Wand” and could have every feature right now for free.
  2. GANNT charts and critical path methodology is used in physical construction projects.  Many people try to visualize software development as a “construction” project, like building a house.  Unfortunately, the methodology does not fit software design because every software project is like inventing something new, where building a house is so methodical that there are books full of estimates for each task to be performed.  GANNT charts are used for home construction, assembly line theory fits software development more accurately.
  3. A typical shop with a large number of developers will contain experts in database design, front-end advanced developers, entry-level front-end developers, back-end developers (which are usually API experts) and other specialized developers.  In such a situation scheduling can get a bit dicey, but the same work-arounds apply.
  4. In practice this technique should always work.  In the real-world there are pieces of the puzzle that are dependent on other pieces that are already completed.  A new developer will need some ramp-up time to get into the flow of what is being built.  This can also slow down existing developers who must explain what is going on.

 

 

Automated Deployment with .Net Core 2.0 Unit Tests

If you’re using an automated deployment system or continuous integration, you’ll need to get good at compiling and running your unit tests from the command line.  One of the issues I found with .Net Core was the difficulty in making xUnit work with Jenkins.  Jenkins has plug-ins for different types of unit testing modules and support for MSTest is easy to implement.  There is no plug-in that makes xUnit work in Jenkins for .Net Core 1.  There is a plug-in for nUnit that works with the xUnit output if you convert the xml tags to match what is expected by the plug-in.  That’s where this powershell script becomes necessary:

https://blog.dangl.me/archive/unit-testing-and-code-coverage-with-jenkins-and-net-core/

If you’re attempting to use .Net Core 1 projects, follow the instructions at the link to make it work properly.

For .Net Core 2.0, there is an easier solution.  There is a logger switch that allows you to output the correct xml formatted result file that can be used by the MSTest report runner in Jenkins.  You’ll need to be in the directory containing the project file for the unit tests you want to run, then execute the following:

dotnet test --logger "trx;LogFileName=mytests.trx"

Run this command for each unit test project you have in your solution and then use the MSTest runner:

This will pickup any trx files and display the familiar unit test line chart.

The dotnet-test command will run xUnit as well as MSTest so you can mix and match test projects in your solution.  Both will produce the same formatted xml output trx file for consumption by Jenkins.

One note about the powershell script provided Georg Dangl:

There are environment variables in the script that are only created when executed from Jenkins.  So you can’t test this script from outside of the Jenkins environment (unless you fake out all the variables before executing the script).  I would recommend modifying the script to convert all the $ENV variables into a parameter passed into the script.  From Jenkins the variable names would be the same as they are in the script (like $ENV:WORKSPACE), but you can pass in a workspace url to the script if you want to tests this script on your desktop.  Often times I’ll test my scripts on my desktop/laptop first to make sure the script works correctly.  Then I might test it on the Jenkins server under my user account.  After that I test from the Jenkins job itself.  Otherwise, it could take a lot of man-hours to fix a powershell script from re-running a Jenkins job just to test the script.

 

 

Deploying Software

Story

I’ve worked for a lot of different companies.  Most of them small.  Several of the companies that I have worked for have had some serious growth in their user base.  Every company I have worked for seem to follow same path from start-up to mid-sized company.  Start-ups usually staffed by amateur programmers who know how to write a small program and get it working.  Inevitably the software becomes so large that they are overwhelmed and have no clue how to solve their deployment problems.  Here are the problems that they run into:

  1. The customers become numerous and bugs are reported faster than they can fix them.
  2. Deployments become lengthy and difficult.  Usually causing outages after deployment nights.
  3. Regression testing becomes an overwhelming task.
  4. Deployments cause the system to overload.
  5. Keeping environments in-sync becomes overwhelming.

Solutions

This is where continuous integration techniques come into play.  The first problem can be tackled by making sure there is proper logging of system crashes.  If there is no log of what is going on in your production system, then you have a very big problem.

Problem number two is one that can be easy to solve if it is tackled early in the software development phase.  This problem can only be solved by ensuring everyone is on-board with the solution.  Many companies seem to double-down on manual deployments and do incredibly naive things like throwing more people at the problem.  The issue is not the labor, the issue is time.  As your software grows, it becomes more complex and takes more time to test new enhancements.  Performing a scheduled deployment at night is a bad customer experience.  The proper way to deploy production is to do it in the background.

One method of performing this task is to create new servers to deploy the software to and test the software before hooking the servers into your load-balancer.  The idea is to automate the web server creation process, install the new software on the new servers and then add them to the load-balancer with the new features turned off.  The new software needs to be setup to behave identical to the old software when the new features are not turned on.  Once the new servers are deployed, the old servers are removed from load-balancing one at a time until they have been replaced.  During this phase, the load of your servers need to be monitored (including your database servers).  If something doesn’t look right, you have the option to stop the process and roll-back.

Database changes can be the challenging part.  You’ll need to design your software to work properly with any old table, view, stored procedure designs as well as the new ones.  Once the feature has been rolled out and turned on, a future clean-up version can be rolled out (possibly with the next feature release) to remove the code that recognizes the old tables, views, stored procedures.  This can also be tested when new web servers are created and before they are added to the web farm.

Once everything has been tested and properly deployed the announcement that a new feature will be released can be made, followed by the switch-on of the new feature.  Remember, everything should be tested and deployed by the time the new feature is switched on.  If you are running a large web farm with tens of thousands (or more) of customers, you may want to do a canary release.  A canary release can be treated like a beta release, but it doesn’t have to.  You randomly choose 5% of your customers and switch on the feature on early in the day that the feature is to be released.  Give it an hour to monitor and see what happens.  If everything looks good, add another 5% or 10% of your customers.  By the time you switch on 20% of your customers you should feel confident enough to up it to 50%, then follow that by 100%.  All customers can be switched on within a 4 hour period.  This allows enough time to monitor and give a go or no-go on proceeding.  If your bug tracking logs are reporting an uptick in bugs when you switched on the first 5%, then turn it back off and analyze the issue.  Fix the problem and proceed again.

I’ve heard the complaint that canary release is like a beta program.  The first 5% are beta testing your software.  My answer to that is: If you are releasing 100% of your customers at the same time, doesn’t that mean that all your customers are beta testers?  Let’s face the facts, the choice is not between different versions of the software.  The choice is between how many people will experience the software you are releasing, 5% or 100%.  That’s why I advocate random customer selection.  The best scenario rotates the customers each release so that each customer will be in the first 5% only one it twenty releases.  That means that every customer shares the pain 1/20th of the time instead of a 100% release where every customer feels the pain every time.

Regression Testing

Regression testing is something that needs to be considered early in your software design.  Current technology provides developers with the tools to build this right into the software.  Unit testing, which I am a big advocate of, is something that needs to be done for every feature released.  The unit tests must be designed with the software and you must have adequate code coverage.  When a bug is found and reported, a unit test must be created to simulate this bug and then the bug is fixed.  This gives you regression testing ability.  It also gives a developer instant feed-back.  The faster a bug is reported, the cheaper it is to fix.

I have worked in many environments where there is a team of QA (Quality Assurance) workers who manually find bugs and report them back to the developer assigned to the enhancement causing the bug.  The problem with this work flow is that the developer is usually on to the next task and is “in-the-zone” of the next difficult coding problem.  If that developer needs to switch gears, shelve their changes, fix a bug and deploy it back to the QA environment, it causes a slowdown in the work flow.  If the developer checks in their software and the build server catches a unit test bug and reports it immediately, then that developer will still have the task in mind and be able to fix it right there.  No task switching is necessary.  Technically many unit test bugs are found locally if the developer runs the unit tests before check-in or if the system has a gated check-in that prevents bad builds from being checked in (then they are forced to fix their error before they can continue).

Load Testing

When your software becomes large and the number of customers accessing your system is large, you’ll need to perform load testing.  Load testing can be expensive, so young companies are not going to perform this task.  My experience with load testing is that it is never performed until after a load-related software deployment disaster occurs.  Then load testing seems “cheap” compared to hordes of angry customers threatening lawsuits and cancellations.  To determine when your company should start load-testing, keep an eye on your web farm and database performances.  You’ll need to keep track of your base-line performances as well as the peaks.  Over time you’ll see your server CPU and memory usage go up.  Keep yourself a large buffer to protect from a bad database query.  Eventually your customer size will get to a point where you need to load test before deployments because unpredictable customer behavior will overwhelm your servers in an unexpected manner.  Your normal load will ride around 50% one day, and then, because of year-end reporting, you wake up and all your servers are maxed out.  If it’s a web server load problem, that is easy to fix: Add more servers to the farm (keep track of what your load-balancer can handle).  If it’s a database server problem, you’re in deep trouble.  Moving a large database is not an easy task.

For database operations, you’ll need to balance your databases between server instances.  You might also need to increase memory or CPUs per instance.  If you are maxed out on the number of CPUs or memory per instance, then you are left with only one choice: Moving databases.  I could write a book on this problem alone and I’m not a full-time database person.

Environments

One issue I see is that companies grow and they build environments by hand.  This is a bad thing to do.  There are a lot of tools available to replicate servers and stand up a system automatically.  What inevitably happens is that the development, QA, staging and production environments get out of sync.  Sometimes shortcuts are taken for development and QA environments and that can cause software to perform differently that in production.  This guarantees that deployments will go poorly.  Configure environments automatically.  Refresh your environments at regular intervals.  Companies I have worked for don’t do this enough and it always causes deployment issues.  If you are able to built a web-farm with the click of a button, then you can perform this task for any environment.  By guaranteeing each environment is identical to production (except on a smaller scale), then you can find environment specific bugs early in the development phase and ensure that your software will perform as expected when it is deployed to your production environment.

Databases need to be synchronized as well.  There are tools to sync the database structure.  This task needs to be automated as much as possible.  If your development database can be synced up once a week, then you’ll be able to purge any bad data that has occurred during the week.  Developers need to alter their work-flow to account for this process.  If there are database structure changes (tables, views, functions, stored procedures, etc.) then they need to be checked into version control just like code and the automated process needs to pickup these changes and apply them after the base database is synced down.

Why spend the time to automate this process?  If your company doesn’t automate this step, you’ll end up with a database that has sat un-refreshed for years.  It might have the right changes, it might not.  The database instance becomes the wild west.  It will also become full of test data that causes your development processes to slow down.  Many developer hours will be wasted trying to “fix” an issue caused by a bad database change that was not properly rolled back.  Imagine a database where the constraints are out of sync.  Once the software is working on the development database, it will probably fail in QA.  At that point, it’s more wasted troubleshooting time.  If your QA database is out of sync?  Yes, your developers start fixing environment related issues all the way up the line until the software is deployed and crashes on the production system.  Now the development process is expensive.

Other Sources You Should Read

Educate yourself on deployment techniques early in the software design phase.  Design your software to be easy and safe to deploy.  If you can head off the beast before it becomes a nightmare, you can save yourself a lot of time and money.  Amazon has designed their system around microservices.  Their philosophy is to keep each software package small.  This makes it quick and easy to deploy.  Amazon deploys continuously at a rate that averages more than one deployment per second (50 million per year):

http://www.allthingsdistributed.com/2014/11/apollo-amazon-deployment-engine.html

Facebook uses PHP, but they have designed and built a compiler to improve the efficiency of their software by a significant margin.  Then they deploy a 1.5 gigabyte package using BitTorrent.  Facebook does daily deployments using this technique:

https://arstechnica.com/information-technology/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-release-engineering/

I stumbled across this blogger who used to work for GitHub.  He has a lengthy but detailed blog post describing how to make deployments boring.  I would recommend all developers read this article and begin to understand the process of deploying software:

https://zachholman.com/posts/deploying-software

Finally…

Believe it or not, your deployment process is the largest factor determining your customer experience.  If your deployments require you to shut down your system in the wee-hours of the morning to avoid the system-wide outage from affecting customers, then you’ll find it difficult to fix bugs that might affect only a hand-full of customers.  If you can smoothly deploy a version of your software in the middle of the day, you can fix a minor bug and run the deployment process without your customers being affected at all.  Ultimately, there will be bugs.  How quickly you can fix the bugs and how smoothly you get that fix deployed will determine the customer experience.

 

 

 

Creating POCOs in .Net Core 2.0

Summary

I’ve shown how to generate POCOs (Plain Old C# Objects) using the scaffold tool for .Net Core 1 in an earlier post.  Now I’m going to show how to do it in Visual Studio 2017 with Core 2.0.

Install NuGet Packages

First, you’ll need to install the right NuGet Packages.  I prefer to use the command line because I’ve been doing this so long that my fingers type the command without me thinking about it.  If you’re not comfortable with the command line NuGet window, you can use the NuGet Package Manager Settings window under the project you want to create your POCOs in.  If you want, you can copy the commands here and paste them into the NuGet Package Manager Console window.  Follow these instructions:

  1. Create a .Net Core 2.0 library project in Visual Studio 2017.
  2. Type or copy and paste the following NuGet commands into the Nuget Package Manager Console window:
install-package Microsoft.EntityFrameworkCore.SqlServer
install-package Microsoft.EntityFrameworkCore.Tools
install-package Microsoft.EntityFrameworkCore.Tools.DotNet

If you open up your NuGet Dependencies treeview, you should see the following:

Execute the Scaffold Command

In the same package manager console window use the following command to generate your POCOs:

Scaffold-DbContext "Data Source=YOURSQLINSTANCE;Initial Catalog=DATABASENAME;Integrated Security=True" Microsoft.EntityFrameworkCore.SqlServer -OutputDir POCODirectory

You’ll need to update the datasource and initial catalog to point to your database.  If the command executes without error, then you’ll see a directory named “POCODirectory” that contains cs files for each table in the database you just converted.  There will also be a context that contains all the model builder entity mappings.  You can use this file “as-is” or you can split the mappings into individual files.

My process consists of generating these files in a temporary project, followed by copying each table POCO that I want to use in my project.  Then I copy the model builder mappings for each table that I use in my project.

What This Does not Cover

Any views, stored procedures or functions that you want to access with Entity Framework will not show up with this tool.  You’ll still need to create the result POCO for views, stored procedures and functions by hand (or find a custom tool).  Using EF with stored procedures is not recommended.  Anyone who has to deal with legacy code and legacy database will run into a situation where they will need to interface with an existing stored procedure.