Deploying Software

Story

I’ve worked for a lot of different companies.  Most of them small.  Several of the companies that I have worked for have had some serious growth in their user base.  Every company I have worked for seem to follow same path from start-up to mid-sized company.  Start-ups usually staffed by amateur programmers who know how to write a small program and get it working.  Inevitably the software becomes so large that they are overwhelmed and have no clue how to solve their deployment problems.  Here are the problems that they run into:

  1. The customers become numerous and bugs are reported faster than they can fix them.
  2. Deployments become lengthy and difficult.  Usually causing outages after deployment nights.
  3. Regression testing becomes an overwhelming task.
  4. Deployments cause the system to overload.
  5. Keeping environments in-sync becomes overwhelming.

Solutions

This is where continuous integration techniques come into play.  The first problem can be tackled by making sure there is proper logging of system crashes.  If there is no log of what is going on in your production system, then you have a very big problem.

Problem number two is one that can be easy to solve if it is tackled early in the software development phase.  This problem can only be solved by ensuring everyone is on-board with the solution.  Many companies seem to double-down on manual deployments and do incredibly naive things like throwing more people at the problem.  The issue is not the labor, the issue is time.  As your software grows, it becomes more complex and takes more time to test new enhancements.  Performing a scheduled deployment at night is a bad customer experience.  The proper way to deploy production is to do it in the background.

One method of performing this task is to create new servers to deploy the software to and test the software before hooking the servers into your load-balancer.  The idea is to automate the web server creation process, install the new software on the new servers and then add them to the load-balancer with the new features turned off.  The new software needs to be setup to behave identical to the old software when the new features are not turned on.  Once the new servers are deployed, the old servers are removed from load-balancing one at a time until they have been replaced.  During this phase, the load of your servers need to be monitored (including your database servers).  If something doesn’t look right, you have the option to stop the process and roll-back.

Database changes can be the challenging part.  You’ll need to design your software to work properly with any old table, view, stored procedure designs as well as the new ones.  Once the feature has been rolled out and turned on, a future clean-up version can be rolled out (possibly with the next feature release) to remove the code that recognizes the old tables, views, stored procedures.  This can also be tested when new web servers are created and before they are added to the web farm.

Once everything has been tested and properly deployed the announcement that a new feature will be released can be made, followed by the switch-on of the new feature.  Remember, everything should be tested and deployed by the time the new feature is switched on.  If you are running a large web farm with tens of thousands (or more) of customers, you may want to do a canary release.  A canary release can be treated like a beta release, but it doesn’t have to.  You randomly choose 5% of your customers and switch on the feature on early in the day that the feature is to be released.  Give it an hour to monitor and see what happens.  If everything looks good, add another 5% or 10% of your customers.  By the time you switch on 20% of your customers you should feel confident enough to up it to 50%, then follow that by 100%.  All customers can be switched on within a 4 hour period.  This allows enough time to monitor and give a go or no-go on proceeding.  If your bug tracking logs are reporting an uptick in bugs when you switched on the first 5%, then turn it back off and analyze the issue.  Fix the problem and proceed again.

I’ve heard the complaint that canary release is like a beta program.  The first 5% are beta testing your software.  My answer to that is: If you are releasing 100% of your customers at the same time, doesn’t that mean that all your customers are beta testers?  Let’s face the facts, the choice is not between different versions of the software.  The choice is between how many people will experience the software you are releasing, 5% or 100%.  That’s why I advocate random customer selection.  The best scenario rotates the customers each release so that each customer will be in the first 5% only one it twenty releases.  That means that every customer shares the pain 1/20th of the time instead of a 100% release where every customer feels the pain every time.

Regression Testing

Regression testing is something that needs to be considered early in your software design.  Current technology provides developers with the tools to build this right into the software.  Unit testing, which I am a big advocate of, is something that needs to be done for every feature released.  The unit tests must be designed with the software and you must have adequate code coverage.  When a bug is found and reported, a unit test must be created to simulate this bug and then the bug is fixed.  This gives you regression testing ability.  It also gives a developer instant feed-back.  The faster a bug is reported, the cheaper it is to fix.

I have worked in many environments where there is a team of QA (Quality Assurance) workers who manually find bugs and report them back to the developer assigned to the enhancement causing the bug.  The problem with this work flow is that the developer is usually on to the next task and is “in-the-zone” of the next difficult coding problem.  If that developer needs to switch gears, shelve their changes, fix a bug and deploy it back to the QA environment, it causes a slowdown in the work flow.  If the developer checks in their software and the build server catches a unit test bug and reports it immediately, then that developer will still have the task in mind and be able to fix it right there.  No task switching is necessary.  Technically many unit test bugs are found locally if the developer runs the unit tests before check-in or if the system has a gated check-in that prevents bad builds from being checked in (then they are forced to fix their error before they can continue).

Load Testing

When your software becomes large and the number of customers accessing your system is large, you’ll need to perform load testing.  Load testing can be expensive, so young companies are not going to perform this task.  My experience with load testing is that it is never performed until after a load-related software deployment disaster occurs.  Then load testing seems “cheap” compared to hordes of angry customers threatening lawsuits and cancellations.  To determine when your company should start load-testing, keep an eye on your web farm and database performances.  You’ll need to keep track of your base-line performances as well as the peaks.  Over time you’ll see your server CPU and memory usage go up.  Keep yourself a large buffer to protect from a bad database query.  Eventually your customer size will get to a point where you need to load test before deployments because unpredictable customer behavior will overwhelm your servers in an unexpected manner.  Your normal load will ride around 50% one day, and then, because of year-end reporting, you wake up and all your servers are maxed out.  If it’s a web server load problem, that is easy to fix: Add more servers to the farm (keep track of what your load-balancer can handle).  If it’s a database server problem, you’re in deep trouble.  Moving a large database is not an easy task.

For database operations, you’ll need to balance your databases between server instances.  You might also need to increase memory or CPUs per instance.  If you are maxed out on the number of CPUs or memory per instance, then you are left with only one choice: Moving databases.  I could write a book on this problem alone and I’m not a full-time database person.

Environments

One issue I see is that companies grow and they build environments by hand.  This is a bad thing to do.  There are a lot of tools available to replicate servers and stand up a system automatically.  What inevitably happens is that the development, QA, staging and production environments get out of sync.  Sometimes shortcuts are taken for development and QA environments and that can cause software to perform differently that in production.  This guarantees that deployments will go poorly.  Configure environments automatically.  Refresh your environments at regular intervals.  Companies I have worked for don’t do this enough and it always causes deployment issues.  If you are able to built a web-farm with the click of a button, then you can perform this task for any environment.  By guaranteeing each environment is identical to production (except on a smaller scale), then you can find environment specific bugs early in the development phase and ensure that your software will perform as expected when it is deployed to your production environment.

Databases need to be synchronized as well.  There are tools to sync the database structure.  This task needs to be automated as much as possible.  If your development database can be synced up once a week, then you’ll be able to purge any bad data that has occurred during the week.  Developers need to alter their work-flow to account for this process.  If there are database structure changes (tables, views, functions, stored procedures, etc.) then they need to be checked into version control just like code and the automated process needs to pickup these changes and apply them after the base database is synced down.

Why spend the time to automate this process?  If your company doesn’t automate this step, you’ll end up with a database that has sat un-refreshed for years.  It might have the right changes, it might not.  The database instance becomes the wild west.  It will also become full of test data that causes your development processes to slow down.  Many developer hours will be wasted trying to “fix” an issue caused by a bad database change that was not properly rolled back.  Imagine a database where the constraints are out of sync.  Once the software is working on the development database, it will probably fail in QA.  At that point, it’s more wasted troubleshooting time.  If your QA database is out of sync?  Yes, your developers start fixing environment related issues all the way up the line until the software is deployed and crashes on the production system.  Now the development process is expensive.

Other Sources You Should Read

Educate yourself on deployment techniques early in the software design phase.  Design your software to be easy and safe to deploy.  If you can head off the beast before it becomes a nightmare, you can save yourself a lot of time and money.  Amazon has designed their system around microservices.  Their philosophy is to keep each software package small.  This makes it quick and easy to deploy.  Amazon deploys continuously at a rate that averages more than one deployment per second (50 million per year):

http://www.allthingsdistributed.com/2014/11/apollo-amazon-deployment-engine.html

Facebook uses PHP, but they have designed and built a compiler to improve the efficiency of their software by a significant margin.  Then they deploy a 1.5 gigabyte package using BitTorrent.  Facebook does daily deployments using this technique:

https://arstechnica.com/information-technology/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-release-engineering/

I stumbled across this blogger who used to work for GitHub.  He has a lengthy but detailed blog post describing how to make deployments boring.  I would recommend all developers read this article and begin to understand the process of deploying software:

https://zachholman.com/posts/deploying-software

Finally…

Believe it or not, your deployment process is the largest factor determining your customer experience.  If your deployments require you to shut down your system in the wee-hours of the morning to avoid the system-wide outage from affecting customers, then you’ll find it difficult to fix bugs that might affect only a hand-full of customers.  If you can smoothly deploy a version of your software in the middle of the day, you can fix a minor bug and run the deployment process without your customers being affected at all.  Ultimately, there will be bugs.  How quickly you can fix the bugs and how smoothly you get that fix deployed will determine the customer experience.

 

 

 

Using Scripts

Summary

In this post I’m going to show how you can improve developer productivity by steering developers to use scripts where it makes sense.

Setting up IIS

As a back-end developer, I spend a lot of time standing up and configuring new APIs.  One of the tools I use to reduce the amount of man-hours it takes me to get an API up and running is PowerShell.  Personally, the world “PowerShell” makes my skin crawl.  Why?  Because it’s a scripting language that has a syntax that feels like something built by Dr. Frankenstein.  To get beyond my lack of memorizing each and every syntax nuance of PowerShell, I use a lot of Google searches.  Fortunately, after several years of use, I’ve become familiar with some of the capabilities of PowerShell and I can save a lot of time when I create IIS sites.

Now you’re probably wondering where I save my time, since the script has to be written and the site only needs to be setup once.  The time saving comes when I have to change something minor or I have to establish the site on another environment.  In the case of another environment, I can change the path name or url to match the destination environment and run my script to create all the pieces necessary to run my API.

Before I get into the script, I’m going to go through the steps to create an IIS site for WebApi for .Net Core 2.0.

Step 1: Setup the Application Pool.

  • Open IIS and navigate to the Application Pool node.
  • Right-click and add.
  • Give your app pool a name that matches your site, so you can identify it quickly.  This will save you troubleshooting time.
  • For .Net Core, you need to set the .Net Framework Version to “No Managed Code”

Step 2: Setup IIS site.

  • Right-click on the “Sites” node and “Add Web Site”
  • I usually name my site the same as the URL or at least the sub-domain of the URL so I can find it quick.  Again, this name is not used by the system, it is only used when I have to troubleshoot and saving time troubleshooting is the number one priority.
  • Set the path to point to the root of your publish directory (make sure you have done a local publish from Visual Studio before performing this step).
  • Type in the host name.  This is the URL of your site.  If you are just testing locally, you can make up a URL that you’ll need to add to the Hosts file.
  • Select the Application Pool that you created earlier.

Step 3: Optional, setup Hosts file.  Use this step if you are setting up a local website for testing purposes only.

  • Navigate to C:\Windows\System32\drivers\etc
  • Edit “Hosts” file.  You might have to edit with Administrator rights.
  • Add your URL to the hosts file: “127.0.0.1       MyDotNetWebApi.com”

Now try to visualize performing this process for each environment that your company uses.  For me, that comes out to be about half a dozen environments.  In addition to this, each developer that will need your API setup on their PC will need to configure this.  Here’s where the time-saving comes in.  Create the PowerShell script first, and test the script.  Never create the site by hand.  Then use the script for each environment.  Provide the script for other developers to setup their own local copy.  This can be accomplished by posting the script on a wiki page or checking the script into your version control system with the code.

Here’s what an example PowerShell script would look like:

# if you get an error when executing this script, comment the line below to exclude the WebAdministration module
Import-Module WebAdministration

#setup all IIS sites here
$iisAppList = 
    "MyDotNetWebApi,MyDotNetWebApi.franksurl.com,c:\myapicodedirectory,", # use "v4.0" for non-core apps
    "testsite2,testsite2.franksurl.com,c:\temp,v4.0"
    


# setup the app pools and main iis websites
foreach ($appItem in $iisAppList)
{
    $temp = $appItem.split(',')
    
    $iisAppName = $temp[0]
    $iisUrl = $temp[1]
    $iisDirectoryPath = $temp[2]
    $dotNetVersion = $temp[3]
    
    #navigate to the app pools root
    cd IIS:\AppPools\

    if (!(Test-Path $iisAppName -pathType container))
    {
        #create the app pool
        $appPool = New-Item $iisAppName
        $appPool | Set-ItemProperty -Name "managedRuntimeVersion" -Value $dotNetVersion
    }
    
    
    #navigate to the sites root
    cd IIS:\Sites\
    
    if (!(Test-Path $iisAppName -pathType container))
    {
        #create the site
        $iisApp = New-Item $iisAppName -bindings @{protocol="http";bindingInformation=":80:" + $iisUrl} -physicalPath $iisDirectoryPath
        $iisApp | Set-ItemProperty -Name "applicationPool" -Value $iisAppName
        
        Write-Host $iisAppName "completed."
    }
}

c:

You can change the sites listed in the list of sites at the top of the script.  The app pool is setup first, followed by the IIS web site.  Each section will test to see if the app pool or site is already setup (in which is skips).  So you can run the PowerShell script again without causing errors.  Keep the script in a safe location, then you can add to the list and re-run the PowerShell script.  If you need to recreate your environment, you can create all sites with one script.

If you delete all your IIS sites and app pools you might run into the following error:

New-Item : Index was outside the bounds of the array.

To fix this “issue” create a temporary web site in IIS (just use a dummy name like “test”).  Run the script, then delete the dummy site and it’s app pool.  The error is caused by a bug where IIS is trying to create a new site ID.

Setting a Directory to an Application

There are time when you need to convert a directory in your website into it’s own application.  To do this in IIS, you would perform the following steps:

  • Expand the website node
  • Right-click on the directory that will be converted and select “Convert to Application”
  • Click “OK”

To perform this operation automatically in a script, add the following code after creating your IIS sites above (just before the “c:” line of code):

$iisAppList = 
    "MyDotNetWebApi,MyAppDirectory,MyAppDirectory,MyDotNetWebApi\MyAppDirectory,"

foreach ($appItem in $iisAppList)
{
    $temp = $appItem.split(',')

    $iisSiteName = $temp[0]
    $iisAppName = $temp[1]
    $iisPoolName = $temp[2]
    $iisPath = $temp[3]
    $dotNetVersion = $temp[4]

    cd IIS:\AppPools\

    if (!(Test-Path $iisPoolName -pathType container))
    {
        #create the app pool
        $appPool = New-Item $iisPoolName
        $appPool | Set-ItemProperty -Name "managedRuntimeVersion" -Value $dotNetVersion
    }

    cd IIS:\Sites\
    
    # remove and re-apply any IIS applications
    if (Get-WebApplication -Site $iisSiteName -Name $iisAppName)
    {
        Remove-WebApplication -Site $iisSiteName -Name $iisAppName
    }

    ConvertTo-WebApplication -PSPath $iisPath -ApplicationPool $iisPoolName
}

Now add any applications to the list.  The first parameter is the name of the IIS site.  The second parameter is the application name.  The third parameter is the pool name (this script will create a new pool for the application).  The fourth parameter is the path to the folder.  The last parameter is the .Net version (use v4.0 if this application is not a .Net Core project).

For the above script to run, you’ll need to create a blank directory called: C:\myapicodedirectory\MyAppDirectory

Now execute the script and notice that the MyAppDirectory has been turned into an application:

You can add as many applications to each IIS website as you need by adding to the list.

What the code above does is it creates an application pool first (if it doesn’t exist already).  Then it removes the application from the site followed by converting a directory to an application for a specific site.  This script can also be executed multiple times without causing duplicates or errors.

If you run into problems executing your script, you might have to run under an Administrator.  I usually startup powershell in Administrator mode.  Then I navigate to the directory containing the script.  Last, I execute the script.  This allows me to see any errors in the console window.  If you right-click on the ps1 file and run with powershell, your script could fail and exit before you can read the error message.

Feel free to copy the scripts from above and build your own automated installation scripts.

 

Environment Configuration

Introduction

We’ve all modified and saved variables in the web.config and app.config files of our applications.  I’m going to discuss issues that occur when a company maintains multiple environments.

Environments

When I talk about environments, I’m talking about entire systems that mimic the production system of a company.  An example is a development environment, a quality environment and sometimes a staging environment.  There may be other environments as well for purposes such as load testing and regression testing.  Typically a development or QA environment is smaller than the production environment, but all the web, database, etc. servers should be represented.

The purpose of each environment is to provide the ability to test your software during different stages of development.  For this reason it’s best to think of software development like a factory with a pipe-line of steps that must be performed before delivering the final product.  The software developer might create software on his/her local machine and then check in changes to a version control system (like Git or TFS).  Then the software is installed on the development environment using a continuous integration/deployment system like TeamCity, BuildMaster or Jenkins.  Once the software is installed on the development system, then developer-level integration testing can begin.  Does it work with other parts of the software?  Does it work with real servers (like IIS)?

Once a feature is complete and it works in the development environment, it can be scheduled to be quality checked or QA’d.  The feature branch can be merged and deployed to the QA environment and tested by QA personnel and regression scripts can be run.  If the software is QA complete, then it can be merged up to the next level.  Load testing is common in larger environments.  A determination must be made if the new feature causes a higher load on the system than the previous version.  This may be accomplished through the use of an environment to contain the previous version of code.  Baseline load numbers may also be maintained to be used as a comparison for future load testing.  It’s always best to keep a previous system because hardware can be upgraded and that will change the baseline numbers.

Once everything checks out, then your software is ready for deployment.  I’m not going to go into deployment techniques in this blog post (there are so many possibilities).  I’ll leave that for another post.  For this post, I’m going to dig into configuration of environments.

Setting up the physical or virtual hardware should be done in a fashion that mimics the production system as close as possible.  If the development, QA and production systems are all different, then it’ll be difficult to get software working for each environment.  This is a waste of resources and needs to be avoided at all costs.  Most systems today use virtual servers on a host machine, making it easy to setup an identical environment.  The goal in a virtual environment is to automate the setup and tear-down of servers so environments can be created fresh when needed.

The Web.Config and App.Config

One issue with multiple environments is the configuration parameters in the web.config and app.config files (and now the appsettings.json for .net core apps).  There are other config files that might exist, like the nlog.config for nlog setup, but they all fall into the same category: They are specific for each environment.

There are many solutions available.  BuildMaster provides variable injection.  Basically, a template web.config is setup in BuildMaster and variables contain the data to be inserted for each deployment type.  There is a capability in Visual Studio called web.config transformation.  Several web.config files can be setup in addition to a common web.config to be merged when different configurations are built.  Powershell can be used to replace text in a web.config.  One powershell script per environment, or a web.config can have sections commented and powershell can remove the comment lines for the section that applies for the environment being deployed to.

These are all creative ways of dealing with this problem, but they all lack a level of security that is needed if your company has isolation between your developers and the production environment.  At some point your production system becomes so large that you’ll need an IT department to maintain the live system.  That’s the point where you need to keep developers from tweaking production code whenever they feel the need.  Some restrictions must come into play.  One restriction is the passwords used by the databases.  Production databases should be accessible for only those in charge of the production system.

What this means is that the config parameters cannot be checked into your version control system.  They’ll need to be applied when the software is deployed.  One method would be to insert parameters in for the values that will be applied at the end of deployment.  Then a list of variables and their respective parameters can exist on a server in each environment.  That list would be specific to the environment and can be used by powershell to replace variables in the web.config file by the deployment server.

There is a system called Zookeeper that can contain all of your configuration parameters and centrally accessed.  The downside to this is that you’ll need a cluster of servers to provide the throughput for a large system, plus another potential central point of failure.  The complexity of your environment just increased for the sole purpose of keeping track of configuration parameters.

Local Environments

Local environments are a special case problem.  Each software developer should have their own local environment to use for their development efforts.  By creating a miniature environment on the software developer’s desktop/laptop system, the developer has full flexibility to test code without the worry of destroying data that is used by other developers.  An automated method of refreshing the local environment is necessary.  Next comes the issue of configuring the local environment.  How do you handle local configuration files?

Solution 1: Manually alter local config files.

The developer needs to exclude the config files from check-in otherwise the local changes could end up in the version control software for one developer.

Solution 2: Manually alter local config files and exclude from check-in by adding to ignore settings.

If there are any automatic updates to the config by Visual Studio, those changes will not be checked into your version control software.

Solution 3: Create config files with replaceable parameters and include a script to replace them as a post build operation.

Same issue as solution 1, the files could get checked-in to version control and that would wipe out the changes.

Solution 4: Move all config settings to a different config file.  Only maintain automatic settings in web.config and app.config.

This is a cleaner solution because the local config file can be excluded from version control.  A location (like a wiki page or the version control) must contain a copy of the local config file with the parameters to be replaced.  Then a script must be run the first time to populate the parameters or the developer must manually replace the parameters to match their system.  The web.config and app.config would only contain automatic parameters.

One issue with this solution is that it would be difficult to convert legacy code to use this method.  A search and replace for each parameter must be performed, or you can override the ConfigurationManager object and implement the AppSettings method to store the values in the custom config file (ditto for the database connection settings).

Why Use Config Files at all?

One of the questions I see a lot is the question about using the config file in the first place.  Variables used by your software can be stored in a database.  If your configuration data is specific to the application, then a table can be setup in your database to store application config data.  This can be cached and read all at once by your application.  There is only one catch: What about your database connection strings?  At least one connection string will be needed and probably a connection string to your caching system (if you’re using Redis or Memcached or something).  That connection string will be used to read the application config variables, like other database connection strings, etc.  Keep this method in mind if you hope to keep your configuration rats-nest to something manageable.  Each environment would read it’s config variables from the database that belongs to it.

The issues listed in the local environment are still valid.  In other words, each environment would need it’s own connection and the production environment (and possibly other environments) connection would need to be kept secret from developers.

Custom Solutions

There are other solutions to the config replacement problem.  The web.config can be deserialized by a custom program and then each config parameter value can be replaced by a list of replacement values using the key (for appSettings as an example).  Then you can maintain a localized config settings file for each environment.  This technique has the added bonus of not breaking if someone checks in their local copy into the version control software.

Here is a code snippet to parse the appSettings section of the web.config file:

// read the web.config file
var document = new XmlDocument();
document.Load(fileName);

var result = document.SelectNodes("//configuration/appSettings");

// note: you might also need to check for "//configuration/location/appSettings"

foreach (XmlNode childNodes in result)
{
    foreach (XmlNode keyItem in childNodes)
    {
        if (keyItem.Attributes != null)
        {
            if (keyItem.Attributes["key"] != null && keyItem.Attributes["value"] != null)
            {
                var key = keyItem.Attributes["key"].Value;

                // replace your value here
                keyItem.Attributes["value"].Value = LookupValue(key);
            }
        }
    }
}

// save back the web.config file
document.Save(fileName);

The same technique can be used for .json files to read, deserialize, alter then save.

Conclusion

Most aspects of system deployment have industry-wide standards.  Managing configuration information isn’t one of them.  I am not sure why a standard has not been established.  Microsoft provides a method of setting up multiple environment configuration files, but it does not solve the issue of securing a production environment from the developers.  It also does not work when the developer executes their program directly from Visual Studio.  The transform operation only occurs when a project is published.

BuildMaster has a good technique.  That product can be configured to deploy different config files for each environment.  This product does not deploy to the developer’s local environment, so that is still an issue.  The final issue with this method is that automatically added parameters will not show up and must be added by hand.

Jenkins doesn’t seem to have any capability to handle config files (if you know of a plug-in for Jenkins, leave a comment, I’d like to try it out).  Jenkins leaves the dev-ops person with the task of setting up scripts to modify the config parameters during deployment/promotion operations.

If you have other ideas that work better than these, please leave a comment or drop me an email.  I’d like to hear about your experience.

 

Continuous Integration – Baby Steps

Introduction

In this blog post I’m going to talk top of the waves about a very large subject: Continuous Integration or CI.

Where to Start

CI is a process, but it’s not an all or nothing proposition.  There are levels of CI that can be achieved.  As the title of this blog post suggests, you can start small and build on your process.  The most difficult aspect of getting to a CI environment is the natural resistance of people who have been running your company for years.  This occurs because companies always start out small and software is easy when it’s small.  It’s more forgiving.  Manual deployment is not painful.  Unfortunately, by the time an organization discovers that they need to do something, the manual deployment process is at a disastrous level.

To get the process rolling, identify what can be done quickly.  Each manual process that your organization is performing that can be easily and cheaply automated will save time in the future.  As you implement more and more automation, you will begin to see results as operations become smoother.

I’m going to identify some low-hanging fruit that can be done in any company creating software and deploying it on a regular basis.  First, developers should always use version control.  There are many products that are available, including free versions that are very good (Bitbucket is one example that allows free private repositories).  By using either GitHub or Bitbucket you will get historical records on changes in your software and you’ll get off-site backup protection.  Disaster recovery just got a bit easier.

Once your software is regularly checked in by developers, then it’s time to try out some build systems.

The Build Server

Your first level of CI is to have a system that builds your software whenever a change is checked in or merged, then notifies all developers if a build is broken.  At this point, process and rules must be put in place to ensure that the build gets fixed right away.  The longer a build goes without being fixed, the more difficult it becomes to find the problem.  Some companies force developers to stay late to fix their build (since it can affect other developers), other companies have rules that allow the build to be broken for a maximum tolerable time.  This all depends on the company and the number of developers involved.

Once a build system is in place, your software should use unit tests to ensure a change in the software does not break a previously established feature.  The build server must run these unit tests every time the build is completed and the build should be rejected if the unit tests don’t pass.  This is also something that must be fixed by developers right away.

Many version control packages allow a pre-build check to be used.  As you assemble your CI environment, somewhere down the road you’ll want to incorporate some sort of pre-build system that doesn’t allow the software to be checked in unless it builds (and possibly passes the unit tests).  This will prevent change sets that are broken from being checked into your version control system.

Your next phase and probably a more difficult phase is to automate your deployment.  I’m assuming that you can acquire a development server or environment that mimics your production environment.  Once you scale up, you’ll need to add a quality environment for testing (QA) and some sort of staging environment.  Before you setup too many environments, you should get an automated deployment in place.  Jenkins is a good starting point though you can get by with just power shell or batch files.  Initially you can automate the process of preparing your deployment package and then manually switch out existing directories with the automatically prepared directories.  Once you are comfortable with that process you can automate the process of backing up the current environment and deploying the new one.  Keep in mind that you should have a roll-back mechanism that works quickly.  For a web server the process would look something like this:

  1. Create the directory with all the new files from your build server.
  2. Copy config files from the existing production environment.
  3. Stop the web server (in a web farm, perform this operation for one server at a time).
  4. Rename the existing website directory (give it a date so you can keep multiple backups if needed).
  5. Rename the new directory to the name that your web server expects.
  6. Start your web server.

For web farms, you can deploy your software in the middle of the day by adding steps to your process to take one server out of the farm (mark it unhealthy or disable it in your load balancer).  Wait for the traffic to bleed off, then shut the server down.  Continue with the directory switch.  Then start the server back up and put it back in the farm.  Then perform the same operation on the next web server, continuing through all servers in the farm.

To enhance this operation you can put a stop after the first web server and allow testing to be performed before authorizing the continuation of the deployment to other web servers in the farm.

Next you’ll need to make sure you have a roll-back plan.  Roll-back can be accomplished by stopping your web server, putting the old directory back and starting your web server.  This process should be tested on a test or development system first.  It needs to be perfected because you’ll need it when something bad happens.

Logging

If your software doesn’t log exceptions, then you’ll need to add some sort of catch-all logging (like ELMAH).  You should at least log errors and send it to a text file or to an email address.  Be aware that if you are adding logging to an old legacy system that has had no logging in the past, that your email system must either be robust enough to handle the load or you’ll need an email system that can dump older emails if the in-box is too full.  Otherwise, you’ll find yourself with buggy software and an email system that is down.   For text files, you’ll need to make sure that it is setup to roll-over (create a new file) when the text file gets too big.  Find out what your maximum file size is for the editor of your choice.  For products such as Notepad++, you’ll want to keep the files under 50 meg in size.  Sublime can handle larger files, but the file will load slowly as it approaches 500 megabytes in size.  You’ll also want to limit the number of roll-over files to prevent your hard drive from filling up and crashing your system.

Once you have established a logging system, analyze what errors are being logged.  Focus on the largest quantity of one type of error and fix the problem in your software.  Eventually, you’ll get into obscure errors that occur in situations where a web bot hits your website with incorrect parameters (or something of that nature).

Next, you’ll want to log events that occur on your APIs.  Logging such events can reveal aspects of your software that you never knew existed.  Scenarios where an object is null after a call to an API can occur if the parameters are unexpected.  These bugs can be fixed to prevent 500 errors from tying up your resources.  It can also help prevent memory leaks and stuck web server processes.

More Automation

The last aspect of CI that you should focus on involves tests such as load testing.  Load tests can be performed during off-hours or on a system that is isolated from your production system.  Code coverage can be performed to determine if your unit tests are adequate. Be aware that code coverage can be misleading due to the fact that there is a difference between a large quantity of poorly designed unit tests and a small quantity of well designed unit tests.

Integration testing can also be automated.  This is a complex subject in it’s own right, but there are ways to script a process to perform gets and posts on APIs and web sites to probe for 500 errors.  Any manual test that is performed more than once should become a candidate for automation.  Manual regression tests are time consuming (i.e. expensive) and prone to errors.  Your test suite should consist of a collection of small individual test packages that can be run separately or in parallel.  Eventually, you’ll get to a point where you can perform a full regression test at night and reduce the amount of manual testing to new systems or tricky sections of code.  This type of testing can be brittle if not done properly, so be aware that you will need to identify what can be automated and what will need to be done manually.

Repeat Often

One aspect of CI that is important is the theory of deploying small chunks often instead of large chunks of code rarely.  When you deploy software often (as in several times a week or even several times a day), you’ll gain confidence in your ability to deploy quality code.  The deployment process becomes automatic and your automated processes will become hardened and reliable.  Feed back from your logging should indicate if something went wrong (in which case you can roll back) or if your improvements have made an impact.  Feed back from your customers can also be addressed quickly since your turn-around time consists of analysis and programming followed by your automated testing and deployment process.  If your testing and deployment process takes a month, then developers will have forgotten a lot of information of what features they programmed by the time it’s time to deploy them.

Other Candidates for Automation

Resetting passwords of your databases used by your website should be easy to do.  Sometimes development continues at a fevered pace only to discover that two dozen (or more) databases are accessed from connection strings in various web.config files scattered on all different servers.  You should at least keep a list of where the passwords are stored so you can change them all quickly.

Third party connections can also fall into this category.  If you connect to an outside service, you should keep track of where that information is stored and how to change it.  It only takes one rouge programmer to make life miserable for a programming shop that consists of hundreds of config files scattered everywhere.  If you need to keep programmers out of your production system then you’ll need a method of changing passwords often (like once a month).

Finally

Be sure to click the “like” button if this information was helpful!