XML Serialization

Summary

In this post I’m going to demonstrate the proper way to serialize XML and setup unit tests using xUnit and .Net Core.  I will also be using Visual Studio 2017.

Generating XML

JSON is rapidly taking over as the data encoding standard of choice.  Unfortunately, government agencies are decades behind the technology curve and XML is going to be around for a long time to come.  One of the largest industries industries still using XML for a majority of their data transfer encoding is the medical industry.  Documents required by meaningful use are mostly encoded in XML.  I’m not going to jump into the gory details of generating a CCD.  Instead, I’m going to keep this really simple.

First, I’m going to show a method of generating XML that I’ve seen many times.  Usually coded by a programmer with little or no formal education in Computer Science.  Sometimes programmers just take a short-cut because it appears to be the simplest way to get the product out the door.  So I’ll show the technique and then I’ll explain why it turns out that this is a very poor way of designing an XML generator.

Let’s say for instance we wanted to generate XML representing a house.  First we’ll define the house as a record that can contain square footage.  That will be the only data point assigned to the house record (I mentioned this was going to be simple right).  Inside of the house record will be lists of walls and lists of roofs (assume a house could have two or more roofs like a tri-level configuration).  Next, I’m going to make a list of windows for the walls.  The window block will have a “Type” that is a free-form string input and the roof block will also have a “Type” that is a free-form string.  That is the whole definition.

public class House
{
  public List Walls = new List();
  public List Roofs = new List();
  public int Size { get; set; }
}

public class Wall
{
  public List Windows { get; set; }
}

public class Window
{
  public string Type { get; set; }
}

public class Roof
{
  public string Type { get; set; }
}

The “easy” way to create XML from this is to use the StringBuilder and just build XML tags around the data in your structure.  Here’s a sample of the possible code that a programmer might use:

public class House
{
  public List<Wall> Walls = new List<Wall>();
  public List<Roof> Roofs = new List<Roof>();
  public int Size { get; set; }

  public string Serialize()
  {
    var @out = new StringBuilder();

    @out.Append("<?xml version=\"1.0\" encoding=\"utf-8\"?>");
    @out.Append("<House xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\">");

    foreach (var wall in Walls)
    {
      wall.Serialize(ref @out);
    }

    foreach (var roof in Roofs)
    {
      roof.Serialize(ref @out);
    }

    @out.Append("<size>");
    @out.Append(Size);
    @out.Append("</size>");

    @out.Append("</House>");

    return @out.ToString();
  }
}

public class Wall
{
  public List<Window> Windows { get; set; }

  public void Serialize(ref StringBuilder @out)
  {
    if (Windows == null || Windows.Count == 0)
    {
      @out.Append("<wall />");
      return;
    }

    @out.Append("<wall>");
    foreach (var window in Windows)
    {
      window.Serialize(ref @out);
    }
    @out.Append("</wall>");
  }
}

public class Window
{
  public string Type { get; set; }

  public void Serialize(ref StringBuilder @out)
  {
    @out.Append("<window>");
    @out.Append("<Type>");
    @out.Append(Type);
    @out.Append("</Type>");
    @out.Append("</window>");
  }
}

public class Roof
{
  public string Type { get; set; }

  public void Serialize(ref StringBuilder @out)
  {
    @out.Append("<roof>");
    @out.Append("<Type>");
    @out.Append(Type);
    @out.Append("</Type>");
    @out.Append("</roof>");
  }
}

The example I’ve given is a rather clean example.  I have seen XML generated with much uglier code.  This is the manual method of serializing XML.  One almost obvious weakness is that the output produced is a straight line of XML, which is not human-readable.  In order to allow human readable XML output to be produced with an on/off switch, extra logic will need to be incorporated that would append the newline and add tabs for indents.  Another problem with this method is that it contains a lot of code that is unnecessary.  One typo and the XML is incorrect.  Future editing is hazardous because tags might not match up if code is inserted in the middle and care is not taken to test such conditions.  Unit testing something like this is an absolute must.

The easy method is to use the XML serializer.  To produce the correct output, it is sometimes necessary to add attributes to properties in objects to be serialized.  Here is the object definition that produces the same output:

public class House
{
  [XmlElement(ElementName = "wall")]
  public List Walls = new List();

  [XmlElement(ElementName = "roof")]
  public List Roofs = new List();

  [XmlElement(ElementName = "size")]
  public int Size { get; set; }
}

public class Wall
{
  [XmlElement(ElementName = "window")]
  public List Windows { get; set; }

  public bool ShouldSerializenullable()
  {
    return Windows == null;
  }
}

public class Window
{
  public string Type { get; set; }
}

public class Roof
{
  public string Type { get; set; }
}

In order to serialize the above objects into XML, you use the XMLSerializer object:

public static class CreateXMLData
{
  public static string Serialize(this House house)
  {
    var xmlSerializer = new XmlSerializer(typeof(House));

    var settings = new XmlWriterSettings
    {
      NewLineHandling = NewLineHandling.Entitize,
      IndentChars = "\t",
      Indent = true
    };

    using (var stringWriter = new Utf8StringWriter())
    {
      var writer = XmlWriter.Create(stringWriter, settings);
      xmlSerializer.Serialize(writer, house);

      return stringWriter.GetStringBuilder().ToString();
    }
  }
}

You’ll also need to create a Utf8StringWriter Class:

public class Utf8StringWriter : StringWriter
{
  public override Encoding Encoding
  {
    get { return Encoding.UTF8; }
  }
}

Unit Testing

I would recommend unit testing each section of your XML.  Test with sections empty as well as containing one or more items.  You want to make sure you capture instances of null lists or empty items that should not generate XML output.  If there are any special attributes, make sure that the XML generated matches the specification.  For my unit testing, I stripped newlines and tabs to compare with a sample XML file that is stored in my unit test project.  As a first-attempt, I created a helper for my unit tests:

public static class XmlResultCompare
{
  public static string ReadExpectedXml(string expectedDataFile)
  {
    var assembly = Assembly.GetExecutingAssembly();
    using (var stream = assembly.GetManifestResourceStream(expectedDataFile))
    {
      using (var reader = new StreamReader(stream))
      {
        return reader.ReadToEnd().RemoveWhiteSpace();
      }
    }
  }

  public static string RemoveWhiteSpace(this string s)
  {
    s = s.Replace("\t", "");
    s = s.Replace("\r", "");
    s = s.Replace("\n", "");
  return s;
  }
}

If you look carefully, I ‘m compiling my xml test data right into the unit test dll.  Why am I doing that?  The company that I work for as well as most serious companies use continuous integration tools such as a build server.  The problem with a build server is that your files might not make it to the same directory location on the build server that they are on your PC.  To ensure that the test files are there, compile them into the dll and reference them from the namespace using Assembly.GetExecutingAssembly().  To make this work, you’ll have to mark your xml test files as an Embedded Resource (click on the xml file and change the Build Action property to Embedded Resource).  To access the files, which are contained in a virtual directory called “TestData”, you’ll need to use the name space, the virtual directory and the full file name:

XMLCreatorTests.TestData.XMLHouseOneWallOneWindow.xml

Now for a sample unit test:

[Fact]
public void TestOneWallNoWindow()
{
  // one wall, no windows
  var house = new House { Size = 2000 };
  house.Walls.Add(new Wall());

  Assert.Equal(XmlResultCompare.ReadExpectedXml("XMLCreatorTests.TestData.XMLHouseOneWallNoWindow.xml"), house.Serialize().RemoveWhiteSpace());
}

Notice how I filled in the house object with the size and added one wall.  The ReadExpectedXml() method will remove whitespaces automatically, so it’s important to remove them off the serialized version of house in order to match.

Where to Get the Code

As always you can go to my GitHub account and download the sample application (click here).  I would recommend downloading the application and modifying it as a test to see how all the piece work.  Add a unit test to see if you can match your expected xml with the xml serializer.

 

 

 

Environment Configuration

Introduction

We’ve all modified and saved variables in the web.config and app.config files of our applications.  I’m going to discuss issues that occur when a company maintains multiple environments.

Environments

When I talk about environments, I’m talking about entire systems that mimic the production system of a company.  An example is a development environment, a quality environment and sometimes a staging environment.  There may be other environments as well for purposes such as load testing and regression testing.  Typically a development or QA environment is smaller than the production environment, but all the web, database, etc. servers should be represented.

The purpose of each environment is to provide the ability to test your software during different stages of development.  For this reason it’s best to think of software development like a factory with a pipe-line of steps that must be performed before delivering the final product.  The software developer might create software on his/her local machine and then check in changes to a version control system (like Git or TFS).  Then the software is installed on the development environment using a continuous integration/deployment system like TeamCity, BuildMaster or Jenkins.  Once the software is installed on the development system, then developer-level integration testing can begin.  Does it work with other parts of the software?  Does it work with real servers (like IIS)?

Once a feature is complete and it works in the development environment, it can be scheduled to be quality checked or QA’d.  The feature branch can be merged and deployed to the QA environment and tested by QA personnel and regression scripts can be run.  If the software is QA complete, then it can be merged up to the next level.  Load testing is common in larger environments.  A determination must be made if the new feature causes a higher load on the system than the previous version.  This may be accomplished through the use of an environment to contain the previous version of code.  Baseline load numbers may also be maintained to be used as a comparison for future load testing.  It’s always best to keep a previous system because hardware can be upgraded and that will change the baseline numbers.

Once everything checks out, then your software is ready for deployment.  I’m not going to go into deployment techniques in this blog post (there are so many possibilities).  I’ll leave that for another post.  For this post, I’m going to dig into configuration of environments.

Setting up the physical or virtual hardware should be done in a fashion that mimics the production system as close as possible.  If the development, QA and production systems are all different, then it’ll be difficult to get software working for each environment.  This is a waste of resources and needs to be avoided at all costs.  Most systems today use virtual servers on a host machine, making it easy to setup an identical environment.  The goal in a virtual environment is to automate the setup and tear-down of servers so environments can be created fresh when needed.

The Web.Config and App.Config

One issue with multiple environments is the configuration parameters in the web.config and app.config files (and now the appsettings.json for .net core apps).  There are other config files that might exist, like the nlog.config for nlog setup, but they all fall into the same category: They are specific for each environment.

There are many solutions available.  BuildMaster provides variable injection.  Basically, a template web.config is setup in BuildMaster and variables contain the data to be inserted for each deployment type.  There is a capability in Visual Studio called web.config transformation.  Several web.config files can be setup in addition to a common web.config to be merged when different configurations are built.  Powershell can be used to replace text in a web.config.  One powershell script per environment, or a web.config can have sections commented and powershell can remove the comment lines for the section that applies for the environment being deployed to.

These are all creative ways of dealing with this problem, but they all lack a level of security that is needed if your company has isolation between your developers and the production environment.  At some point your production system becomes so large that you’ll need an IT department to maintain the live system.  That’s the point where you need to keep developers from tweaking production code whenever they feel the need.  Some restrictions must come into play.  One restriction is the passwords used by the databases.  Production databases should be accessible for only those in charge of the production system.

What this means is that the config parameters cannot be checked into your version control system.  They’ll need to be applied when the software is deployed.  One method would be to insert parameters in for the values that will be applied at the end of deployment.  Then a list of variables and their respective parameters can exist on a server in each environment.  That list would be specific to the environment and can be used by powershell to replace variables in the web.config file by the deployment server.

There is a system called Zookeeper that can contain all of your configuration parameters and centrally accessed.  The downside to this is that you’ll need a cluster of servers to provide the throughput for a large system, plus another potential central point of failure.  The complexity of your environment just increased for the sole purpose of keeping track of configuration parameters.

Local Environments

Local environments are a special case problem.  Each software developer should have their own local environment to use for their development efforts.  By creating a miniature environment on the software developer’s desktop/laptop system, the developer has full flexibility to test code without the worry of destroying data that is used by other developers.  An automated method of refreshing the local environment is necessary.  Next comes the issue of configuring the local environment.  How do you handle local configuration files?

Solution 1: Manually alter local config files.

The developer needs to exclude the config files from check-in otherwise the local changes could end up in the version control software for one developer.

Solution 2: Manually alter local config files and exclude from check-in by adding to ignore settings.

If there are any automatic updates to the config by Visual Studio, those changes will not be checked into your version control software.

Solution 3: Create config files with replaceable parameters and include a script to replace them as a post build operation.

Same issue as solution 1, the files could get checked-in to version control and that would wipe out the changes.

Solution 4: Move all config settings to a different config file.  Only maintain automatic settings in web.config and app.config.

This is a cleaner solution because the local config file can be excluded from version control.  A location (like a wiki page or the version control) must contain a copy of the local config file with the parameters to be replaced.  Then a script must be run the first time to populate the parameters or the developer must manually replace the parameters to match their system.  The web.config and app.config would only contain automatic parameters.

One issue with this solution is that it would be difficult to convert legacy code to use this method.  A search and replace for each parameter must be performed, or you can override the ConfigurationManager object and implement the AppSettings method to store the values in the custom config file (ditto for the database connection settings).

Why Use Config Files at all?

One of the questions I see a lot is the question about using the config file in the first place.  Variables used by your software can be stored in a database.  If your configuration data is specific to the application, then a table can be setup in your database to store application config data.  This can be cached and read all at once by your application.  There is only one catch: What about your database connection strings?  At least one connection string will be needed and probably a connection string to your caching system (if you’re using Redis or Memcached or something).  That connection string will be used to read the application config variables, like other database connection strings, etc.  Keep this method in mind if you hope to keep your configuration rats-nest to something manageable.  Each environment would read it’s config variables from the database that belongs to it.

The issues listed in the local environment are still valid.  In other words, each environment would need it’s own connection and the production environment (and possibly other environments) connection would need to be kept secret from developers.

Custom Solutions

There are other solutions to the config replacement problem.  The web.config can be deserialized by a custom program and then each config parameter value can be replaced by a list of replacement values using the key (for appSettings as an example).  Then you can maintain a localized config settings file for each environment.  This technique has the added bonus of not breaking if someone checks in their local copy into the version control software.

Here is a code snippet to parse the appSettings section of the web.config file:

// read the web.config file
var document = new XmlDocument();
document.Load(fileName);

var result = document.SelectNodes("//configuration/appSettings");

// note: you might also need to check for "//configuration/location/appSettings"

foreach (XmlNode childNodes in result)
{
    foreach (XmlNode keyItem in childNodes)
    {
        if (keyItem.Attributes != null)
        {
            if (keyItem.Attributes["key"] != null && keyItem.Attributes["value"] != null)
            {
                var key = keyItem.Attributes["key"].Value;

                // replace your value here
                keyItem.Attributes["value"].Value = LookupValue(key);
            }
        }
    }
}

// save back the web.config file
document.Save(fileName);

The same technique can be used for .json files to read, deserialize, alter then save.

Conclusion

Most aspects of system deployment have industry-wide standards.  Managing configuration information isn’t one of them.  I am not sure why a standard has not been established.  Microsoft provides a method of setting up multiple environment configuration files, but it does not solve the issue of securing a production environment from the developers.  It also does not work when the developer executes their program directly from Visual Studio.  The transform operation only occurs when a project is published.

BuildMaster has a good technique.  That product can be configured to deploy different config files for each environment.  This product does not deploy to the developer’s local environment, so that is still an issue.  The final issue with this method is that automatically added parameters will not show up and must be added by hand.

Jenkins doesn’t seem to have any capability to handle config files (if you know of a plug-in for Jenkins, leave a comment, I’d like to try it out).  Jenkins leaves the dev-ops person with the task of setting up scripts to modify the config parameters during deployment/promotion operations.

If you have other ideas that work better than these, please leave a comment or drop me an email.  I’d like to hear about your experience.