XML Serialization

Summary

In this post I’m going to demonstrate the proper way to serialize XML and setup unit tests using xUnit and .Net Core.  I will also be using Visual Studio 2017.

Generating XML

JSON is rapidly taking over as the data encoding standard of choice.  Unfortunately, government agencies are decades behind the technology curve and XML is going to be around for a long time to come.  One of the largest industries industries still using XML for a majority of their data transfer encoding is the medical industry.  Documents required by meaningful use are mostly encoded in XML.  I’m not going to jump into the gory details of generating a CCD.  Instead, I’m going to keep this really simple.

First, I’m going to show a method of generating XML that I’ve seen many times.  Usually coded by a programmer with little or no formal education in Computer Science.  Sometimes programmers just take a short-cut because it appears to be the simplest way to get the product out the door.  So I’ll show the technique and then I’ll explain why it turns out that this is a very poor way of designing an XML generator.

Let’s say for instance we wanted to generate XML representing a house.  First we’ll define the house as a record that can contain square footage.  That will be the only data point assigned to the house record (I mentioned this was going to be simple right).  Inside of the house record will be lists of walls and lists of roofs (assume a house could have two or more roofs like a tri-level configuration).  Next, I’m going to make a list of windows for the walls.  The window block will have a “Type” that is a free-form string input and the roof block will also have a “Type” that is a free-form string.  That is the whole definition.

public class House
{
  public List Walls = new List();
  public List Roofs = new List();
  public int Size { get; set; }
}

public class Wall
{
  public List Windows { get; set; }
}

public class Window
{
  public string Type { get; set; }
}

public class Roof
{
  public string Type { get; set; }
}

The “easy” way to create XML from this is to use the StringBuilder and just build XML tags around the data in your structure.  Here’s a sample of the possible code that a programmer might use:

public class House
{
  public List<Wall> Walls = new List<Wall>();
  public List<Roof> Roofs = new List<Roof>();
  public int Size { get; set; }

  public string Serialize()
  {
    var @out = new StringBuilder();

    @out.Append("<?xml version=\"1.0\" encoding=\"utf-8\"?>");
    @out.Append("<House xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\">");

    foreach (var wall in Walls)
    {
      wall.Serialize(ref @out);
    }

    foreach (var roof in Roofs)
    {
      roof.Serialize(ref @out);
    }

    @out.Append("<size>");
    @out.Append(Size);
    @out.Append("</size>");

    @out.Append("</House>");

    return @out.ToString();
  }
}

public class Wall
{
  public List<Window> Windows { get; set; }

  public void Serialize(ref StringBuilder @out)
  {
    if (Windows == null || Windows.Count == 0)
    {
      @out.Append("<wall />");
      return;
    }

    @out.Append("<wall>");
    foreach (var window in Windows)
    {
      window.Serialize(ref @out);
    }
    @out.Append("</wall>");
  }
}

public class Window
{
  public string Type { get; set; }

  public void Serialize(ref StringBuilder @out)
  {
    @out.Append("<window>");
    @out.Append("<Type>");
    @out.Append(Type);
    @out.Append("</Type>");
    @out.Append("</window>");
  }
}

public class Roof
{
  public string Type { get; set; }

  public void Serialize(ref StringBuilder @out)
  {
    @out.Append("<roof>");
    @out.Append("<Type>");
    @out.Append(Type);
    @out.Append("</Type>");
    @out.Append("</roof>");
  }
}

The example I’ve given is a rather clean example.  I have seen XML generated with much uglier code.  This is the manual method of serializing XML.  One almost obvious weakness is that the output produced is a straight line of XML, which is not human-readable.  In order to allow human readable XML output to be produced with an on/off switch, extra logic will need to be incorporated that would append the newline and add tabs for indents.  Another problem with this method is that it contains a lot of code that is unnecessary.  One typo and the XML is incorrect.  Future editing is hazardous because tags might not match up if code is inserted in the middle and care is not taken to test such conditions.  Unit testing something like this is an absolute must.

The easy method is to use the XML serializer.  To produce the correct output, it is sometimes necessary to add attributes to properties in objects to be serialized.  Here is the object definition that produces the same output:

public class House
{
  [XmlElement(ElementName = "wall")]
  public List Walls = new List();

  [XmlElement(ElementName = "roof")]
  public List Roofs = new List();

  [XmlElement(ElementName = "size")]
  public int Size { get; set; }
}

public class Wall
{
  [XmlElement(ElementName = "window")]
  public List Windows { get; set; }

  public bool ShouldSerializenullable()
  {
    return Windows == null;
  }
}

public class Window
{
  public string Type { get; set; }
}

public class Roof
{
  public string Type { get; set; }
}

In order to serialize the above objects into XML, you use the XMLSerializer object:

public static class CreateXMLData
{
  public static string Serialize(this House house)
  {
    var xmlSerializer = new XmlSerializer(typeof(House));

    var settings = new XmlWriterSettings
    {
      NewLineHandling = NewLineHandling.Entitize,
      IndentChars = "\t",
      Indent = true
    };

    using (var stringWriter = new Utf8StringWriter())
    {
      var writer = XmlWriter.Create(stringWriter, settings);
      xmlSerializer.Serialize(writer, house);

      return stringWriter.GetStringBuilder().ToString();
    }
  }
}

You’ll also need to create a Utf8StringWriter Class:

public class Utf8StringWriter : StringWriter
{
  public override Encoding Encoding
  {
    get { return Encoding.UTF8; }
  }
}

Unit Testing

I would recommend unit testing each section of your XML.  Test with sections empty as well as containing one or more items.  You want to make sure you capture instances of null lists or empty items that should not generate XML output.  If there are any special attributes, make sure that the XML generated matches the specification.  For my unit testing, I stripped newlines and tabs to compare with a sample XML file that is stored in my unit test project.  As a first-attempt, I created a helper for my unit tests:

public static class XmlResultCompare
{
  public static string ReadExpectedXml(string expectedDataFile)
  {
    var assembly = Assembly.GetExecutingAssembly();
    using (var stream = assembly.GetManifestResourceStream(expectedDataFile))
    {
      using (var reader = new StreamReader(stream))
      {
        return reader.ReadToEnd().RemoveWhiteSpace();
      }
    }
  }

  public static string RemoveWhiteSpace(this string s)
  {
    s = s.Replace("\t", "");
    s = s.Replace("\r", "");
    s = s.Replace("\n", "");
  return s;
  }
}

If you look carefully, I ‘m compiling my xml test data right into the unit test dll.  Why am I doing that?  The company that I work for as well as most serious companies use continuous integration tools such as a build server.  The problem with a build server is that your files might not make it to the same directory location on the build server that they are on your PC.  To ensure that the test files are there, compile them into the dll and reference them from the namespace using Assembly.GetExecutingAssembly().  To make this work, you’ll have to mark your xml test files as an Embedded Resource (click on the xml file and change the Build Action property to Embedded Resource).  To access the files, which are contained in a virtual directory called “TestData”, you’ll need to use the name space, the virtual directory and the full file name:

XMLCreatorTests.TestData.XMLHouseOneWallOneWindow.xml

Now for a sample unit test:

[Fact]
public void TestOneWallNoWindow()
{
  // one wall, no windows
  var house = new House { Size = 2000 };
  house.Walls.Add(new Wall());

  Assert.Equal(XmlResultCompare.ReadExpectedXml("XMLCreatorTests.TestData.XMLHouseOneWallNoWindow.xml"), house.Serialize().RemoveWhiteSpace());
}

Notice how I filled in the house object with the size and added one wall.  The ReadExpectedXml() method will remove whitespaces automatically, so it’s important to remove them off the serialized version of house in order to match.

Where to Get the Code

As always you can go to my GitHub account and download the sample application (click here).  I would recommend downloading the application and modifying it as a test to see how all the piece work.  Add a unit test to see if you can match your expected xml with the xml serializer.

Leave a Reply