XML Serialization

Summary

In this post I’m going to demonstrate the proper way to serialize XML and setup unit tests using xUnit and .Net Core.  I will also be using Visual Studio 2017.

Generating XML

JSON is rapidly taking over as the data encoding standard of choice.  Unfortunately, government agencies are decades behind the technology curve and XML is going to be around for a long time to come.  One of the largest industries industries still using XML for a majority of their data transfer encoding is the medical industry.  Documents required by meaningful use are mostly encoded in XML.  I’m not going to jump into the gory details of generating a CCD.  Instead, I’m going to keep this really simple.

First, I’m going to show a method of generating XML that I’ve seen many times.  Usually coded by a programmer with little or no formal education in Computer Science.  Sometimes programmers just take a short-cut because it appears to be the simplest way to get the product out the door.  So I’ll show the technique and then I’ll explain why it turns out that this is a very poor way of designing an XML generator.

Let’s say for instance we wanted to generate XML representing a house.  First we’ll define the house as a record that can contain square footage.  That will be the only data point assigned to the house record (I mentioned this was going to be simple right).  Inside of the house record will be lists of walls and lists of roofs (assume a house could have two or more roofs like a tri-level configuration).  Next, I’m going to make a list of windows for the walls.  The window block will have a “Type” that is a free-form string input and the roof block will also have a “Type” that is a free-form string.  That is the whole definition.

public class House
{
  public List Walls = new List();
  public List Roofs = new List();
  public int Size { get; set; }
}

public class Wall
{
  public List Windows { get; set; }
}

public class Window
{
  public string Type { get; set; }
}

public class Roof
{
  public string Type { get; set; }
}

The “easy” way to create XML from this is to use the StringBuilder and just build XML tags around the data in your structure.  Here’s a sample of the possible code that a programmer might use:

public class House
{
  public List<Wall> Walls = new List<Wall>();
  public List<Roof> Roofs = new List<Roof>();
  public int Size { get; set; }

  public string Serialize()
  {
    var @out = new StringBuilder();

    @out.Append("<?xml version=\"1.0\" encoding=\"utf-8\"?>");
    @out.Append("<House xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\">");

    foreach (var wall in Walls)
    {
      wall.Serialize(ref @out);
    }

    foreach (var roof in Roofs)
    {
      roof.Serialize(ref @out);
    }

    @out.Append("<size>");
    @out.Append(Size);
    @out.Append("</size>");

    @out.Append("</House>");

    return @out.ToString();
  }
}

public class Wall
{
  public List<Window> Windows { get; set; }

  public void Serialize(ref StringBuilder @out)
  {
    if (Windows == null || Windows.Count == 0)
    {
      @out.Append("<wall />");
      return;
    }

    @out.Append("<wall>");
    foreach (var window in Windows)
    {
      window.Serialize(ref @out);
    }
    @out.Append("</wall>");
  }
}

public class Window
{
  public string Type { get; set; }

  public void Serialize(ref StringBuilder @out)
  {
    @out.Append("<window>");
    @out.Append("<Type>");
    @out.Append(Type);
    @out.Append("</Type>");
    @out.Append("</window>");
  }
}

public class Roof
{
  public string Type { get; set; }

  public void Serialize(ref StringBuilder @out)
  {
    @out.Append("<roof>");
    @out.Append("<Type>");
    @out.Append(Type);
    @out.Append("</Type>");
    @out.Append("</roof>");
  }
}

The example I’ve given is a rather clean example.  I have seen XML generated with much uglier code.  This is the manual method of serializing XML.  One almost obvious weakness is that the output produced is a straight line of XML, which is not human-readable.  In order to allow human readable XML output to be produced with an on/off switch, extra logic will need to be incorporated that would append the newline and add tabs for indents.  Another problem with this method is that it contains a lot of code that is unnecessary.  One typo and the XML is incorrect.  Future editing is hazardous because tags might not match up if code is inserted in the middle and care is not taken to test such conditions.  Unit testing something like this is an absolute must.

The easy method is to use the XML serializer.  To produce the correct output, it is sometimes necessary to add attributes to properties in objects to be serialized.  Here is the object definition that produces the same output:

public class House
{
  [XmlElement(ElementName = "wall")]
  public List Walls = new List();

  [XmlElement(ElementName = "roof")]
  public List Roofs = new List();

  [XmlElement(ElementName = "size")]
  public int Size { get; set; }
}

public class Wall
{
  [XmlElement(ElementName = "window")]
  public List Windows { get; set; }

  public bool ShouldSerializenullable()
  {
    return Windows == null;
  }
}

public class Window
{
  public string Type { get; set; }
}

public class Roof
{
  public string Type { get; set; }
}

In order to serialize the above objects into XML, you use the XMLSerializer object:

public static class CreateXMLData
{
  public static string Serialize(this House house)
  {
    var xmlSerializer = new XmlSerializer(typeof(House));

    var settings = new XmlWriterSettings
    {
      NewLineHandling = NewLineHandling.Entitize,
      IndentChars = "\t",
      Indent = true
    };

    using (var stringWriter = new Utf8StringWriter())
    {
      var writer = XmlWriter.Create(stringWriter, settings);
      xmlSerializer.Serialize(writer, house);

      return stringWriter.GetStringBuilder().ToString();
    }
  }
}

You’ll also need to create a Utf8StringWriter Class:

public class Utf8StringWriter : StringWriter
{
  public override Encoding Encoding
  {
    get { return Encoding.UTF8; }
  }
}

Unit Testing

I would recommend unit testing each section of your XML.  Test with sections empty as well as containing one or more items.  You want to make sure you capture instances of null lists or empty items that should not generate XML output.  If there are any special attributes, make sure that the XML generated matches the specification.  For my unit testing, I stripped newlines and tabs to compare with a sample XML file that is stored in my unit test project.  As a first-attempt, I created a helper for my unit tests:

public static class XmlResultCompare
{
  public static string ReadExpectedXml(string expectedDataFile)
  {
    var assembly = Assembly.GetExecutingAssembly();
    using (var stream = assembly.GetManifestResourceStream(expectedDataFile))
    {
      using (var reader = new StreamReader(stream))
      {
        return reader.ReadToEnd().RemoveWhiteSpace();
      }
    }
  }

  public static string RemoveWhiteSpace(this string s)
  {
    s = s.Replace("\t", "");
    s = s.Replace("\r", "");
    s = s.Replace("\n", "");
  return s;
  }
}

If you look carefully, I ‘m compiling my xml test data right into the unit test dll.  Why am I doing that?  The company that I work for as well as most serious companies use continuous integration tools such as a build server.  The problem with a build server is that your files might not make it to the same directory location on the build server that they are on your PC.  To ensure that the test files are there, compile them into the dll and reference them from the namespace using Assembly.GetExecutingAssembly().  To make this work, you’ll have to mark your xml test files as an Embedded Resource (click on the xml file and change the Build Action property to Embedded Resource).  To access the files, which are contained in a virtual directory called “TestData”, you’ll need to use the name space, the virtual directory and the full file name:

XMLCreatorTests.TestData.XMLHouseOneWallOneWindow.xml

Now for a sample unit test:

[Fact]
public void TestOneWallNoWindow()
{
  // one wall, no windows
  var house = new House { Size = 2000 };
  house.Walls.Add(new Wall());

  Assert.Equal(XmlResultCompare.ReadExpectedXml("XMLCreatorTests.TestData.XMLHouseOneWallNoWindow.xml"), house.Serialize().RemoveWhiteSpace());
}

Notice how I filled in the house object with the size and added one wall.  The ReadExpectedXml() method will remove whitespaces automatically, so it’s important to remove them off the serialized version of house in order to match.

Where to Get the Code

As always you can go to my GitHub account and download the sample application (click here).  I would recommend downloading the application and modifying it as a test to see how all the piece work.  Add a unit test to see if you can match your expected xml with the xml serializer.

 

 

 

Serializing Data

Summary

In this blog post I’m going to talk about some tricky problems with serializing and deserializing data.  In particular, I’m going to demonstrate a problem with the BinaryFormatter used in C# to turn an object into a byte array of data.

Using the BinaryFormatter Serializer

If you are serializing an object inside your project and storing the data someplace, then deserializing the same object in your project, things will work as expected.  I’ll show an example.  First, I’ll define a generic object called AddressClass.  Which stores address information:

[Serializable]
public class AddressClass
{
    public string Address1 { get; set; }
    public string Address2 { get; set; }
    public string City { get; set; }
    public string State { get; set; }
    public string Zip { get; set; }
}

The first thing you’ll notice is that there is a [Serializable] attribute.  This is needed in order for BinaryFormatter to serialize the object.  Next, I’ll create an instance of this object in my console application and populate with some dummy data:

// create an instance and put some dummy data into it.
var addressClass = new AddressClass
{
    Address1 = “123 Main st“,
    City = “New York“,
    State = “New York“,
    Zip = “12345
};

Now we’re ready to serialize the data.  In this example, I’ll just serialize this object into a byte array:

// serialize the object
using (var memoryStream = new MemoryStream())
{
    var binaryFormatter = new BinaryFormatter();
    binaryFormatter.Serialize(memoryStream, addressClass);

    storedData = memoryStream.ToArray();
}

There’s nothing fancy going on here.  You can even use a compressor inside the code above to compress the data before saving it someplace (like SQL or Redis or transmitting over the wire).  Now, I’m going to just deserialize the data into a new object instance:

//deserialize the object
AddressClass newObject;
using (var memoryStream = new MemoryStream())
{
    var binaryFormatter = new BinaryFormatter();

    memoryStream.Write(storedData, 0, storedData.Length);
    memoryStream.Seek(0, SeekOrigin.Begin);

    newObject = (AddressClass)binaryFormatter.Deserialize(memoryStream);
}

If you put a break-point at the end of this code as it is, you can see the newObject contains the exact same data that the addressClass instance contained.  In order to make all the code above work in one program you’ll have to include the following usings at the top:

using System;
using System.IO;
using System.Runtime.Serialization.Formatters.Binary;



Deserializing in a Different Program

Here’s where the trouble starts.  Let’s say that you have two different programs.  One program serializes the data and stores it someplace (or transmits it).  Then another program will read that data and deserialize it for its own use.  To simulate this, and avoid writing a bunch of code that will distract from this blog post, I’m going to dump the serialized data as an array of integers in a text file.  Then I’m going to copy that raw text data and then use it in my second program as preset data of a byte array.  Then I’m going to copy the AddressClass code above and the deserialize code above and put it in another program.  This should deserialize the data and put it into the new object as above.  But that doesn’t happen.  Here’s the error that will occur:

Unable to find assembly ‘SerializatingDataBlogPost, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null’.

This error occurs on this line:

newObject = (AddressClass)binaryFormatter.Deserialize(memoryStream);

Inside the serialized data is a reference to the dll that was used to serialize the information.  If it doesn’t match, then the assumption is that BinaryFormatter will not be able to convert the serialized data back into the object defined.

If you dig around, you’ll find numerous articles on how to get around this problem.  Using a BindToType object is one method as shown here:

Unable to find assembly with BinaryFormatter.Deserialize

And it goes down a rabbit hole from there.  

Another Solution

Another solution is to serialize the data into JSON format.  The Newtonsoft serializer is very good at serializing objects into JSON.  After that, the deserialized data can be cast back into the same object inside another dll.  Use the NuGet manager to add Newtonsoft to your project.  Then use the following code to serialize your addressClass object:

// serialize the object
var serializer = new JsonSerializer();
string resultSet = JsonConvert.SerializeObject(addressClass);


This will convert your object into the following string:

{“Address1″:”123 Main st”,”Address2″:null,”City”:”New York”,”State”:”New York”,”Zip”:”12345″}

Inside another project, you can deserialize the string above using this:

// deserialize object
AddressClass newObject;
newObject = JsonConvert.DeserializeObject<AddressClass>(resultSet);



Compressing the Data

JSON is a text format and it can take up a lot of space, so you can add a compressor to your code to reduce the amount of space that your serialized data takes up.  You can use the following methods to compress and decompress string data into byte array data:

private static byte[] Compress(string input)
{
    byte[] inputData = Encoding.ASCII.GetBytes(input);
    byte[] result;

    using (var memoryStream = new MemoryStream())
    {
        using (var zip = new GZipStream(memoryStream, CompressionMode.Compress))
        {
            zip.Write(inputData, 0, inputData.Length);
        }

        result = memoryStream.ToArray();
    }

    return result;
}

private static string Decompress(byte[] input)
{
    byte[] result;

    using (var outputMemoryStream = new MemoryStream())
    {
        using (var inputMemoryStream = new MemoryStream(input))
        {
            using (var zip = new GZipStream(inputMemoryStream, CompressionMode.Decompress))
            {
                zip.CopyTo(outputMemoryStream);
            }
        }

        result = outputMemoryStream.ToArray();
    }

    return Encoding.Default.GetString(result);
}

You’ll need the following usings:

using System.IO.Compression;
using System.IO;


Then you can pass your JSON text to the compressor like this:

// compress text
byte[] compressedResult = Compress(resultSet);


And you’ll need to decompress back into JSON before deserializing:

// decompress text
string resultSet = Decompress(compressedResult);



Where to Get the Code

As usual, you can go to my GitHub account and download the projects by clicking here.

 

XML Serializing Nullable Optional Attribute

Summary

The title of this blog post is a bit of a mouth-full.  I do a lot of xml serialization and de-serialization.  It’s all part of the new paradigm of using APIs to communicate with other systems over the Internet.  One of the annoying “features” of the xml serializer is that it doesn’t support nullable attributes.  It’ll serialize nullable elements, but not attributes.  So I’m going to show how to serialize nullable attributes and make the attribute optional.

The Problem

Here’s the example code of an XML serializer that will not work

public class House
{
    [XmlElement]
    public List<Room> rooms = new List<Room>();
}

public class Room
{
    [XmlAttribute(AttributeName = “name“)]
    public string Name { get; set; }

    [XmlAttribute(AttributeName = “windows“)]
    public int? NumberOfWindows { get; set; }       
}

In this instance I’m attempting to serialize a nullable integer named “NumberOfWindows”.  My goal is to produce an XML file that looks something like this (I removed the schema info to make this easier to read):

<?xml version=”1.0encoding=”utf-8“?>
  <rooms name=”kitchenwindows=”2” />
  <rooms name=”bathroomwindows=”0” />
  <rooms name=”closet” />
</House>

Notice how the “closet” doesn’t have any windows.  For a closet, we’ll assume that a window does not apply.  So the windows property must be nullable and it must be optional.

How to Fix it
 
First, let’s change the Room class so that it will serialize without getting an error.  The first thing to note is that we can turn the NumberOfWindows parameter into a string, and treat an empty string as the null value:

public class Room
{
    [XmlAttribute(AttributeName = “name“)]
    public string Name { get; set; }

    [XmlIgnore]
    public int? NumberOfWindows { get; set; }

    [XmlAttribute(AttributeName = “windows“)]
    public string WindowsSerializable 
    {
        get
        {
            if (NumberOfWindows != null)
            {
                return NumberOfWindows.ToString();
            }
            else
            {
                return “”;
            }
        }
        set
        {
            if (WindowsSerializable != null)
            {
                NumberOfWindows = int.Parse(WindowsSerializable);
            }
        }
    }
}

So the NumberOfWindows variable is checked for null and if it is then return an empty string.  That will cause the closet to return: windows=””, which is not quite what we want.  But at least it will execute and generate an xml output without causing an error.  Also, notice that I put an XmlIgnore on the variable that will be populated, but not used to generate the serialized output.

Now we need to make the NumberOfWindows attribute optional.  To make it optional we can add this to the end of the WindowsSerializable getter:

public bool ShouldSerializeWindowsSerializable()
{
    return NumberOfWindows.HasValue;
}


You can also do something like this:

public bool ShouldSerializeWindowsSerializable()
{
    return WindowsSerializable != “”;
}

The “ShouldSerialize{varname}” method will output your results for the variable indicated if it returns true.  So you can put any fancy logic in this method that you want to show or hide the attribute of your choice.


Where to Get the Code

You can go to my github account and download the sample code by clicking here.