Dynamic or Extended Field Data

Summary

SQL Databases are very powerful mechanisms for storing and retrieving data.  With careful design, a SQL database can store large quantities of records and retrieve them at lightning speeds.  The downside to a SQL database is that they are very rigid structures and, if not carefully designed, they can become slow and unwieldy.  There is a common mechanism that allows a database to be dynamically extended.  This mechanism can allow the customer to add a field to a table that represents a holding place for data that the system was never designed to hold.  Such a mechanism is used in SAS (Software As a Service) systems where there are multiple customers with different data needs that cannot be accommodated by enhancements by the SAS company.  I’m going to show several techniques that I’ve seen used, as well as their pros and cons, and then I’m going to show an effective technique for performing this function.

Fixed Fields

I’m going to start with one of the worse techniques I’ve seen used.  The fixed field technique.  The gist of this technique is that each table that allows extended data will have multiple extended fields.  Here’s a sample table (called productextended):

If you’re not cringing after looking at the sample table design above, then you need to read on.  The table above is only a small sample of systems that I’ve seen in production.  Production systems I’ve worked on have more than 10 text, 10 datetime, 10 shortdate, 10 int, 10 float, etc.  In order to make such a system work there is usually some sort of dictionary of information stored that tells what each field is used for.  Then there is a configuration screen that allows the customer to choose what each of those fixed fields can be used as.

Here is an example meta data lookup table with fields matching the sample data in the extended table above:

If you examine the lookup table above, you’ll see that the Money1 field represents a sale price and the Bit1 field represents a flag indicating that the product is sold out.  There is no relational integrity between these tables because normalization between these two does not exist.  The tables are not relational.  If you delete a data type in the meta data lookup table and re-use the original field to represent other data, then the data that existed will become the new data.  You’ll need to write special code to handle a situation when a field is re-used for other purposes.  Your code would need to delete data from that field for the entire productextended table.

I’m going to list some obvious disadvantages to using this technique:

  • There are a limited number of extended fields available per table
  • Tables are extra wide and slow
  • First normal form is broken, no relational integrity constraints can be used.

Using XML Field

The second example that I’ve seen is the use of one extended field of XML data type.  This was a very clever idea involving one field on each table called “extended” and it was setup to contain XML data.  The data stored in this field is serialized and de-serialized by the software and then the actual data is read from a POCO object.  Here’s a sample table:

This is less cringe-worthy than the previous example.  The advantage to this setup is that the table width is still manageable and first normal form has not been broken (although, there is no relation to the extended data).  If the extended field is being serialized and de-serialized into a POCO, then the customer will not be able to change field data on the fly, unless the POCO contains some clever setup, like an array of data fields that can be used at run-time (my example will show this ability).

Here is a sample POCO for the ProducXml table:

public class ProductXml
{
    public XElement XmlValueWrapper
    {
        get { return XElement.Parse(Extended); }
        set { Extended = value.ToString(); }
    }

    public int Id { get; set; }
    public int Store { get; set; }
    public string Name { get; set; } 
    public decimal? Price { get; set; }
    public string Extended { get; set; }

    public virtual Store StoreNavigation { get; set; }
}

The POCO for the xml data looks like this:

public class ExtendedXml
{
    public decimal SalePrice { get; set; }
    public bool OutOfStock { get; set; }
    public DataRecord ExtendedData = new DataRecord();
}

public class DataRecord
{
    public string Key { get; set; }
    public string Value { get; set; }
}

To make this work, you’ll need to tweak the EF configuration to look something like this:

public static class ProductXmlConfig
{
    public static void ProductMapping(this ModelBuilder modelBuilder)
    {
        modelBuilder.Entity(entity =>
        {
            entity.ToTable("ProductXml");

            entity.HasKey(e => e.Id);
            entity.Property(e => e.Id).HasColumnName("Id");
            entity.Property(e => e.Name).HasColumnType("varchar(50)");
            entity.Property(e => e.Price).HasColumnType("money");
            entity.Property(c => c.Extended).HasColumnType("xml");
            entity.Ignore(c => c.XmlValueWrapper);

            entity.HasOne(d => d.StoreNavigation)
                .WithMany(p => p.ProductXml)
                .HasForeignKey(d => d.Store)
                .OnDelete(DeleteBehavior.Restrict)
                .HasConstraintName("FK_store_product_xml");
        });
    }
}

Now the Linq code to insert a value could look like this:

using (var db = new DatabaseContext())
{
    var extendedXml = new ExtendedXml
    {
        SalePrice = 3.99m,
        OutOfStock = false,
        ExtendedData = new DataRecord
        {
            Key = "QuantityInWarehouse",
            Value = "5"
        }
    };

    var productXml = new ProductXml
    {
        Store = 1,
        Name = "Stuffed Animal",
        Price = 5.95m,
        Extended = extendedXml.Serialize()
    };

    db.ProductXmls.Add(productXml);
    db.SaveChanges();
}

The results of executing the Linq code from above will produce a new record like this:

If you click on the XML link in SQL, you should see something like this:

As you can see, there are two hard-coded POCO fields for the sale price and out of stock values.  These are not customer controlled fields.  These fields demonstrate that an enhancement could use the extended field to store new data without modifying the table.  The dictionary data contains one item called QuantityInWarehouse.  This is a customer designed field and these can be added through a data entry screen and maybe a meta data table that contains the names of extra data fields stored in the extended field.

XML allows flexible serialization, so if you add a field to the xml POCO, it will still de-serialize xml that does not contain that data (just make sure POCO field is nullable).

To see a working example, go to my GitHub account (see end of this article) and download the sample code.

You can use the following SQL query to extract data from the xml:

SELECT
    Extended.value('(/ExtendedXml/SalePrice)[1]', 'nvarchar(max)') as 'SalePrice',
    Extended.value('(/ExtendedXml/OutOfStock)[1]', 'nvarchar(max)') as 'OutOfStock', 
    Extended.value('(/ExtendedXml/ExtendedData/Key)[1]', 'nvarchar(max)') as 'Key',
    Extended.value('(/ExtendedXml/ExtendedData/Value)[1]', 'nvarchar(max)') as 'Value'
FROM 
    ProductXml

The query above should produce the following output:

Here are some disadvantages to using this technique:

  • It is difficult to query one extended field
  • Loading data is slow because the entire field of XML is loaded

Using Extended Table

This is the preferred example of designing an extended field system.  With this technique, data of any type can be added to any table without adding fields to the existing tables.  This technique does not break first normal form and forming a query is easy and powerful.  The idea behind this technique is to create two tables: The first table contains metadata describing the table and field to extend and the second table contains the actual value of the data stored.  Here’s an example of MetaDataDictionary table:

Here’s an example of the ExtendedData table:


A custom query can be formed to output all of the extended data for each record.  Here’s an example for the above data for the product table:

SELECT
	p.*,
	(SELECT e.value FROM ExtendedData e WHERE e.RecordId = p.Id AND e.MetaDataDictionaryId=1) AS 'SalePrice',
	(SELECT e.value FROM ExtendedData e WHERE e.RecordId = p.Id AND e.MetaDataDictionaryId=2) AS 'SoldOut'
FROM	
	Product p

This will produce:

To obtain data from one extended field, a simple query can be formed to lookup the value.  This leads to another bonus, the fact that Entity Framework and Linq can be used to query data that is organized in this fashion.  Why is this so important?  Because the use of EF and Linq allows all of the business logic to reside in code where it is executed by the front-end and it can be unit tested.  If there is a significant amount of code in a stored procedure, that code cannot be unit tested.

I’m going to list a few advantages of this method over the previous two methods:

  • Your implementation can have any number of extended fields
  • Any table in the system can be extended without modification to the database
  • Forming a query to grab one extended field value is simple

One thing to note about this method is that I’m storing the value in a varchar field.  You can change the size to accommodate any data stored.  You will need to perform some sort of data translation between the varchar and the actual data type you expect to store.  For example: If you are storing a date data type, then you might want some type checking when converting from varchar to the date expected.  The conversion might occur at the Linq level, or you might do it with triggers on the extended value table (though, I would avoid such a design, since it will probably chew up a lot of SQL CPU resources).

Where to Get the Code

You can find the sample code used by the xml extended field example at my GitHub account (click here).  The project contains a file named “SqlServerScripts.sql”.  You can copy this script to your local SQL server to generate the tables in a demo database and populate the data used by this blog post (saving you a lot of time).

 

Leave a Reply