Dynamic or Extended Field Data

Summary

SQL Databases are very powerful mechanisms for storing and retrieving data.  With careful design, a SQL database can store large quantities of records and retrieve them at lightning speeds.  The downside to a SQL database is that they are very rigid structures and, if not carefully designed, they can become slow and unwieldy.  There is a common mechanism that allows a database to be dynamically extended.  This mechanism can allow the customer to add a field to a table that represents a holding place for data that the system was never designed to hold.  Such a mechanism is used in SAS (Software As a Service) systems where there are multiple customers with different data needs that cannot be accommodated by enhancements by the SAS company.  I’m going to show several techniques that I’ve seen used, as well as their pros and cons, and then I’m going to show an effective technique for performing this function.

Fixed Fields

I’m going to start with one of the worse techniques I’ve seen used.  The fixed field technique.  The gist of this technique is that each table that allows extended data will have multiple extended fields.  Here’s a sample table (called productextended):

If you’re not cringing after looking at the sample table design above, then you need to read on.  The table above is only a small sample of systems that I’ve seen in production.  Production systems I’ve worked on have more than 10 text, 10 datetime, 10 shortdate, 10 int, 10 float, etc.  In order to make such a system work there is usually some sort of dictionary of information stored that tells what each field is used for.  Then there is a configuration screen that allows the customer to choose what each of those fixed fields can be used as.

Here is an example meta data lookup table with fields matching the sample data in the extended table above:

If you examine the lookup table above, you’ll see that the Money1 field represents a sale price and the Bit1 field represents a flag indicating that the product is sold out.  There is no relational integrity between these tables because normalization between these two does not exist.  The tables are not relational.  If you delete a data type in the meta data lookup table and re-use the original field to represent other data, then the data that existed will become the new data.  You’ll need to write special code to handle a situation when a field is re-used for other purposes.  Your code would need to delete data from that field for the entire productextended table.

I’m going to list some obvious disadvantages to using this technique:

  • There are a limited number of extended fields available per table
  • Tables are extra wide and slow
  • First normal form is broken, no relational integrity constraints can be used.

Using XML Field

The second example that I’ve seen is the use of one extended field of XML data type.  This was a very clever idea involving one field on each table called “extended” and it was setup to contain XML data.  The data stored in this field is serialized and de-serialized by the software and then the actual data is read from a POCO object.  Here’s a sample table:

This is less cringe-worthy than the previous example.  The advantage to this setup is that the table width is still manageable and first normal form has not been broken (although, there is no relation to the extended data).  If the extended field is being serialized and de-serialized into a POCO, then the customer will not be able to change field data on the fly, unless the POCO contains some clever setup, like an array of data fields that can be used at run-time (my example will show this ability).

Here is a sample POCO for the ProducXml table:

public class ProductXml
{
    public XElement XmlValueWrapper
    {
        get { return XElement.Parse(Extended); }
        set { Extended = value.ToString(); }
    }

    public int Id { get; set; }
    public int Store { get; set; }
    public string Name { get; set; } 
    public decimal? Price { get; set; }
    public string Extended { get; set; }

    public virtual Store StoreNavigation { get; set; }
}

The POCO for the xml data looks like this:

public class ExtendedXml
{
    public decimal SalePrice { get; set; }
    public bool OutOfStock { get; set; }
    public DataRecord ExtendedData = new DataRecord();
}

public class DataRecord
{
    public string Key { get; set; }
    public string Value { get; set; }
}

To make this work, you’ll need to tweak the EF configuration to look something like this:

public static class ProductXmlConfig
{
    public static void ProductMapping(this ModelBuilder modelBuilder)
    {
        modelBuilder.Entity(entity =>
        {
            entity.ToTable("ProductXml");

            entity.HasKey(e => e.Id);
            entity.Property(e => e.Id).HasColumnName("Id");
            entity.Property(e => e.Name).HasColumnType("varchar(50)");
            entity.Property(e => e.Price).HasColumnType("money");
            entity.Property(c => c.Extended).HasColumnType("xml");
            entity.Ignore(c => c.XmlValueWrapper);

            entity.HasOne(d => d.StoreNavigation)
                .WithMany(p => p.ProductXml)
                .HasForeignKey(d => d.Store)
                .OnDelete(DeleteBehavior.Restrict)
                .HasConstraintName("FK_store_product_xml");
        });
    }
}

Now the Linq code to insert a value could look like this:

using (var db = new DatabaseContext())
{
    var extendedXml = new ExtendedXml
    {
        SalePrice = 3.99m,
        OutOfStock = false,
        ExtendedData = new DataRecord
        {
            Key = "QuantityInWarehouse",
            Value = "5"
        }
    };

    var productXml = new ProductXml
    {
        Store = 1,
        Name = "Stuffed Animal",
        Price = 5.95m,
        Extended = extendedXml.Serialize()
    };

    db.ProductXmls.Add(productXml);
    db.SaveChanges();
}

The results of executing the Linq code from above will produce a new record like this:

If you click on the XML link in SQL, you should see something like this:

As you can see, there are two hard-coded POCO fields for the sale price and out of stock values.  These are not customer controlled fields.  These fields demonstrate that an enhancement could use the extended field to store new data without modifying the table.  The dictionary data contains one item called QuantityInWarehouse.  This is a customer designed field and these can be added through a data entry screen and maybe a meta data table that contains the names of extra data fields stored in the extended field.

XML allows flexible serialization, so if you add a field to the xml POCO, it will still de-serialize xml that does not contain that data (just make sure POCO field is nullable).

To see a working example, go to my GitHub account (see end of this article) and download the sample code.

You can use the following SQL query to extract data from the xml:

SELECT
    Extended.value('(/ExtendedXml/SalePrice)[1]', 'nvarchar(max)') as 'SalePrice',
    Extended.value('(/ExtendedXml/OutOfStock)[1]', 'nvarchar(max)') as 'OutOfStock', 
    Extended.value('(/ExtendedXml/ExtendedData/Key)[1]', 'nvarchar(max)') as 'Key',
    Extended.value('(/ExtendedXml/ExtendedData/Value)[1]', 'nvarchar(max)') as 'Value'
FROM 
    ProductXml

The query above should produce the following output:

Here are some disadvantages to using this technique:

  • It is difficult to query one extended field
  • Loading data is slow because the entire field of XML is loaded

Using Extended Table

This is the preferred example of designing an extended field system.  With this technique, data of any type can be added to any table without adding fields to the existing tables.  This technique does not break first normal form and forming a query is easy and powerful.  The idea behind this technique is to create two tables: The first table contains metadata describing the table and field to extend and the second table contains the actual value of the data stored.  Here’s an example of MetaDataDictionary table:

Here’s an example of the ExtendedData table:


A custom query can be formed to output all of the extended data for each record.  Here’s an example for the above data for the product table:

SELECT
	p.*,
	(SELECT e.value FROM ExtendedData e WHERE e.RecordId = p.Id AND e.MetaDataDictionaryId=1) AS 'SalePrice',
	(SELECT e.value FROM ExtendedData e WHERE e.RecordId = p.Id AND e.MetaDataDictionaryId=2) AS 'SoldOut'
FROM	
	Product p

This will produce:

To obtain data from one extended field, a simple query can be formed to lookup the value.  This leads to another bonus, the fact that Entity Framework and Linq can be used to query data that is organized in this fashion.  Why is this so important?  Because the use of EF and Linq allows all of the business logic to reside in code where it is executed by the front-end and it can be unit tested.  If there is a significant amount of code in a stored procedure, that code cannot be unit tested.

I’m going to list a few advantages of this method over the previous two methods:

  • Your implementation can have any number of extended fields
  • Any table in the system can be extended without modification to the database
  • Forming a query to grab one extended field value is simple

One thing to note about this method is that I’m storing the value in a varchar field.  You can change the size to accommodate any data stored.  You will need to perform some sort of data translation between the varchar and the actual data type you expect to store.  For example: If you are storing a date data type, then you might want some type checking when converting from varchar to the date expected.  The conversion might occur at the Linq level, or you might do it with triggers on the extended value table (though, I would avoid such a design, since it will probably chew up a lot of SQL CPU resources).

Where to Get the Code

You can find the sample code used by the xml extended field example at my GitHub account (click here).  The project contains a file named “SqlServerScripts.sql”.  You can copy this script to your local SQL server to generate the tables in a demo database and populate the data used by this blog post (saving you a lot of time).

 

DotNet Core vs. NHibernate vs. Dapper Smackdown!

The Contenders

Dapper

Dapper is a hybrid ORM.  This is a great ORM for those who have a lot of ADO legacy code to convert.  Dapper uses SQL queries and parameters can be used just like ADO, but the parameters to a query can be simplified into POCOs.  Select queries in Dapper can also be translated into POCOs.  Converting legacy code can be accomplished in steps because the initial pass of conversion from ADO is to add Dapper, followed by a step to add POCOs, then to change queries into LINQ (if desired).  The speed difference in my tests show that Dapper is better than my implementation of ADO for select queries, but slower for inserts and updates.  I would expect ADO to perform the best, but there is probably a performance penalty for using the data set adapter instead of the straight sqlCommand method.

If you’re interested in Dapper you can find information here: Stack Exchange/Dapper.   Dapper has a NuGet package, which is the method I used for my sample program.

ADO

I rarely use ADO these days, with the exception of legacy code maintenance or if I need to perform some sort of bulk insert operation for a back-end system.  Most of my projects are done in Entity Framework, using the .Net Core or the .Net version.  This comparison doesn’t feel complete without including ADO, even though my smackdown series is about ORM comparisons.  So I assembled a .Net console application with some ADO objects and ran a speed test with the same data as all the ORM tests.

NHibernate

NHiberate is the .Net version of Hibernate.  This is an ORM that I used at a previous company that I worked for.  At the time it was faster than Entity Framework 6 by a large amount.  The .Net Core version of Entity Framework has fixed the performance issues of EF and it no longer makes sense to use NHibernate.  I am providing the numbers in this test just for comparison purposes.  NHibernate is still faster than ADO and Dapper for everything except the select.  Both EF-7 and NHibernate are so close in performance that I would have to conclude that they are the same.  The version of NHibernate used for this test is the latest version as of this post (version 4.1.1 with fluent 2.0.3).

Entity Framework 7 for .Net Core

I have updated the NuGet packages for .Net Core for this project and re-tested the code to make sure the performance has not changed over time.  The last time I did a smackdown with EF .Net Core I was using .Net Core version 1.0.0, now I’m using .Net Core 1.1.1.  There were not measurable changes in performance for EF .Net Core.

The Results

Here are the results side-by-side with the .ToList() method helper and without:

Test for Yourself!

First, you can download the .Net Core version by going to my GitHub account here and downloading the source.  There is a SQL script file in the source that you can run against your local MS SQL server to setup a blank database with the correct tables.  The NHibernate speed test code can also be downloaded from my GitHub account by clicking here. The ADO version is here.  Finally, the Dapper code is here.  You’ll want to open the code and change the database server name.

 

SQL Server is Crazy!

If you’ve ever used SQL server for a serious project, you’re already nodding your head at the title of this blog post.  There are many aspects of SQL server that is a bit quirky.  Overall I think SQL Server is a very good but expensive product.  I have experience with Oracle as well, and Oracle has it’s own quirks and it is also an expensive product.  But I’m not here to rate products, or decide which product I think you should use.  I’m going to talk about one of the quirks of SQL Server that you might run into if you’re attempting to work with a legacy product that was built around SQL Server.

Field Names

One of the problems with using an ORM with MS SQL Server is that SQL allows characters in its field names that C# cannot handle.  For example: ‘#’ and ‘@’ are allowed as a character in a field name (‘#’ is not allowed as the first character).  Double-quotes can be in a field name (even the first character can be a quote).  Fields can start with a number (Yikes!).  Don’t believe me?  Check this out:


Now you have to wonder, what would Entity Framework do with this mess?  Let’s find out…


It looks like EF puts a “C” in front to prevent numbers from being the first character and then it turns all unacceptable symbols into underscores.  That could cause a problem if another field name existed… let’s see what happens if another field is already named the name that EF wants to to rename a field name…

  
Now, let’s see what EF does…


OK, it appears that EF will make the second occurrence of “field_one” unique by adding an number to the end of the field name.  Keeping the names unique.


Plural Table Names

I have a table in my example database named “person”.  When you create a LINQ query of this table you end up with this:

using (var db = new sampledataEntities())
{
    var query = (from p in db.people
                             select p).ToList();
}

The name “people” is the DBSet of the “person” object that represents the table “person”.  Now, what if we named the table “persons”?  It will name it “persons”.  Now, let’s create a table named “people”, just to mess up EF. 

Yikes!  It named the record object “person” and it uses “people” as the table name.  That would be confusing if you were working with the person table and your query is actually looking at the people table.  Now let’s add “person”, “persons” and “people” into EF all at once.

The first thing that happened to me is that none of the tables show in the edmx model diagram.  It did, however, create three objects for the tables:

using (var db = new sampledataEntities())
{
    var query = (from p in db.people
                             select p).ToList();

    var query2 = (from p2 in db.people1
                                select p2).ToList();

    var query3 = (from p3 in db.persons
                                select p3).ToList();
}

One of the table objects is named “people”, one is named “people1” and the last is named “persons”.  In order to figure out which object points to which table, you’ll need to open your edmx file with an xml editor (right-click on the edmx file, select “open with”, then choose “automatic editor selector (XML)”).  Then you can search for people, people1 or persons and reverse engineer which table is being accessed.


Summary

Keep this little exercise in mind when you design a database and choose field and table names.  Avoid any symbols that are no acceptable in a C# variable name.  Also, avoid using a plural name on a table name.  Your goal is to choose a table name that will translate into an object name very similar to your table name.  Otherwise, any developer that maintains your code will have to deal with an extra layer of confusion.

 

How to detect if your code was called from a unit test

I’ve done a few unit test blog posts, but this time I’m going to demonstrate how to detect if a method is running under a unit test.  This flag can be used to break dependencies in an object or run a test database instance instead of the production instance.  

The first article I stumbled across was this stack overflow article on just such a problem:

determine-if-code-is-running-as-part-of-a-unit-test

Answer #22 had my solution, using reflection to determine if one of the included DLL’s was the UnitTestFramework.


public static class UnitTestDetector
{
    public static bool IsInUnitTest()
    {
        string assemblyName = 
         “Microsoft.VisualStudio.QualityTools.UnitTestFramework“;
        return AppDomain.CurrentDomain.GetAssemblies().Any(a => 
                   a.FullName.StartsWith(assemblyName));
    }
}


So what’s the point?  The purpose of this class/method is to embed this into your context code and force your ORM to use a test database when you are running unit tests, but run your production database when the calling assembly is not a unit test assembly.  By using this code to switch your database at the context level, you no longer have to worry about breaking dependencies of your objects that will be under unit test.

Another advantage is that this can be used in multiple ORM’s (such as EF, NHibernate, etc.).  It can even be used in direct queries if you have a SqlConnection wrapper that feeds the connection string in one place.

In a future post, I’ll demonstrate how to use this class/method in Entity Framework and show how to connect EF to SQLLocalDB for unit testing purposes.