NHibernate 4000 Character Limit, Sigh…

Summary

NHibernate has its share of issues.  One of it’s most notorious issues is the 4000 character limit.  That limit reveals itself when you attempt to write to text or VARCHAR fields in your database that exceed that limit.  In this post, I’m going to describe some solutions to this problem.


Mappings

One issue shows up in the mappings and I’ve mentioned this before.  To get around the mapping issue for VARCHAR(MAX) fields you’ll need to add some attributes:


Map(u => u.ExtraData).CustomType(“StringClob“)
.CustomSqlType(“varchar (MAX)“).Length(Int32.MaxValue);

This sample will setup the field named “ExtraData” with a custom type of StringClob, the SQL field type of VARCHAR(MAX) and set the length to the max int value.


Stored Procedure Parameters

Another crazy place where this problem shows up is in the parameters passed to a stored procedure or a SQL call:

db.CreateSQLQuery(“MyStoredProc :myparameter“)
    .SetParameter(“myparameter“, data, NHibernate.NHibernateUtil.StringClob)
    .ExecuteUpdate();



Data is the string that exceeds 4000 characters.  The NHibernate.NHibernateUtil.StringClob allows the parameter to handle a large string.

There are other methods of dealing with the mapping issue which I have not tried, here’s one interesting example on stack overflow:

string-unlimited-still-limited-to-4000-characters




 

SQL Server Secrets…

Summary

If you’re an old-hat with databases, this blog post will not be new to you.  I’m going to show some capabilities of SQL Server that will reveal how powerful a database server can be.  These “secrets” (OK, they’re really not secret to anybody with a browser and Google) actually apply to other data servers as well.

SQL Has a Database of Your Databases

That’s right.  You can open up the master database (under “System Databases”) and inside the “Views -> System Views” you’ll see a whole bunch of views to all the databases in the server.  You can query things like the list of tables in your database:

SELECT * FROM yourdatabasename.INFORMATION_SCHEMA.TABLES


Both views and tables will be listed here.  If you want to know the columns in your tables you can query from this view:

SELECT * FROM yourdatabasename.INFORMATION_SCHEMA.COLUMNS

This view will tell you which field is nullable, what the data type is, the character length, etc.

You can list all stored procedures with this query:

SELECT * FROM yourdatabasename.information_schema.routines WHERE routine_type = ‘PROCEDURE

You’ll need to use another query if you want the full text of the stored procedure:

USE [yourdatabasename]
SELECT OBJECT_DEFINITION(OBJECT_ID(‘stored_proc_name‘)) AS code

You might think you can use the ROUTINE_DEFINITION field from the previous query, but it has a limited string size and will cut off long stored procedure code.

Here is how you can find a list of primary keys, foreign keys and unique keys:

SELECT * FROM yourdatabasename.INFORMATION_SCHEMA.TABLE_CONSTRAINTS

You should see something like this:


The constraint type field will tell you which constraint you are looking at but this table will not tell you which table is connected to any foreign keys.  For that information you’ll have to run this query (the holy grail of foreign key queries):

SELECT PKTABLE_QUALIFIER = CONVERT(SYSNAME,DB_NAME()),
       PKTABLE_OWNER = CONVERT(SYSNAME,SCHEMA_NAME(O1.SCHEMA_ID)),
       PKTABLE_NAME = CONVERT(SYSNAME,O1.NAME),
       PKCOLUMN_NAME = CONVERT(SYSNAME,C1.NAME),
       FKTABLE_QUALIFIER = CONVERT(SYSNAME,DB_NAME()),
       FKTABLE_OWNER = CONVERT(SYSNAME,SCHEMA_NAME(O2.SCHEMA_ID)),
       FKTABLE_NAME = CONVERT(SYSNAME,O2.NAME),
       FKCOLUMN_NAME = CONVERT(SYSNAME,C2.NAME),
       — Force the column to be non-nullable (see SQL BU 325751)
       –KEY_SEQ             = isnull(convert(smallint,k.constraint_column_id), sysconv(smallint,0)),

       UPDATE_RULE = CONVERT(SMALLINT,CASE OBJECTPROPERTY(F.OBJECT_ID,’CnstIsUpdateCascade‘)
                                        WHEN 1 THEN 0
                                        ELSE 1
                                      END),
       DELETE_RULE = CONVERT(SMALLINT,CASE OBJECTPROPERTY(F.OBJECT_ID,’CnstIsDeleteCascade‘)
                                        WHEN 1 THEN 0
                                        ELSE 1
                                      END),
       FK_NAME = CONVERT(SYSNAME,OBJECT_NAME(F.OBJECT_ID)),
       PK_NAME = CONVERT(SYSNAME,I.NAME),
       DEFERRABILITY = CONVERT(SMALLINT,7)   — SQL_NOT_DEFERRABLE
FROM   SYS.ALL_OBJECTS O1,
       SYS.ALL_OBJECTS O2,
       SYS.ALL_COLUMNS C1,
       SYS.ALL_COLUMNS C2,
       SYS.FOREIGN_KEYS F
       INNER JOIN SYS.FOREIGN_KEY_COLUMNS K
         ON (K.CONSTRAINT_OBJECT_ID = F.OBJECT_ID)
       INNER JOIN SYS.INDEXES I
         ON (F.REFERENCED_OBJECT_ID = I.OBJECT_ID
             AND F.KEY_INDEX_ID = I.INDEX_ID)
WHERE  O1.OBJECT_ID = F.REFERENCED_OBJECT_ID
       AND O2.OBJECT_ID = F.PARENT_OBJECT_ID
       AND C1.OBJECT_ID = F.REFERENCED_OBJECT_ID
       AND C2.OBJECT_ID = F.PARENT_OBJECT_ID
       AND C1.COLUMN_ID = K.REFERENCED_COLUMN_ID
       AND C2.COLUMN_ID = K.PARENT_COLUMN_ID


which can be found on this blog post: Identify all of your foreign keys in a SQL Server database. 

This query can be used to generate foreign keys through code.  The code necessary to create a foreign key looks like this:

ALTER TABLE product ADD CONSTRAINT FK_ProductType FOREIGN KEY (producttype) REFERENCES producttype(Id)

So you’ll need variables for the two tables, the foreign key name and the two fields (producttype and id).

There are other views in the master database you can use to lookup your database configuration.  Just poke around and see what these views reveal.


So What’s the Purpose?

This information can be used to create a utility to compare the configuration of two databases.  For example, you might have a production database (or customer facing database) and a development database.  Many times these databases can get out of sync with each other.  A utility can be created to match tables, stored procedures, foreign keys, indexes, etc. to produce a report of differences.  By extension, a utility can be created to match one database to the other.  For instance, you might want to synchronize your stored procedures.  Or you might want to create a master template database and use it to generate indexes and foreign keys on your production and development databases.

I’ve used this information to create my ORM mapping code for Fluent NHibernate.  You could create another format to store your database configuration (i.e. store your tables, indexes and foreign keys in an excel spreadsheet, or xml or json, etc.).  This would be your master template that you can use to generate multiple databases.  In turn, this generating capability can be used in an installer to generate a clean start-up database that you can then pre-populate with your static information.

As I mentioned earlier, this is not limited to SQL Server.  You can store your database template in a format that is independent of the database you want to generate, then create multiple programs to generate the database type you want to use (like an Oracle database generator using the same tables, indexes and foreign key constraints).

One other application is to store your database configuration information in a format that can be easily copied and changed (i.e. like excel or text file format).  Then you can keep multiple versions of your database handy.  This can be linked with your software version so you can specify which database is needed with your latest software and a verification of your data structures can be executed by your deployment/install program before the software is upgraded.



 

 

Blogging Stuff…

Random Blogging Stuff…

First of all, I’ve decided to change the title of my blog to my name.  The subject of this blog is still software related and I might blog about hardware issues in the future.  My focus on ORM’s and Unit Testing has narrowed the scope of this blog away from hardware so I’m just going to use my name to give the blog a generic title.


Second, I have a whole bunch of posts in the hopper right now, but I haven’t been pumping out articles as fast as I’d like because I’m working on an application that will replace the NHibernate table mapping generator that I threw together a few months ago.  The new application is an actual windows app with a drop-down list of SQL servers to choose from and a check list box of databases to read for creating Fluent NHibernate ORM mapping files.  It also contains stored procedure and view scraping methods that can be used in a unit testing environment.  However, to make this package truly powerful is the other pieces that I’m working on.  I am literally building a ORM/SQL Server unit testing suite that works together.  There will be unit test helpers that start and stop an instance of SQLLocalDB as well as table truncation and utilities to pre-populate tables from xml or json files.  I would also like to add some tools to allow the unit test to create indexes and relational integrity constraints as needed.  These will work the same as the stored procedures do, it will scrape the information from MS SQL server and the unit test can just specify that relational integrity will be included.


Some of the subjects still in the hopper include:

– NHibernate outer join issues
– Game design, back to Battlefield One.
– Using SQLLocalDB with LINQ-to-SQL
– Legacy code
– EF testing with SQLLocalDB
– ORM vs Non-ORM software design
– Turning off Log4Net logging for NHibnerate
– NHibernate 4000 character limit (sigh).
– Unit testing Javascript

You might conclude that I’m a bit ADD.  Maybe, but I work in this technology during the day and my only regret is that I don’t have enough hours in the week to do all the research on these subjects that I’d like to do.  Some of this research occurs while I’m at work (to solve a problem).  Most of it occurs while I’m trying to put together a blog post (then it tends to make it’s way to work, it’s a symbiotic relationship).  


Other subjects on the back burner…

I’m also doing some reflection stuff.  I was thinking of blogging about that.  I have SQL Server knowledge that I’d like to blog about (how to use tools to tune your server and identify where to add indexes to reduce deadlocks and improve performance).  I’m doing some MongoDB research (yeah, I have a few minutes at the end of the week to try this out… LOL).  I still need to revisit micro-ORMs and do a full analysis on Dapper.Net.  I’d also like to connect NHibernate to Oracle and run some analysis on that.  I haven’t blogged much on Oracle but I have a lot of experience with Oracle from my past projects.  I did install Oracle on my PC over a year ago when I demonstrated it in a blog post.  So I’d like to do a bit more with my Oracle instance to show differences with SQL Server.  One other database that I have experience with is MySQL.  I  haven’t done anything serious with MySQL because the last time I analyzed MySQL it didn’t have constraints, which made it a toy.  I’d like to demonstrate the use of NHibernate with MySQL to show how two free products can be used to create a real application (in other words, I’ll compare the performance with SQL).

So many subjects, so little time…

 

SQL Server is Crazy!

If you’ve ever used SQL server for a serious project, you’re already nodding your head at the title of this blog post.  There are many aspects of SQL server that is a bit quirky.  Overall I think SQL Server is a very good but expensive product.  I have experience with Oracle as well, and Oracle has it’s own quirks and it is also an expensive product.  But I’m not here to rate products, or decide which product I think you should use.  I’m going to talk about one of the quirks of SQL Server that you might run into if you’re attempting to work with a legacy product that was built around SQL Server.

Field Names

One of the problems with using an ORM with MS SQL Server is that SQL allows characters in its field names that C# cannot handle.  For example: ‘#’ and ‘@’ are allowed as a character in a field name (‘#’ is not allowed as the first character).  Double-quotes can be in a field name (even the first character can be a quote).  Fields can start with a number (Yikes!).  Don’t believe me?  Check this out:


Now you have to wonder, what would Entity Framework do with this mess?  Let’s find out…


It looks like EF puts a “C” in front to prevent numbers from being the first character and then it turns all unacceptable symbols into underscores.  That could cause a problem if another field name existed… let’s see what happens if another field is already named the name that EF wants to to rename a field name…

  
Now, let’s see what EF does…


OK, it appears that EF will make the second occurrence of “field_one” unique by adding an number to the end of the field name.  Keeping the names unique.


Plural Table Names

I have a table in my example database named “person”.  When you create a LINQ query of this table you end up with this:

using (var db = new sampledataEntities())
{
    var query = (from p in db.people
                             select p).ToList();
}

The name “people” is the DBSet of the “person” object that represents the table “person”.  Now, what if we named the table “persons”?  It will name it “persons”.  Now, let’s create a table named “people”, just to mess up EF. 

Yikes!  It named the record object “person” and it uses “people” as the table name.  That would be confusing if you were working with the person table and your query is actually looking at the people table.  Now let’s add “person”, “persons” and “people” into EF all at once.

The first thing that happened to me is that none of the tables show in the edmx model diagram.  It did, however, create three objects for the tables:

using (var db = new sampledataEntities())
{
    var query = (from p in db.people
                             select p).ToList();

    var query2 = (from p2 in db.people1
                                select p2).ToList();

    var query3 = (from p3 in db.persons
                                select p3).ToList();
}

One of the table objects is named “people”, one is named “people1” and the last is named “persons”.  In order to figure out which object points to which table, you’ll need to open your edmx file with an xml editor (right-click on the edmx file, select “open with”, then choose “automatic editor selector (XML)”).  Then you can search for people, people1 or persons and reverse engineer which table is being accessed.


Summary

Keep this little exercise in mind when you design a database and choose field and table names.  Avoid any symbols that are no acceptable in a C# variable name.  Also, avoid using a plural name on a table name.  Your goal is to choose a table name that will translate into an object name very similar to your table name.  Otherwise, any developer that maintains your code will have to deal with an extra layer of confusion.

 

Unit Testing with Stored Procedures

Summary

In this post, I’m going to show how to use a stored procedure in a SQLLocalDB MS Test setup.  Technically this is an integration test, so excuse the fact that I might call it a unit test out of habit.


The Technical Details


What I’m going to do is copy the code from a SQL Server stored procedure, and paste it into a class/method string or property.  Next I’m going to write a unit test helper method that will create the stored procedure in my SQLLocalDB test instance.  Finally, I’m going to run a unit test that executes a method containing a call to the stored procedure and verify it worked correctly.

If your familiar with stored procedures and unit tests, you’re probably scratching your head right now and thinking “what would be the purpose.”  My purpose in accomplishing this task is to get legacy code into a test harness so I can refactor my code and replace ugly stored procedures with LINQ code, or possibly clean up any stored procedures I’m forced to keep.

Now you’re thinking that if I change a stored procedure, my unit test will still work, even though my run-time code might not.  That’s a problem.  However, I have created a map class generator for NHibernate.  What I plan to do is enhance this generator to scrape the text out of the stored procedures in a database and put them into my project containing all my ORM mappings.  That way, I can regenerate my class mappings and synchronize the stored procedures all at once.  This will make sure that my unit tests are always using the same stored procedures that are defined in the database.  This will be demonstrated in a future blog post.

The Session Factory

I’m going to start by demonstrating how you can modify your session factory to make it execute in dual mode.  Which mode is used will depend on the calling assembly.  If the calling assembly is from a unit test assembly, then we’ll use the SQLLocalDB context.  If the calling assembly is not a unit test assembly, then we’ll assume that we’re supposed to connect to the real database.  I have commented the session factory code from the code shown below, you can download the demo application to see the real code.

public class MSSQLSessionFactory
{
    private static ISessionFactory _sessionFactory;
    private static ISessionFactory SessionFactory
    {
        get
        {
            if (_sessionFactory == null)
            {
                if (UnitTestHelpers.UnitTestHelpers.IsInUnitTest)
                {
                    // unit test context

                }
                else
                {
                    // production context

                }
            }
            return _sessionFactory;
        }
    }
    public static ISession OpenSession()
    {
        return SessionFactory.OpenSession();
    }
}

The code in my demo could use some folding and refactoring, but this is just to demonstrate the concept.  So I’m keeping it simple for now.


The Stored Procedure

I scraped the stored procedure code from my database and pasted it into a string.  Eventually, I’ll need a method that can do that for me.  The code for this object looks like this:

    public class StoredProcedureObjects
    {
        public string Name { get { return pSetClassForTeacher“; } }
        public string Code {
            get
            {
                return @”USE [facultydata]
GO
/****** Object:  StoredProcedure [dbo].[pSetClassForTeacher]    Script Date: 8/9/2014 6:06:40 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE [dbo].[pSetClassForTeacher]
@TeacherId int,
@ClassId int
AS
BEGIN
    SET NOCOUNT ON;

    UPDATE class SET teacherid=@TeacherId WHERE id=@ClassId

END“;
            }
        }

        public void CreateStoredProcedure(string databaseName)
        {
            SqlConnection db = new SqlConnection(
                “server=(localdb)\” + databaseName + 
                “testinstance;” +
                “Trusted_Connection=yes;” +
                “database=” + databaseName + “; ” +
                “Integrated Security=true;“);

            // first, drop the stored procedure if it already exists
            db.Open();

            string sp = @”if exists (select * from sys.objects 
               where name = N'” + Name + @”‘ and type = N’P’)
               begin
                   drop procedure ” + Name + @”
               end
“;
            SqlCommand myCommand = new SqlCommand(sp, db);
            myCommand.ExecuteNonQuery();

            // need to read the text file and create the stored 
            // procedure in the test database
            string[] TSQLcommandList = Regex.Split(Code, “GO“);

            foreach (var tsqlCommand in TSQLcommandList)
            {
                myCommand = new SqlCommand(tsqlCommand, db);
                myCommand.ExecuteNonQuery();
            }

            db.Close();
        }
    }

First, you should notice that we need to split the stored procedure query by the “GO” statements.  Technically the sql code above is a collection of separate queries and we need to execute these individually.  

Next, I have taken the liberty of changing the “ALTER PROCEDURE” to “CREATE PROCEDURE”.  Later, I’m going to do a string replace when I generate the stored procedure text while reading it from the database.

Last, I have written code to drop the stored procedure first, if it already exists.  Technically, we would like to create all the stored procedures in the database at assembly start time and just use them for all tests.  There should be no expectation that a unit test will be changing while unit tests are in progress, so there’s no need to delete and create stored procedures for each unit test.


The Unit Test

The unit test method looks like this:

[TestMethod]
public void test_stored_procedure_object()
{
    StoredProcedureObjects sp = new StoredProcedureObjects();
    sp.CreateStoredProcedure(“facultydata“);

    using (var db = MSSQLSessionFactory.OpenSession())
    {
        var teacher = new Teacher
        {
            name = “Smith
        };
        db.Save(teacher);

        teacher = new Teacher
        {
            name = “Johnson
        };
        db.Save(teacher);

        string[] classList = new string[] { “MTH 100“, 
              “CHM 101“, “ENG 200“, “PHY 230“};

        foreach (var aClass in classList)
        {
            var MyClass = new Class
            {
                name = aClass,
                teacherid = 1
            };
            db.Save(MyClass);
        }
        db.Flush();
    }

    TeacherRecordObjects teacherRecordObjects = new 
             TeacherRecordObjects();
    teacherRecordObjects.SetClassForTeacher(2,3);

    using (var db = MSSQLSessionFactory.OpenSession())
    {
        var query = (from c in db.Query<Class>()
                                 where c.id == 3
                                 select c).FirstOrDefault();

        Assert.AreEqual(2, query.teacherid);
    }
}

The first thing this unit test does is create the stored procedure.  Next I have some insert queries to insert two teachers and 4 classes.  This is my test data setup.  Then I call the SetClassForTeacher method with a parameter that I can check for after the method executes.  Last, I open another database context and read the results and test for equality.


Summary

There are a lot of pieces necessary to making this work properly.  You need to be able to check which project called the context and select the correct database context.  You need to be able to create a stored procedure in SQLLocalDB.  You need to be able to truncate data in your SQLLocalDB after your unit test has concluded (so your results don’t mess up the next unit test).  Finally, you need to handle the case where someone will try and create a stored procedure from another unit test without realizing that it already exists.

You can download the sample code here:

UnitTestingStoredProcedures.zip



 

Scaling

Summary

In this post I’m going to talk about database application scaling.  For this discussion I’m going to assume the reader is working (or owns) a small company that has suddenly increased in capacity.  I’m going to further assume that the goal is to be able to scale up to a larger size and performance without replacing everything at once.


The problems of Scaling

The number one problem that I have run into regarding scaling is the design of the software.  The software I’m talking about is a custom application built by the company a long time ago when the goal was to sell a product (not ensure that the product would scale).  Legacy software that I have dealt with seems to fit the category of poorly designed and built by a person (usually only one person) with only a basic knowledge of databases and software design.  As a company increases capacity, the normal and cheap way to scale up is to buy new hardware.  Why?  Because hardware is much cheaper than the wages of developers required to rewrite, and time is usually a critical factor.  Eventually the company will arrive at a point where new hardware cannot help or becomes prohibitively expensive.  This is where this blog post comes in.


Solutions Available

One of the easier solutions is called potting.  This is where you divide your data into two or more databases.  Then you need some sort of mechanism in your code to decide which database to retrieve the data.  If your customers are numbered, you can do something clever, like even numbered customers in one database and odd numbered customers in another database.  Then you’ll need a mechanism to split your data from one database to another.  The coding of the initial mechanism is relatively simple (depending on how your legacy code is organized).  Creating scripts or programs to divide your accounts can be a bit more difficult.  Obviously, your common data will be duplicated (unless you want to move that to a third database) and you’ll need to have a mechanism to handle saving the duplicate data from one to another (maybe scripts to copy from one database to the other, or change your administrative software to save to both).

As you are contemplating your expansion, you’ll need to think about two things.  Where is your actual performance bottleneck and if splitting by two will nearly double your performance, should you leap ahead and split into four?  Let’s pretend your database resides on one physical server and not a virtual server.  Let’s also assume you have at least 4 open drive slots available.  Now your choice is between buying 2 hard drives (large enough for a copy of the database on each) or 4 hard drives.  You’ll need to install a separate database on each drive to gain performance.  You can use the same MS SQL server (or Oracle server, MYSQL, etc.) and attach all the databases to the same server, this will simplify copying data between databases.  

Making a decision to go with four databases instead of two at this point will save you headaches in the future if you need to scale further.  The reason is that you can create your software to use a modulo 4 arithmetic to decide which database each account is assigned to (instead of even/odd).  Most of your developer time will be sunk into writing an app to divide the database, so you might decide to go directly to four now and skip refactoring your data converter to divide your future two databases into four.

This begs the question, how many ways can I divide my database?  That depends on how much time you have to get your system scaled, and what hardware resources you have.  If your server cannot house 8 hard drives or you want to upgrade to multiple servers, you’ll have to account for hardware purchases and database licenses (unless you are lucky to have legacy software that already runs on a free database system).


Potential Pitfalls

Before you embark on an expedition of splitting your data into multiple database, you should do some experimentation.  First, you need to be absolutely sure of where the performance bottleneck is located.  If you’re running a web-based system and your real bottleneck is in your web server, all the database potting will just cause you more work with no gain in performance.  You should also analyze your server memory usage and hard disk technology.  Can you increase your server memory, replace your hard drives with hybrid or SSDs?  Is your network or internet connection speed maxed out?  It’s safe to assume your database CPU’s are not maxed, that is a rare and unusual situation.  What about your software, is there any low-hanging fruit for increasing performance?  Analyze your database structure and determine if you have indexes on all foreign keys, order by fields and fields that show up in your “where” clauses, “group by” clauses, etc.  You can analyze query performance by using the “Estimated Execution Plan” tool built into MS SQL Server Management Studio:



You’ll see an output similar to this:


This is an example that it too simple to see the benefit of this tool, but you can execute a large and complex stored procedure and this tool will break down which part of each query is costing you the most time.  Focus on the big ticket items first.


Time to Get Serious

I’m going to assume you have already upgraded these systems, you have tuned your database to be as fast and efficient as possible and you are to the point that your database is still a bottleneck.

MS SQL server has a few nice tools that I use a lot (as in, all the time).  One such tool is the “script to” feature that is available for any object in the database.  You can script the entire database to create a new database.  Then search and replace the names to create a different database inside your SQL Server.  For my example, I’m going to use the script tool to create 4 student databases called StudentData0 through StudentData1.  You should pay attention to the filename portion of the database create script.  You’ll need to change each database to point to your separate hard drives.  If you make a mistake here and need to fix it later, that is easy.  Just “detach” your database, copy the physical file to the other database and “attach” the database.

One other thing you need to consider in this endeavor is that you’ll need a tool to make sure your four databases are in sync.  In other words, all  databases need to have the identical table structures, keys, indexes, stored procedures, etc.  If one database is out of compliance, then your software will suffer.  You can purchase an off-the shelf tool or you can write your own tool depending on your SQL Server sophistication.  


Identity Indexes

MS SQL Server has an automatic indexing scheme called “identity”.  What this does is it creates a special trigger on a table that will auto-increment the primary key number by one when an insert operation occurs.  There are some parameters in the identity definition that can make things easy for us.  First, there is a seed value.  That is the starting number.  Normally, this is set to zero, but we’re going to set the seed to 0 for database 0 and 1 for database 1, etc.  Next is the “Identity Increment”.  This determines the increment value of the identity field.  Normally, this is set to one, but we’re going to set this to 4 for each table inside our databases.  The “TRUNCATE TABLE” command will reset the table to the starting seed value.

Now, when we insert a new student into a database, database 0 will contain student numbers that are equal to the modulo 4 equal to zero and database 1 will contain student numbers that are equal to modulo 4 equal to 1, etc.


Using an ORM

I’m going to use NHibernate for this example to show how to tweak the session factory and how to insert data into four databases.  Here’s the sample code:

public class QuadSessionFactory
{
    private static ISessionFactory[] _sessionFactory = new  

        ISessionFactory[4];
    private static ISessionFactory SessionFactory(int 

        databaseNumber)
    {
            if (_sessionFactory[databaseNumber] == null)
            {
                _sessionFactory[databaseNumber] = 

                     Fluently.Configure()
                .Database(MsSqlConfiguration.MsSql2005
                .ConnectionString(“Server={servername};Initial 

                     Catalog=StudentData” + databaseNumber + 
                     “;Integrated Security=True“))
                .Mappings(m => m.FluentMappings.Add<StudentMap>())
                .ExposeConfiguration(config =>
                {
                    SchemaExport schemaExport = new  

                        SchemaExport(config);
                })
                .BuildSessionFactory();
            }
            return _sessionFactory[databaseNumber];
    }
    public static ISession OpenSession(int databaseNumber)
    {
        return SessionFactory(databaseNumber).OpenSession();
    }
}


You’ll notice that this code is nearly identical to the session factory that I normally use in my demos.  All I did was turn the local variable “_sessionFactory” into an array of 4 variables and refactor my getter into a method call with the database number as a parameter.  Now I can use my OpenSession method to open four different sessions, like this:

using (ISession db = QuadSessionFactory.OpenSession(dbNumber))
{

    // query code here
}


Now you’ll have to get clever when you select data.  Why?  Because your data is spread across 4 different databases and you can’t be sure which student is in which database, unless you are searching by student number.  One solution is to query all 4 databases and then sub-select.  Another, more efficient method, is to use the same query to sub-select from each database and then concatenate the results at the end (and sort if desired).  Oh what fun.

Inserting data will involve a scheme to distribute your data.  You could choose a random number to select which database to insert into or you can count the number of records in each database to determine which one contains the least number of active records (that will cost you some processing time). Another method is to keep a table in database zero with an identity field in it for the student table.  Insert into this table first to get the next number and take the modulo 4 of that number.  This will also cost you some processing time and you’ll have to keep different tables to give you the next number for each table of data that you want to distribute.  My choice would be the random number modulo 4.  Simple and fast.


Common Data

No matter how you slice your data, there will be some reference tables and data that is used by your application that should not be split between databases.  You can’t just split this data and you can’t just have one copy in one database.  These tables will form your relational queries (like student standing, freshman, sophomore, etc).  When you query each database, for student and standing, you’ll need to join these records together in each database.  This will require duplication.  This is OK, since most reference tables are rather small.  The space that these tables will be wasteful but hard drives are cheap and the price of performance is expensive.

You’ll have to take steps to keep your databases in sync.  You can loop all your CRUD operations to save your data into each database after a change occurs.  Another method is to change data in one designated database (let’s say database zero to keep it simple) and trigger a stored procedure to sync data to other databases.  Which result you decide to use may depend on how fast you need all databases to be identical.

Another solution to this mess is to extract all the common data into its own database.  This will break any relational integrity that you have setup but it will simplify the operation of keeping your common data in sync.  One thing to keep in mind is that this one common database might become your bottleneck undoing any work you put into splitting your primary data into 4 databases.


Summary

The task of splitting your database into parallel databases is not an easy one and this blog post is nowhere near enough information on the subject to completely split your data.  You’ll have to do more research.  Your first task should be to setup a sample system that has multiple databases with your data, then point your development systems to this setup and determine what will break and what will work.  You might get into a rats nest that is a bit too deep.  Creating a simple and cheap test system is the place to make this discovery.

You can download the sample code here:

QuadDatabaseSample.zip

You can download the database create script here:

quad_database_script.zip




 

Fluent NHibernate Execute Stored Procedure

Summary

I’ve demonstrated stored procedures a couple of times on this blog, but this time I’m going to show you how to execute a stored procedure that doesn’t need to return a value.  You can use the same syntax to execute an update, insert or delete query (in cases where you might want to make bulk changes).


The Setup

I’m going to use my facultydata database from previous blog posts.  Here’s the ERD:



My stored procedure will look like this:

USE [facultydata]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[pSetClassForTeacher]
@TeacherId int,
@ClassId int
AS
BEGIN
    SET NOCOUNT ON;

    UPDATE class
    SET teacherid=@TeacherId
    WHERE id=@ClassId

END


This is just a simple example.  It’s not very useful, but easy to understand.


The Project

First, I’m not going to do any mappings or table classes.  I’m going to create a console application with a session factory and just call the stored procedure.  Here’s the complete code (minus the session factory):

using (var db = MSSQLSessionFactory.OpenSession())
{
    db.CreateSQLQuery(“exec pSetClassForTeacher 

        @TeacherId=:TeacherId, @ClassId=:ClassId“)
        .SetParameter(“TeacherId“, 3)
        .SetParameter(“ClassId“, 84)
        .ExecuteUpdate();
}


Don’t forget that SQL Server uses a comma between variables.  

This stored procedure will update the class table and put a 3 in the teacher id field (make sure you’re using valid values.  My database has records for class with an id of 84 and and a teacher id of 3).  

If you have a long running stored procedure and you need to set the timeout longer, you can use the SetTimeout() method:

 using (var db = MSSQLSessionFactory.OpenSession())
{
    db.CreateSQLQuery(“exec pSetClassForTeacher 

        @TeacherId=:TeacherId, @ClassId=:ClassId“)
        .SetParameter(“TeacherId“, 3)
        .SetParameter(“ClassId“, 84)

        .SetTimeout(100)
        .ExecuteUpdate();
}


Keep in mind that the set timeout method is measured in seconds.


Executing SQL Queries

You can use the same command to execute a query directly.  Maybe you want to delete all records of a given foreign key, or perform the same update that my stored procedure performed, without creating a stored procedure.  You can do it using this syntax:


 using (var db = MSSQLSessionFactory.OpenSession())
{
    db.CreateSQLQuery(“UPDATE facultydata..class SET teacherid=:TeacherId WHERE id=:ClassId“)
        .SetParameter(“TeacherId“, 3)
        .SetParameter(“ClassId“, 84)

        .SetTimeout(100)
        .ExecuteUpdate();
}


You can download the sample project here:

 FluentNHibernateStoredProcedure.zip

 

Dapper .Net

Summary

A commenter on my blog mentioned Dapper .Net so I decided to take a look.  Dapper is known as a Micro-ORM.  It’s very easy to get started and can be used as a stepping stone between non-ORM code and an ORM with LINQ queries.  First, I’ll show how it’s done:


Installing Dapper in Visual Studio

I created a console application in Visual Studio 2012, then I opened the NuGet package manager and searched for Dapper.  Just install the NuGet package:



Writing the Code

In your usings, you’ll need to add the “using Dapper;” and you’ll also need to add “using System.Data.SqlClient;”.  The actual database connection will be the standard SqlClient .net methods.  Here’s the basic code I wrote for my console application:

private static string ConnectionString = “server=sqlservername;Trusted_Connection=yes;database=sampledata;Integrated Security=true;“;

static void Main(string[] args)
{
    using (var db = new SqlConnection(ConnectionString))
    {
        db.Open();

        var query = db.Query<Store>(“SELECT * FROM store“);

        foreach (var item in query)
        {
            Console.WriteLine(“Name:“+item.Name);
        }

        Console.ReadKey();
    }
}


I also had to create a class for “Store”, which dapper will map the result set to:

namespace DapperDotNetDemo
{
    public class Store    {
        public int id { get; set; }
        public string Name { get; set; }
        public string Address { get; set; }
        public string City { get; set; }
        public string State { get; set; }
        public string Zip { get; set; }
        public string text { get; set; }
    }
}


That’s it.  As you can see, the LINQ part of this ORM is not used.  Instead a query is used just like any other ADO database access method.  The ORM portion of this is the return set.  Technically, the return class can be used in a LINQ query.


Summary

I’m going to go deeper into this subject in a future blog post, but for now, I just wanted to introduce Dapper for anybody that has not heard of it or Micro-ORMs.  If you want more examples of how to use Dapper you can follow one of these links:

Dapper .Net
A Look at Dapper .Net
Dapper .Net by Example