Saturday, 2 June 2012

SqlbulkCopy3


High performance bulk loading to SQL Server using SqlBulkCopy

If you ever want to bulk load data into an SQL Server database as quickly as possible, the SqlBulkCopy class is your friend (in the System.Data.SqlClient namespace). Since being introduced in .NET 2.0, it has provided an extremely efficient way to bulk load data into SQL Server, and is one the classes that I see as a "must know about". A usual scenario is where you want to dump some data into the database to then do some processing on. Or you want to temporarily load some data in to then query to get some stats from.

To give an idea of how well it performs, I'll compare it to an SqlDataAdapter approach.

SqlBulkCopy approach
// Instantiate SqlBulkCopy with default options,
// supplying an open SqlConnection to the database
using (SqlBulkCopy bulkCopy=new SqlBulkCopy(connection))
{
    // The table I'm loading the data to
    bulkCopy.DestinationTableName = "BulkLoadTable";
    // How many records to send to the database in one go (all of them)
    bulkCopy.BatchSize = myDataTable.Rows.Count;
 
    // Load the data to the database
    bulkCopy.WriteToServer(myDataTable);
    
    // Close up          
    bulkCopy.Close();
}        

SqlDataAdapter approach
using (SqlCommand insertCommand=new SqlCommand(
"INSERT BulkLoadTable(FieldA, FieldB) VALUES (@FieldA, @FieldB)", connection))
{
    insertCommand.Parameters.Add("@FieldA", SqlDbType.VarChar, 10, "FieldA");
    insertCommand.Parameters.Add("@FieldB", SqlDbType.Int, 4, "FieldB");
    // Setting UpdatedRowSource is important if you want to batch up the inserts
    insertCommand.UpdatedRowSource = UpdateRowSource.None;
    using (SqlDataAdapter insertAdapter = new SqlDataAdapter())
    {
        insertAdapter.InsertCommand = insertCommand;
        // How many records to send to the database in one go (all of them)
        insertAdapter.UpdateBatchSize = myDataTable.Rows.Count;
        
        // Send the inserts to the database
        insertAdapter.Update(myDataTable);                   
    }
}

I've left out the boring stuff like the instantiation and opening of the SqlConnection (connection). Plus I haven't shown the generation of the DataTable (myDataTable) as that's not particularly exciting either.

For this test, I'm loading 2 columns of data to the table named "BulkLoadTable" in my database. The columns in myDataTable match those in the table which keeps it nice and simple. If they didn't match, then when using SqlBulkCopy, you need to map which columns in the DataTable map to which columns in the destination table using the ColumnMappings method on the SqlBulkCopy instance. All straight forward.

So how does SqlBulkCopy perform?

I populated the DataTable with 100,000 records containing some random data. FieldA is just a GUID, FieldB is an incrementing number. The times recorded are average measurements of the actual time to persist the data (i.e. the times to complete bulkCopy.WriteToServer and insertAdapter.Update respectively):
SqlBulkCopy:
1.5885s
SqlDataAdapter:
25.0729s

That's about a 93% drop in time taken when using the SqlBulkCopy class over SqlDataAdapter approach.

It gets better.

One of the SqlBulkCopy constructor overloads provides the ability to specify SqlBulkCopyOptions. For maximum performance, specify the TableLock option:
SqlBulkCopy bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, null);
This will obtain a bulk update lock on the destination table during the bulk copy process. By default, it would otherwise obtain row level locks. This further reduces the time to load 100,000 records:

SqlBulkCopy with TableLock Option:0.8229s

Of course, this lock can block out other processes until the loading is complete. Usually in my experience, this is not an issue. For optimal performance I recommend:
  • bulk loading into a new, empty table (staging table)
  • add suitable indexes on to the staging table after the data has been bulk loaded, that will help with subsequent processing/querying. If you add indexes at the start, you will just unnecessarily hinder the insert speed
  • use the TableLock option to maximise throughput

The data source doesn't have to be a DataTable either. The WriteToServer method will also accept a DataRow array, an IDataReader, or a DataTable with a DataRowState (it will persist only the DataTable rows matching the supplied DataRowState).

So, "can I use it for bulk updates?" I hear you ask. No. It's a mechanism for doing bulk inserts only. You can't have everything I suppose.

No comments:

Post a Comment