Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Enterprise Developer

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

    Search

    XML Feeds

    Google Ads

    « Reducing Code-Build-Test Friction with NCrunchMonitoring and Logging as a Service - Reviews »
    comments

    Earlier this week I came upon a post (Entity Framework Comparative Performance) by Luke McGregor (b|t) that compared the performance of several ORMs for handling batch data. Given the amount of batch data I've processed, I was curious how those ORM tests would line up against a couple common non-ORM methods.

    I decided to stick to ADO.Net methods for data and to focus on the insert, as a fast insert can be used to replace updates and deletes. SSIS and bcp would be alternative options, but would require additional setup to test alongside the .Net code.

    2012-07-11: In response to a comment below, I added a followup section to the post with a graph showing all the test results in the 100,000 record tests.

    Method

    The method for these tests closely resembles the one Luke followed in his original post. The main exception is that I am using a local SQL Server 2008 R2 instance rather than a remote VM. Unfortunately attempts to run a full test set against my remote SQL Server VM ran into errors each time I tried, generally in the Entity Framework setup, teardown, and assertion code. My local system has a large enough amount of RAM and cores that any impact from running the tests locally should be limited, with only the network constraint removed from the equation.

    Tests

    I focused entirely on the insert tests, adding two new tests to use SqlDataAdapter and SqlBulkCopy. The test lineup then became:

    • EntityFramework 4.1
      • Basic Configuration (No optimizations)
      • AutoDetectChanges Disabled
      • Tuned
    • EntityFramework 5.0 beta1
      • Basic Configuration (No optimizations)
      • AutoDetectChanges Disabled
      • Proxy Entities
      • Tuned
    • Dapper 1.8
      • Dapper Rainbow
      • My Best effort at making it go fast
      • EDIT Batching inserts/updates using transactions
    • LINQ to SQL
      • Basic Configuration (No optimizations)
      • Tuned
    • Raw ADO methods

    The raw code is available on my branch at github: https://github.com/tarwn/StaticVoid.OrmPerformance and the original is located on Luke's here: https://github.com/lukemcgregor/StaticVoid.OrmPerformance.

    SqlDataAdapter Method

    The SqlDataAdapter method is to create a local DataSet or DataTable then provide this to a SqlDataAdapter with a configured SqlCommand object and parameters. The SqlDataAdapter takes care of the details, getting all the individual rows of data into the database via that insert command. Because we are operating on the full set of data, the test stores all of the values in memory until it is told to commit them.

    SqlBulkCopy Method

    The SqlBulkCopy object is designed to "let you efficiently bulk load a SQL Server table" (MSDN). It provides similar functionality as bcp, but from inside our .Net code and without the format files. This test works similarly to the SqlDataAdapter test, in that it holds the entire DataTable of test data in memory until it is told to commit it in one go (some of the tests are iterative, some are bulk, the test method caters to both).

    Results

    I'll admit the results were not that surprising. SqlBulkCopy was the fastest method for inserting larger amounts of data, but had some initial overhead that made it slower for the 1, 10, and 100 record tests. Compared to the best times from the other methods (SqlBulkCopy is the SqlCommand representative in the chart), the performance difference is clear:

    Graph of Best Times for 1-10000 records
    Best Times for 1, 10, 100, 1000, 10000 scenarios

    Extending this to a larger set of 100000 records and the difference is relatively the same. Relative to the prior set of results, SqlBulkCopy is not as much faster on the 100,000 run as it was on the 10,000. It would be interesting to switch to increments of 10,000 and see if there is a pattern to it.

    Graph of Best Times for 1-100000 records
    Best Times for 1, 10, 100, 1000, 10000, 100000 scenarios

    I also thought it was interesting to see how well the tuning improved some of the ORM methods. In the case of Entity Framework, it's clear that if you intend to use it for batch data then tuning is a requirement, not an option. The out-of-the-box experience for Entity Framework 4.1 and 5 were roughly an order of magnitude slower than all other tests.

    Scaled out to show EF 4.1 and 5 Basic Performance
    Scaled out to show EF 4.1 and 5 Basic Performance

    The other key indicator is memory. Our two new methods store all the data and send it in a single command, so they will have a higher memory footprint to accommodate that data. The methods that incrementally send the data, like the Dapper scenarios and basic SqlCommand option, will use very little data since they are flushing each addition directly to SQL.

    Memory Usage per Test Type
    Memory Usage per Test Type

    This graph shows the memory/record of the 1000, 10000, and 100000 test runs. As we would expect, the memory/record for the full batch methods is reduced as the overhead is spread across more records. Entity Framework show consistently high memory usage, but the Proxy Entities method does bring it down to just about twice as much as Linq2SQL, which is in turn about 50% higher than the SqlCommand/SqlBulkCopy methods.

    Conclusions

    For pure, batch insertions, I still wouldn't use an ORM. These tests show that the ORMs tested are still significantly slower at batch insertion than tools built specifically for bulk operations, like SqlBulkCopy. We've also seen that when we do use an ORM, understanding it's performance characteristics and how to tune the ORM can make an enormous difference in how well or poorly it works.

    Follow-up

    Based on Tudor's comment below, I've generated a graph of the times for each method in the 100,000 record tests. Unlike the full chart above, I've scaled it to ignore the two basic Entity Framework entries that cause the line chart above to be so unreadable.

    Readable Execution Times for 1000,000 records
    Readable Execution Times for 1000,000 records

    Using ADO.Net does not automatically mean better performance than an ORM. SqlBulkCopy does clearly perform better, but using a SqlCommand.ExecuteNonQuery or a SqlAdapter.InsertCommand does not achieve the same level of performance. Many of the ORM tests kept up or outperformed the non-transactional SqlCommand and SqlAdapter tests, and Dapper kept up with the Transactional SqlCommand test. ADO.Net itself is not giving the boost in speed we see from SqlBulkCopy, it's the use of a tool that is built specifically for batch processing (all of the rest operate at the row level, ADO.Net or ORM).

    About the Author

    User bio imageEli delivers software and technology solutions for a living. His roles have included lone developer, accidental DBA, team lead, and even unintentional Solaris consultant once. With experience in adhoc, Lean, and Agile environments across NSF grants, SaaS products, and in-house IT groups, he is just as willing to chat about the principles of Lean or Continuous Delivery as he is to dive into Azure, SQL Server, or the last ATDD project he created.
    Social SitingsTwitterLinkedInHomePagedeliciousLTD RSS Feed
    orm, sql server
    InstapaperVote on HN

    12 comments

    Comment from: Luke McGregor [Visitor] · http://blog.staticvoid.co.nz
    Luke McGregor Really liked to see this post, I had also been planning to do a similar comparison.

    One of the things I've found really intriguing about these results (and the base set which I blogged about) is that when dealing with small numbers of items most ORMs have comparable performance, its not until you go into the large batch world that things diverge (as you have pointed out this is largely to do with the fact that ORMs don't change the type of SQL operation they use when dealing with batch's. I've also noticed that many of the ORMs which I've examined submit each operation (ie insert) in a separate request/query/packet to the SQL server which has a massive impact on performance especially when performing many synchronous operations or operating in a high latency environment.

    I would love to add your tests into the trunk, you should send me a pull request :)
    07/10/12 @ 17:53
    Comment from: Eli Weinstock-Herman (tarwn) [Member]
    Eli Weinstock-Herman (tarwn) Thanks Luke, I'm glad you found the post, I meant to send you a link but posted from work and it slipped my mind after I got home.

    The other thing I found surprising was how much of an impact tuning could have on Entity Framework. I had worked with earlier versions but had not done as much serious work with later ones, so it was encouraging to see just how much headroom there was in tuning it for this type of scenario.

    I sent a pull request for those two tests but didn't include the later changes, since I made configuration changes to help me run untrusted remote SQL connections prior to adding the PetaPoco tests.
    07/10/12 @ 19:00
    Comment from: Tudor [Visitor]
    Tudor I would say that such a test is not so relevant - most O/RMs were not designed for batch operations with large sets of data - for such cases it's much better to do a custom implementation that uses stored procedures or ADO.NET..
    Batch operations is the classical example when an O/RM is not the appropriate tool, so any perf tests are useless.
    07/11/12 @ 03:38
    Comment from: Eli Weinstock-Herman (tarwn) [Member]
    Eli Weinstock-Herman (tarwn) Tudor: I have to disagree. I would take hard data over an assumption or an "everyone knows" situation any time. The fact that these tests do show that the ORM methods are slower than 1 of the 3 methods for batching data in directly via ADO.Net may back up the hypothesis I had going in, but that doesn't make the tests a waste of time or effort.

    Your point about using a Stored Procedure or ADO.Net is incorrect. The SqlCommand results displayed above are SqlBulkCopy, the charts show only the best entry from each category. The SqlAdapter Insert and basic SqlCommand insert method were about the same speed as many of the tuned ORM methods. SqlCommand in a transaction was a good bit faster, but Dapper matched it. I'll add a graph with these details shortly.
    07/11/12 @ 04:28
    Comment from: Mark Rendle [Visitor] · http://blog.markrendle.net/
    Mark Rendle Hi Eli,

    You might want to take a look at Simple.Data, which provides ORM-like syntax but is intelligent about batch operations and will actually use SqlBulkCopy behind the scenes when running against SQL Server.

    http://github.com/markrendle/Simple.Data
    07/11/12 @ 04:41
    Comment from: Eli Weinstock-Herman (tarwn) [Member]
    Eli Weinstock-Herman (tarwn) Mark: Oddly enough I had considered adding it after Chrissie and I went back and forth with Simple.Data and PetaPoco a couple months ago. I added PetaPoco and was getting ready to add Simple.Data, but thought I had enough examples (I used results prior to adding PetaPoco for tis chart). Knowing Simple.Data uses SqlBulkCopy I think I have to find some time to go back now and include it, I'll try to do an additional follow-up with that method.
    07/11/12 @ 04:53
    Comment from: Moses [Visitor]
    Moses It would be interesting to see how nHibernate compares to the above.
    07/11/12 @ 05:56
    Comment from: Eli Weinstock-Herman (tarwn) [Member]
    Eli Weinstock-Herman (tarwn) Moses: I was thinking about that this morning myself, since it looks like I'll be either adding Simple.Data or conning Chrissie into writing some C# somehow (which means I'll probably be writing it). Alex offered to do the nHibernate tests when I first started playing with Luke's project, so I may circle back around and see if he is still willing. I personally have 0 experience with nHibernate, so wouldn't be a good candidate for writing even basic tests with it.
    07/11/12 @ 06:38
    Comment from: Shane [Visitor]
    Shane What do you mean by EF-Tuned? Did you do something manually to tune it? If so what things did you do?
    07/13/12 @ 14:02
    Comment from: Eli Weinstock-Herman (tarwn) [Member]
    Eli Weinstock-Herman (tarwn) Shane:
    I can't really speak to the implementation details of those tests, Luke did the hard work on them and I just reused his code (I haven't done anything serious with EF since v1). If you are familiar with EF, the relevant code is here: Harness.EntityFramework4-1/TunedConfiguration.cs on github

    In the setup he has turned off the options for AutoDetect changes, lazy loading, proxy creation, and validate on save. Several of the individual tests show the performance of turning off one of these at a time, the tuned option has all of them. The biggest performance delta seems to be change detection, so if you're interested I would definitely suggest reading his in-depth post on EF + AutoDetectChanges here: EntityFramework Performance and AutoDetectChanges
    07/13/12 @ 16:56
    Comment from: Tahir Khalid [Member] Email
    Checkout OpenAccess ORM by Telerik, not biased because I am using it now :D

    http://www.telerik.com/products/orm.aspx

    Using ADO.NET does affect it's performance, I still believe there are uses for it and something I intend to share with the good folks of LTD soon is a (very) light weight framework built around providers, data connectors and factories working on domain entities...
    07/14/12 @ 20:34
    Comment from: Eli Weinstock-Herman (tarwn) [Member]
    Eli Weinstock-Herman (tarwn) kermit: it's on github and VS 2012 RC and SQL Express are both free, let me know when you have it added in ;)

    I'm currently integrating in Simple.Data, PetaPoco, and NHibernate. Have some assertions failing that I have to knock out. If I manage to get all of those working i may take a look at OpenAccess next, will depend (have a couple other things cooking too and something I started and need to return too soon).
    07/17/12 @ 17:25

    Leave a comment


    Your email address will not be revealed on this site.

    To mislead the spambots.

    Your URL will be displayed.
    (Line breaks become <br />)
    (Name, email & website)
    (Allow users to contact you through a message form (your email will not be revealed.)