Tuesday, October 25, 2011

LINQ Pitfalls: Nested queries

NHibernate's LINQ provider is very easy to start with and sufficient for majority of use cases, but when used in more complex scenarios it can mislead and surprise. Here is one of the pitfalls (or perhaps bugs?) I've recently run into. Note that I'm using NHibernate 3.0 and the issue is already fixed, but general conclusions possibly still apply.

I have simple model with two-step relationship: River has many Countries, Country has one Continent. I want to query for all rivers located within given continent. For code simplicity, let's try to use two separate LINQ queries:

var countries = session.Query<Country>()
.Where(c => c.Continent.Id == continentId);

var data = session.Query<River>()
.Where(r => (r.Countries.Where(c => countries.Contains(c))).Any())
.ToList();

The Where condition in second query is satisfied if at least one of river's countries exists in the country list for given continent. I don't call ToList() on first query, so I assume it is not run separately but merged (nested) into second one (or at least NotSupportedException is thrown or something). Here is the query that actually runs:

What do we have here? It gets all rivers, filter it for rivers that are within countries (country2_ alias), that are equal to ANY country (country3_ alias) - no continent condition at all! The query is merged, it runs without errors and silently produces unexpected, wrong results.

How can we enforce LINQ provider to filter the results correctly? The easiest way is to materialize the list of countries in separate query to avoid buggy query merging. So let's add ToList() to the first query:

var countries = session.Query<Country>()
.Where(c => c.Continent.Id == continentId)
.ToList();

var data = session.Query<River>()
.Where(r => (r.Countries.Where(c => countries.Contains(c))).Any())
.ToList();

This is how these queries look like in SQL:

This time we are fetching the list of countries first and the second query is correctly filtered using IN clause. But this is not the best solution, for sure. There are two queries instead of one and we don't really need country values, especially if there may be thousands of rows in real-life model. In this case, we should try to rewrite the queries to express our filtering logic in single query. In River-Country-Continent example, it will look like this:

data = session.Query<River>()
.Where(r => r.Countries.Any(c => c.Continent.Id == continentId))
.ToList();

This time the SQL statement contains continent filter and results are as expected:

Conclusion: Use the latest stable versions of your software. Be careful when merging LINQ queries in NHibernate. Ensure all ToList's are in place or find different way to express your filtering. And always check what SQL is your query producing!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.