Full text search in Entity Framework 6 using command interception

Background

One of the main problems we encountered using Entity Framework was its lack of support for full text search. Our search function is based around a series of processors that attempt to match the incoming phrase against a set of patterns that we're expecting (10+). For example, there's one for dates and times, recent financial quarters as well as more domain specific ones. Everything is cached and so is really fast, except when it isn't:

Here's the monitoring chart for our SQL DB in Azure

That's a spike to 100% resource utilisation because we fall through to a phrase search as a last resort (which cannot be cached) - it's exceedingly rare but it still hurts.

Before EF, the app used full text search and it was one of the casualties when the app was migrated to EF4 several years ago. The lack of full text search has been getting more and more painful as the amount of data has grown. We really needed full text search back...

Command interception

Entity Framework 6 allows one to intercept a command before (or after) it has been executed against the database. The interface you have to implement is IDbCommandInterceptor which supplies the hooks to adjust the query just before it gets executed. For example, Rowan Miller did a demonstration of command interception at Tech Ed 2014 to implement soft deleting entities.

The solution

My initial searching led me to this: http://www.entityframework.info/Home/FullTextSearch which got me most of the way there but there were a couple of significant problems worth mentioning.

1. You need to make sure your parameter is a valid full text search string
2. You must ensure that the full text search predicate will be applied to a table not a derived table.

There's a lot of information about problem 1 but problem 2 is a bit more interesting:

This SQL will error:

SELECT * 
FROM ( SELECT Body
       FROM   postings
     ) AS Extent1
WHERE CONTAINS(Extent1.Body, 'apple')

The error is:

Msg 7601, Level 16, State 3, Line 5
Cannot use a CONTAINS or FREETEXT predicate on column 'Body' because it is not full-text indexed.

The problem is that you can't always easily control how EF is going to generate your SQL when queries get complicated.

The workaround I'm currently using (which seems to work) is to always force the Full Text Index predicate into a separate sub query so the predicate is always executed against the base table rather than an intermediate result set.

For example, say I have a method to search blog posts that can take a phrase to full text search with as well as other parameters. I'd write the full text search bit as below:

public IEnumerable<BlogPost> SearchBlogPosts(
    string phrase, 
    Category category, 
    DateTime publishedBefore)
{
    using (var context = new MyContext())
    {
        var searchPhrase = FullTextSearchUtilities.AddFullTextSearchPrefix(phrase);
        var results = context.BlogPosts.Where(post => 
            post.Categories.Any(c => c.Id == category.Id));
        results = result.Where(post => post.PublishedDate <= publishedBefore);
        // more conditions 


        // ** Important bit **
        // Make sure FTS is fully formed subquery
        var ftsResults = context.BlogPosts.Where(post => 
            post.Body.Contains(searchPhrase))
                .Select(post => post.Id);
        return results.Where(post => ftsResults.Contains(post.Id)).ToList();
    }
}

This ensures that the SQL is generated with the FTS part as a separate exists subquery:

SELECT /* Some columns */
FROM /* Potentially complicated mess */ AS [Project1]
WHERE /* Some predicates */
   AND EXISTS (SELECT 
        1 AS [C1]
        FROM [dbo].[BlogPost] AS [Extent1]
        WHERE (contains([Extent1].[Body], @p__linq__0)) AND ([Extent1].[Id] = [Project1].[Id])
    )

I tried to make the SQL look a bit Entity Frameworky... Hopefully it looks reminiscent of EF's generated SQL anyway. The point is that with the pattern above, the FTS predicate seemingly can't be moved away from the base table and so never errors.

Honestly, combining full text search with other advanced filters probably isn't done by many people so this is in all likelihood a bit of an edge case. If you do come across the problem of "Cannot use a CONTAINS or FREETEXT predicate on column 'Body' because it is not full-text indexed." on a column you're sure has a full text index you'll have an idea of where to look!

Comments

  1. This comment has been removed by the author.

    ReplyDelete
  2. Hi. It's using two queries in your LINQ that makes Entity Framework generate the exists subquery. The interceptor code is only a slightly refactored version available at the linked address (http://www.entityframework.info/Home/FullTextSearch).

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. many thanks for then problem2 solution!

    ReplyDelete
  5. Many thanks for problem 2.

    Could you share a more complex example involving more than 1 table?

    ReplyDelete
    Replies
    1. Potentially - what sort of scenario were you thinking? There would need to be a navigation property (or collection) between the entities in the tables but the same principle applies. Create a separate full text search query and then use that inside the main query (on the other table(s)).

      Delete
  6. A scenario where table A has a navigation property to table B which in turn has a navigation property to table C. I want to apply the Contains clause on a field in table C.

    ReplyDelete
  7. Thank you very much for this post! Solved my issue ;)

    ReplyDelete

Post a Comment

Popular posts from this blog

Trimming strings in action parameters in ASP.Net Web API

Composing Expressions in C#