Full text search in Entity Framework 6 using command interception
Background
One of the main problems we encountered using Entity Framework was its lack of support for full text search. Our search function is based around a series of processors that attempt to match the incoming phrase against a set of patterns that we're expecting (10+). For example, there's one for dates and times, recent financial quarters as well as more domain specific ones. Everything is cached and so is really fast, except when it isn't:
Here's the monitoring chart for our SQL DB in Azure
That's a spike to 100% resource utilisation because we fall through to a phrase search as a last resort (which cannot be cached) - it's exceedingly rare but it still hurts.
Before EF, the app used full text search and it was one of the casualties when the app was migrated to EF4 several years ago. The lack of full text search has been getting more and more painful as the amount of data has grown. We really needed full text search back...
Command interception
Entity Framework 6 allows one to intercept a command before (or after) it has been executed against the database. The interface you have to implement is IDbCommandInterceptor which supplies the hooks to adjust the query just before it gets executed. For example, Rowan Miller did a demonstration of command interception at Tech Ed 2014 to implement soft deleting entities.
The solution
My initial searching led me to this: http://www.entityframework.info/Home/FullTextSearch which got me most of the way there but there were a couple of significant problems worth mentioning.
1. You need to make sure your parameter is a valid full text search string
2. You must ensure that the full text search predicate will be applied to a table not a derived table.
There's a lot of information about problem 1 but problem 2 is a bit more interesting:
This SQL will error:
The error is:
Msg 7601, Level 16, State 3, Line 5
Cannot use a CONTAINS or FREETEXT predicate on column 'Body' because it is not full-text indexed.
1. You need to make sure your parameter is a valid full text search string
2. You must ensure that the full text search predicate will be applied to a table not a derived table.
There's a lot of information about problem 1 but problem 2 is a bit more interesting:
This SQL will error:
SELECT * FROM ( SELECT Body FROM postings ) AS Extent1 WHERE CONTAINS(Extent1.Body, 'apple')
The error is:
Msg 7601, Level 16, State 3, Line 5
Cannot use a CONTAINS or FREETEXT predicate on column 'Body' because it is not full-text indexed.
The problem is that you can't always easily control how EF is going to generate your SQL when queries get complicated.
The workaround I'm currently using (which seems to work) is to always force the Full Text Index predicate into a separate sub query so the predicate is always executed against the base table rather than an intermediate result set.
For example, say I have a method to search blog posts that can take a phrase to full text search with as well as other parameters. I'd write the full text search bit as below:
public IEnumerable<BlogPost> SearchBlogPosts(
string phrase,
Category category,
DateTime publishedBefore)
{
using (var context = new MyContext())
{
var searchPhrase = FullTextSearchUtilities.AddFullTextSearchPrefix(phrase);
var results = context.BlogPosts.Where(post =>
post.Categories.Any(c => c.Id == category.Id));
results = result.Where(post => post.PublishedDate <= publishedBefore);
// more conditions
// ** Important bit **
// Make sure FTS is fully formed subquery
var ftsResults = context.BlogPosts.Where(post =>
post.Body.Contains(searchPhrase))
.Select(post => post.Id);
return results.Where(post => ftsResults.Contains(post.Id)).ToList();
}
}
This ensures that the SQL is generated with the FTS part as a separate exists subquery:
SELECT /* Some columns */ FROM /* Potentially complicated mess */ AS [Project1] WHERE /* Some predicates */ AND EXISTS (SELECT 1 AS [C1] FROM [dbo].[BlogPost] AS [Extent1] WHERE (contains([Extent1].[Body], @p__linq__0)) AND ([Extent1].[Id] = [Project1].[Id]) )
I tried to make the SQL look a bit Entity Frameworky... Hopefully it looks reminiscent of EF's generated SQL anyway. The point is that with the pattern above, the FTS predicate seemingly can't be moved away from the base table and so never errors.
Honestly, combining full text search with other advanced filters probably isn't done by many people so this is in all likelihood a bit of an edge case. If you do come across the problem of "Cannot use a CONTAINS or FREETEXT predicate on column 'Body' because it is not full-text indexed." on a column you're sure has a full text index you'll have an idea of where to look!
This comment has been removed by the author.
ReplyDeleteHi. It's using two queries in your LINQ that makes Entity Framework generate the exists subquery. The interceptor code is only a slightly refactored version available at the linked address (http://www.entityframework.info/Home/FullTextSearch).
ReplyDeleteThis comment has been removed by the author.
ReplyDeletemany thanks for then problem2 solution!
ReplyDeleteMany thanks for problem 2.
ReplyDeleteCould you share a more complex example involving more than 1 table?
Potentially - what sort of scenario were you thinking? There would need to be a navigation property (or collection) between the entities in the tables but the same principle applies. Create a separate full text search query and then use that inside the main query (on the other table(s)).
DeleteA scenario where table A has a navigation property to table B which in turn has a navigation property to table C. I want to apply the Contains clause on a field in table C.
ReplyDeleteThank you very much for this post! Solved my issue ;)
ReplyDelete