Login or Sign Up to become a member!
LessThanDot Sit Logo

LessThanDot

Data Management

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot friendfeed Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Your profile

Authors

Search

XML Feeds

Google Ads

« Push your DBA skills over the edge by becoming a better developerA couple of ways of getting the top 2 distinct values from a set in SQL Server »
comments
Rate Post:
submit to reddit Digg!FacebookDotnetkicks

I encountered this question last October in one of the forums I'm frequent. The question was: given the table with the keywords, find all records from some table which would include all these words (Implementing search with AND keyword).

Follow up:

From the first glance the problem looks trivial, but for me the solution was not very obvious and I used Borislav Borissov's help in solving it.

The code bellow illustrates the problem and the solution I found.

  1. DECLARE @MyTable TABLE (Id INT IDENTITY(1,1), Searched VARCHAR(200))
  2. DECLARE @Keys    TABLE (Word VARCHAR(200), WordId INT IDENTITY(1,1))
  3.  
  4. INSERT INTO @MyTable VALUES ('Mother Father Daughter Son')
  5. INSERT INTO @MyTable VALUES ('Mother Daughter Son')
  6. INSERT INTO @MyTable VALUES ('Mother Son')
  7. INSERT INTO @MyTable VALUES ('Daughter Son')
  8. INSERT INTO @MyTable VALUES ('Mother Father Son')
  9. INSERT INTO @MyTable VALUES ('Son Daughter Father')
  10. INSERT INTO @MyTable VALUES ('Mother Son')
  11. INSERT INTO @MyTable VALUES ('Other Word')
  12. INSERT INTO @MyTable VALUES ('Mother Father Daughter Brother Son')
  13. INSERT INTO @MyTable VALUES ('Mother Daughter Son Stepdaughter')
  14. INSERT INTO @MyTable VALUES ('Mother Son And Stepson and Daughter and Father and Grandfather')
  15. INSERT INTO @MyTable VALUES ('Daughter Son Family')
  16. INSERT INTO @MyTable VALUES ('Mother Brother Father Son Orphan')
  17. INSERT INTO @MyTable VALUES ('Son or Daughter or Father')
  18. INSERT INTO @MyTable VALUES ('Mother And Son')
  19. INSERT INTO @MyTable VALUES ('Other Word One More')
  20.  
  21. INSERT INTO @Keys VALUES ('Mother')
  22. INSERT INTO @Keys VALUES ('Father')
  23. INSERT INTO @Keys VALUES ('Son')
  24. INSERT INTO @Keys VALUES ('Daughter')
  25.  
  26. DECLARE @nAllWords INT
  27. SELECT @nAllWords = COUNT(*) FROM @Keys
  28.  
  29. SELECT MyTable.*
  30. FROM @MyTable MyTable
  31. INNER JOIN (SELECT MyTable.Id
  32.                    FROM @MyTable MyTable
  33.             INNER JOIN @Keys KeyWords ON ' ' + MyTable.Searched + ' ' LIKE '% ' + KeyWords.Word  + ' %'
  34.             GROUP BY MyTable.Id
  35.             HAVING COUNT(DISTINCT(KeyWords.Word)) = @nAllWords) Tbl1 ON MyTable.Id = Tbl1.Id

The above is the original test case and the solution we implemented with Borislav.

Based on Nikola's comments I re-worked the test case. The last suggestion performs the best. We can get further improvement if we include '% ' in our Keys and WordsToExclude tables. For the explanation of the last Nikola's query see http://msdn.microsoft.com/en-us/library/ms178543.aspx

See also this blog ALL, ANY and SOME: The Three Stooges

  1. USE [AllTests]
  2. GO
  3.  
  4. /****** Object:  Table [dbo].[RealTest]    Script Date: 08/19/2009 19:39:24 ******/
  5. SET ANSI_NULLS ON
  6. GO
  7.  
  8. SET QUOTED_IDENTIFIER ON
  9. GO
  10.  
  11. SET ANSI_PADDING ON
  12. GO
  13.  
  14. IF NOT EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[RealTest]') AND type in (N'U'))
  15. BEGIN
  16. CREATE TABLE [dbo].[RealTest](
  17.     [Id] [INT] IDENTITY(1,1) NOT NULL,
  18.     [Searched] [VARCHAR](200) NULL,
  19.  CONSTRAINT [PK_RealTest] PRIMARY KEY CLUSTERED
  20. (
  21.     [Id] ASC
  22. )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
  23. ) ON [PRIMARY]
  24. END
  25. GO
  26.  
  27. --ALTER TABLE [dbo].[RealTest] DISABLE CHANGE_TRACKING
  28. GO
  29.  
  30. SET ANSI_PADDING OFF
  31. GO
  32.  
  33. USE [AllTests]
  34. GO
  35.  
  36. USE [AllTests]
  37. GO
  38.  
  39. /****** Object:  Index [PK_RealTest]    Script Date: 08/19/2009 19:53:06 ******/
  40. IF NOT EXISTS (SELECT * FROM sys.indexes WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[RealTest]') AND name = N'PK_RealTest')
  41. ALTER TABLE [dbo].[RealTest] ADD  CONSTRAINT [PK_RealTest] PRIMARY KEY CLUSTERED
  42. (
  43.     [Id] ASC
  44. )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
  45. GO
  46.  
  47.  
  48. /****** Object:  Index [IX_RealTest]    Script Date: 08/19/2009 19:52:29 ******/
  49. IF NOT EXISTS (SELECT * FROM sys.indexes WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[RealTest]') AND name = N'IX_RealTest')
  50. CREATE NONCLUSTERED INDEX [IX_RealTest] ON [dbo].[RealTest]
  51. (
  52.     [Searched] ASC
  53. )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
  54. GO
  55.  
  56.  
  57.  
  58.  
  59. DECLARE @Keys    TABLE (Word VARCHAR(200), WordId INT IDENTITY(1,1))
  60. DECLARE @WordsToExclude TABLE (Word VARCHAR(200), WordID INT IDENTITY(1,1))
  61. DECLARE @I INT = 1
  62. SET NOCOUNT ON
  63. SELECT COUNT(*) FROM RealTest  
  64. WHILE @I<10000
  65. BEGIN
  66. INSERT INTO RealTest  VALUES ('Mother Father Daughter Son iteration' + CAST(@i AS VARCHAR(10)))
  67. INSERT INTO RealTest VALUES ('Mother Daughter Son iteration' + CAST(@i AS VARCHAR(10)))
  68. INSERT INTO RealTest VALUES ('Mother Son iteration' + CAST(@i AS VARCHAR(10)))
  69. INSERT INTO RealTest VALUES ('Daughter Son')
  70. INSERT INTO RealTest VALUES ('Mother Father Son')
  71. INSERT INTO RealTest VALUES ('Son Daughter Father')
  72. INSERT INTO RealTest VALUES ('Mother Son')
  73. INSERT INTO RealTest VALUES ('Other Word')
  74. INSERT INTO RealTest VALUES ('Mother Father Daughter Brother Son')
  75. INSERT INTO RealTest VALUES ('Exclude Mother Father Daughter Brother Son Orphan')
  76. INSERT INTO RealTest VALUES ('Exclude Mother Father Daughter Brother Son Orphan')
  77. INSERT INTO RealTest VALUES ('MotherFatherDaughterBrotherSon')
  78. INSERT INTO RealTest VALUES ('Exclude Mother Father Daughter Son Stepdaughter')
  79. INSERT INTO RealTest VALUES ('Brother Mother Father Daughter Son Stepdaughter')
  80. INSERT INTO RealTest VALUES ('Mother Son And Stepson and Daughter and Father and Grandfather')
  81. INSERT INTO RealTest VALUES ('Daughter Son Family')
  82. INSERT INTO RealTest VALUES ('Mother Brother Father Daughter Son Orphan')
  83. INSERT INTO RealTest VALUES ('Son or Daughter or Father')
  84. INSERT INTO RealTest VALUES ('Mother And Son')
  85. INSERT INTO RealTest VALUES ('Other Word One More')
  86.  
  87. SET @i = @i +1
  88. END
  89.  
  90. INSERT INTO @Keys VALUES ('Mother')
  91. INSERT INTO @Keys VALUES ('Father')
  92. INSERT INTO @Keys VALUES ('Son')
  93. INSERT INTO @Keys VALUES ('Daughter')
  94.  
  95. INSERT INTO @WordsToExclude VALUES ('Exclude')
  96. INSERT INTO @WordsToExclude VALUES ('Stepdaughter')
  97.  
  98. SELECT COUNT(*) FROM RealTest
  99.  
  100. SET STATISTICS TIME ON
  101. DECLARE @nAllWords INT
  102. SELECT @nAllWords = COUNT(*) FROM @Keys
  103.  
  104. SELECT MyTable.*
  105. FROM RealTest MyTable
  106. INNER JOIN (SELECT MyTable.Id
  107.                    FROM RealTest MyTable
  108.             INNER JOIN @Keys KeyWords ON ' ' + MyTable.Searched + ' ' LIKE '% ' + KeyWords.Word  + ' %'
  109.             WHERE not exists (SELECT 1 FROM RealTest M INNER JOIN @WordsToExclude W ON ' ' + M.Searched + ' ' LIKE '%' + W.Word + ' %' and M.Id = MyTable.Id)
  110.             GROUP BY MyTable.Id
  111.             HAVING COUNT(DISTINCT(KeyWords.Word)) = @nAllWords) Tbl1 ON MyTable.Id = Tbl1.Id --order by MyTable.Id
  112.  
  113. PRINT 'Regular INNER JOIN ' +  CAST(@@ROWCOUNT AS VARCHAR(10))
  114.            
  115. SELECT x.Id,
  116.        x.Searched
  117.   FROM ( SELECT m.Id,
  118.                 m.Searched,
  119.                 (SELECT COUNT(*)
  120.                    FROM @Keys k
  121.                   WHERE ' '+ m.Searched+' ' like '% ' + k.Word + ' %'
  122.                 ) AS n, (SELECT COUNT(*) FROM @WordsToExclude W WHERE ' ' + m.Searched + ' ' like '% ' + W.Word + ' %') AS WE
  123.            FROM RealTest m
  124.        ) AS x
  125.  WHERE n=@nAllWords and isnull(WE,0) = 0 --order by x.Id
  126.  
  127. PRINT 'Subquery solution ' +  CAST(@@ROWCOUNT AS VARCHAR(10))        
  128.  
  129. SELECT x.Id,
  130.        x.Searched
  131.   FROM ( SELECT m.Id,
  132.                 m.Searched,
  133.                 ( SELECT COUNT(*)
  134.                     FROM @Keys k
  135.                    WHERE ' '+ m.Searched+' ' like '% ' + k.Word + ' %'
  136.                 ) AS n
  137.            FROM RealTest m
  138.           WHERE not exists ( SELECT *
  139.                                FROM @WordsToExclude W
  140.                               WHERE ' ' + m.Searched + ' ' like '% ' + W.Word + ' %'
  141.                            )
  142.        ) AS x
  143.  WHERE n=@nAllWords
  144.  
  145. PRINT 'Optimized Nikola''s  solution #1 ' +  CAST(@@ROWCOUNT AS VARCHAR(10))        
  146.  
  147. SELECT m.Id, m.Searched
  148. FROM RealTest m
  149. WHERE not exists (SELECT * FROM @WordsToExclude W WHERE ' ' + m.Searched + ' ' like '% ' + W.Word + ' %')
  150. and 1 = ALL ( SELECT CASE WHEN ' ' + m.Searched + ' ' like '% ' + k.Word + ' %' THEN 1 ELSE 0 END FROM @Keys k)
  151.  
  152. PRINT 'Optimized Nikola''s  solution #2 ' +  CAST(@@ROWCOUNT AS VARCHAR(10))        
  153.  
  154.    
  155. SET STATISTICS TIME OFF

with the result on my PC
SQL Server Execution Times:

SQL Server Execution Times:
CPU time = 6599 ms, elapsed time = 9802 ms.
Regular INNER JOIN 79992


SQL Server Execution Times:
CPU time = 4961 ms, elapsed time = 6697 ms.
Subquery solution 79992


SQL Server Execution Times:
CPU time = 5366 ms, elapsed time = 6716 ms.
Optimized Nikola's solution #1 79992


SQL Server Execution Times:
CPU time = 2901 ms, elapsed time = 3670 ms.
Optimized Nikola's solution #2 79992

1764 views
submit to reddit Digg!FacebookDotnetkicks

Comments and Feedback

20 comments

Comment from: George Mastros [Member] Email
*****
Excellent post!
07/08/09 @ 20:27
Comment from: SQLDenis [Member] Email
*****
Very nice Naomi and congratz on the first post :-)
07/08/09 @ 21:03
Comment from: chrissie1 [Member] Email
*****
Welcome to the club.
07/08/09 @ 23:47
Comment from: niikola [Member] Email
What about next one, should be faster:

Select x.Id,
       x.Searched
  from ( Select m.Id,
                m.Searched,
                (Select Count(*)
                   From @Keys k
                  Where ' '+ m.Searched+' ' like '% ' + k.Word + ' %'
                ) as n
           from @myTable m
       ) as x
 Where n=(Select COUNT(*) From @Keys)
07/09/09 @ 02:08
Comment from: onpnt [Member] Email
*****
Nice!
07/09/09 @ 08:46
Comment from: Naomi [Member] Email
Hi Niikola,

I would need to test your solution - need to create a good test scenario and find time for this as well. To me this one is slightly harder to understand than the solution I posted.

Also today there was a little twist in the original problem - find all records including all the words, but excluding some other words (list of words). I have an idea, but don't have time to think / test in more details.
07/09/09 @ 09:10
Comment from: Naomi [Member] Email
Hi Niikola,

Your solution proved to be much faster. Also I added an ability to exclude certain words - see, if you can suggest a different idea.
07/11/09 @ 22:35
Comment from: Eralper [Visitor] · http://www.kodyaz.com
*****
Hi Naomi,

I liked much the trick you separate the sentence into words for the inner join using the below code fragment.
Excellent!

ON ' ' + MyTable.Searched + ' ' LIKE '% ' + KeyWords.Word + ' %'
07/12/09 @ 23:38
Comment from: Sridhar [Visitor] · http://sridharbabuk.blogspot.com/
*****
Excellent idea... we easily extend the solution in the same way how we can use Contains/FreeText
and ContainsTable/FreeTextTable in Full Text Search.
07/20/09 @ 22:43
Comment from: someguy198650 [Visitor]
*****
Nice,
gotta try that one.
07/24/09 @ 03:18
Comment from: niikola [Member] Email
*****
Slightly faster version with "Words to Exclude"

SELECT x.Id,
       x.Searched
  FROM ( SELECT m.Id,
                m.Searched,
                ( SELECT COUNT(*)
                    FROM @Keys k
                   WHERE ' '+ m.Searched+' ' like '% ' + k.Word + ' %'
                ) AS n
           FROM @myTable m
          WHERE not exists ( SELECT *
                               FROM @WordsToExclude W
                              WHERE ' ' + m.Searched + ' ' like '% ' + W.Word + ' %'
                           )
       ) AS x
 WHERE n=@nAllWords
08/18/09 @ 06:45
Comment from: niikola [Member] Email
It would be nice if we can use next syntax:

SELECT m.Id,
m.Searched
FROM @myTable
WHERE m.Searched LIKE ALL ( SELECT '% ' + k.Word + ' %' FROM @Keys k)
and not exists (SELECT * FROM @WordsToExclude W WHERE ' ' + m.Searched + ' ' like '% ' + W.Word + ' %')

but unfortunately ALL does not work with LIKE operator :-(
08/18/09 @ 07:03
Comment from: niikola [Member] Email

I forgot to add some comment why this query is faster, although I believe all of you know that already.


As WHERE clause is executed before SELECT part, it means sub query in select clause will not be executed for rows that contains at least one unwanted word, which further means less CPU and faster execution.


Changing sub query for unwanted words from (SELECT COUNT(*)...) to NOT EXISTS (...) reduces number of rows affected and number of LIKE comparisons because it will stop execution on the first match.


Effectively, instead of having nWantedWords+nUnwantedWords comparisons for each row, we will have - in case of rows that have at least one unwanted word - no one comparison with wanted words and between 1 and nUnwantedWords comparisons against unwanted ones


Rows without unwanted words will still have same number of comparisons, but without aggregates for unwanted words.

08/18/09 @ 09:06
Comment from: niikola [Member] Email
Maybe we could us ALL with a bit different sub query, although I'm not sure we would get any improvement:

SELECT m.Id, m.Searched
FROM @myTable
WHERE not exists (SELECT * FROM @WordsToExclude W WHERE ' ' + m.Searched + ' ' like '% ' + W.Word + ' %')
and 1 = ALL ( SELECT Case When ' ' + m.Searched + ' ' like '% ' + k.Word + ' %' Then 1 Else 0 End FROM @Keys k)
08/18/09 @ 09:13
Comment from: niikola [Member] Email
Me again :-)

1 = ALL (Select...)

should improve performance as ALL will stop execution of sub query for the first row in @Keys that doesn't mach LIKE criteria - which means less comparisons in total
08/18/09 @ 09:18
Comment from: Naomi [Member] Email
Niikola - thanks a lot for your comments, I'll try different variations at the first opportunity.
08/18/09 @ 15:37
Comment from: niikola [Member] Email
This is the last one, I promise :-)

Results with real tables (not the table variables) on both, SQL Server 2005 and 2008 makes difference between original and modified queries much smaller. It seems that joins with table variables are very costly (especially in 2008). Adding PKs and indexes makes differences even smaller, but still visible.

The best one is the last one (with 1=ALL()), followed by UnwantedWords in Where clause as second, 2 subqueries in select 3rd and JOIN 4th.

Another interesting fact is that comparing Execution Plans and estimated cost for each of the queries shows completely opposite order of efficiency.

Further improvement (not a big one, but affects all solutions) would be

Update @Keys Set Word = '% '+word+' %'
Update @WordsToExclude Set Word = '% '+word+' %'

In that case LIKE comparisons would be
' ' + M.Searched + ' ' LIKE Word
without unnecessary concatenation in every step

08/19/09 @ 08:57
Comment from: Naomi [Member] Email
Hi Niikola,

Thanks again and keep them coming - I'm sorry I didn't have a chance to play with it yesterday and not sure would have time this week, but I plan to test and update the blog - thanks again.
08/19/09 @ 10:31
Comment from: Naomi [Member] Email
I've updated the blog with your comments - thanks again.
08/19/09 @ 19:10
Comment from: Ben [Visitor] · http://www.geekzrule.com
*****
Thanks for this! This absolutely saved my bacon, and provided an awesome solution where I could not use Full Text Indexing. Thanks again!
09/12/09 @ 20:51

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)