I encountered this question last October in one of the forums I'm frequent. The question was: given the table with the keywords, find all records from some table which would include all these words (Implementing search with AND keyword).
Follow up:
From the first glance the problem looks trivial, but for me the solution was not very obvious and I used Borislav Borissov's help in solving it.
The code bellow illustrates the problem and the solution I found.
- DECLARE @MyTable TABLE (Id INT IDENTITY(1,1), Searched VARCHAR(200))
- DECLARE @Keys TABLE (Word VARCHAR(200), WordId INT IDENTITY(1,1))
- INSERT INTO @MyTable VALUES ('Mother Father Daughter Son')
- INSERT INTO @MyTable VALUES ('Mother Daughter Son')
- INSERT INTO @MyTable VALUES ('Mother Son')
- INSERT INTO @MyTable VALUES ('Daughter Son')
- INSERT INTO @MyTable VALUES ('Mother Father Son')
- INSERT INTO @MyTable VALUES ('Son Daughter Father')
- INSERT INTO @MyTable VALUES ('Mother Son')
- INSERT INTO @MyTable VALUES ('Other Word')
- INSERT INTO @MyTable VALUES ('Mother Father Daughter Brother Son')
- INSERT INTO @MyTable VALUES ('Mother Daughter Son Stepdaughter')
- INSERT INTO @MyTable VALUES ('Mother Son And Stepson and Daughter and Father and Grandfather')
- INSERT INTO @MyTable VALUES ('Daughter Son Family')
- INSERT INTO @MyTable VALUES ('Mother Brother Father Son Orphan')
- INSERT INTO @MyTable VALUES ('Son or Daughter or Father')
- INSERT INTO @MyTable VALUES ('Mother And Son')
- INSERT INTO @MyTable VALUES ('Other Word One More')
- INSERT INTO @Keys VALUES ('Mother')
- INSERT INTO @Keys VALUES ('Father')
- INSERT INTO @Keys VALUES ('Son')
- INSERT INTO @Keys VALUES ('Daughter')
- DECLARE @nAllWords INT
- SELECT @nAllWords = COUNT(*) FROM @Keys
- SELECT MyTable.*
- FROM @MyTable MyTable
- INNER JOIN (SELECT MyTable.Id
- FROM @MyTable MyTable
- INNER JOIN @Keys KeyWords ON ' ' + MyTable.Searched + ' ' LIKE '% ' + KeyWords.Word + ' %'
- GROUP BY MyTable.Id
- HAVING COUNT(DISTINCT(KeyWords.Word)) = @nAllWords) Tbl1 ON MyTable.Id = Tbl1.Id
The above is the original test case and the solution we implemented with Borislav.
Based on Nikola's comments I re-worked the test case. The last suggestion performs the best. We can get further improvement if we include '% ' in our Keys and WordsToExclude tables. For the explanation of the last Nikola's query see http://msdn.microsoft.com/en-us/library/ms178543.aspx
See also this blog ALL, ANY and SOME: The Three Stooges
- USE [AllTests]
- GO
- /****** Object: Table [dbo].[RealTest] Script Date: 08/19/2009 19:39:24 ******/
- SET ANSI_NULLS ON
- GO
- SET QUOTED_IDENTIFIER ON
- GO
- SET ANSI_PADDING ON
- GO
- IF NOT EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[RealTest]') AND type in (N'U'))
- BEGIN
- CREATE TABLE [dbo].[RealTest](
- [Id] [INT] IDENTITY(1,1) NOT NULL,
- [Searched] [VARCHAR](200) NULL,
- CONSTRAINT [PK_RealTest] PRIMARY KEY CLUSTERED
- (
- [Id] ASC
- )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
- ) ON [PRIMARY]
- END
- GO
- --ALTER TABLE [dbo].[RealTest] DISABLE CHANGE_TRACKING
- GO
- SET ANSI_PADDING OFF
- GO
- USE [AllTests]
- GO
- USE [AllTests]
- GO
- /****** Object: Index [PK_RealTest] Script Date: 08/19/2009 19:53:06 ******/
- IF NOT EXISTS (SELECT * FROM sys.indexes WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[RealTest]') AND name = N'PK_RealTest')
- ALTER TABLE [dbo].[RealTest] ADD CONSTRAINT [PK_RealTest] PRIMARY KEY CLUSTERED
- (
- [Id] ASC
- )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
- GO
- /****** Object: Index [IX_RealTest] Script Date: 08/19/2009 19:52:29 ******/
- IF NOT EXISTS (SELECT * FROM sys.indexes WHERE OBJECT_ID = OBJECT_ID(N'[dbo].[RealTest]') AND name = N'IX_RealTest')
- CREATE NONCLUSTERED INDEX [IX_RealTest] ON [dbo].[RealTest]
- (
- [Searched] ASC
- )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
- GO
- DECLARE @Keys TABLE (Word VARCHAR(200), WordId INT IDENTITY(1,1))
- DECLARE @WordsToExclude TABLE (Word VARCHAR(200), WordID INT IDENTITY(1,1))
- DECLARE @I INT = 1
- SET NOCOUNT ON
- SELECT COUNT(*) FROM RealTest
- WHILE @I<10000
- BEGIN
- INSERT INTO RealTest VALUES ('Mother Father Daughter Son iteration' + CAST(@i AS VARCHAR(10)))
- INSERT INTO RealTest VALUES ('Mother Daughter Son iteration' + CAST(@i AS VARCHAR(10)))
- INSERT INTO RealTest VALUES ('Mother Son iteration' + CAST(@i AS VARCHAR(10)))
- INSERT INTO RealTest VALUES ('Daughter Son')
- INSERT INTO RealTest VALUES ('Mother Father Son')
- INSERT INTO RealTest VALUES ('Son Daughter Father')
- INSERT INTO RealTest VALUES ('Mother Son')
- INSERT INTO RealTest VALUES ('Other Word')
- INSERT INTO RealTest VALUES ('Mother Father Daughter Brother Son')
- INSERT INTO RealTest VALUES ('Exclude Mother Father Daughter Brother Son Orphan')
- INSERT INTO RealTest VALUES ('Exclude Mother Father Daughter Brother Son Orphan')
- INSERT INTO RealTest VALUES ('MotherFatherDaughterBrotherSon')
- INSERT INTO RealTest VALUES ('Exclude Mother Father Daughter Son Stepdaughter')
- INSERT INTO RealTest VALUES ('Brother Mother Father Daughter Son Stepdaughter')
- INSERT INTO RealTest VALUES ('Mother Son And Stepson and Daughter and Father and Grandfather')
- INSERT INTO RealTest VALUES ('Daughter Son Family')
- INSERT INTO RealTest VALUES ('Mother Brother Father Daughter Son Orphan')
- INSERT INTO RealTest VALUES ('Son or Daughter or Father')
- INSERT INTO RealTest VALUES ('Mother And Son')
- INSERT INTO RealTest VALUES ('Other Word One More')
- SET @i = @i +1
- END
- INSERT INTO @Keys VALUES ('Mother')
- INSERT INTO @Keys VALUES ('Father')
- INSERT INTO @Keys VALUES ('Son')
- INSERT INTO @Keys VALUES ('Daughter')
- INSERT INTO @WordsToExclude VALUES ('Exclude')
- INSERT INTO @WordsToExclude VALUES ('Stepdaughter')
- SELECT COUNT(*) FROM RealTest
- SET STATISTICS TIME ON
- DECLARE @nAllWords INT
- SELECT @nAllWords = COUNT(*) FROM @Keys
- SELECT MyTable.*
- FROM RealTest MyTable
- INNER JOIN (SELECT MyTable.Id
- FROM RealTest MyTable
- INNER JOIN @Keys KeyWords ON ' ' + MyTable.Searched + ' ' LIKE '% ' + KeyWords.Word + ' %'
- WHERE not exists (SELECT 1 FROM RealTest M INNER JOIN @WordsToExclude W ON ' ' + M.Searched + ' ' LIKE '%' + W.Word + ' %' and M.Id = MyTable.Id)
- GROUP BY MyTable.Id
- HAVING COUNT(DISTINCT(KeyWords.Word)) = @nAllWords) Tbl1 ON MyTable.Id = Tbl1.Id --order by MyTable.Id
- PRINT 'Regular INNER JOIN ' + CAST(@@ROWCOUNT AS VARCHAR(10))
- SELECT x.Id,
- x.Searched
- FROM ( SELECT m.Id,
- m.Searched,
- (SELECT COUNT(*)
- FROM @Keys k
- WHERE ' '+ m.Searched+' ' like '% ' + k.Word + ' %'
- ) AS n, (SELECT COUNT(*) FROM @WordsToExclude W WHERE ' ' + m.Searched + ' ' like '% ' + W.Word + ' %') AS WE
- FROM RealTest m
- ) AS x
- WHERE n=@nAllWords and isnull(WE,0) = 0 --order by x.Id
- PRINT 'Subquery solution ' + CAST(@@ROWCOUNT AS VARCHAR(10))
- SELECT x.Id,
- x.Searched
- FROM ( SELECT m.Id,
- m.Searched,
- ( SELECT COUNT(*)
- FROM @Keys k
- WHERE ' '+ m.Searched+' ' like '% ' + k.Word + ' %'
- ) AS n
- FROM RealTest m
- WHERE not exists ( SELECT *
- FROM @WordsToExclude W
- WHERE ' ' + m.Searched + ' ' like '% ' + W.Word + ' %'
- )
- ) AS x
- WHERE n=@nAllWords
- PRINT 'Optimized Nikola''s solution #1 ' + CAST(@@ROWCOUNT AS VARCHAR(10))
- SELECT m.Id, m.Searched
- FROM RealTest m
- WHERE not exists (SELECT * FROM @WordsToExclude W WHERE ' ' + m.Searched + ' ' like '% ' + W.Word + ' %')
- and 1 = ALL ( SELECT CASE WHEN ' ' + m.Searched + ' ' like '% ' + k.Word + ' %' THEN 1 ELSE 0 END FROM @Keys k)
- PRINT 'Optimized Nikola''s solution #2 ' + CAST(@@ROWCOUNT AS VARCHAR(10))
- SET STATISTICS TIME OFF
with the result on my PC
SQL Server Execution Times:
SQL Server Execution Times:
CPU time = 6599 ms, elapsed time = 9802 ms.
Regular INNER JOIN 79992
SQL Server Execution Times:
CPU time = 4961 ms, elapsed time = 6697 ms.
Subquery solution 79992
SQL Server Execution Times:
CPU time = 5366 ms, elapsed time = 6716 ms.
Optimized Nikola's solution #1 79992
SQL Server Execution Times:
CPU time = 2901 ms, elapsed time = 3670 ms.
Optimized Nikola's solution #2 79992


LTD Social Sitings
Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.