Thursday, September 6, 2012

Searching SQL tips.. Multiple Fields… Phonetically… Without Full Text Index…

Searching Multiple Fields

Full Text Indexing takes up a lot of precious space on a HDD.  The Indexing takes a lot of precious resources to keep the index up to date.  This might not be a good fit for all instances in which you only want to search a couple fields but not the entire table or database.
Let’s take this example into practical means.  We have the Northwind database (Yes, I’m going there again).  We want to be able to search the Company Name, Contact Name, Address, City, or postal code from the Customers table so that we can find a customer (or list of customers) rather quickly.
We could create a site with textboxes for each column and create different searches for each and every field using the LIKE function in SQL.  This seems like a lot of work just to accomplish something rather simple.  I bet there is a better way to accomplish this task.  Let’s take a look.
We’ll start simple by selecting the fields we want:

SELECT [CustomerID]
,[CompanyName]
,[ContactName]
,[ContactTitle]
,[Address]
,[City]
,[PostalCode]
FROM
[Northwind].[dbo].[Customers]


Ahh, easy enough. This however, is a simple select statement and not a search like we want.  This is also where SQL Functions will work for us. Enter the LIKE Function:



SELECT [CustomerID]
,[CompanyName]
,[ContactName]
,[ContactTitle]
,[Address]
,[City]
,[PostalCode]
FROM
[Northwind].[dbo].[Customers]
Where CompanyName LIKE 'be'


If we execute this script, we get no results.  That is because LIKE uses pattern matching.  In this particular case, CompanyName would have to be equal to be in order for us to see results.  So think of the Where clause as saying – Where CompanyName has the pattern of be.  Now you can see why we get no results.  So how do we get it to return results where our pattern is in the field?  There are two WILDCARDS in SQL.  They are the percent sign (%) and the underscore (_). 
The underscore matches any ONE character.  For instance:




  • 1DFG would match _DFG


  • DFG3 would match DFG_ 


  • 1DFG3 would match _DFG_


  • 1DFG would NOT match DFG_



The percent sign matches one OR MORE characters.  For instance:




  • happier would match %pier


  • pierview would match pier%


  • happierview would match %pier%


  • happier would NOT match pier%



 



So if we change our WHERE clause to:



Where CompanyName LIKE '%be'

That would mean that any returns we get would have be as the last two characters in the CompanyName field.  If we change it to:




Where CompanyName LIKE 'be%'

That means be are the first two characters of the CompanyName.  As you can see we get a single return:


image



Finally, if we change it to:



Where CompanyName LIKE '%be%'

That means be needs to be a pattern in the CompanyName field.  Where you will notice that case is not important.  Thus, if we execute we get four (4) returns:


image



Ok so, now that we understand pattern matching, how do we do this on multiple fields?  Since we don’t care how our data in our Where clause looks, we can combine fields and then search those for our pattern:



SELECT [CustomerID]
,[CompanyName]
,[ContactName]
,[ContactTitle]
,[Address]
,[City]
,[PostalCode]
FROM
[Northwind].[dbo].[Customers]
Where
[CompanyName] +
[ContactName] +
[ContactTitle] +
[Address] +
[City] +
[PostalCode]
LIKE '%be%'

Notice if you Execute this, you get 21 rows.  When searching for your pattern, remember it can be in any of the fields (except CustomerID because it is not included in the Where clause).  For instance the second Row – CustomerID BERGS has several patterns that match the criteria, Company Name, Contact Name, and Address all have our pattern in it.



Search Phonetically



Yes, it is possible to search SQL by the sound of words.  For instance: Color would match Colour.  Now SQL did not exactly just grow ears, but Microsoft did a lot of research into languages.  It came up with a pretty simple solution on how to match words that sound the same.  They take out all the vowels, assign some super complex chinese algebra to the constants and WHAM! You can match words phonetically… With a little help from a SQL Function.  Let’s take my name as an example.  Chad.  Simple enough, but you can spell it Chad or Chadd or even Chahd and it still sounds the same.  Let’s apply this logic to SQL:



Select SOUNDEX('chad')
Select SOUNDEX('chadd')
Select SOUNDEX('chahd')

The Soundex function is the function to compare strings phonetically in SQL.  What SQL does is take the first letter of the word and then assigns it a 3 number value based on the before mentioned chinese algebraic super complex algorithm.  So when we execute it, they all equal C300.  Which means they all sound the same.  Let’s change that:




Select SOUNDEX('chad')
Select SOUNDEX('jeff')
Select SOUNDEX('chahd')

Returns


C300


J100


C300


So the first and third values sound the same.  Let’s make it a little more complex:




Select SOUNDEX('chad')
Select SOUNDEX('snuffaluppogaus')
Select SOUNDEX('snuphaluppogaus')

Notice the ff and ph difference between the last two values.  However, we get:


C300


S514


S514


So we can surmise that the last two entries are phonetically the same.



So going back to our Northwinds example, let’s see if we can find Berlin by spelling it phonetically the same as Burlin:



SELECT [CustomerID]
,[CompanyName]
,[ContactName]
,[ContactTitle]
,[Address]
,[City]
,[PostalCode]
FROM
[Northwind].[dbo].[Customers]
Where SOUNDEX('Burlin') = SOUNDEX(City)

And we get one return.


Pretty cool stuff… But wait.. There’s one more magical SQL Function.  It kind of builds on Soundex.  Difference works like Soundex but instead of assigned it some erroneous letter plus value that you can’t understand, it gives it a score.  A score of 1 to 4 to be exact.  1 being, eh, kinda sounds like it to 4 being DUDE! Same word. 
So using my name again:




Select DIFFERENCE('chad', 'chahd')

Which returns a 4, meaning it is the same word – you just suck at spelling.  Let’s go back to our Northwind example:




SELECT [CustomerID]
,[CompanyName]
,[ContactName]
,[ContactTitle]
,[Address]
,[City]
,[PostalCode]
FROM
[Northwind].[dbo].[Customers]
Where DIFFERENCE([City], 'Burlin') > 3

And…. When Executed we end up with the same row that we had before.



Håþþ¥ .ñꆆïñg…

No comments:

Post a Comment