Monday, October 29, 2012

Random Rows or Records from SQL for a specific number of rows.

Hack the Gibson

Unfortunately, SQL does not create random things very well.  Thus, selecting a row or record at random is not as straight forward as one would like to think.  So how does one get a random row or record from the database.

Why would we want to?
Let’s take an example where you want to display a random tip on your website.  Perhaps you own an auto mechanic site and want to display some tips for maintaining your car.  I actually own a Flight School, so one of my sites displays random question/answer for best practices of flying.  Lastly, perhaps you want to display one of your top 10 read posts on your blog.  Have it randomly select a record from your top ten all time greatest posts.  There are many reasons on why you might want to randomly select a record.

How do we do it?
Well we have to actually trick SQL into generating a random number and then picking from that number.  We also have to decide how many rows we want returned.  This is for obvious reasons, but how we accomplish the number of rows we want is actually in a different part of the select statement from where we tell SQL to select a random row.  Let’s take a look at the number of rows to return first, as it is simpler.

Return a specific number of rows
Let’s fire up the old Northwinds database and use the Customers table.  If we were to write the SQL statement in English it would look like -
Select the top five from the customers table and include the CustomerID, CompanyName, and ContactName columns.  That’s simple enough – lets look at the code:

SELECT 
TOP
5
[CustomerID]
, [CompanyName]
, [ContactName]
FROM [Northwind].[dbo].[Customers]

Now if we execute this we get -


image



Keywords here are Top and the number 5.  Suppose instead of 5 we put 10.  We get 10 records.


A little talked about secret of SQL is that it is far better at working with recordsets than the .Net Framework is.  So if you are looking at massaging the recordset as a whole, you want to do it in SQL rather than the framework.  My own rule of thumb is “If I can do it in SQL”…. I do.  Point blank.


So how do we get a single row?  That’s right, we say Top 1 in the select statement.


Additionally, we can add a where clause to limit the scope of the Top x records:




SELECT 
TOP
5
[CustomerID]
, [CompanyName]
, [ContactName]
FROM [Northwind].[dbo].[Customers]
Where Country = 'USA'

Notice that my Where clause contains a column that is not in my select statement.  This is because regardless of the select criteria, SQL will gather the entire record for the records that meet the where (or having) clause from an already created recordset that was created from the entire table.  It will then start removing records that does not fit the where clause.  From there it will remove the columns that have not been selected, and finally the user is presented with the recordset.



The GUID…


Remember how I said that SQL is not all that efficient in creating random numbers?  Well it actually, does extremely well at creating one specific uniquely random number called a GUID.  It is pronounced – goo-wee-eye-dee. 
What is a GUID?


A GUID is a 128-bit value that is displayed as 32 hexadecimal digits grouped by hyphens.  A GUID actually contains 6 fixed parameter bits and 122 random bits.  The total number of possible GUID combinations is 2122 which is a number so incredibly great, big, huge, gigantically large that the chances of randomly producing two identical numbers are nearly impossible (and if you do, you need to play the lottery). 



How do we produce a GUID?

SQL has a special function for this.  Specifically the NewID() function.  So, in SSMS type in NewID() like so:




Select NEWID()

You end up with a generated GUID -


image



Select a random record

Taking our new found function that we just learned about (NewID()) and remembering how specifically, SQL selects recordsets – we can easily apply the function so that it will randomly generate a number and attach it to the record, select one of those records, and then dismiss the generated number.  We do this by using the Order By clause.  In the stack SQL determines the source from the From statement, then moves to the Where clause to eliminate records, then sorts the list by the Order By clause and finally moves to the select statement to select the columns you asked for, like so:




SELECT 
TOP
1
[CustomerID]
, [CompanyName]
, [ContactName]
FROM [Northwind].[dbo].[Customers]
Where Country = 'USA'
Order By NEWID()

Notice I have a Top 1 statement.  This should only give me one record where we had already established that there were at least 5 in our previous statement.  Now if we execute it, it will give us one generated recordset -


image



Continuously press Execute.  It will randomly pick records.  Why then, do records reappear? Well remember that we are generating random numbers into a temp column and then selecting the top 1 of that.  So if the first record has a 123 it will obviously beat out a 567.  But what if it is assigned 126?  It still beats out 567.  The point is that just because it appears twice in a row does not mean that SQL is assigning it the same random number, just means that the number it was assigned beat out all the others twice in a row.

The less number of rows you have to pick the top 1 from, the more often the 1 will reappear.  For instance, if you have 1 record – it will naturally appear all the time randomly.


If you have two rows, the law of averages suggest it should appear half the time.


Three rows?  Well it should appear a third of the time.


Does it always work out this way?  No. But it is just a rule of thumb.



Additional Notes

Just for S&G’s, you can also create GUIDs in the .Net Framework.




Public Function createGUID() As Guid
createGUID = New Guid
End Function



Or in long form… It would look like:




Public Function createGUID() As Guid
Dim newID As Guid = New Guid
Return newID
End Function




Håþþ¥ .ñꆆïñg…

Hack the Gibson

No comments:

Post a Comment