Wednesday, October 10, 2012

Another look at Comparisons between two SQL tables, columns, and rows.

Hack the Gibson
In a previous article, I showed how to compare two complete SQL tables (article can be found here).  What if you only wanted to compare certain fields and wanted to know the difference of only those fields.  For instance, we wanted a report from the Customers table and Order Information table to see what the difference is between customer addresses and shipping addresses.  Typically, a good DBA would have the more prominent customers table updating the shipping table or even better a good developer would perform a join from the shipping table with a primary key to the customer table and get the address that way or create a view that automatically updates, thus there is nothing to keep in sync. – However, that won’t do for my hypothetical lookup.  So, in this example we have two tables that are supposed to have exactly the same customer data (address = shipping address) and we want to look at the differences.  To further complicate this, we want it in report format that will tell us the customer ID, customer name, field that is different, and the value from each table so we can decide which one is correct.  Sound like a lot?  It’s actually easier in code than it is to write what we want.
Assumptions:
We need to make an assumption here.  The report will only include accounts that are present in both tables.  However, with a little Left Outter Join magic you could easily remove this assumption and list accounts that do not exist in both tables (beyond the scope of this article).
Psuedocode:
What we will do, is create a temporary table to house the account information, customer name, name of field that is different, and the values of both fields in two separate columns.  Then we will select those records that are different.  So essentially our code will look like:
Select
        accountID
    ,    customername
    ,    fieldname
    ,    valueoftableA
    ,    valueoftableB
From customerTable cust
    Inner Join shippingTable ship 
        On cust.accountID = ship.accountID

To populate the temporary table.  To select our final results, we will do a query like so:




Select
        accountID        As    'Account #'
    ,    customername    As    'Customer Name'
    ,    fieldname        As    'Field With Difference'
    ,    valueoftableA    As    'Value of Customers Table'
    ,    valueoftableB    As    'Value of Shipping Table'From 
    temporarytable
Where
    valueoftableA <> valueoftableB
-- Can also be written as valueoftableA != valueoftableBOrder By accountID, customername

So this is our psuedocode to accomplish our stated task.  Let’s look at the code in a real example.

Practical Example:


First we need to create our temporary table:




Create Table
        #tempAddresses
        (
                AccountID        Int
            ,    CustName        Varchar(100)
            ,    FieldName        Varchar(50)
            ,    CustTableValue    Varchar(250)
            ,    ShipTableValue    Varchar(250)
        )

Next, we need to Insert the values from both tables into the temporary tables.  We will need to do this for each field we want to look at.  Our example would look like:




/* Street Address */Insert
    Into
        #tempAddresses
            Select
                    cust.AccountID
                ,    cust.customerName
                ,    'Street Address'
                ,    cust.StreetAddress
                ,    ship.StreetAddress
            From
                dbo.customers cust
            Inner Join
                dbo.shipping ship
                    On ship.AccountID = cust.AccountID
/* City */Insert
    Into
        #tempAddresses
            Select
                    cust.AccountID
                ,    cust.customerName
                ,    'City'
                ,    cust.City
                ,    ship.City
            From
                dbo.customers cust
            Inner Join
                dbo.shipping ship
                    On ship.AccountID = cust.AccountID
/* State */Insert
    Into
        #tempAddresses
            Select
                    cust.AccountID
                ,    cust.customerName
                ,    'State'
                ,    cust.[State]
                ,    ship.[State]
            From
                dbo.customers cust
            Inner Join
                dbo.shipping ship
                    On ship.AccountID = cust.AccountID
/* Zip */Insert
    Into
        #tempAddresses
            Select
                    cust.AccountID
                ,    cust.customerName
                ,    'Zip'
                ,    cust.Zip
                ,    ship.Zip
            From
                dbo.customers cust
            Inner Join
                dbo.shipping ship
                    On ship.AccountID = cust.AccountID


Now our table should be populated with the information that we want to compare.  Thus, we need to extract (or Select) the data that is different:



/* Select Different Records */Select
        AccountID        As 'Account #'
    ,    CustName        As 'Customer Name'
    ,    FieldName        As 'Field'
    ,    CustTableValue    As 'Customers Value'
    ,    ShipTableValue    As 'Shipping Value'From 
    #tempAddresses
Where
    CustTableValue <> ShipTableValue
-- Also written as CustTableValue != ShipTableValueOrder By 
    AccountID, CustName

Of course our last step is to perform a little cleanup and remove the temp table.  Failure to do so will keep the table in the system and will cause errors in the future:




/* Perform CleanUp */Drop
    Table
        #tempAddresses

Results:


What we eventually end up with is a list by Account Number, Customer Name that tells us the field name and values of both tables – but only if they are different, like so:
























    







Account # Customer Name Field Customers Value Shipping Value
ABC123 John Doe City Atlanta Dallas
DEF456Jane Smith Zip 90210 90120


Most likely, if the Street Address, City, State, or Zip is different, they are all different but now you are just being picky and need to stay within the concept of my hypothetical problem.  Work with me here..  I’m not responsible for data integrity, just to provide a method so that you can see where you screwed it up.


At any rate, we can now drop the dataset (or return set) into a report, gridview… etc…


Entire Code:


This is the code in its entirety:




Create Table
        #tempAddresses
        (
                AccountID        Int
            ,    CustName        Varchar(100)
            ,    FieldName        Varchar(50)
            ,    CustTableValue    Varchar(250)
            ,    ShipTableValue    Varchar(250)
        )/* Street Address */Insert
    Into
        #tempAddresses
            Select
                    cust.AccountID
                ,    cust.customerName
                ,    'Street Address'
                ,    cust.StreetAddress
                ,    ship.StreetAddress
            From
                dbo.customers cust
            Inner Join
                dbo.shipping ship
                    On ship.AccountID = cust.AccountID
/* City */Insert
    Into
        #tempAddresses
            Select
                    cust.AccountID
                ,    cust.customerName
                ,    'City'
                ,    cust.City
                ,    ship.City
            From
                dbo.customers cust
            Inner Join
                dbo.shipping ship
                    On ship.AccountID = cust.AccountID
/* State */Insert
    Into
        #tempAddresses
            Select
                    cust.AccountID
                ,    cust.customerName
                ,    'State'
                ,    cust.[State]
                ,    ship.[State]
            From
                dbo.customers cust
            Inner Join
                dbo.shipping ship
                    On ship.AccountID = cust.AccountID
/* Zip */Insert
    Into
        #tempAddresses
            Select
                    cust.AccountID
                ,    cust.customerName
                ,    'Zip'
                ,    cust.Zip
                ,    ship.Zip
            From
                dbo.customers cust
            Inner Join
                dbo.shipping ship
                    On ship.AccountID = cust.AccountID
/* Select Different Records */Select
        AccountID        As 'Account #'
    ,    CustName        As 'Customer Name'
    ,    FieldName        As 'Field'
    ,    CustTableValue    As 'Customers Value'
    ,    ShipTableValue    As 'Shipping Value'From 
    #tempAddresses
Where
    CustTableValue <> ShipTableValue
-- Also written as CustTableValue != ShipTableValueOrder By 
    AccountID, CustName
/* Perform CleanUp */Drop
    Table
        #tempAddresses

Hack the Gibson

Håþþ¥ .ñꆆïñg…

No comments:

Post a Comment