Thursday, January 5, 2012

SSIS–Import Image, XML, and Unzipping

YES! We are going to do all of it in SSIS and all in the same package (Hope you like to type).
We do have a single dependency.  We need to go and download the Ionic.Zip assemblies.  These will facilitate us with unzipping our file and putting it in a directory for us.
You can download the Ionic.Zip Assembly Here.  Unzip the package into a directory where you can find it again because we are going to have to do some background work.  First and foremost we have to add the assembly to the GAC.  Microsoft thought ahead and gave us a tool to facilitate this for us.  Additionally, because Microsoft is not all that and a bag of chips, we have to use the reduced ionic.zip library.  Why? Well because most developers keep pace with current technology.  VS2010 uses .Net Framework 4.0.  However, the latest version of BIDS uses .Net Framework 2.0 – How unfortunate… For us.. Anyways, here’s what you do.

Registering Ionic.Zip to the GAC:
Locate the where your copy of the gacutil is located (Different depending on many factors).  Easiest way to determine this is to go to your C: in file File Explorer and in the search field type gacutil.exe.
Once you have its location, open a command prompt and change your directory to match where the gacutil is.  From there type:
gacutil /I c:\(directory where you unzipped the ionic.zip library and make sure to use the reduced library)\ionic.zip.dll
If you run the command correctly, it will give you notification that it was added successfully.
Ok, Now we can move into BIDS and add an SSIS Project.

The SSIS Project:
The first thing we need to do is add a GLOBAL variable called FileName of type String and no default.  You’ll see where we utilize this later.

Drag and drop a For Each Container to the workspace.  Double Left click it to get into the properties.  Click on the Collections tab on the left.  Now BIDS is rather finicky so you have to physically select Foreach File Enumerator in order to get the properties of it to show below it.
For the Folder, Browse to where you will have the .Zip file physically located.  Also retrieve the name and extension.  Now click the Variable Mappings.  This is where we choose the FileName which turns into User::FileName (Because it is a user created variable).

Now to do the Unzip.

Drag and drop a Script Task and drop it INSIDE the For Each Task.  You should physically see it located inside the For Each Loop.  Now we need to create two more GLOBAL variables.  SourceFolder and DestinationFolder that are of type String and the default for SourceFolder is where you expect the zip file to be located (in relation to the SQL Server) and DestinationFolder is where you want the unzipped files to go (In relation to the SQL Server).  Double Left Click the Script Task to get the properties editor.  You need to add the FileName, DestinationFolder, and SourceFolder variables to the ReadWriteVariables list collection by clicking the … to the right of the field.  Change the Language to VB and then click Edit Script…
Wow, we have already done a full day’s work.  Now your typing skills will be put to the test.
First things first, we need to add a reference to the Ionic.Zip.Reduced Assembly.  So click Project | Add Reference.  Click the Browse tab and then Browse to where you unpacked the Ionic.Zip Library and select the REDUCED Zip Library (Release).  Now – on to the typing.
We need to add an Import (Or we could use strongly typed namespaces, but in this article we’ll make it easy and follow standard practice).
Add the line:
Imports Ionic.Zip
to the Declarations of the script.
From there, this is the rest of our script:

Public Sub Main()



 



        Dim strSourceFile As String



        Dim strDestinationDirectory As String



 



        'MsgBox("Current File: " & Dts.Variables("Filename").Value.ToString)



 



        strDestinationDirectory = Dts.Variables("DestinationFolder").Value.ToString



        strSourceFile = Dts.Variables("SourceFolder").Value.ToString & "\" & Dts.Variables("Filename").Value.ToString



 



        Dim zip As ZipFile = ZipFile.Read(strSourceFile)



        For Each E As ZipEntry In zip



            E.Extract(strDestinationDirectory, ExtractExistingFileAction.OverwriteSilently)



        Next



 



 



        zip.Dispose()



        Dts.TaskResult = ScriptResults.Success



    End Sub




Congratulations, we have just unzipped ALL Zip files that are located in the SourceFolder.  Ofcourse, with a little manipulation you can make it where it only unzips a single file or certain files (remove the For Each Loop).



The important part of the script is that we dispose of it before we exit the Sub.  If we fail to do this regardless if SQL Job Agent is running the package or not, you will run into sharing violations because it will not have released the file.



From there we want to add another For Each Loop container.  Once again you have to physically choose Foreach File Enumerator in the Collections portion.  Set the directory to the directory we just unzipped in and files of type *.xml.  Again we will click the radio button on Files and Extension.  In the Variable mappings tab, we want to map our User::xmlFilename to Index 0.  This will populate our filename so that we can manipulate the file later.



Inside the Foreach Loop Container add a Script Task.  Set it to VB like before and and create variables for each node that you want to read from the XML file.



These are our Imports:





Imports System



Imports Microsoft.SqlServer.Dts.Runtime



Imports System.Xml



Imports System.Data



Imports System.Data.SqlClient



Imports System.IO




And this is our script (In this example I am reading in checking data and an actual image of a cashed check into the database (the check is of .tif type):





Public Sub Main()



        Dts.Variables("xmlFullPath").Value = Dts.Variables("DestinationFolder").Value.ToString & "\" & Dts.Variables("xmlFilename").Value.ToString



        Dim xmlDoc As New XmlDocument



        xmlDoc.Load(Dts.Variables("xmlFullPath").Value.ToString())



        'Dim xmlReader As XmlTextReader = New XmlTextReader(Dts.Variables("xmlFullPath").Value.ToString())



        Dim n As XmlNode



        For Each n In xmlDoc.SelectNodes("MetaData/Header/Col")



            Dts.Variables(n.Attributes("name").Value.ToString()).Value = n.Attributes("value").Value.ToString()



            'MsgBox(n.Attributes("name").Value & ", " & n.Attributes("value").Value)



            'MsgBox(Dts.Variables(n.Attributes("name").Value.ToString()).Name.ToString() & ", " & Dts.Variables(n.Attributes("name").Value.ToString()).Value.ToString())



        Next



        For Each n In xmlDoc.SelectNodes("MetaData/Detail/Col")



            Dts.Variables(n.Attributes("name").Value.ToString()).Value = n.Attributes("value").Value.ToString()



        Next



        For Each n In xmlDoc.SelectNodes("MetaData/FileName")



            Dts.Variables("ImageFileName").Value = n.Attributes("value").Value.ToString()



        Next



        Dts.Variables("ImageFileFullPath").Value = Dts.Variables("DestinationFolder").Value.ToString() & "\" & Dts.Variables("ImageFileName").Value.ToString()



 



 



        'MsgBox(Dts.Variables("xmlFilename").Value.ToString())



        'MsgBox(Dts.Variables("ImageFileFullPath").Value.ToString())



        Dim imageData As Byte() = getImageData()



        InsertIntoDatabase(imageData)



 



        Dts.TaskResult = ScriptResults.Success



    End Sub



    Public Sub InsertIntoDatabase(ByVal imageData As Byte())



        'Dim cm As ConnectionManager = Dts.Connections("ga0sql01p.tcm_dm")



        Dim conn As SqlConnection = New SqlConnection(Dts.Variables("ConnectionString").Value.ToString())



        'Dim conn As SqlConnection = New SqlConnection(cm.AcquireConnection(Dts.Transaction).ToString())



        Dim cmd As New SqlCommand("dbo.usp_paperhost_data_insert", conn)



        cmd.CommandType = CommandType.StoredProcedure



 



        cmd.Parameters.Add(New SqlParameter("@FileObject_ID", Dts.Variables("FileObject_Id").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@LockBoxNumber", Dts.Variables("LockBoxNumber").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@DepositDateTime", Dts.Variables("DepositDateTime").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@BatchNumber", Dts.Variables("BatchNumber").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@TransactionID", Dts.Variables("TransactionID").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@CheckSequence", Dts.Variables("CheckSequence").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@CheckAmount", Dts.Variables("CheckAmount").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@RT", Dts.Variables("RT").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@AccountNumber", Dts.Variables("AccountNumber").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@SerialNumber", Dts.Variables("SerialNumber").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@BatchAmount", Dts.Variables("BatchAmount").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@CheckCount", Dts.Variables("CheckCount").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@CheckPostmarkDate", Dts.Variables("CheckPostmarkDate").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@ReaderDocumentSource", Dts.Variables("ReaderDocumentSource").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@GlobalBatchID", Dts.Variables("GlobalBatchID").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@RemitterName", Dts.Variables("RemitterName").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@RemitterID", Dts.Variables("RemitterID").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@IndexID", Dts.Variables("Indexid").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@Amount", Dts.Variables("Amount").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@Serial", Dts.Variables("Serial").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@Remitter", Dts.Variables("Remitter").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@Filename", Dts.Variables("ImageFileName").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@InvoiceNumber", Dts.Variables("InvoiceNumber").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@PHTransferred", Dts.Variables("PHTransferred").Value.ToString()))



        cmd.Parameters.Add(New SqlParameter("@Account", Dts.Variables("Account").Value.ToString()))



        cmd.Parameters.AddWithValue("@Image", SqlDbType.Binary).Value = imageData



        conn.Open()



        cmd.ExecuteNonQuery()



        conn.Close()



    End Sub



    Public Function getImageData() As Byte()



        Dim st As New FileStream(Dts.Variables("ImageFileFullPath").Value.ToString(), FileMode.Open)



        Dim buffer As Byte() = New Byte(st.Length - 1) {}



        st.Read(buffer, 0, DirectCast(st.Length, Long))



        st.Close()



        Return buffer



    End Function




In Main() we are reading the xml document by node branch.  So in this case, the actual xml document has a standard like so:





<?xml version="1.0" encoding="UTF-8">



    <MetaData>



        <Header>        



            <Col name="FileObject_ID" value="1" />



            <Col name="CheckSequence" value="1" />



            <Col name="DepositDateTime" value="11/04/2011 00:00:00" />



            ...



        </Header>



        <Detail>



            <Col name="Amount" value="1109.1100" />



            <Col name="CurrencyType" value="USD" />



            ...



        </Detail>



        <FileName value="checkNumber_1_1_image.tif" />



    </MetaData>


















So what it does is read the Col name and finds the variable for it with the exact same name and then populates it with the value column.



From there we read in our image that is defined in the FileName field into a ByteStream and finally pass it to the InsertIntoDatabase Sub. 
It’s important to note that you must have the User defined variables defined at the ForEach Loop Container scope.  Then in the Script Task have them set as ReadWriteVariables on the Script Tab.  Otherwise you will get collection errors.


Now that our data is housed in the database, we will need to do some housekeeping type chores.



Drag a System File Task inside the same ForEach Loop Container and set it to CopyFile in its operation parameter.  Set the IsDestinationPathVariable/IsSourcePathVariable to true and populate the SourceVariable/DestinationVariable to the full path of the file.



Drag another System File Task OUTSIDE of the ForEachLoop Container and set it to delete the directory contents.  Set the variable to the source directory and…….. We’re done.



 



So here we have unzipped a file, read the xml and image file into a sql database, and finally cleaned out the directories after we were done.  Quite a bit of work.



Happy .Netting… Saleh

1 comment:

  1. I needed to post you this little bit of observation to finally thank you so much as before regarding the lovely principles you have contributed above. This has been really incredibly open-handed with people like you to supply extensively just what most people could have supplied for an ebook in making some cash for themselves, notably considering that you might have done it if you desired. The smart ideas in addition worked like a fantastic way to know that other people online have the same keenness just as my very own to grasp somewhat more on the subject of this issue. I am sure there are many more pleasurable opportunities up front for those who view your blog.
    Avid Disc Brake Bleed Kit

    ReplyDelete