SQL Server Integration Services can read XML files, that’s known by every BI developer.
his first example is a really simple XML file containing a list of colors with their corresponding RGB code.
Let’s import this into a database. Open up the BIDS, create an SSIS project and throw a Data Flow Task into the package and open it up.
The component that we’re now most interested in is the XML Source, one of the components in the Data Flow Sources category in the Toolbox.
Add one of those to your Data Flow and double-click it to open up the XML Source Editor.
The Data Access Mode should be set to XML file location, which is the default setting. The other options are XML file from variable – useful if you’ve got the file path and name of the XML file in a variable
As XML Location, select the .xml file. Our XML sample does not have an inline schema, so we can’t use that checkbox. And we can’t click the OK button either, it’s grayed out. The source component really expects a description of the XML structure before the editor can be closed.
The bottom of the screen even shows a warning with the following message:
XML Schema (XSD) is not specified. Select an existing XSD or click Generate XSD to create an XSD from the XML file.
So, what are you waiting for, Click the Generate XSD button to let the XML Source Editor generate the XSD schema for us. Real easy, right?
Remember where you save the file, and when it’s generated, select the .xsd file in the XSD location textbox. As you can see, the OK button will become available. But don’t click it just yet.
Here’s what the XML Source Editor now looks like:
Let’s now move on to the second page of the XML Source Editor, called Columns. When you open it, you’ll receive the following popup with a couple of warnings:
The editor is letting us know that the columns that are being generated do not have a maximum length specified. So it’s setting them to Unicode (DT_WSTR) with a length of 255. Click the OK button to get rid of that message and to be able to see the generated columns.
As you can see, our only attribute – RGB, is nicely put in a column with the same name. The value of each <color> node however is not put in a column called Color. By default, this value is put into a column called “text”. Which is a weird name for a column in an SSIS data flow if you ask me. The good thing is that you can just rename it by changing the Output Column value.
Let’s test this out. My favorite way is to add a Multicast component to the Data Flow, then add a Data Viewer on the connector (right-click the green arrow, select Data Viewers, click Add > OK > OK). Now execute the package to get this result:
Mission accomplished, we’ve retrieved data from a very basic XML file!