|   Register   |  
Search  

The Guru's Guide to SQL Server Architecture and Internals

Last Updated 2/3/2009 3:43:01 PM


Abstract
This chapter from "The Guru's Guide to SQL Server Architecture and Internals" gives you an architectural and a practical-use overview of XML for SQL Server (SQLXML). You'll find out how the SQLXML technologies are designed and how they fit together, and you'll learn about practical applications such as using OPENXML, accessing SQL Server over HTTP, and using URL querles.


'

The key to everything is happiness. Do what you can to be happy in this world. Life is short'too short to do otherwise. The deferred gratification you mention so often is more deferred than gratifying.
— H. W. Kenton

NOTE: This chapter assumes that you're running, at a minimum, SQL Server 2000 with SQLXML 3.0. The SQLXML Web releases have changed and enhanced SQL Server's XML functionality significantly. For the sake of staying current with the technology, I'm covering the latest version of SQLXML rather than the version that shipped with the original release of SQL Server 2000.

This chapter updates the coverage of SQLXML in my last book, The Guru's Guide to SQL Server Stored Procedures, XML, and HTML. That book was written before Web Release 1 (the update to SQL Server 2000's original SQLXML functionality) had shipped. As of this writing, SQLXML 3.0 (which would be the equivalent of Web Release 3 had Microsoft not changed the naming scheme) has shipped, and Yukon, the next version of SQL Server, is about to go into beta test.

This chapter will also get more into how the SQLXML technologies are designed and how they fit together from an architectural standpoint. As with the rest of the book, my intent here is to get beyond the "how to" and into the "why" behind how SQL Server's technologies work.

I must confess that I was conflicted when I sat down to write this chapter. I wrestled with whether to update the SQLXML coverage in my last book, which was more focused on the practical application of SQLXML but which I felt really needed updating, or to write something completely new on just the architectural aspects of SQLXML, with little or no discussion of how to apply them in practice. Ultimately, I decided to do both things. In keeping with the chief purpose of this book, I decided to cover the architectural aspects of SQLXML, and, in order to stay up with the current state of SQL Server's XML family of technologies, I decided to update the coverage of SQLXML in my last book from the standpoint of practical use. So, this chapter updates what I had to say previously about SQLXML and also delves into the SQLXML architecture in ways I've not done before.


OVERVIEW

With the popularity and ubiquity of XML, it's no surprise that SQL Server has extensive support for working with it. Like most modern DBMSs, SQL Server regularly needs to work with and store data that may have originated in XML. Without this built-in support, getting XML to and from SQL Server would require the application developer to translate XML data before sending it to SQL Server and again after receiving it back. Obviously, this could quickly become very tedious given the pervasiveness of the language. SQL Server is an XML-enabled DBMS. This means that it can read and write XML data. It can return data from databases in XML format, and it can read and update data stored in XML documents. As Table 18.1 illustrates, out of the box, SQL Server's XML features can be broken down into eight general categories.

We'll explore each of these in this chapter and discuss how they work and how they interoperate.


MSXML

SQL Server uses Microsoft's XML parser, MSXML, to load XML data, so we'll begin our discussion there. There are two basic ways to parse XML data using MSXML: using the Document Object Model (DOM) or using the Simple API for XML (SAX). Both DOM and SAX are W3C standards. The DOM method involves parsing the XML document and loading it into a tree structure in memory. The entire document is materialized and stored in memory when processed this way. An XML document parsed via DOM is known as a DOM document (or just "DOM" for short). XML parsers provide a variety of ways to manipulate DOM documents. Listing 18.1 shows a short Visual Basic app that demonstrates parsing an XML document via DOM and querying it for a particular node set. (You can find the source code to this app in the CH18\msxmltest subfolder on the CD accompanying this book.)

Listing 18.1
Private Sub Command1_Click()
 
  Dim bstrDoc As String

  bstrDoc = "<Songs> " & _
                "<Song>One More Day</Song>" & _
                "<Song>Hard Habit to Break</Song>" & _
                "<Song>Forever</Song>" & _
                "<Song>Boys of Summer</Song>" & _
                "<Song>Cherish</Song>" & _
                "<Song>Dance</Song>" & _
                "<Song>I Will Always Love You</Song>" & _
                "</Songs>"
				   
  Dim xmlDoc As New DOMDocument30
  
  If Len(Text1.Text) = 0 Then
    Text1.Text = bstrDoc
  End If

  If Not xmlDoc.loadXML(Text1.Text) Then
    MsgBox "Error loading document"
  Else
    Dim oNodes As IXMLDOMNodeList
    Dim oNode As IXMLDOMNode
	
    If Len(Text2.Text) = 0 Then
      Text2.Text = "//Song"
    End If
    Set oNodes = xmlDoc.selectNodes(Text2.Text)
	
    For Each oNode In oNodes
      If Not (oNode Is Nothing) Then
        sName = oNode.nodeName
        sData = oNode.xml
        MsgBox "Node <" + sName + ">:" _
            + vbNewLine + vbTab + sData + vbNewLine
      End If
    Next
	
    Set xmlDoc = Nothing
  End If
End Sub

We begin by instantiating a DOMDocument object, then call its loadXML method to parse the XML document and load it into the DOM tree. We call its selectNodes method to query it via XPath. The selectNodes method returns a node list object, which we then iterate through using For Each. In this case, we display each node name followed by its contents via VB's Msg- Box function. We're able to access and manipulate the document as though it were an object because that's exactly what it is—parsing an XML document via DOM turns the document into a memory object that you can then work with just as you would any other object.

SAX, by contrast, is an event-driven API. You process an XML document via SAX by configuring your application to respond to SAX events. As the SAX processor reads through an XML document, it raises events each time it encounters something the calling application should know about, such as an element starting or ending, an attribute starting or ending, and so on. It passes the relevant data about the event to the application's handler for the event. The application can then decide what to do in response—it could store the event data in some type of tree structure, as is the case with DOM processing; it could ignore the event; it could search the event data for something in particular; or it could take some other action. Once the event is handled, the SAX processor continues reading the document. At no point does it persist the document in memory as DOM does. It's really just a parsing mechanism to which an application can attach its own functionality. In fact, SAX is the underlying parsing mechanism for MSXML's DOM processor. Microsoft's DOM implementation sets up SAX event handlers that simply store the data handed to them by the SAX engine in a DOM tree.

As you've probably surmised by now, SAX consumes far less memory than DOM does. That said, it's also much more trouble to set up and use. By persisting documents in memory, the DOM API makes working with XML documents as easy as working with any other kind of object.

SQL Server uses MSXML and the DOM to process documents you load via sp_xml_preparedocument. It restricts the virtual memory MSXML can use for DOM processing to one-eighth of the physical memory on the machine or 500MB, whichever is less. In actual practice, it's highly unlikely that MSXML would be able to access 500MB of virtual memory, even on a machine with 4GB of physical memory. The reason for this is that, by default, SQL Server reserves most of the user mode address space for use by its buffer pool. You'll recall that we talked about the MemToLeave space in Chapter 11 and noted that the non'thread stack portion defaults to 256MB on SQL Server 2000. This means that, by default, MSXML won't be able to use more than 256MB of memory'and probably considerably less given that other things are also allocated from this region'regardless of the amount of physical memory on the machine.

The reason MSXML is limited to no more than 500MB of virtual memory use regardless of the amount of memory on the machine is that SQL Server calls the GlobalMemoryStatus Win32 API function to determine the amount of available physical memory. GlobalMemoryStatus populates a MEMORYSTATUS structure with information about the status of memory use on the machine. On machines with more than 4GB of physical memory, GlobalMemoryStatus can return incorrect information, so Windows returns a -1 to indicate an overflow. The Win32 API function GlobalMemoryStatusEx exists to address this shortcoming, but SQLXML does not call it. You can see this for yourself by working through the following exercise.

Exercise 18.1: Determining How MSXML Computes Its Memory Ceiling
  1. Restart your SQL Server, preferably from a console since we will be attaching to it with WinDbg. This should be a test or development system, and, ideally, you should be its only user.
  2. Start Query Analyzer and connect to your SQL Server.
  3. Attach to SQL Server using WinDbg. (Press F6 and select sqlservr.exe from the list of running tasks; if you have multiple instances, be sure to select the right one.)
  4. At the WinDbg command prompt, add the following breakpoint:
    bp kernel32!GlobalMemoryStatus
  5. Once the breakpoint is added, type g and hit Enter to allow SQL Server to run.
  6. Next, return to Query Analyzer and run the following query:
    declare @doc varchar(8000)
    set @doc='
    <Songs>
      <Song name="She''s Like the Wind" artist="Patrick Swayze"/>
      <Song name="Hard to Say I''m Sorry" artist="Chicago"/>
      <Song name="She Loves Me" artist="Chicago"/>
      <Song name="I Can''t Make You Love Me" artist="Bonnie Raitt"/>
      <Song name="Heart of the Matter" artist="Don Henley"/>
      <Song name="Almost Like a Song" artist="Ronnie Milsap"/>
      <Song name="I''ll Be Over You" artist="Toto"/>
    </Songs>
    '
    declare @hDoc int
    exec sp_xml_preparedocument @hDoc OUT, @doc
  7. The first time you parse an XML document using sp_xml_preparedocument, SQLXML calls GlobalMemoryStatus to retrieve the amount of physical memory in the machine, then calls an undocumented function exported by MSXML to restrict the amount of virtual memory it may allocate. (I had you restart your server so that we'd be sure to go down this code path.) This undocumented MSXML function is exported by ordinal rather than by name from the MSXMLn.DLL and was added to MSXML expressly for use by SQL Server.
  8. At this point, Query Analyzer should appear to be hung because your breakpoint has been hit in WinDbg and SQL Server has been stopped. Switch back to WinDbg and type kv at the command prompt to dump the call stack of the current thread. Your stack should look something like this (I've omitted everything but the function names):
    KERNEL32!GlobalMemoryStatus (FPO: [Non-Fpo])
    sqlservr!CXMLLoadLibrary::DoLoad+0x1b5
    sqlservr!CXMLDocsList::Load+0x58
    sqlservr!CXMLDocsList::LoadXMLDocument+0x1b
    sqlservr!SpXmlPrepareDocument+0x423
    sqlservr!CSpecProc::ExecuteSpecial+0x334
    sqlservr!CXProc::Execute+0xa3
    sqlservr!CSQLSource::Execute+0x3c0
    sqlservr!CStmtExec::XretLocalExec+0x14d
    sqlservr!CStmtExec::XretExecute+0x31a
    sqlservr!CMsqlExecContext::ExecuteStmts+0x3b9
    sqlservr!CMsqlExecContext::Execute+0x1b6
    sqlservr!CSQLSource::Execute+0x357
    sqlservr!language_exec+0x3e1
  9. You'll recall from Chapter 3 that we discovered that the entry point for T-SQL batch execution within SQL Server is language_exec. You can see the call to language_exec at the bottom of this stack—this was called when you submitted the T-SQL batch to the server to run. Working upward from the bottom, we can see the call to SpXmlPrepareDocument, the internal "spec proc" (an extended procedure implemented internally by the server rather than in an external DLL) responsible for implementing the sp_xml_preparedocument xproc. We can see from there that SpXmlPrepareDocument calls LoadXMLDocument, LoadXMLDocument calls a method named Load, Load calls a method named DoLoad, and DoLoad calls GlobalMemoryStatus. So, that's how we know how MSXML computes the amount of physical memory in the machine, and, knowing the limitations of this function, that's how we know the maximum amount of virtual memory MSXML can use.
  10. Type q and hit Enter to quit WinDbg. You will have to restart your SQL Server.

FOR XML

Despite MSXML's power and ease of use, SQL Server doesn't leverage MSXML in all of its XML features. It doesn't use it to implement serverside FOR XML queries, for example, even though it's trivial to construct a DOM document programmatically and return it as text. MSXML has facilities that make this quite easy. For example, Listing 18.2 presents a Visual Basic app that executes a query via ADO and constructs a DOM document on-the-fly based on the results it returns.

Listing 18.2
Private Sub Command1_Click()

  Dim xmlDoc As New DOMDocument30
  Dim oRootNode As IXMLDOMNode

  Set oRootNode = xmlDoc.createElement("Root")

  Set xmlDoc.documentElement = oRootNode

  Dim oAttr As IXMLDOMAttribute
  Dim oNode As IXMLDOMNode

  Dim oConn As New ADODB.Connection
  Dim oComm As New ADODB.Command
  Dim oRs As New ADODB.Recordset

  oConn.Open (Text3.Text)
  oComm.ActiveConnection = oConn

  oComm.CommandText = Text1.Text
  Set oRs = oComm.Execute

  Dim oField As ADODB.Field

  While Not oRs.EOF
    Set oNode = xmlDoc.createElement("Row")
    For Each oField In oRs.Fields
      Set oAttr = xmlDoc.createAttribute(oField.Name)
      oAttr.Value = oField.Value
      oNode.Attributes.setNamedItem oAttr
    Next
    oRootNode.appendChild oNode
    oRs.MoveNext
  Wend
  
  oConn.Close
  
  Text2.Text = xmlDoc.xml
  
  Set xmlDoc = Nothing
  Set oRs = Nothing
  Set oComm = Nothing
  Set oConn = Nothing
End Sub

As you can see, translating a result set to XML doesn't require much code. The ADO Recordset object even supports being streamed directly to an XML document (via its Save method), so if you don't need complete control over the conversion process, you might be able to get away with even less code than in my example.

As I've said, SQL Server doesn't use MSXML or build a DOM document in order to return a result set as XML. Why is that? And how do we know that it doesn't use MSXML to process server-side FOR XML queries? I'll answer both questions in just a moment.

The answer to the first question should be pretty obvious. Building a DOM from a result set before returning it as text would require SQL Server to persist the entire result set in memory. Given that the memory footprint of the DOM version of an XML document is roughly three to five times as large as the document itself, this doesn't paint a pretty resource usage picture. If they had to first be persisted entirely in memory before being returned to the client, even moderately large FOR XML result sets could use huge amounts of virtual memory (or run into the MSXML memory ceiling and therefore be too large to generate).

To answer the second question, let's again have a look at SQL Server under a debugger.

Exercise 18.2: Determining Whether Server-Side FOR XML Uses MSXML
  1. Restart your SQL Server, preferably from a console since we will be attaching to it with WinDbg. This should be a test or development system, and, ideally, you should be its only user.
  2. Start Query Analyzer and connect to your SQL Server.
  3. Attach to SQL Server using WinDbg. (Press F6 and select sqlservr.exe from the list of running tasks; if you have multiple instances, be sure to select the right one.) Once the WinDbg command prompt appears, type g and press Enter so that SQL Server can continue to run.
  4. Back in Query Analyzer, run a FOR XML query of some type:
    SELECT * FROM (
    SELECT 'Summer Dream' as Song
    UNION
    SELECT 'Summer Snow'
    UNION
    SELECT 'Crazy For You'
    ) s FOR XML AUTO
    This query unions some SELECT statements together, then queries the union as a derived table using a FOR XML clause.
  5. After you run the query, switch back to WinDbg. You will likely see some ModLoad messages in the WinDbg command window. WinDbg displays a ModLoad message whenever a module is loaded into the process being debugged. If MSXMLn.DLL were being used to service your FOR XML query, you'd see a ModLoad message for it. As you've noticed, there isn't one. MSXML isn't used to service FOR XML queries.
  6. If you've done much debugging, you may be speculating that perhaps the MSXML DLL is already loaded; hence, we wouldn't see a ModLoad message for it when we ran our FOR XML query. That's easy enough to check. Hit Ctrl+Break in the debugger, then type lm in the command window and hit Enter. The lm command lists the modules currently loaded into the process space. Do you see MSXMLn.DLL in the list? Unless you've been interacting with SQL Server's other XML features since you recycled your server, it should not be there. Type g in the command window and press Enter so that SQL Server can continue to run.
  7. As a final test, let's force MSXMLn.DLL to load by parsing an XML document. Reload the query from Exercise 18.1 above in Query Analyzer and run it. You should see a ModLoad message for MSXML's DLL in the WinDbg command window.
  8. Hit Ctrl+Break again to stop WinDbg, then type q and hit Enter to stop debugging. You will need to restart your SQL Server.
So, based on all this, we can conclude that SQL Server generates its own XML when it processes a server-side FOR XML query. There is no memory-efficient mechanism in MSXML to assist with this, so it is not used.


USING FOR XML

As you saw in Exercise 18.2, you can append FOR XML AUTO to the end of a SELECT statement in order to cause the result to be returned as an XML document fragment. Transact-SQL's FOR XML syntax is much richer than this, though'it supports several options that extend its usefulness in numerous ways. In this section, we'll discuss a few of these and work through examples that illustrate them.

SELECT'FOR XML (Server-Side)


As I'm sure you've already surmised, you can retrieve XML data from SQL Server by using the FOR XML option of the SELECT command. FOR XML causes SELECT to return query results as an XML stream rather than a traditional rowset. On the server-side, this stream can have one of three formats: RAW, AUTO, or EXPLICIT. The basic FOR XML syntax looks like this:
SELECT column list
FROM table list
WHERE filter criteria
FOR XML RAW | AUTO | EXPLICIT [, XMLDATA] [, ELEMENTS]
    [, BINARY BASE64]
RAW returns column values as attributes and wraps each row in a generic row element. AUTO returns column values as attributes and wraps each row in an element named after the table from which it came.1 EXPLICIT lets you completely control the format of the XML returned by a query.

XMLDATA causes an XML-Data schema to be returned for the document being retrieved. ELEMENTS causes the columns in XML AUTO data to be returned as elements rather than attributes. BINARY BASE64 specifies that binary data is to be returned using BASE64 encoding.

I'll discuss these options in more detail in just a moment. Also note that there are client-side specific options available with FOR XML queries that aren't available in server-side queries. We'll talk about those in just a moment, too.

RAW Mode


RAW mode is the simplest of the three basic FOR XML modes. It performs a very basic translation of the result set into XML. Listing 18.3 shows an example.

Listing 18.3

SELECT CustomerId, CompanyName
FROM Customers FOR XML RAW
(Results abridged)
XML_F52E2B61-18A1-11d1-B105-00805F49916B

------------------------------------------------------------------
<row CustomerId="ALFKI" CompanyName="Alfreds Futterkiste"/><row Cu
CompanyName="Ana Trujillo Emparedados y helados"/><row CustomerId=
CompanyName="Antonio Moreno Taquer'a"/><row CustomerId="AROUT" Com
Horn"/><row CustomerId="BERGS" CompanyName="Berglunds snabbk'p"/><
CustomerId="BLAUS" CompanyName="Blauer See Delikatessen"/><row Cus
CompanyName="Blondesddsl p_re et fils"/><row CustomerId="WELLI"
CompanyName="Wellington Importadora"/><row CustomerId="WHITC" Comp
Clover Markets"/><row CustomerId="WILMK" CompanyName="Wilman Kala"
CustomerId="WOLZA"
CompanyName="Wolski Zajazd"/>

Each column becomes an attribute in the result set, and each row becomes an element with the generic name of row.

As I've mentioned before, the XML that's returned by FOR XML is not well formed because it lacks a root element. It's technically an XML fragment and must include a root element in order to be usable by an XML parser. From the client side, you can set an ADO Command object's xml root property in order to automatically generate a root node when you execute a FOR XML query.

AUTO Mode


FOR XML AUTO gives you more control than RAW mode over the XML fragment that's produced. To begin with, each row in the result set is named after the table, view, or table-valued UDF that produced it. For example, Listing 18.4 shows a basic FOR XML AUTO query.

Listing 18.4

SELECT CustomerId, CompanyName
FROM Customers FOR XML AUTO
(Results abridged)
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers CustomerId="ALFKI" CompanyName="Alfreds Futterkiste"/><
CustomerId="ANATR" CompanyName="Ana Trujillo Emparedados y helados
CustomerId="ANTON" CompanyName="Antonio Moreno Taquer'a"/><Custome
Henderson_book.fm Page 682 Thursday, September 25, 2003 5:23 AM
Using FOR XML 683
CustomerId="AROUT" CompanyName="Around the Horn"/><Customers Custo
CompanyName="Vins et alcools Chevalier"/><Customers CustomerId="WA
CompanyName="Wartian Herkku"/><Customers CustomerId="WELLI" Compan
Importadora"/><Customers CustomerId="WHITC" CompanyName="White Clo
Markets"/><Customers CustomerId="WILMK" CompanyName="Wilman Kala"/
CustomerId="WOLZA"
CompanyName="Wolski Zajazd"/>

Notice that each row is named after the table from whence it came: Customers. For results with more than one row, this amounts to having more than one top-level (root) element in the fragment, which isn't allowed in XML.

One big difference between AUTO and RAW mode is the way in which joins are handled. In RAW mode, a simple one-to-one translation occurs between columns in the result set and attributes in the XML fragment. Each row becomes an element in the fragment named row. These elements are technically empty themselves'they contain no values or subelements, only attributes. Think of attributes as specifying characteristics of an element, while data and subelements compose its contents. In AUTO mode, each row is named after the source from which it came, and the rows from joined tables are nested within one another. Listing 18.5 presents an example.

Listing 18.5

SELECT Customers.CustomerID, CompanyName, OrderId
FROM Customers JOIN Orders
ON (Customers.CustomerId=Orders.CustomerId)
FOR XML AUTO
(Results abridged and formatted)
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers CustomerID="ALFKI" CompanyName="Alfreds Futterkiste">
  <Orders OrderId="10643"/><Orders OrderId="10692"/>
  <Orders OrderId="10702"/><Orders OrderId="10835"/>
  <Orders OrderId="10952"/><Orders OrderId="11011"/>
</Customers>
<Customers CustomerID="ANATR" CompanyName="Ana Trujillo Emparedado
  <Orders OrderId="10308"/><Orders OrderId="10625"/>
  <Orders OrderId="10759"/><Orders OrderId="10926"/></Customers>
<Customers CustomerID="FRANR" CompanyName="France restauration">
  <Orders OrderId="10671"/><Orders OrderId="10860"/>
  <Orders OrderId="10971"/>
</Customers>

I've formatted the XML fragment to make it easier to read'if you run the query yourself from Query Analyzer, you'll see an unformatted stream of XML text.

Note the way in which the Orders for each customer are contained within each Customer element. As I said, AUTO mode nests the rows returned by joins. Note my use of the full table name in the join criterion. Why didn't I use a table alias? Because AUTO mode uses the table aliases you specify to name the elements it returns. If you use shortened monikers for a table, its elements will have that name in the resulting XML fragment. While useful in traditional Transact-SQL, this makes the fragment difficult to read if the alias isn't sufficiently descriptive.

ELEMENTS Option


The ELEMENTS option of the FOR XML AUTO clause causes AUTO mode to return nested elements instead of attributes. Depending on your business needs, element-centric mapping may be preferable to the default attribute-centric mapping. Listing 18.6 gives an example of a FOR XML query that returns elements instead of attributes.

Listing 18.6

SELECT CustomerID, CompanyName
FROM Customers
FOR XML AUTO, ELEMENTS
(Results abridged and formatted)
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers>
  <CustomerID>ALFKI</CustomerID>
  <CompanyName>Alfreds Futterkiste</CompanyName>
</Customers>
<Customers>
  <CustomerID>ANATR</CustomerID>
  <CompanyName>Ana Trujillo Emparedados y helados</CompanyName>
</Customers>
<Customers>
  <CustomerID>ANTON</CustomerID>
  <CompanyName>Antonio Moreno Taquer'a</CompanyName>
</Customers>
<Customers>
  <CustomerID>AROUT</CustomerID>
  <CompanyName>Around the Horn</CompanyName>
</Customers>
<Customers>
  <CustomerID>WILMK</CustomerID>
  <CompanyName>Wilman Kala</CompanyName>
</Customers>
<Customers>
  <CustomerID>WOLZA</CustomerID>
  <CompanyName>Wolski Zajazd</CompanyName>
</Customers>

Notice that the ELEMENTS option has caused what were being returned as attributes of the Customers element to instead be returned as subelements. Each attribute is now a pair of element tags that enclose the value from a column in the table.

NOTE: Currently, AUTO mode does not support GROUP BY or aggregate functions. The heuristics it uses to determine element names are incompatible with these constructs, so you cannot use them in AUTO mode queries. Additionally, FOR XML itself is incompatible with COMPUTE, so you can't use it in FOR XML queries of any kind.


EXPLICIT Mode


If you need more control over the XML than FOR XML produces, EXPLICIT mode is more flexible (and therefore more complicated to use) than either RAW mode or AUTO mode. EXPLICIT mode queries define XML documents in terms of a "universal table"—a mechanism for returning a result set from SQL Server that describes what you want the document to look like, rather than composing the document itself. A universal table is just a SQL Server result set with special column headings that tell the server how to produce an XML document from your data. Think of it as a set-oriented method of making an API call and passing parameters to it. You use the facilities available in Transact-SQL to make the call and pass it parameters.

A universal table consists of one column for each table column that you want to return in the XML fragment, plus two additional columns: Tag and Parent. Tag is a positive integer that uniquely identifies each tag that is to be returned by the document; Parent establishes parent-child relationships between tags.

The other columns in a universal table'the ones that correspond to the data you want to include in the XML fragment'have special names that actually consist of multiple segments delimited by exclamation points (!). These special column names pass muster with SQL Server's parser and provide specific instructions regarding the XML fragment to produce. They have the following format:
Element!Tag!Attribute!Directive
We'll see some examples of these shortly.

The first thing you need to do to build an EXPLICIT mode query is to determine the layout of the XML document you want to end up with. Once you know this, you can work backward from there to build a universal table that will produce the desired format. For example, let's say we want a simple customer list based on the Northwind Customers table that returns the customer ID as an attribute and the company name as an element. The XML fragment we're after might look like this:
<Customers CustomerId="ALFKI">Alfreds Futterkiste</Customers>
Listing 18.7 shows a Transact-SQL query that returns a universal table that specifies this layout.

Listing 18.7
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1]
FROM Customers
(Results abridged)
T
Tag     Parent  Customers!1!CustomerId Customers!1
------ -------- ---------------------- ---------------------------
1       NULL    ALFKI                  Alfreds Futterkiste
1       NULL    ANATR                  Ana Trujillo Emparedados y
1       NULL    ANTON                  Antonio Moreno Taquería

The first two columns are the extra columns I mentioned earlier. Tag specifies an identifier for the tag we want to produce. Since we want to produce only one element per row, we hard-code this to 1. The same is true of Parent—there's only one element and a top-level element doesn't have a parent, so we return NULL for Parent in every row.

Since we want to return the customer ID as an attribute, we specify an attribute name in the heading of column 3 (bolded). And since we want to return CompanyName as an element rather than an attribute, we omit the attribute name in column 4.

By itself, this table accomplishes nothing. We have to add FOR XML EXPLICIT to the end of it in order for the odd column names to have any special meaning. Add FOR XML EXPLICIT to the query and run it from Query Analyzer. Listing 18.8 shows what you should see.

Listing 18.8

SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1]
FROM Customers
FOR XML EXPLICIT
(Results abridged and formatted)
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers CustomerId="ALFKI">Alfreds Futterkiste</Customers>
<Customers CustomerId="ANATR">Ana Trujillo Emparedados y helados
  </Customers>
<Customers CustomerId="WHITC">White Clover Markets</Customers>
<Customers CustomerId="WILMK">Wilman Kala</Customers>
<Customers CustomerId="WOLZA">Wolski Zajazd</Customers>

As you can see, each CustomerId value is returned as an attribute, and each CompanyName is returned as the element data for the Customers element, just as we specified.

Directives
The fourth part of the multivalued column headings supported by EXPLICIT mode queries is the directive segment. You use it to further control how data is represented in the resulting XML fragment. As Table 18.2 illustrates, the directive segment supports eight values.

Of these, element is the most frequently used. It causes data to be rendered as a subelement rather than an attribute. For example, let's say that, in addition to CustomerId and CompanyName, we wanted to return ContactName in our XML fragment and we wanted it to be a subelement rather than an attribute. Listing 18.9 shows how the query would look.

Listing 18.9

SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
ContactName AS [Customers!1!ContactName!element]
FROM Customers
FOR XML EXPLICIT
(Results abridged and formatted)
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers CustomerId="ALFKI">Alfreds Futterkiste
  <ContactName>Maria Anders</ContactName>
</Customers>
<Customers CustomerId="ANATR">Ana Trujillo Emparedados y
  <ContactName>Ana Trujillo</ContactName>
</Customers>
<Customers CustomerId="ANTON">Antonio Moreno Taquer'a
  <ContactName>Antonio Moreno</ContactName>
</Customers>
<Customers CustomerId="AROUT">Around the Horn
  <ContactName>Thomas Hardy</ContactName>
</Customers>
<Customers CustomerId="BERGS">Berglunds snabbk'p
  <ContactName>Christina Berglund</ContactName>
</Customers>
<Customers CustomerId="WILMK">Wilman Kala
  <ContactName>Matti Karttunen</ContactName>
</Customers>
<Customers CustomerId="WOLZA">Wolski Zajazd
  <ContactName>Zbyszek Piestrzeniewicz</ContactName>
</Customers>

As you can see, ContactName is nested within each Customers element as a subelement. The elements directive encodes the data it returns. We can retrieve the same data by using the xml directive without encoding, as shown in Listing 18.10.

Listing 18.10

SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
Henderson_book.fm Page 689 Thursday, September 25, 2003 5:23 AM
690 Chapter 18 SQLXML
ContactName AS [Customers!1!ContactName!xml]
FROM Customers
FOR XML EXPLICIT

The xml directive (bolded) causes the column to be returned without encoding any special characters it contains.

Establishing Data Relationships
Thus far, we've been listing the data from a single table, so our EXPLICT queries haven't been terribly complex. That would still be true even if we queried multiple tables as long as we didn't mind repeating the data from each table in each top-level element in the XML fragment. Just as the column values from joined tables are often repeated in the result sets of Transact- SQL queries, we could create an XML fragment that contained data from multiple tables repeated in each element. However, that wouldn't be the most efficient way to represent the data in XML. Remember: XML supports hierarchical relationships between elements. You can establish these hierarchies by using EXPLICIT mode queries and T-SQL UNIONs. Listing 18.11 provides an example.

Listing 18.11

SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
NULL AS [Orders!2!OrderId],
NULL AS [Orders!2!OrderDate!element]
FROM Customers
UNION
SELECT 2 AS Tag,
1 AS Parent,
CustomerId,
NULL,
OrderId,
OrderDate
FROM Orders
ORDER BY [Customers!1!CustomerId], [Orders!2!OrderDate!element]
FOR XML EXPLICIT

This query does several interesting things. First, it links the Customers and Orders tables using the CustomerId column they share. Notice the third column in each SELECT statement'it returns the CustomerId column from each table. The Tag and Parent columns establish the details of the relationship between the two tables. The Tag and Parent values in the second query link it to the first. They establish that Order records are children of Customer records. Lastly, note the ORDER BY clause. It arranges the elements in the table in a sensible fashion'first by CustomerId and second by the OrderDate of each Order. Listing 18.12 shows the result set.

Listing 18.12

(Results abridged and formatted)

XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers CustomerId="ALFKI">Alfreds Futterkiste
  <Orders OrderId="10643">
    <OrderDate>1997-08-25T00:00:00</OrderDate>
  </Orders>
  <Orders OrderId="10692">
    <OrderDate>1997-10-03T00:00:00</OrderDate>
  </Orders>
  <Orders OrderId="10702">
    <OrderDate>1997-10-13T00:00:00</OrderDate>
  </Orders>
  <Orders OrderId="10835">
    <OrderDate>1998-01-15T00:00:00</OrderDate>
  </Orders>
  <Orders OrderId="10952">
    <OrderDate>1998-03-16T00:00:00</OrderDate>
  </Orders>
  <Orders OrderId="11011">
    <OrderDate>1998-04-09T00:00:00</OrderDate>
  </Orders>
</Customers>
<Customers CustomerId="ANATR">Ana Trujillo Emparedados y helados
  <Orders OrderId="10308">
    <OrderDate>1996-09-18T00:00:00</OrderDate>
  </Orders>
  <Orders OrderId="10625">
  <OrderDate>1997-08-08T00:00:00</OrderDate>
  </Orders>
<Orders OrderId="10759">
    <OrderDate>1997-11-28T00:00:00</OrderDate>
  </Orders>
  <Orders OrderId="10926">
    <OrderDate>1998-03-04T00:00:00</OrderDate>
  </Orders>
</Customers>

As you can see, each customer's orders are nested within its element.

The hide Directive


The hide directive omits a column you've included in the universal table from the resulting XML document. One use of this functionality is to order the result by a column that you don't want to include in the XML fragment. When you aren't using UNION to merge tables, this isn't a problem because you can order by any column you choose. However, the presence of UNION in a query requires order by columns to exist in the result set. The hide directive gives you a way to satisfy this requirement without being forced to return data you don't want to. Listing 18.13 shows an example.

Listing 18.13

SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
PostalCode AS [Customers!1!PostalCode!hide],
NULL AS [Orders!2!OrderId],
NULL AS [Orders!2!OrderDate!element]
FROM Customers
UNION
SELECT 2 AS Tag,
1 AS Parent,
CustomerId,
NULL,
NULL,
OrderId,
OrderDate
FROM Orders
ORDER BY [Customers!1!CustomerId], [Orders!2!OrderDate!element],
[Customers!1!PostalCode!hide]
FOR XML EXPLICIT

Notice the hide directive (bolded) that's included in the column 5 heading. It allows the column to be specified in the ORDER BY clause without actually appearing in the resulting XML fragment.

The cdata Directive


CDATA sections may appear anywhere in an XML document that character data may appear. A CDATA section is used to escape characters that would otherwise be recognized as markup (e.g., <, >, /, and so on). Thus CDATA sections allow you to include sections in an XML document that might otherwise confuse the parser. To render a CDATA section from an EXPLICIT mode query, include the cdata directive, as demonstrated in Listing 18.14.

Listing 18.14

SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
Fax AS [Customers!1!!cdata]
FROM Customers
FOR XML EXPLICIT
(Results abridged and formatted)
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers CustomerId="ALFKI">Alfreds Futterkiste
  <![CDATA[030-0076545]]>
</Customers>
<Customers CustomerId="ANATR">Ana Trujillo Emparedados y helados
  <![CDATA[(5) 555-3745]]>
</Customers>
<Customers CustomerId="ANTON">Antonio Moreno Taquer'a
</Customers>
<Customers CustomerId="AROUT">Around the Horn
  <![CDATA[(171) 555-6750]]>
</Customers>
<Customers CustomerId="BERGS">Berglunds snabbk'p
  <![CDATA[0921-12 34 67]]>
</Customers>

As you can see, each value in the Fax column is returned as a CDATA section in the XML fragment. Note the omission of the attribute name in the cdata column heading (bolded). This is because attribute names aren't allowed for CDATA sections. Again, they represent escaped document segments, so the XML parser doesn't process any attribute or element names they may contain.

The id, idref, and idrefs Directives


The ID, IDREF, and IDFREFS data types can be used to represent relational data in an XML document. Set up in a DTD or XML-Data schema, they establish relationships between elements. They're handy in situations where you need to exchange complex data and want to minimize the amount of data duplication in the document.

EXPLICIT mode queries can use the id, idref, and idrefs directives to specify relational fields in an XML document. Naturally, this approach works only if a schema is used to define the document and identify the columns used to establish links between entities. FOR XML's XMLDATA option provides a means of generating an inline schema for its XML fragment. In conjunction with the id directives, it can identify relational fields in the XML fragment. Listing 18.15 gives an example.

Listing 18.15

SELECT 1 AS Tag,
       NULL AS Parent,
       CustomerId AS [Customers!1!CustomerId!id],
       CompanyName AS [Customers!1!CompanyName],
       NULL AS [Orders!2!OrderID],
       NULL AS [Orders!2!CustomerId!idref]
FROM Customers
UNION
SELECT 2,
       NULL,
       NULL,
       NULL,
       OrderID,
       CustomerId
FROM Orders
ORDER BY [Orders!2!OrderID]
FOR XML EXPLICIT, XMLDATA
(Results abridged and formatted)
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Schema name="Schema2" xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
  <ElementType name="Customers" content="mixed" model="open">
    <AttributeType name="CustomerId" dt:type="id"/>
    <AttributeType name="CompanyName" dt:type="string"/>
    <attribute type="CustomerId"/>
    <attribute type="CompanyName"/>
</ElementType>
<ElementType name="Orders" content="mixed" model="open">
    <AttributeType name="OrderID" dt:type="i4"/>
    <AttributeType name="CustomerId" dt:type="idref"/>
    <attribute type="OrderID"/>
    <attribute type="CustomerId"/>
</ElementType>
</Schema>
<Customers xmlns="x-schema:#Schema2" CustomerId="ALFKI"
  CompanyName="Alfreds Futterkiste"/>
<Customers xmlns="x-schema:#Schema2" CustomerId="ANATR"
  CompanyName="Ana Trujillo Emparedados y helados"/>
<Customers xmlns="x-schema:#Schema2" CustomerId="ANTON"
  CompanyName="Antonio Moreno Taquer'a"/>
<Customers xmlns="x-schema:#Schema2" CustomerId="AROUT"
  CompanyName="Around the Horn"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10248"
  CustomerId="VINET"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10249"
  CustomerId="TOMSP"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10250"
  CustomerId="HANAR"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10251"
  CustomerId="VICTE"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10252"
  CustomerId="SUPRD"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10253"
  CustomerId="HANAR"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10254"
  CustomerId="CHOPS"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10255"
  CustomerId="RICSU"/>

Note the use of the id and idref directives in the CustomerId columns of the Customers and Orders tables (bolded). These directives link the two tables by using the CustomerId column they share.

If you examine the XML fragment returned by the query, you'll see that it starts off with the XML-Data schema that the XMLDATA directive created. This schema is then referenced in the XML fragment that follows.



Page: 1, 2, 3, 4, 5, 6

next page

Rate this:
Recent Comments
There are currently no comments. Be the first to make a comment.