XSLT for data mapping

Improve your IAF skills!




XSLT is a powerful tool that is much used in data processing. With data mapping we refer to the process of lining up values in one data source against values in another data source. For data mapping we only use a small subset of the available functions. In this article I would like to address these xslt functions.  In most cases, mappings are based on a mapping documentation provided in spreadsheet or document format.  A template for clients  to fill out is handy, this template will help you get the mapping document to be 1 on 1 with the xslt.

 

What’s that : XSLT  ? 

XSLT is a language that is designed to transform xml documents. The transformed document can be of type : xml, html, plain text, and more. The original document is usually xml, but in fact any document that can be searched via XPATH queries can be used. XSLT doesn’t change the original document but produces a new output document.

 

 

The Plan

 

  • Discuss how a mapping template should look like.
  • Create and test an xslt-stylesheet on the basis of this mapping.

 

Files

To start off this tutorial you need to have three files in your document root :

  • A filled in mapping template. On the left you will write out the destination, the structure your xsl is supposed to output. On the right you will place the structure of the source document. Path and Element can be combined into one column but I prefer the separation where the path is relative. Condition specifies under which condition a group or element needs to be mapped. Transformation contains domain translations and other transformations on the source data.

 

 

 

 

 

 

 

 

 

 

 

 

 

  • A base xslt document into which we’re going to write our code.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
  • An example input document. (original file see below)

 

 

 

 

 

 

 

 

Planning

To test your transformations you need a tool that can parse our stylesheet and inputfiles. What I use most is XMLspy , but Eclipse also has a built in parser. To start off quick and dirty you could use one of the many available online tools, like freeformatter.  Just insert your  input xml and your stylesheet and click the transform button. While working on this tutorial we would seriously recommend parsing the examples, instead of just reading them. Just work in two tabs and you’re done.

 

Let’s start coding!

 

STEP 1 Most of the times I start the mapping by writing down the result structure. So lets do this, start mapping and forget about repeating groups, translations and conditions. To produce this you just must analyse the destination part of your mapping document.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
 <businesspartner>
  <firstname><xsl:value-of select="request/BP/FirstName"/></firstname>
  <lastname><xsl:value-of select="request/BP/LastName"/></lastname>
  <address> 
    <type><xsl:value-of select="request/BP/Address/Type"/></type> 
    <street><xsl:value-of select="request/BP/Address/Street"/></street> 
    <housenumber><xsl:value-of select="request/BP/Address/HouseNumber"/></housenumber>
    <city><xsl:value-of select="request/BP/Address/City"/></city>
    <postalcode><xsl:value-of select="request/BP/Address/PostalCode"/></postalcode>
  </address>
  <contract>
    <type><xsl:value-of select="request/Contract/Type"/></type>
    <object><xsl:value-of select="request/Contract/Object"/></object>
  </contract>
 </businesspartner>
</xsl:template>
</xsl:stylesheet> 

 

Parsing this stylesheet on the supplied input.xml we will get the following result :

 

 

 

 

 

Now have a look at the mapping template and try to find what needs to be improved. Three important differences we see :

  1. There is an empty firstname; we need to insert a set of initials there.
  2. We wanted all adresses, not just one!
  3.  The same with the contract element : we want to have them all listed.
  4. We miss the conditions and transformations from the mapping document.

 

STEP 2 

Let’s tackle the firstname : in our input document we can have either a firstname element, or an initials element. That means we need to make a choice in our stylesheet : when we have a non-empty firstname we choose that one, otherwise we take the initials, put dots after each character, and give that as the firstname.  xsl:choose is the construction , with an xsl:when, and an xsl:otherwise element in it. In a xsl:choose we have a mandatory test attribute, where we can fill in a valid XPATH expression.

            <firstname>
                <xsl:choose>
                    <xsl:when test="request/BP/FirstName!='' "><xsl:value-of select="request/BP/FirstName"/></xsl:when>
                    <xsl:otherwise>
                        no name known
                    </xsl:otherwise>
                </xsl:choose>
            </firstname>

This gives us a silly “<firstname>no name known</firstname>” when we transform it, but when we have a name it fills in that one. Lets fix that :

            <firstname>
                <xsl:choose>
                    <xsl:when test="request/BP/FirstName!='' "><xsl:value-of select="request/BP/FirstName"/></xsl:when>
                    <xsl:otherwise><xsl:value-of select="request/BP/Initials"/></xsl:otherwise>
                </xsl:choose>
            </firstname>

 

 

Getting closer : We have a firstname now in both cases. But look at our template : We must add dots between the letters and therefore we need to analyze a string. Luckily xslt provides a function for this. xsl:analyze-string, and xsl:matching-substring. You can use a regular expression to filter out your input string. In this case we will use ‘ . ‘ (DOT) which would give us any character. A slightly better regex would be [a-zA-Z]. In the matching substring element we would use again ‘ . ‘  (DOT) , but this time to denote the current node. As a cherry on our pie we will capitalize the letters we will find. However, we could just as well omit this capitalization, after all this wasn’t explicitly mentioned in the specs!

<!-- analyze string easy solution -->
            <firstname>
                <xsl:choose>
                    <xsl:when test="request/BP/FirstName!='' "><xsl:value-of select="request/BP/FirstName"/></xsl:when>
                    <xsl:otherwise><xsl:analyze-string select="request/BP/Initials" regex=".">
                            <xsl:matching-substring><xsl:value-of select="." />.</xsl:matching-substring>
                        </xsl:analyze-string></xsl:otherwise>
                </xsl:choose>
            </firstname>

 
 <!-- analyze string best try -->
            <firstname>
                <xsl:choose>
                    <xsl:when test="request/BP/FirstName!='' "><xsl:value-of select="request/BP/FirstName"/></xsl:when>
                    <xsl:otherwise><xsl:analyze-string select="request/BP/Initials" regex="[a-zA-Z]">
                            <xsl:matching-substring><xsl:value-of select="upper-case( . )" />.</xsl:matching-substring>
                        </xsl:analyze-string></xsl:otherwise>
                </xsl:choose>
            </firstname>

 

STEP 3   Now we are ready to fix the addresses. It’s kind of sloppy to omit all the other addresses that were in the original xml. Lets first incorporate all the addresses, and after that turn to the specs. We will need an xsl:for-each construct to grab all the Address nodes.

    <xsl:for-each select="request/BP/Address">
     <address>
       <type><xsl:value-of select="AddressType"/></type>
       <street><xsl:value-of select="Street"/></street>
       <housenumber><xsl:value-of select="HouseNumber"/></housenumber>
       <city><xsl:value-of select="City"/></city>
       <postalcode><xsl:value-of select="PostalCode"/></postalcode>
     </address>
   </xsl:for-each>

Which will result in the following output :


This is becoming better, since all the addresses are in our output now. Only the specs give us some more work to do: the type number has to be transformed to the corresponding string value. This will be an xsl:choose , where we will add three when clauses.

 

 

 

 

 

    <xsl:for-each select="request/BP/Address">
     <address>
        <type><xsl:choose>
          <xsl:when test="AddressType='04'">delivery</xsl:when>
          <xsl:when test="AddressType='10'">business</xsl:when>
          <xsl:when test="AddressType='12'">home</xsl:when>
          </xsl:choose>
        </type>
        <street><xsl:value-of select="Street"/></street>
        <housenumber><xsl:value-of select="HouseNumber"/></housenumber>
        <city><xsl:value-of select="City"/></city>
        <postalcode><xsl:value-of select="PostalCode"/></postalcode>
     </address>
   </xsl:for-each>

Transforming address type number to a specified string.

One more thing…the specs say clearly that an address type not equal to 12  should be incorporated. We will have to skip the last when clause in that case. Maybe it has a reason not to include the home address, something with privacy perhaps? Let’s test this out. We just omit the when-clause where addresstype =12.  Producing :

 

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="xml" encoding="UTF-8" indent="yes"/>

   <xsl:template match="/">
      <businesspartner>
         <firstname>
             <xsl:choose>
                 <xsl:when test="request/BP/FirstName!='' "><xsl:value-of select="request/BP/FirstName"/></xsl:when>
                 <xsl:otherwise><xsl:analyze-string select="request/BP/Initials" regex=".">
                     <xsl:matching-substring><xsl:value-of select="." />.</xsl:matching-substring>
                     </xsl:analyze-string></xsl:otherwise>
             </xsl:choose>
         </firstname>
         <lastname><xsl:value-of select="request/BP/LastName"/></lastname>
         <xsl:for-each select="request/BP/Address">
            <address>
               <type>
                  <xsl:choose>
                     <xsl:when test="AddressType='04'">delivery</xsl:when>
                     <xsl:when test="AddressType='10'">business</xsl:when>
                  </xsl:choose>
               </type>
               <street><xsl:value-of select="Street"/></street>
               <housenumber>
                  <xsl:value-of select="HouseNumber"/>
                  <xsl:if test="HouseNumberAddition!=''"><xsl:text> </xsl:text><xsl:value-of select="HouseNumberAddition"/></xsl:if>
               </housenumber>
               <city><xsl:value-of select="City"/></city>
               <postalcode><xsl:value-of select="PostalCode"/></postalcode>
            </address>
         </xsl:for-each>
         <contract>
    <type><xsl:value-of select="request/Contract/Type"/></type>
    <object><xsl:value-of select="request/Contract/Object"/></object>
  </contract>
      </businesspartner>
   </xsl:template>
   

</xsl:stylesheet>

 
Unexpected result: The Private address is not omitted; the only result is that we have an empty type tag.

 

We will have to write an XPATH-expression to omit the private addresses after all. We can use the not-function for that.

            <xsl:for-each select="request/BP/Address[ not( ./AddressType = '12' ) ]">

 

Now the address addition. When there is an HouseNumberAddition, we want a space added to the HouseNumber and then the addition. We can use an xsl:if here, and combine it with xsl:text which can produce a literal output.

 

<housenumber>
    <xsl:value-of select="HouseNumber"/>
    <xsl:if test="HouseNumberAddition!=''">
        <xsl:text> </xsl:text>
        <xsl:value-of select="HouseNumberAddition"/>
    </xsl:if>
</housenumber>

Resulting in the following :

 

While mapping you should take in account the XSLT is processing the whole input XML. It will find and apply templates on all elements. Elements out of scope of your templates will be mapped to your output! This could result in unexpected results (text from input will appear in your result).Like here ! See the contract element below; We have an unexpected triple objects there!! Time to work on that issue.

 

 

 

 

 

 

STEP 4   When selecting nodes xslt selects all the nodes that satisfy the XPATH query. Hence the triple router, and the delivery, installation and insurance in one tag.  Let’s do that by defining a separate template for the contract tag.  As such, this snippet will not work, we have to call the template in our code.

    <xsl:template match="Contract">
        <contract>
            <type><xsl:value-of select="Type"/></type>
            <object><xsl:value-of select="Object"/></object>
        </contract>
    </xsl:template>    

Just before </businespartner> we can simply insert  <xsl:apply-templates select=”request/Contract”/> which will make the template work. We can make one major improvement on this, which will make it more readable what we are doing :

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" encoding="UTF-8" indent="yes"/>

  <xsl:template match="/">
          <businesspartner>
            <xsl:apply-templates select="request/BP"/>
            <xsl:apply-templates select="request/Contract"/>
          </businesspartner>
  </xsl:template>
  
<xsl:template match="BP">
      <firstname>
        <xsl:choose>
          <xsl:when test="FirstName!='' "><xsl:value-of select="FirstName"/></xsl:when>
          <xsl:otherwise>
            <xsl:analyze-string select="Initials" regex=".">
              <xsl:matching-substring><xsl:value-of select="." />.</xsl:matching-substring>
            </xsl:analyze-string>
          </xsl:otherwise>
        </xsl:choose>
      </firstname>
      <lastname><xsl:value-of select="LastName"/></lastname>
          <xsl:for-each select="Address[ not(./AddressType = '12' ) ]">
            <address>
              <type>
                <xsl:choose>
                  <xsl:when test="AddressType='04'">delivery</xsl:when>
                  <xsl:when test="AddressType='10'">business</xsl:when>
                </xsl:choose>
              </type>
              <street><xsl:value-of select="Street"/></street>
              <housenumber>
                <xsl:value-of select="HouseNumber"/>
                <xsl:if test="HouseNumberAddition!=''"><xsl:text> </xsl:text><xsl:value-of select="HouseNumberAddition"/></xsl:if>
              </housenumber>
              <city><xsl:value-of select="City"/></city>
              <postalcode><xsl:value-of select="PostalCode"/></postalcode>
            </address>
      </xsl:for-each>
</xsl:template>

  <xsl:template match="Contract">
    <contract>
      <type><xsl:value-of select="Type"/></type>
      <object><xsl:value-of select="Object"/></object>
    </contract>
  </xsl:template> 
</xsl:stylesheet>

 

Final stylesheet

 

 

We have split the code into a person/address part and a contract part. When using the for-each or the apply-templates we need to consider the fact that the paths needs to be relative to the select in the apply-templates or the for-each. Compare for XPATH expression to exclude private addresses changes after this block is made into a separate template.

 

 

 

 

 

 

Final output

 

Resources :