Class XmlTransform

  extended by com.cleancode.xml.XmlTransform

public class XmlTransform
extends Object

Transforms and/or schema-validates a tree of files, optionally generating multi-level indices and adding navigational linkages.

XmlTransform is a general-purpose transformation toolkit for manipulating a large collection of hierarchically organized files. So what does it really do? The short answer: saves you a lot of time. Consider probably the most common task--though by no means the only application for XmlTransform--of creating and maintaining a website. If you've done it before by manually creating a collection of HTML files, you most likely encountered maintenance issues such as:

XmlTransform can transform from one flavor of XML to another, or from XML to XHTML, or any other output you specify in a standard XSL file. Also, you may selectively validate either your input files, your output files, or both, against an XML schema. During transformation, you may add vertical and horizontal linkages among files in the directory tree, and, you may automatically generate a contents file for each subdirectory in your directory tree. Finally, XmlTransform expands your custom abbreviations (via XSL) so you never need to write the same code or text twice.

Besides the description here, I have also published two articles discussing the practical uses of this tool, available on the DevX online magazine: XmlTransform:A General-Purpose XSLT Pre-Processor and Add Custom XML Documentation Capability To Your SQL Code.

XmlTransform may be used either as a library module or as a standalone utility. As a library class, the API is quite simple:

    settings = new InputOptions(args);
    converter = new XmlTransform(settings);
The InputOptions class is another CleanCode module that handles the configuration options. In its simplest form, you merely provide the command-line argument array from your main routine.

As a standalone tool, the utility is invoked as:

   java XmlTransform options-or-option-file
   java XmlTransform --help
The latter form will list all the available configuration options, described in the following sections.

Basic Steps

Step 1 -- What To Do

There are several boolean switches to select from:


Step 2 -- Where To Do It

You specify where your input tree resides (sourcePath) and where your output tree should be generated (targetPath). Either one will default to the current directory if not specified. Next, you specify the input file extension (inExtension) and the output file extension (outExtension). These default to DEFAULT_IN_EXTENSION and DEFAULT_OUT_EXTENSION, respectively, if not specified.


Step 3 -- What To Do It With

In order to transform each file, you must provide an XSL file that specifies how this is done (xslName). If you specify an absolute path, that file will be used at any subdirectory depth. If, however, you specify just a base file name (such as "stuff.xsl"), then a couple things happen. The program will look for such a file within each subdirectory that is processed. If it doesn't find one, then it will look for the same name in your root directory (sourcePath). This provides a flexible mechanism whereby you may specify one global XSL specification, but you may override it in specific instances that require it. If, on the other hand, you do not provide a root XSL file, then only those subdirectories containing a local XSL file will be transformed.

In order to generate a table of contents for any given subdirectory, you must provide an XML file template--this is just an XML file like any other among your files, with the exception that it has one or more placeholders for referencing lists of other files. The template file name consists of an underscore, then the contentsBaseName parameter, a dot, and the inExtension parameter (e.g. _myDir.xml). The filled-in template is stored in your input tree. The name of it depends on whether you specify to store the file in the same directory or the parent; if in the same directory, the name will be the same base name-dot-inExtension, less the underscore. (See "Generating a Contents File" for more details.)

[Advanced use] If you wish to validate your transformation XSL file (xslName) then you must provide an XML Schema file with which to do so (xslSchema). This is not often necessary since any errors will be promptly reported when you attempt to use the XSL file in any transformation. So this option (controlled by the validateXslBySchema switch) is perhaps more of an academic exercise. (Note that this validates only the transformation XSL not the contents-generating XSL.)


Step 4 -- How To Do It

Finally, you need to provide a few details on how the program should operate.


Navigational Controls

XmlTransform, as you're aware, operates on a tree of directories and files, and therefore has an implicit notion of first/final and next/previous within a directory, and up/down between directories. As such, during the XSL transformation of each file, XmlTransform provides parameters from which you could create navigational controls. The parameters are:

If your XSL file creates HTML, for example, you could create a set of navigation buttons or links for next and previous entries with something like this:

   <xsl:param name="prevLink"/>
   <xsl:param name="nextLink"/>
     <xsl:when test="$prevLink">
       <a><xsl:attribute name="href">
         <xsl:value-of select="$prevLink"/>
         </xsl:attribute>Previous file</a>
     <xsl:otherwise><span>(no previous file)</span>
     <xsl:when test="$nextLink">
       <a><xsl:attribute name="href">
         <xsl:value-of select="$nextLink"/>
         </xsl:attribute>Next file</a>
     <xsl:otherwise><span>(no next file)</span>
Then each file within a directory would have a link to the one that follows it (in alphabetical order) and the one that precedes it. The only exception is at the endpoints--the first and final files in the directory--where XmlTransform will pass an empty string. Your XSL code must account for this as in the above example; the <xsl:when> element above will catch just the non-empty strings. Similarly, the upLink will be empty when already at the top-level directory. The last two navigational controls, firstLink and finalLink, will only be empty when the directory contains but a single file.

Finally, a note with respect to contents generation: if you choose to create a contents file and have it generated within the same directory (contentsToParent is false), that file will be considered the first file in the directory, notwithstanding alphabetical order. That way using the firstLink will immediately take the user to this contents file. If the contents file is generated in the parent, it receives no such special treatment.

Generating a Contents File

As mentioned briefly above, you may automatically generate a table of contents for each directory. The mechanism to do this provides complete flexibility over constructing a table of contents using the power of XSL, but--alas--that means that you must be conversant with XSL. For those who are not and are interested in the challenge, I do walk you through each step below, but you may need to consult an XSL reference as you go! (Note that the generateContents configuration option must be turned on otherwise contents generation will not occur.)

Step 1 -- Naming a Contents Template File

XmlTransform first generates an XML file--in your input tree--from your contents-template XML file. The template file name is just _contentsBaseName.inExtension (note the leading underscore). (Both contentsBaseName and inExtension are configuration options.) This file is then transformed along with all your other input files into your output tree so that it may benefit from the same transformations and provide the same output file structure. You have a choice where in your input tree the contents file should be stored, controlled by the contentsToParent configuration option. It will be stored either in the current subdirectory (false), or in the current subdirectory's parent (true). The choice depends on your needs and your application. Let's say you have a subdirectory "fauna" containing birds.xml, mammals.xml, invertebrates.xml, etc. If you create the contents file in the same directory, it will be called contentsBaseName.inExtension (without the leading underscore). For example, if contentsBaseName is "contents" and inExtension is "xml", then the assembled contents file name is "contents.xml" (and the corresponding template is "_contents.xml"). If, however, you choose to generate the contents in the parent directory, then the contents file will be named directory.inExtension. Using our "fauna" example, that would simply result in "fauna.xml". This allows you to have multiple subdirectories, all sending their contents to a common parent, each with a unique name.


Step 2 -- Instrumenting the Contents Template File

After naming the contents template, the next salient point is what goes inside. The contents template is just like every other file in your input tree in all respects but two. First, inside this template you include a place holder where XmlTransform will insert a list of files in the current directory. The place holder is an XML element with no content (though it may have attributes, discussed shortly). The name of the place holder element is a configuration option (groupPlaceHolder) so you may specify any element you like. Say, for example, you use <files/>. When you execute XmlTransform, that element in your template will be expanded with a list of <file> children, enumerating the files in the current directory.

The second vital element required in your template tells XmlTransform in subsequent executions which files it is appropriate to overwrite with regenerated contents. The generator element (named by the generatorNode configuration option) should be empty; it will then be replaced with a standard HTML <meta> of the form:

<meta content="generator-id" name="generator"/>
The generator-id string was mentioned earlier (GENERATOR_ID).

Note that this must be converted in your XSL file, using a template such as:

   <xsl:param name="generator"/>
   <xsl:template match="cc:generator">
   <meta content="{$generator}" name="generator"/>

This element must show up in the generated file; otherwise, the next time you wish to regenerate the same contents file, XmlTransform will not be able to tell that it generated the file, and will thus not overwrite it. Finally, regardless of whether you are transforming to HTML or not, the generator element must be the standard HTML <meta> element, since that is what XmlTransform looks for.


Step 3 -- Filling in the Template with the Current Directory

Now let's examine the <files/> element in more detail. If a given directory has the 4 files shown below, and you include a <files/> element in your _index.xml contents template, here's an example of an XML fragment created automatically in your index.xml contents file,

   . . . etcetera...
You'll observe that the XML is quite simple. Under the <files> root element is a list of individual <file> elements. Each <file> element contains a relative file path to the file in your output tree (for linking), and an absolute file path to the file in your input tree (for data extraction). (Recall that a contents file may be generated within a directory, or in its parent. The above XML is from the former case, since the <relfile> elements show just a plain base filename. If the contents file was in the parent directory, and the current directory is named "webstuff", then the <relfile> elements would appear as "webstuff/accessibility.html", etc.) Your job, then, is to add transformation code to your XSL file that converts XML in the form shown into appropriate output in your output file.

Step 4 -- Transforming the Contents Data

The <files> element above is stored inside your contents file in the input tree. Your XSL must then handle conversion of this data to an appropriate display, which is stored as the contents file in the output tree. Using the same file data in the previous example, here's example output when converted to HTML/XHTML:

   <a href="accessibility.html">Accessibility</a>:
   Don't discriminate on physical ability when you design web pages.
   <a href="antispam.html">Anti-Spam</a>:
   Design defensively so you do not make it easy for spammers
   to enlist you to help them.
   <a href="browser.html">Browser Compliance</a>:
   Design economically by considering the technology of your audience.
   <a href="style.html">Issues of Style</a>:
   Style is the grey that is leftover after you've considered
   all the black and white of the other topics presented here.
Notice that the relfile elements have been directly inserted in the HTML text, but the <absfile> elements have been used by the XSL translation to lookup information from within each file to provide a description of that very file.

Let's examine where the components of an entry come from in that piece of HTML. The general form is

 <li><a href="relfile">
     file title</a>:file description</li>
The relfile is the same relfile discussed above--the relative file reference to your output tree.

The file title comes from--in this example--the <cc:title> element contained by the <cc:head> element contained by the root <cc:cleanCodeDoc> element, which is in the file specified by absfile. Similarly, the file description comes from the content attribute of the HTML <meta> element whose name attribute is description. Here's the XSL used to extract that information:

 <xsl:template match="cc:files">

 <xsl:template select="file">
 <xsl:variable name="extNode" select="document(absfile)/cc:cleanCodeDoc"/>
   <a href="{relfile}">
     <xsl:value-of select="$extNode/cc:head/cc:title"/>
To see a specific example, here's a fragment of the accessibility.xml file, showing where the file title and file description came from for one entry. The extracted information has been emboldened in the fragment below.
 <meta name="description"
  content="Don't discriminate on physical ability
  when you design web pages." />
 . . .

Step 5 -- Separating the Current Directory into Groups

This step is optional, but it makes the contents generator even more flexible. Hearken back to the discussion of the <files/> element above in steps 2 and 3. With the implementation described therein, we used a single <files/> element in the template, and references to all the files in the current directory were stored there. But you may also subdivide the current directory into groups of similar files and place each group in your contents file using multiple <files/> elements. Add the group attribute with a designation of your own choosing. Say, for example, you have a directory containing chocolate recipes and you wish to group these by variety. In your contents template, before each list of files, you want to have a paragraph or two of introduction, followed by the file list. So your template might look, in part, like this (for brevity, I'll just use titles to introduce each list here):

   <cc:files group="bs"/>
   <cc:files group="ss"/>
   <cc:files group="milk"/>
   <cc:files group=""/>
For each <files/> element, the files belonging to that group will be inserted. Note particularly the last one (it does not have to be last, by the way, it just tends to naturally want to be last). There you'll see an empty group attribute; omitting the attribute entirely is equivalent, and in fact, is the very element we started with in the simpler, non-grouped discussion earlier. Any files that do not have a group designation will be stored in this group without a group designator.

So we have instrumented the contents template; now we must also instrument each file to assign it to a group. You create an element in each of your files that specifies what group the file is in. This element may have any name you choose; let's use group for purposes of discussion. So inside, for example, the file recipe23.xml that contains a bittersweet recipe, you would include:

since bs is what we used as the bittersweet group designator in the contents template.

The last step is just to connect the two ends--that is, to tell XmlTransform where in each file to look for a group designator that it has identified in the template file. To do this, you define the configuration option groupIdXpath, which specifies an Xpath expression to traverse your XML to the appropriate element. Let's peek at just a few more lines in one of your XML files:

       . . .
     . . .
Given the above structure, you could use this definition:
to point to the group designation.


Validating Your Files

XmlTransform has two main purposes: generating and transforming files from one form to another, discussed above, along with validating your files for correctness (both well-formed and valid). Checking for well-formedness is done intrinsically with any processing of your XML files. Doing formal XML Schema validation, however, is a separate option that you may turn on and off, which we discuss here.

Under "Basic Steps" above you'll observe the two switches for independently validating your input tree XML and your output tree XML. (There is also a third, more for experimentation, to validate your XSL files.) Once turned on, XmlTransform will validate your selection against an XML Schema (DTD validation is not supported). This may be done by specifying the input Schema either within each file, or you may specify a global Schema file. You may specify up to two global Schema files, one for your input tree of XML files, and one for your output tree. If you specify a value for the global input tree Schema, then all files in your input tree will be validated against that Schema, regardless of whether any individual file specifies its own Schema file. Note that a Schema specified inside an XML file may be either a local file reference or a URL reference. However, a global Schema file may only be a local file reference.


Leveraging the Parameter Map

Earlier the --help command-line option was mentioned, which displays the list of configuration options accessible from XmlTransform. Each of the settings used by XmlTransform was explained above as well. But if you run XmlTransform with --help, you'll see several settings that were not mentioned. What, you may wonder, is going on? The remaining settings that you find are not used by XmlTransform (at least not directly) but rather by the CleanCode Diagnostic module. Those settings are not documented above to avoid duplication, and, if they change, to prevent the documentation from becoming inaccurate. But more than that, the list of options displayed with --help are not even generated from XmlTransform; that list is requested from the Diagnostic module, also. Carrying this to the next level, let's say you wish to use XmlTransform programmatically, rather than from the command-line. You can use the same methodology of asking XmlTransform for its list of settings, rather than re-documenting them yourself. You'll want to review the ParamMap and InputOptions classes for further details, but essentially you need to create a ParamMap object for your own settings, then use its putAll method to add in the settings for each library module you use. In the case of XmlTransform, it uses only one module which uses InputOptions, so it includes just this one line of code --

-- to add the Diagnostic modules parameters to its own.

Using Diagnostics

XmlTransform uses the CleanCode Diagnostic library module to provide flexible diagnostic output options. Six diagnostic levels are used:

diagnostic levelpurpose
XMLTRANSFORM_A_DIAG final statistics
XMLTRANSFORM_C_DIAG processed entries
XMLTRANSFORM_D_DIAG processing notes
XMLTRANSFORM_E_DIAG validation info
XMLTRANSFORM_F_DIAG contents group/file details

If you use the --debug switch, all of the above diagnostics are activated. This will provide a large amount of diagnostic information; you may want to use individual diagnostics selectively by either setting the ones you want to match the DIAG_LEVEL mask from the Diagnostic module, or using the --diagList shorthand. If you say, for example, --diagList=ACF, that will enable XMLTRANSFORM_A_DIAG, XMLTRANSFORM_C_DIAG, and XMLTRANSFORM_F_DIAG. Another option to be aware of is that if you use --noEnable (or --enable=false) and have used neither --debug nor --diagList, then the "C" and "D" diagnostics are activated automatically. After all, why would you want to run the program with all actions disabled if not to have it report what it would do if actions were enabled?

Getting Started

Take a look at the XmlTransform sandbox, containing several simple examples of using XmlTransform, which you may copy and modify to your needs. For a real-world example, take a look at the actual files, XSLT, and Schema for the main portion of the CleanCode website itself here.

To Be Done

CleanCode 0.9
$Revision: 380 $
Michael Sorens
See Also:
Diagnostic, ParamMap, InputOptions

Field Summary
          Default base prefix for contents file.
          Default node name for placing file list in content file.
          Default simple Xpath expression to locate group identifier in each source file.
          Default node name for placing file list in content file.
          Default input extension.
          Default output extension.
static String DEFAULT_XSL_NAME
          Default base XSL file name for translating from input to output.
static String GENERATOR_ID
          Combination of name and version of this generator, for use in XSL files to identify generated files.
          JAXP 1.2 schema language value.
          JAXP 1.2 schema source value.
static ParamMap paramMap
          Parameter map for this class.
static String VERSION
          Current version of this class.
static String W3C_XML_SCHEMA
          JAXP 1.2 xml schema value.
Constructor Summary
XmlTransform(InputOptions settings)
          Creates an XmlTransform object with the specified configuration options.
Method Summary
static void main(String[] args)
          Main program for standalone mode.
 void process()
          Performs all the processing as specified by the configuration options.
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail


public static final String VERSION
Current version of this class.


public static final String GENERATOR_ID
Combination of name and version of this generator, for use in XSL files to identify generated files.


public static final String JAXP_SCHEMA_LANGUAGE
JAXP 1.2 schema language value.

See Also:
Constant Field Values


public static final String W3C_XML_SCHEMA
JAXP 1.2 xml schema value.

See Also:
Constant Field Values


public static final String JAXP_SCHEMA_SOURCE
JAXP 1.2 schema source value.

See Also:
Constant Field Values


public static final String DEFAULT_IN_EXTENSION
Default input extension.

See Also:
Constant Field Values


public static final String DEFAULT_OUT_EXTENSION
Default output extension.

See Also:
Constant Field Values


public static final String DEFAULT_XSL_NAME
Default base XSL file name for translating from input to output.

See Also:
Constant Field Values


public static final String DEFAULT_CONTENTS_BASENAME
Default base prefix for contents file.

See Also:
Constant Field Values


public static final String DEFAULT_GROUP_PLACEHOLDER
Default node name for placing file list in content file.

See Also:
Constant Field Values


public static final String DEFAULT_GENERATOR_NODE
Default node name for placing file list in content file.

See Also:
Constant Field Values


public static final String DEFAULT_GROUP_ID_XPATH
Default simple Xpath expression to locate group identifier in each source file.

See Also:
Constant Field Values


public static ParamMap paramMap
Parameter map for this class. Merge it with the parameter map from your invoking class to get a full set of available configuration options.

Constructor Detail


public XmlTransform(InputOptions settings)
             throws ParserConfigurationException
Creates an XmlTransform object with the specified configuration options. See the text above for descriptions of all available options.

settings - an InputOptions object containing the configuration options.
ParserConfigurationException - if any problems with the JAXP parsers.
Method Detail


public void process()
Performs all the processing as specified by the configuration options.


public static void main(String[] args)
                 throws ParserConfigurationException
Main program for standalone mode. Processes the directory tree according to the supplied configuration options.

Usage: XmlTransform options-or-option-file
Usage: XmlTransform --help

args - list of configuration options
ParserConfigurationException - if any problems with the JAXP parsers.

CleanCode Java Libraries Copyright © 2001-2012 Michael Sorens - Revised 2012.12.10 Get CleanCode at Fast, secure and Free Open Source software downloads