|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.cleancode.xml.XmlTransform
public class XmlTransform
Transforms and/or schema-validates a tree of files, optionally generating multi-level indices and adding navigational linkages.
XmlTransform is a general-purpose transformation toolkit for manipulating a large collection of hierarchically organized files. So what does it really do? The short answer: saves you a lot of time. Consider probably the most common task--though by no means the only application for XmlTransform--of creating and maintaining a website. If you've done it before by manually creating a collection of HTML files, you most likely encountered maintenance issues such as:
XmlTransform can transform from one flavor of XML to another, or from XML to XHTML, or any other output you specify in a standard XSL file. Also, you may selectively validate either your input files, your output files, or both, against an XML schema. During transformation, you may add vertical and horizontal linkages among files in the directory tree, and, you may automatically generate a contents file for each subdirectory in your directory tree. Finally, XmlTransform expands your custom abbreviations (via XSL) so you never need to write the same code or text twice.
Besides the description here, I have also published two articles discussing the practical uses of this tool, available on the DevX online magazine: XmlTransform:A General-Purpose XSLT Pre-Processor and Add Custom XML Documentation Capability To Your SQL Code.
XmlTransform may be used either as a library module or as a standalone utility. As a library class, the API is quite simple:
settings = new InputOptions(args); converter = new XmlTransform(settings); converter.process();The
InputOptions
class is another CleanCode module that handles the configuration options.
In its simplest form, you merely provide the command-line argument
array from your main
routine.
As a standalone tool, the utility is invoked as:
java XmlTransform options-or-option-file java XmlTransform --helpThe latter form will list all the available configuration options, described in the following sections.
There are several boolean switches to select from:
xslTransform
specifies to transform input to output
(using your XSLT specification).generateContents
specifies to generate contents files
in the input tree (using your contents templates).validateInputToSchema
specifies to validate the input
(using your XML Schema definition).validateOutputToSchema
specifies to validate the output
(using your XML Schema definition).validateXslBySchema
specifies to validate the
XSL file itself (not commonly needed since any errors will be evident
when xslTransform is turned on).parameter | default |
---|---|
xslTransform | true |
generateContents | true |
validateInputToSchema | false |
validateOutputToSchema | false |
validateXslBySchema | false |
You specify where your input tree resides
(sourcePath
)
and where your output tree should be generated
(targetPath
).
Either one will default to the current directory if not specified.
Next, you specify the input file extension
(inExtension
)
and the output file extension
(outExtension
).
These default to DEFAULT_IN_EXTENSION
and
DEFAULT_OUT_EXTENSION
, respectively, if not specified.
parameter | default |
---|---|
sourcePath | . |
targetPath | . |
inExtension | DEFAULT_IN_EXTENSION |
outExtension | DEFAULT_OUT_EXTENSION |
In order to transform each file, you must provide an XSL file
that specifies how this is done (xslName
).
If you specify an absolute path, that file will be used at any
subdirectory depth.
If, however, you specify just a base file name (such as "stuff.xsl"),
then a couple things happen.
The program will look for such a file within each subdirectory that
is processed. If it doesn't find one, then it will look for the same
name in your root directory (sourcePath
).
This provides a flexible mechanism whereby you may specify one global
XSL specification, but you may override it in specific instances
that require it.
If, on the other hand, you do not provide a root XSL file,
then only those subdirectories containing a local XSL file
will be transformed.
In order to generate a table of contents for any given subdirectory,
you must provide an XML file template--this is just an XML file like
any other among your files, with the exception that it has
one or more placeholders for referencing lists of other files.
The template file name consists of an underscore,
then the contentsBaseName
parameter,
a dot, and the inExtension
parameter (e.g. _myDir.xml).
The filled-in template is stored in your input tree.
The name of it depends on whether you specify to store the file
in the same directory or the parent; if in the same directory,
the name will be the same base name-dot-inExtension, less the underscore.
(See "Generating a Contents File" for more details.)
[Advanced use]
If you wish to validate your transformation XSL file (xslName
)
then you must provide an XML Schema file with which to do so
(xslSchema
).
This is not often necessary since any errors will be promptly
reported when you attempt to use the XSL file in any transformation.
So this option (controlled by the validateXslBySchema
switch)
is perhaps more of an academic exercise.
(Note that this validates only the transformation XSL
not the contents-generating XSL.)
parameter | default |
---|---|
xslName | DEFAULT_XSL_NAME |
xslSchema | null |
Finally, you need to provide a few details on how the program should operate.
sourcePath
to process using dirList
.
This string should be a comma-separated or semicolon-separated list
of subdirectory names (relative to sourcePath
)
as in "sub1, sub1/subsub1, sub2, sub3".
If omitted, only sourcePath
itself (with no subdirectories)
will be processed.
enable
flag. Set if to false and no actual work
will be done, but it will report what it would do.
If omitted, the default is true.
processAll
flag. If omitted, the default is false.
GENERATOR_ID
) to your XSL that you may use or not as you choose.
To access the parameter, you would include this line in your XSL:
<xsl:param name="generator"/>Then to use it, you might use something like this, if you are creating HTML or XHTML:
<meta content="{$generator}" name="generator"/>See the section on generating contents files for one application of this.
<xsl:template name="buildpath"> <xsl:param name="level"></xsl:param> <xsl:if test="$level > 0">../<xsl:call-template name="buildpath"> <xsl:with-param name="level"> <xsl:value-of select="$level - 1"/> </xsl:with-param> </xsl:call-template> </xsl:if> </xsl:template>The
startDepth
configuration option (default=0) allows you
to specify an offset between your root (sourcePath
)
and the location of your include files to be referenced.
So if your include files are in the directory above where you HTML files
are, you could specify a startDepth of 1, directing the above XSL
routine to add an extra ".." in the path.
Note that XmlTransform passes the depth of the subdirectory it is processing
to the XSL, as an offset to this starting depth.
The XSL parameter name is level
.
Hence you would use a call-template
element in XSL,
passing the $level
parameter as its argument to create
your path string in the above example.
Note that this $level
parameter has other uses as well;
you may invoke different templates within your XSL depending on your
current level, for instance.
xslParmList
configuration option.
This string should be a comma-separated or semicolon-separated list
of parameter settings.
Each parameter setting must have the form name:value
and obviously may contain neither commas nor semi-colons.
This is useful for passing in things like a copyright date or
a release version as in
"--xslParmList=copyright:2006,relVersion:v1.2
".
parameter | default |
---|---|
dirList | . |
enable | true |
processAll | false |
startDepth | 0 |
xslParmList | null |
XmlTransform, as you're aware, operates on a tree of directories and files, and therefore has an implicit notion of first/final and next/previous within a directory, and up/down between directories. As such, during the XSL transformation of each file, XmlTransform provides parameters from which you could create navigational controls. The parameters are:
If your XSL file creates HTML, for example, you could create a set of navigation buttons or links for next and previous entries with something like this:
<xsl:param name="prevLink"/> <xsl:param name="nextLink"/> <xsl:choose> <xsl:when test="$prevLink"> <a><xsl:attribute name="href"> <xsl:value-of select="$prevLink"/> </xsl:attribute>Previous file</a> </xsl:when> <xsl:otherwise><span>(no previous file)</span> </xsl:otherwise> </xsl:choose> <xsl:choose> <xsl:when test="$nextLink"> <a><xsl:attribute name="href"> <xsl:value-of select="$nextLink"/> </xsl:attribute>Next file</a> </xsl:when> <xsl:otherwise><span>(no next file)</span> </xsl:otherwise> </xsl:choose>Then each file within a directory would have a link to the one that follows it (in alphabetical order) and the one that precedes it. The only exception is at the endpoints--the first and final files in the directory--where XmlTransform will pass an empty string. Your XSL code must account for this as in the above example; the <xsl:when> element above will catch just the non-empty strings. Similarly, the
upLink
will be empty when already at
the top-level directory.
The last two navigational controls,
firstLink
and finalLink
,
will only be empty when the directory contains but a single file.
Finally, a note with respect to contents generation:
if you choose to create a contents file and have it generated
within the same directory (contentsToParent
is false),
that file will be considered the first file in the directory,
notwithstanding alphabetical order.
That way using the firstLink
will immediately
take the user to this contents file.
If the contents file is generated in the parent,
it receives no such special treatment.
As mentioned briefly above, you may automatically generate a table of contents for each directory. The mechanism to do this provides complete flexibility over constructing a table of contents using the power of XSL, but--alas--that means that you must be conversant with XSL. For those who are not and are interested in the challenge, I do walk you through each step below, but you may need to consult an XSL reference as you go! (Note that the generateContents configuration option must be turned on otherwise contents generation will not occur.)
XmlTransform first generates an XML file--in your
input tree--from your contents-template XML file.
The template file name is just
_contentsBaseName.inExtension (note the leading underscore).
(Both contentsBaseName and inExtension
are configuration options.)
This file is then transformed along with all your other input files
into your output tree so that it may benefit from the same transformations
and provide the same output file structure.
You have a choice where in your input tree the contents file should be
stored, controlled by
the contentsToParent
configuration option.
It will be stored either in the current subdirectory (false),
or in the current subdirectory's parent (true).
The choice depends on your needs and your application.
Let's say you have a subdirectory "fauna" containing
birds.xml, mammals.xml, invertebrates.xml, etc.
If you create the contents file in the same directory, it will be called
contentsBaseName.inExtension (without the leading underscore).
For example, if contentsBaseName
is "contents"
and inExtension
is "xml", then the assembled contents file
name is "contents.xml" (and the corresponding template is "_contents.xml").
If, however, you choose to generate the contents in the parent directory,
then the contents file will be named directory.inExtension.
Using our "fauna" example, that would simply result in "fauna.xml".
This allows you to have multiple subdirectories, all
sending their contents to a common parent, each with a unique name.
parameter | default |
---|---|
contentsBaseName | DEFAULT_CONTENTS_BASENAME |
contentsToParent | true |
After naming the contents template,
the next salient point is what goes inside.
The contents template is just like every other file in your input
tree in all respects but two.
First, inside this template you include a place holder where XmlTransform
will insert a list of files in the current directory.
The place holder is an XML element with no content
(though it may have attributes, discussed shortly).
The name of the place holder element is a configuration option
(groupPlaceHolder
) so you may specify any element you like.
Say, for example, you use <files/>
.
When you execute XmlTransform, that element in your template will be
expanded with a list of <file>
children,
enumerating the files in the current directory.
The second vital element required in your template tells XmlTransform
in subsequent executions which files it is appropriate to overwrite
with regenerated contents.
The generator element
(named by the generatorNode
configuration option)
should be empty; it will then be replaced with
a standard HTML <meta>
of the form:
<meta content="generator-id" name="generator"/>The generator-id string was mentioned earlier (
GENERATOR_ID
).
Note that this must be converted in your XSL file, using a template such as:
<xsl:param name="generator"/> <xsl:template match="cc:generator"> <meta content="{$generator}" name="generator"/> </xsl:template>
This element must show up in the generated file; otherwise,
the next time you wish to regenerate the same contents file,
XmlTransform will not be able to tell that it generated the file,
and will thus not overwrite it.
Finally, regardless of whether you are transforming to HTML or not,
the generator element must be the standard HTML <meta>
element, since that is what XmlTransform looks for.
parameter | default |
---|---|
groupPlaceHolder | DEFAULT_GROUP_PLACEHOLDER |
generatorNode | DEFAULT_GENERATOR_NODE |
Now let's examine the <files/>
element in more detail.
If a given directory has the 4 files shown below, and you include a
<files/>
element in your
_index.xml contents template,
here's an example of an XML fragment created automatically in your
index.xml contents file,
<files> <file> <relfile>accessibility.html</relfile> <absfile>/usr/ms/webRules/accessibility.xml</absfile> </file> <file> <relfile>antispam.html</relfile> <absfile>/usr/ms/webRules/antispam.xml</absfile> </file> <file> <relfile>browser.html</relfile> <absfile>/usr/ms/webRules/browser.xml</absfile> </file> <file> <relfile>style.html</relfile> <absfile>/usr/ms/webRules/style.xml</absfile> </file> . . . etcetera... </files>You'll observe that the XML is quite simple. Under the
<files>
root element is a list
of individual <file>
elements.
Each <file> element contains a relative file path to the file
in your output tree (for linking),
and an absolute file path to the file in your input tree
(for data extraction).
(Recall that a contents file may be generated within a directory,
or in its parent.
The above XML is from the former case, since the <relfile>
elements show just a plain base filename.
If the contents file was in the parent directory, and the current
directory is named "webstuff", then the <relfile>
elements would appear as "webstuff/accessibility.html", etc.)
Your job, then, is to add transformation code to your XSL file that
converts XML in the form shown into appropriate output in your output file.
The <files>
element above is stored inside your
contents file in the input tree.
Your XSL must then handle conversion of this data
to an appropriate display, which is stored as the contents file
in the output tree.
Using the same file data in the previous example,
here's example output when converted to HTML/XHTML:
<li> <a href="accessibility.html">Accessibility</a>: Don't discriminate on physical ability when you design web pages. </li> <li> <a href="antispam.html">Anti-Spam</a>: Design defensively so you do not make it easy for spammers to enlist you to help them. </li> <li> <a href="browser.html">Browser Compliance</a>: Design economically by considering the technology of your audience. </li> <li> <a href="style.html">Issues of Style</a>: Style is the grey that is leftover after you've considered all the black and white of the other topics presented here. </li>Notice that the relfile elements have been directly inserted in the HTML text, but the <absfile> elements have been used by the XSL translation to lookup information from within each file to provide a description of that very file.
Let's examine where the components of an entry come from in that piece of HTML. The general form is
<li><a href="relfile"> file title</a>:file description</li>The relfile is the same relfile discussed above--the relative file reference to your output tree.
The file title comes from--in this example--the <cc:title>
element contained by the <cc:head> element contained
by the root <cc:cleanCodeDoc> element, which is in
the file specified by absfile.
Similarly, the file description comes from the content attribute
of the HTML <meta>
element
whose name attribute is description.
Here's the XSL used to extract that information:
<xsl:template match="cc:files"> <xsl:apply-templates/> </xsl:template> <xsl:template select="file"> <xsl:variable name="extNode" select="document(absfile)/cc:cleanCodeDoc"/> <li> <a href="{relfile}"> <xsl:value-of select="$extNode/cc:head/cc:title"/> </a>: <xsl:value-of select="$extNode/cc:head/xhtml:meta[@name='description']/@content"/> </li> </xsl:template>To see a specific example, here's a fragment of the accessibility.xml file, showing where the file title and file description came from for one entry. The extracted information has been emboldened in the fragment below.
<cc:cleanCodeDoc> <cc:head> <cc:title>CleanCode::Web::Accessibility</cc:title> <meta name="description" content="Don't discriminate on physical ability when you design web pages." /> . . .
This step is optional, but it makes the contents generator
even more flexible.
Hearken back to the discussion of the <files/>
element
above in steps 2 and 3.
With the implementation described therein, we used a single
<files/>
element in the template, and references to
all the files in the current directory were stored there.
But you may also subdivide the current directory into groups of
similar files and place each group in your contents file
using multiple <files/>
elements.
Add the group attribute with a designation of your own choosing.
Say, for example, you have a directory containing chocolate recipes
and you wish to group these by variety.
In your contents template, before each list of files, you want to
have a paragraph or two of introduction, followed by the file list.
So your template might look, in part, like this
(for brevity, I'll just use titles to introduce each list here):
<h2>BitterSweet</h2> <cc:files group="bs"/> <h2>Semi-Sweet</h2> <cc:files group="ss"/> <h2>Milk</h2> <cc:files group="milk"/> <h2>Miscellaneous</h2> <cc:files group=""/>For each
<files/>
element, the files belonging
to that group will be inserted.
Note particularly the last one (it does not have to be last, by the way,
it just tends to naturally want to be last).
There you'll see an empty group attribute; omitting the attribute
entirely is equivalent, and in fact, is the very element we
started with in the simpler, non-grouped discussion earlier.
Any files that do not have a group designation will be
stored in this group without a group designator.
So we have instrumented the contents template; now we must also instrument each file to assign it to a group. You create an element in each of your files that specifies what group the file is in. This element may have any name you choose; let's use group for purposes of discussion. So inside, for example, the file recipe23.xml that contains a bittersweet recipe, you would include:
<group>bs</group>since bs is what we used as the bittersweet group designator in the contents template.
The last step is just to connect the two ends--that is, to tell XmlTransform
where in each file to look for a group designator that it has identified
in the template file.
To do this, you define the configuration option groupIdXpath
,
which specifies an Xpath expression to traverse your XML
to the appropriate element.
Let's peek at just a few more lines in one of your XML files:
<doc> <head> <group>bs</group> . . . </head> . . . </doc>Given the above structure, you could use this definition:
--groupIdXpath=/doc/head/groupto point to the group designation.
parameter | default |
---|---|
groupIdXpath | DEFAULT_GROUP_ID_XPATH |
XmlTransform has two main purposes: generating and transforming files from one form to another, discussed above, along with validating your files for correctness (both well-formed and valid). Checking for well-formedness is done intrinsically with any processing of your XML files. Doing formal XML Schema validation, however, is a separate option that you may turn on and off, which we discuss here.
Under "Basic Steps" above you'll observe the two switches for independently validating your input tree XML and your output tree XML. (There is also a third, more for experimentation, to validate your XSL files.) Once turned on, XmlTransform will validate your selection against an XML Schema (DTD validation is not supported). This may be done by specifying the input Schema either within each file, or you may specify a global Schema file. You may specify up to two global Schema files, one for your input tree of XML files, and one for your output tree. If you specify a value for the global input tree Schema, then all files in your input tree will be validated against that Schema, regardless of whether any individual file specifies its own Schema file. Note that a Schema specified inside an XML file may be either a local file reference or a URL reference. However, a global Schema file may only be a local file reference.
parameter | default |
---|---|
inputSchemaSource | null |
outputSchemaSource | null |
Earlier the --help
command-line option was mentioned,
which displays the list of configuration options
accessible from XmlTransform.
Each of the settings used by XmlTransform was explained above as well.
But if you run XmlTransform with --help, you'll see several settings
that were not mentioned. What, you may wonder, is going on?
The remaining settings that you find are not used by XmlTransform
(at least not directly) but rather by
the CleanCode Diagnostic
module.
Those settings are not documented above to avoid duplication,
and, if they change, to prevent the documentation from becoming inaccurate.
But more than that, the list of options displayed with --help are not
even generated from XmlTransform; that list is requested from
the Diagnostic module, also.
Carrying this to the next level, let's say you wish to use XmlTransform
programmatically, rather than from the command-line.
You can use the same methodology of asking XmlTransform for its list
of settings, rather than re-documenting them yourself.
You'll want to review the ParamMap
and InputOptions
classes for further details,
but essentially you need to create a ParamMap object for your own
settings, then use its putAll
method to add in the
settings for each library module you use. In the case of XmlTransform,
it uses only one module which uses InputOptions, so it includes just this
one line of code --
paramMap.putAll(Diagnostic.paramMap);-- to add the Diagnostic modules parameters to its own.
XmlTransform uses the
CleanCode Diagnostic
library module
to provide flexible diagnostic output options.
Six diagnostic levels are used:
diagnostic level | purpose |
---|---|
XMLTRANSFORM_A_DIAG | final statistics |
XMLTRANSFORM_B_DIAG | trace |
XMLTRANSFORM_C_DIAG | processed entries |
XMLTRANSFORM_D_DIAG | processing notes |
XMLTRANSFORM_E_DIAG | validation info |
XMLTRANSFORM_F_DIAG | contents group/file details |
If you use the --debug
switch, all of the above
diagnostics are activated.
This will provide a large amount of diagnostic information;
you may want to use individual diagnostics selectively by either
setting the ones you want to match the DIAG_LEVEL mask from the
Diagnostic module, or using the --diagList
shorthand.
If you say, for example, --diagList=ACF
, that will enable
XMLTRANSFORM_A_DIAG, XMLTRANSFORM_C_DIAG, and XMLTRANSFORM_F_DIAG.
Another option to be aware of is that if you use --noEnable
(or --enable=false
)
and have used neither --debug
nor --diagList
,
then the "C" and "D" diagnostics are activated automatically.
After all, why would you want to run the program with all actions
disabled if not to have it report what it would do if actions were enabled?
Diagnostic
,
ParamMap
,
InputOptions
Field Summary | |
---|---|
static String |
DEFAULT_CONTENTS_BASENAME
Default base prefix for contents file. |
static String |
DEFAULT_GENERATOR_NODE
Default node name for placing file list in content file. |
static String |
DEFAULT_GROUP_ID_XPATH
Default simple Xpath expression to locate group identifier in each source file. |
static String |
DEFAULT_GROUP_PLACEHOLDER
Default node name for placing file list in content file. |
static String |
DEFAULT_IN_EXTENSION
Default input extension. |
static String |
DEFAULT_OUT_EXTENSION
Default output extension. |
static String |
DEFAULT_XSL_NAME
Default base XSL file name for translating from input to output. |
static String |
GENERATOR_ID
Combination of name and version of this generator, for use in XSL files to identify generated files. |
static String |
JAXP_SCHEMA_LANGUAGE
JAXP 1.2 schema language value. |
static String |
JAXP_SCHEMA_SOURCE
JAXP 1.2 schema source value. |
static ParamMap |
paramMap
Parameter map for this class. |
static String |
VERSION
Current version of this class. |
static String |
W3C_XML_SCHEMA
JAXP 1.2 xml schema value. |
Constructor Summary | |
---|---|
XmlTransform(InputOptions settings)
Creates an XmlTransform object with the specified
configuration options. |
Method Summary | |
---|---|
static void |
main(String[] args)
Main program for standalone mode. |
void |
process()
Performs all the processing as specified by the configuration options. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String VERSION
public static final String GENERATOR_ID
public static final String JAXP_SCHEMA_LANGUAGE
public static final String W3C_XML_SCHEMA
public static final String JAXP_SCHEMA_SOURCE
public static final String DEFAULT_IN_EXTENSION
public static final String DEFAULT_OUT_EXTENSION
public static final String DEFAULT_XSL_NAME
public static final String DEFAULT_CONTENTS_BASENAME
public static final String DEFAULT_GROUP_PLACEHOLDER
public static final String DEFAULT_GENERATOR_NODE
public static final String DEFAULT_GROUP_ID_XPATH
public static ParamMap paramMap
Constructor Detail |
---|
public XmlTransform(InputOptions settings) throws ParserConfigurationException
XmlTransform
object with the specified
configuration options.
See the text above for descriptions of all available options.
settings
- an InputOptions object containing
the configuration options.
ParserConfigurationException
- if any problems
with the JAXP parsers.Method Detail |
---|
public void process()
public static void main(String[] args) throws ParserConfigurationException
XmlTransform options-or-option-file
XmlTransform --help
args
- list of configuration options
ParserConfigurationException
- if any problems
with the JAXP parsers.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
CleanCode Java Libraries | Copyright © 2001-2012 Michael Sorens - Revised 2012.12.10 |