Home

Wiley Beginning XSLT and XPath: Transforming XML Documents and Data

image

Contents

1. Williams c01 tex V3 07 31 2009 2 53pm Page 2 Chapter 1 First Steps with XSLT If you haven t already done so download the source code for this book from this book s web page at www wrox com You ll be using the source files from now on in the examples that follow When you ve unzipped the download open the folder for Chapter 1 and locate the file xml_stylesheet xml Listing 1 1 shows a pared down version of the source document describing the lt xsl stylesheet gt element Listing 1 1 lt xml version 1 0 encoding UTF 8 gt lt reference gt lt body gt lt title gt xsl stylesheet lt title gt lt purpose gt lt p gt The root element of a stylesheet lt p gt lt purpose gt lt usage gt lt p gt The lt element gt stylesheet lt element gt is always the root element even if a stylesheet is included in or imported into another It must have a lt attr gt version lt attr gt attribute indicating the version of XSLT that the stylesheet requires lt p gt lt p gt For this version of XSLT the value should normally be 2 0 For a stylesheet designed to execute under either XSLT 1 0 or XSLT 2 0 create a core module for each version number then use lt element gt xsl include lt element gt or lt element gt xsl import lt element gt to incorporate common code which should specify lt code gt version 2 0 lt code gt if it uses XSLT 2 0 features or lt code gt version 1 0 lt cod
2. lt xsl template match code gt lt xsl copy gt lt xsl apply templates gt lt xsl copy gt lt xsl template gt Here is the schema definition lt xs element name copy substitutionGroup xsl instruction gt lt xs complexType gt lt xs complexContent mixed true gt lt xs extension base xsl sequence constructor gt lt xs attribute name copy namespaces type xsl yes or no default yes gt lt xs attribute name inherit namespaces type xsl yes or no default yes gt lt xs attribute name use attribute sets type xsl QNames default gt lt xs attribute name type type xsl QName gt lt xs attribute name validation type xsl validation type gt lt xs extension gt lt xs complexContent gt lt xs complexType gt lt xs element gt There are two copy instructions in XSLT The lt xs1 copy gt instruction is a shallow copy and copies only the context node but nothing under it You specify the output in the sequence constructor inside lt xsl copy gt This instruction is most useful when copying element nodes Williams c01 tex V3 07 31 2009 Chapter 1 First Steps with XSLT It causes the current XML node in the source document to be copied to the output The actual effect depends on whether the node is an element an attribute or a text node For an element the start and end element tags are copied the attributes character content and child elements are copied
3. lt dce description gt lt dc format gt lt xsl value of lt dc format gt select id gt select link href gt select content xml lang gt select title gt select updated gt select author name gt select summary gt select content type gt lt xsl for each select category gt lt dc subject gt lt xsl value lt dc subject gt lt xsl for each gt lt item gt lt xsl template gt lt xsl stylesheet gt RSS 1 0 Results To run the transform add a scenario in the Oxygen IDE using atom xml as the source and rss_feed xs1 as the stylesheet of select label gt Listing 1 7 shows a matching fragment of the transformed RSS 1 0 feed ncoding UTF 8 gt Williams c01 tex V3 07 31 2009 2 53pm Page 23 Chapter 1 First Steps with XSLT lt link gt http feeds oreilly com oreilly xml lt link gt lt items gt lt rdf Seq gt lt rdf li rdf resource tag broadcast oreilly com 2008 53 34667 gt lt rdf li rdf resource tag broadcast oreilly com 2008 53 34679 gt lt rdf li rdf resource tag broadcast oreilly com 2008 53 34620 gt lt rdf li rdf resource tag broadcast oreilly com 2008 53 34524 gt lt rdf li rdf resource tag broadcast oreilly com 2008 53 34508 gt lt rdf Seq gt lt items gt lt item rdf about tag broadcast oreilly com 2008 53 34667 gt lt link gt http feeds oreilly co
4. The remaining Saxon CLI options are extensive and quite powerful If you are interested in pursuing the CLI approach you should review the Saxon documentation at your leisure before using them Transforming XML Data to XML The next example illustrates how simple it can be to transform content from one XML format to another A common transform problem is that two similar schemas will use different names for identical content values Another problem is that in one case an attribute is used for a value while another uses an element for the same purpose The next stylesheet uses two common metadata vocabularies that express information in roughly the same manner One is the Atom 1 0 format increasingly used for blogs and news feeds the other is RSS 1 0 which uses a combination of the Dublin Core Metadata Initiative vocabulary and RDF XML There is also a version 2 0 branch of RSS Although it is often assumed that RSS 2 0 supersedes RSS 1 0 it doesn t and the versions are incompatible in several ways RSS 2 0 is also in widespread use but we won t be using it in this chapter If you want to explore the structure of RSS 2 0 go to http cyber law harvard edu rss rss html Of course you wouldn t bother serializing either of these feeds to XML if the data were in a SQL database or an RDF triple store However if you are aggregating the data from feed URLs or you have been provided with source data in XML you won t have much c
5. File 2 In the dialog that appears select 1 0 as the Stylesheet version Williams c01 tex V3 07 31 2009 2 53pm Page 4 Chapter 1 First Steps with XSLT 3 Enter browser xsl as the filename and click Finish The new file should open with the following contents lt xml version 1 0 encoding UTF 8 gt lt xsl stylesheet xmlns xsl http www w3 org 1999 XSL Transform version 1 0 gt lt xsl stylesheet gt Because this stylesheet is an XML document it must begin with an XML declaration specifying the version number and the encoding The root element in a stylesheet is lt xs1 stylesheet gt though the synonym lt xsl transform gt may also be used You must always specify the XSLT namespace and it is important to set the version attribute correctly to match the type of processing required After you ve declared the namespace all the XSLT element names require the namespace prefix which is xsl by convention The prefix also makes it clear which element is referenced if other namespaces are in use Browsers that I have used will not complain about the version number and many version 1 0 features are unchanged in version 2 0 In any case it is good practice to document your intentions You can now process the sample Open the xs1_stylesheet xml file from a browser using the File gt Open menu command You should see something like the output shown in Listing 1 2 Listing 1 2 xsl styleshee
6. or XHTML the output will be in XML format It is also possible to add user defined methods Williams c01 tex V3 07 31 2009 2 53pm Page 5 Chapter 1 First Steps with XSLT You can define the type of output in the declaration lt xs1 output gt You saw the schema definition in this book s introduction but here it is as a reminder The attribute list is quite extensive but for now I d like to focus on just a few attributes lt xs element name output substitutionGroup xsl declaration gt lt xs complexType mixed true gt lt xs complexContent mixed true gt lt xs extension base xsl generic element type gt lt xs attribute name name type xsl QName gt lt xs attribute name method type xsl method gt lt xs attribute name byte order mark type xsl yes or no gt lt xs attribute name cdata section elements type xsl QNames gt lt xs attribute name doctype public type xs string gt lt xs attribute name doctype system type xs string gt lt xs attribute name encoding type xs string gt lt xs attribute name escape uri attributes type xsl yes or no gt lt xs attribute name include content type type xsl yes or no gt lt xs attribute name indent type xsl yes or no gt lt xs attribute name media type type xs string gt lt xs attribute name normalization form type xs NMTOKEN gt lt xs attribute name omit xml declaration type xsl yes or no gt
7. root element of a stylesheet lt p gt lt h2 gt Usage lt h2 gt lt p gt The lt code gt stylesheet lt code gt is always the root element even if a stylesheet is included in or imported into another It must have a lt code gt version lt code gt attribute indicating the version of XSLT that the stylesheet requires lt p gt lt p gt For this version of XSLT the value should normally be lt code gt 2 0 lt code gt For a stylesheet designed to execute under either XSLT 1 0 or XSLT 2 0 create a core module for each version number then use lt code gt xsl include lt code gt or lt code gt xsl import lt code gt to incorporate common code which should specify lt code gt version 2 0 lt code gt if it uses XSLT 2 0 features or lt code gt version 1 0 lt code gt otherwise lt p gt lt p gt The lt code gt xsl transform lt code gt element is allowed as a synonym lt p gt lt p gt The namespace declaration lt code gt xmlns xsl http www w3 org 1999 XSL Transform lt code gt by convention uses the prefix lt code gt xsl lt code gt lt p gt lt p gt An element occurring as a child of the lt code gt stylesheet lt code gt element is called a declaration These top level elements are all optional and may occur zero or more times lt p gt lt body gt lt html gt Using the Command Line Another way to invoke a stylesheet processor is to use a command line interface The specifics of the interface will
8. 09 2 53pm Page 20 Still in the main template you need to loop through the entries again to create a series of complete lt item gt elements in the output lt xsl for each select entry gt lt xsl apply templates select gt lt xs1 for each gt In a template matching lt entry gt elements you can handle the translation from Atom to Dublin Core Most of the translations are straightforward Dublin Core doesn t have an equivalent of the lt atom updated gt element so you use that value in lt dc date gt The language can be obtained from the lt content gt element s xml lang attribute Another point to note is that there can be multiple categories in entries just as there can be multiple lt dc subject gt elements Therefore you need to select the label attribute on the lt category gt element inside another lt xs1 for each gt loop that creates the subject elements In neither of these two schemas does the order of elements matter or the number of occurrences so you can simply let the source sequence drive the process lt xsl template match entry gt lt item gt lt xsl attribute name rdf about gt lt xsl value of select id gt lt xsl attribute gt lt link gt lt xsl value of select link href gt lt link gt lt dc identifier gt lt xsl value of select id gt lt de identifier gt lt dc language gt lt xsl value of select content xml lang gt lt dc language gt l
9. 1 gt lt xsl value of select gt lt h1 gt lt xsl template gt lt xsl template match purpose gt lt h2 gt Purpose lt h2 gt lt xsl apply templates select p gt lt xsl template gt lt xsl template match usage gt lt h2 gt Usage lt h2 gt lt xsl apply templates select p gt lt xsl template Williams c01 tex V3 07 31 2009 2 53pm Page 9 Chapter 1 First Steps with XSLT lt xsl template match p gt lt p gt lt xsl apply templates gt lt p gt lt xsl template gt In the next template you match the XML source lt attr gt attribute and lt element gt names using the XPath union operator and output a containing lt code gt literal result element The union operator performs a logical OR matching either of the source element names lt xsl template match attr element gt lt code gt lt xsl value of select gt lt code gt lt xsl template gt The output for an element name will look like this lt p gt An element occurring as a child of the lt code gt xsl stylesheet lt code gt element is called a declaration These top level elements are all optional and may occur zero or more times lt p gt Copying Content When content in both the source and the output should be identical you can simply copy the source nodes to the result With the lt xs1 copy gt instruction you copy the source lt code gt element name and its content to the output
10. Williams c01 tex V3 07 31 2009 2 53pm Page 1 First Steps with XSLT In this chapter you will get started with XSLT by developing two stylesheets In the first stylesheet you ll see how to generate an XHTML web page from an XML document The second stylesheet illustrates how to transform one XML data format to another in this case from the Atom 1 0 syndication format to RSS 1 0 You ll learn about Q Key XSLT elements and structure Q Built in template rules Q XPath expressions for matching and selection QO Different ways to invoke a stylesheet processor Transforming an XML Document to a Web Page Probably the most common application of XSLT is to generate one or more pages of a website from an XML source of some kind For example you might want to split a large file into chapters each with a separate page or display a news feed There are a couple of ways to accomplish this You might want to rely on a browser s client side processor to transform the content alternatively you could generate static content for a server to render Let s start with an example that relies on a browser s built in processor It is drawn from the case study that you will work on later in this book The case study in Chapter 11 illustrates the production of a website from a set of XML source doc uments that describe each of the XSLT elements and functions The same information was used to produce the XSLT Quick Reference in Appendix C
11. XSLT lt summary gt Can we define a family of markup languages that used the Unicode properties and which could accept a fair imitation of XML and produce a SAX like event stream lt summary gt lt author gt lt name gt Rick Jelliffe lt name gt lt author gt lt category term xml label xml scheme http www sixapart com ns types tag gt lt content type html xml lang en xml base http broadcast oreilly com gt Can we define a family of markup languages that used the Unicode properties and which could accept a fair imitation of XML and produce a SAX like event stream amp lt img src http feeds oreilly com r oreilly xml 4 487372046 height 1 width 1 amp gt lt content gt lt entry gt lt feed gt This feed will be well out of date when you read this To get a current version go to http feeds oreilly com oreilly xml copy the source and save it as a replacement Alternatively you can load the data directly from the feed site using the URL containing the feed source Developing the Stylesheet As with the XML to HTML transform we ll take the development one step at a time The approach is essentially the same with XML as the target rather than HTML The vocabularies are of course different but the matching process will work similarly It is not too important at present to absorb the details of the Atom and RSS 1 0 formats but if you would like to do so here are the relevant URLs Atom 1 0 www at
12. be copied to the output This gives you considerable freedom to construct output from any source in your target XML vocabulary For this XHTML page you will start with something like the following skeleton lt xsl template match gt lt html gt lt head gt lt title gt lt xsl value of select reference body title gt lt title gt lt head gt lt body gt lt p gt The body goes here lt p gt lt body gt lt html gt lt xsl template gt The preceding code will render the following output lt html gt lt head gt lt meta http equiv Content Type content text html charset UTF 8 gt lt title gt xsl stylesheet lt title gt lt head gt lt body gt lt p gt The body goes here lt p gt lt body gt lt html gt Williams c01 tex V3 07 31 2009 2 53pm Page 8 Chapter 1 First Steps with XSLT Selecting Source Content To output source content you select from the element to be transformed using the lt xs1 value of gt instruction The select attribute defines the XPath expression to use The next code snippet shows how lt h1 class section gt lt xsl value of select title gt lt h1 gt As you learned in the introduction to this book the lt xs1 value of gt element is a sequence constructor which is a series of XSLT instructions This is the schema declaration lt xs element name value of substitutionGroup xsl instruction gt lt xs complexType gt lt xs c
13. c attribute values These will result in the processor generating correct declarations in the output before the lt htm1 gt element lt xml version 1 0 encoding UTF 8 gt lt xsl stylesheet xmlns xsl http www w3 org 1999 XSL Transform Williams c01 tex V3 07 31 2009 2 53pm Page 6 Chapter 1 First Steps with XSLT version 1 0 gt lt xsl output method xml1 encoding UTF 8 doctype system http www w3 org TR xhtml1 DTD xhtml1 transitional dtd doctype public W3C DTD XHTML 1 0 Transitional EN gt lt xsl stylesheet gt An XSLT 2 0 stylesheetstylesheet may contain multiple lt xs1 output gt declarations and may include or import stylesheet modules that also contain lt xs1 output gt declarations This enables you to use one or more stylesheets to output results using different methods So you might for instance output both a CSV file and a web page in one pass If you use multiple declarations in this way the name attribute must be specified on each lt xsl output gt element to identify it These names should match a set of format attribute values on lt xsl result document gt instructions which I discuss in Chapter 7 in the stylesheet The following snippet briefly illustrates how this might work lt xsl output name web method xhtml encoding UTF 8 gt lt xsl output name csv method text gt lt xsl template match gt lt xsl result document format web gt l
14. ckets You ll learn more about predicates in the next chapter Change the code to look like this lt channel gt lt xsl variable name feedurl select feed link rel self href gt lt xsl attribute name rdf about gt lt xsl value of select S feedurl gt lt xsl attribute gt lt title gt lt xsl value of select feed title gt lt title gt lt link gt lt xsl value of select feedurl gt lt link gt lt channel gt Item Listing To create an item listing to act as a table of contents enter the literal result element lt items gt and an RDF sequence element rdf Seq The sequence constructor lt xs1 for each gt will take the processor to all of the matching nodes one by one changing the context node as it goes You will learn more about lt xsl1 for each gt in Chapter 4 By selecting with the XPath expression feed entry using you operate on all the entry elements in the feed For each entry you add an RDF list item rdf 1i and set its rdf resource attribute value from the lt link gt element in each individual entry lt items gt lt rdf Seq gt 19 Chapter 1 First Steps with XSLT lt xsl for each select feed entry gt lt rdf li gt lt xsl attribute name rdf resource gt lt xsl value of select link href gt lt xsl attribute gt lt rdf 1li gt lt xs1 for each gt lt rdf Seq gt lt items gt Entry Elements Williams c01 tex V3 07 31 20
15. dy gt lt xsl apply templates select reference body gt lt body gt lt html gt lt xsl template gt lt xsl template match title gt lt h1 gt lt xsl value of select gt lt h1 gt lt xsl template gt lt xsl template match purpose gt lt h2 gt Purpose lt h2 gt lt xsl apply templates select p gt lt xsl template gt 10 2 53pm Page 10 Williams c01 tex V3 07 31 2009 2 53pm Page 11 Chapter 1 First Steps with XSLT lt xsl template match usage gt lt h2 gt Usage lt h2 gt lt xsl apply templates select p gt lt xsl template gt lt xsl template match p gt lt p gt lt xsl apply templates gt lt p gt lt xsl template gt lt xsl template match attr element gt lt code gt lt xsl value of select gt lt code gt lt xsl template gt lt xsl template match code gt lt xsl copy gt lt xsl apply templates gt lt xsl copy gt lt xsl template gt lt xsl stylesheet gt Transforming Locally This time you ll run the stylesheet processor locally rather than on the browser The setup suggestions that follow make further use the Oxygen IDE You ll also look briefly at invoking the Java command line interface for the Saxon processor Try It Out Configuring a Transformation You can use the Oxygen IDE to set up a transformation providing a range of configuration values Essentially you provide input and output value
16. e gt otherwise lt p gt lt p gt The lt element gt xsl transform lt element gt element is allowed as a synonym lt p gt lt p gt The namespace declaration lt code gt xmlns xsl http www w3 org 1999 XSL Transform lt code gt by convention uses the prefix lt code gt xsl lt code gt lt p gt lt p gt An element occurring as a child of the lt element gt stylesheet lt element gt element is called a declaration These top level elements are all optional and may occur zero or more times lt p gt lt usage gt lt body gt lt reference gt In the quick reference documents I use an XML grammar based on the Darwin Information Typing Architecture DITA reference content model DITA is finding increasing support among the larger pub lishers of technical documentation It differs considerably from the longer established DocBook format using a more modular approach covering the concept task reference pattern often found in software help systems You can look ahead to see details of the schema in Chapter 11 In addition to markup like lt body gt lt p gt and lt code gt which you ll recognize from XHTML note that the root element in this example is lt reference gt To keep the example simple only the sections on lt purpose gt and lt usage gt are included as are the inline lt attr gt attribute and lt element gt names In later chapters TIl introduce more elements from the reference vocabulary Using a Browser To
17. element gt To add the rdf about attribute to the lt channel gt element enter the lt xsl attribute gt element right after the lt channel gt element and use the lt xs1 value of gt instruction to obtain the link URL http news oreilly com from the href attribute on the source feed s lt link gt element 18 Williams c01 tex V3 07 31 2009 2 53pm Page 19 Chapter 1 First Steps with XSLT lt channel gt lt xsl attribute name rdf about gt lt xsl value of gt feed link href lt xsl value of gt lt xsl attribute gt lt channel gt Completing the Feed Elements Adding a title is straightforward again using the element lt xs1 value of gt you could also have used lt xs1 copy of gt because the elements are identical in each vocabulary lt title gt lt xsl value of select feed title gt lt title gt There s nothing you can use to fill the lt description gt element except perhaps the feed s subtitle but that might not be a good idea because it is optional in the Atom schema Next you come to the lt link gt element in the RSS feed but wait you ve just used the required value in the rdf about attribute Let s backtrack and create a reusable template variable feedur1 You can also refine the selection to choose the link that has the rel attribute set to self because there are two link elements in the source To do this you use a predicate inside square bra
18. has strong origins in the library community and Atom was primarily developed for the requirements of web logs Atom id title published updated author name email uri contributor DC identifier title date creator contributor Description Identifier of the resource Name by which the resource is known Date of publication Date updated Container for name e mail and URI elements The person or organization responsible for creating the resource Author s e mail address URI associated with the author Contributor to a work same structure as author Continued 15 Chapter 1 First Steps with XSLT 16 Atom summary content type source content category rights DC description language format type publisher source coverage subject rights relation Williams c01 tex V3 07 31 2009 2 53pm Page 16 Description Description of the resource Principal language of the resource File format Defines either the genre or intellectual type of the resource Supplier of the resource Identifier for source material for the resource assuming it is derived from another format Container of or link to the content Locations or periods that are subjects of the resource Subjects of the resource Rights information Reference to a related resource Listing 1 5 shows part of an Atom feed from the xml com website We ll use this document as the
19. hoice Atom and RSS Elements The next two tables compare the feed elements and entry elements in the Atom 1 0 schema with the equivalents in RDF Site Summary RSS 1 0 The pros and cons of the different ways to describe metadata can be a contentious issue but just now we need not be concerned with the details The following table lists the top level elements that define the properties of the Atom feed in the lt feed gt and RSS 1 0 lt channel1 gt elements The matches are quite weak at this level perhaps reflecting the history 14 Williams c01 tex V3 07 31 2009 2 53pm Page 15 Chapter 1 First Steps with XSLT of how these structures were developed The Atom specification provides a richer set of values on the whole Atom RSS 1 0 Description feed channel Root element of the feed document title title Feed title id Feed identifier updated Date the feed was most recently updated subtitle Feed subtitle generator Application that generated the feed link link URL for the HTML version of the feed description Feed description logo icon image URI for a feed image items RDF sequence acting as a table of contents entry item Feed entry container The next table shows elements in the Atom lt entry gt and RSS 1 0 lt item gt elements The item contents for RSS 1 0 are in the Dublin Core DC namespace The matches between the schemas are much closer here and the differences reflect the fact that the DC format
20. ibutes used in lt link rel stylesheet gt in HTML The content type expressed need not be XSLT and this processing instruction is often used to specify multiple CSS files to handle different types of media using the value text css The content type for XSLT 1 0 was never specified in the W3C recommendation Microsoft invented the text xs1 value for Internet Explorer which seems to have stuck in practice though browsers may also accept other values such as text xml1 The XSLT 2 0 recommendation formally registers the media type application xslt xml Built in Rules We can now process the sample by writing a bare bones transform It is not very exciting but it illustrates the default behavior of a processor using a built in template rule specified in the XSLT specification XSLT defines built in rules for processing templates and the rule for document and element nodes ensures that the root element and all of its children will be handled recursively even if there are no element specific templates This book generally specifies XSLT version 2 0 for stylesheets However in the following Try It Out you ll create an XSLT 1 0 transform using a single root lt xs1 stylesheet gt element to demonstrate this built in behavior Try It Out A Root Element Stylesheet To create the transform follow these steps 1 Inthe Oxygen IDE mentioned in this book s Introduction create a new document by choosing New gt Stylesheet XSL
21. isting 1 4 shows the XHTML source code xsl stylesheet Purpose The root element of a stylesheet Usage The xsl stylesheet is always the root element even if a stylesheet is included in or imported into another It must have a version attribute indicating the version of XSLT that the stylesheet requires For this version of XSLT the value should normally be 2 0 For a stylesheet designed to execute under either XSLT 1 0 or XSLT 2 0 create a core module for each version number then use xsl include or x31 import to incorporate common code which should specify version 2 0 ffit uses XSLT 2 0 features or version 1 0 otherwise The xsl transform element is allowed as a synonym The namespace declaration xmlns xsl http www w3 org 1999 XSL Transform by convention uses the prefix xs1 An element occurring as a child of the xs1 stylesheet element is called a declaration These top level elements are all optional and may occur zebo or more times Figure 1 2 12 Williams c01 tex V3 07 31 2009 2 53pm Page 13 Chapter 1 First Steps with XSLT Listing 1 4 lt xml version 1 0 encoding UTF 8 gt lt DOCTYPE html PUBLIC W3C DTD XHTML 1 0 Transitional EN http www w3 org TR xhtm11 DTD xhtml1 transitional dtd gt lt html gt lt head gt lt title gt xsl stylesheet lt title gt lt head gt lt body gt lt h1 gt xsl stylesheet lt h1 gt lt h2 gt Purpose lt h2 gt lt p gt The
22. lt xs attribute name standalone type xsl yes or no or omit gt lt xs attribute name undeclare prefixes type xsl yes or no gt lt xs attribute name use character maps type xsl QNames gt lt xs attribute name version type xs NMTOKEN gt lt xs extension gt lt xs complexContent gt lt xs complexType gt lt xs element gt In XSLT 1 0 the method attribute can take the values xml html or text For instance you would use the text method to output a CSV file which you ll learn how to do in Chapter 9 You would use xml as a value for SVG output and also for PDF because it requires transforming to the XSLFO format as an intermediate step The XSLT 2 0 specification adds xhtm1 to the possible attribute values However in the next XSLT 1 0 example you ll use xml as a value as the output is XHTML The version attribute on the lt xs1 stylesheet gt element is rather confusing It has absolutely nothing to do with the version of XSLT rather it refers to the version of XML to be output You can define an encoding attribute which specifies the preferred character encoding of the output document All XSLT processors and XML parsers are required to support UTF 8 and UTF 16 Clearly processing Chinese or Japanese content with UTF 8 encoding would produce corrupt output On this occasion you ll set it to UTF 8 You can also add two more attributes specifying the XHTML doctype system and doctype publi
23. m r oreilly xml 3 487860677 xforms a pause for reflection html lt link gt lt dc language gt en lt dc language gt lt dc title gt xXForms a pause for reflection lt dc title gt lt dc date gt 2008 12 17T18 05 12Z lt dc date gt lt dc creator gt Philip Fennell lt dc creator gt lt dc description gt The other day I had what could only be described as a Roy Scheider moment you know the bit in the film Jaws where the camera tracks in whilst zooming out at the same time Well whilst debugging an XForms enabled application the Mozilla XForms plug in had exposed the host document XForms and all as the content of the empty xf instance How odd I mean what good is that That s when it struck me in a Roy Scheider sort of way this was Reflection the ability of a program to look at itself and change its behaviour lt dc description gt lt dc format gt html lt dc format gt lt dc subject gt xforms lt dc subject gt lt dc subject gt xml lt dc subject gt lt dc subject gt xrx lt dc subject gt lt item gt lt channel gt lt rdf RDF gt Summary In this chapter you created two stylesheets the first of which handled a typical document transformation from XML to XHTML The second transform was a little more complex involving two different schemas You used an XSLT version 2 0 stylesheet with XML output and learned about using the lt xs1 for each gt instruction to handle repeating uniform data structures Y
24. of select feedurl gt lt xsl attribute gt lt title gt lt xsl value of select feed title gt lt title gt lt link gt lt xsl value of select feedurl gt lt link gt lt items gt lt rdf Seq gt lt xsl for each select feed entry gt xrdf li gt lt xsl attribute name rdf resource gt lt xsl value of select id gt lt xsl attribute gt lt rdf 1i gt lt xs1 for each gt lt rdf Seq gt lt items gt lt xsl for each select entry gt lt xsl apply templates select gt lt xs1 for each gt lt channel gt Continued 21 Listing 1 7 lt xml version 1 0 lt rdf RDF xmlns dc http purl org dc elements 1 1 xmilns rdf http www w3 org 1999 02 22 rdf syntax ns gt lt channel rdf about http feeds oreilly com oreilly xml gt lt title gt O Reilly News XML lt title gt Listing 1 6 continued lt rdf RDF gt lt xsl template gt lt xsl template match entry gt lt item gt Williams c01 tex V3 07 31 2009 2 53pm Page 22 Chapter 1 First Steps with XSLT lt xsl attribute name rdf about gt lt xsl value of lt xsl attribute gt 2link gt lt xsl value of lt link gt lt dc language gt lt xsl value of lt dc language gt lt de title gt lt xsl value of lt de title gt lt dc date gt lt xsl value of lt dc date gt lt dc creator gt lt xsl value of lt dc creator gt lt dc description gt lt xsl value of
25. omenabled org developers syndication atom format spec php RSS 1 0 http web resource org rss 1 0 spec I ll call the top level elements feed elements and the individual entries entry elements using the Atom terminology Preliminaries Let s start with the basics of the stylesheet rss_feed xs1 This time you ll set 2 0 as the value of the stylesheet s version attribute Inside the lt xsl stylesheet gt element are two namespaces to declare using the rdf and dc prefixes Always check the source file for a default namespace declaration In this case it is http www w3 org 2005 Atom You need to set the xpath default namespace attribute on the lt xs1 stylesheet gt element to this value otherwise nothing from the source file will be output 17 Williams c01 tex V3 07 31 2009 2 53pm Page 18 Chapter 1 First Steps with XSLT Next declare the output method as xm1 and the encoding as UTF 8 In the main template create the literal result elements lt rdf RDF gt and lt channel gt in that order as the container for your output The namespaces must be declared again on the lt rdf RDF gt element lt xml version 1 0 encoding UTF 8 gt lt xsl stylesheet version 2 0 xmlins xsl http www w3 org 1999 XSL Transform xmins rdf http www w3 org 1999 02 22 rdf syntax ns xmlns dc http purl org dc elements 1 1 xpath default namespace http www w3 org 2005 Atom gt l
26. omplexContent mixed true gt lt xs extension base xsl sequence constructor gt lt xs attribute name select type xsl expression gt lt xs attribute name Separator type xsl avt gt lt xs attribute name disable output escaping type xsl yes or no default no gt lt xs extension gt lt xs complexContent gt lt xs complexType gt lt xs element gt Processing Specific Source Elements The processor s built in template rules have a lower priority than other templates so by adding rules for individual elements you can override the defaults Now you can add specific templates for the structural elements in the XML source lt title gt lt purpose gt lt usage gt and lt p gt The match attribute identifies the element and the output is specified with literal result elements The select attribute value for the lt title gt element is an XPath expression that refers to the current node being processed Because both the lt purpose gt and the lt usage gt elements can contain paragraphs we apply processing to all the lt p gt content and its inline markup These templates are located at the top level like the main template you have just written but their order is not significant The XSLT processor treats the source elements in document order and will look in the templates for matches as it goes I generally put them in rough document order in simple stylesheets lt xsl template match title gt lt h
27. only if xsl apply templates is used within xs1 copy In contrast if you use lt xs1 copy of gt each new node will contain copies of all the children attributes and namespaces of the original node recursively This is often called a deep copy This instruction has a select attribute providing you with more flexibility in selection lt xs element name copy of substitutionGroup xsl instruction gt lt xs complexType gt lt xs complexContent mixed true gt lt xs extension base xsl versioned element type gt lt xs attribute name select type xsl expression use required gt lt xs attribute name copy namespaces type xsl yes or no default yes gt lt xs attribute name type type xsl QName gt lt xs attribute name validation type xsl validation type gt lt xs extension gt lt xs complexContent gt lt xs complexType gt lt xs element gt Listing 1 3 shows the completed stylesheet Save this version as local xs1 Listing 1 3 lt xml version 1 0 encoding UTF 8 gt lt xsl stylesheet xmlns xsl http www w3 org 1999 XSL Transform version 1 0 gt lt xsl output method htm1 encoding UTF 8 doctype system http www w3 org TR xhtml1 DTD xhtml1 transitional dtd doctype public W3C DTD XHTML 1 0 Transitional EN gt lt xsl template match gt lt html gt lt head gt lt title gt lt xsl value of select reference body title gt lt title gt lt head gt lt bo
28. ou used three methods to invoke your first stylesheet the lt xsl stylesheet gt processing instruction the Oxygen IDE and the Saxon CLI Along the way you learned about the main structural XSLT elements defining output methods match ing nodes in source documents and selecting content to transform You also encountered some common XPath syntax more of which is introduced in Chapter 2 23 Williams c01 tex V3 07 31 2009 2 53pm Page 24 Chapter 1 First Steps with XSLT Key Points A stylesheet processor uses built in rules for processing by default You override these rules by using specific template rules to match elements in an XML source document with XPath expressions You can specify different output methods in a stylesheet XML XHTML HTML and text and define the preferred character encoding too Literal result elements and attributes are often used to define output struc tures Output element content is usually specified by selecting values in the source with XPath expressions Always check for default namespace declarations in source files and set the xpath default namespace attribute on the lt xs1 stylesheet gt element to this value 24
29. reference gt element Path expressions in this context will be relative to this location The contained lt xs1 apply templates gt instruction selects the lt body gt element in the source document see Listing 1 1 for processing showing the relative XPath expression reference body in the select attribute This means that the lt body gt element and everything inside it have been selected for processing This instruction simply defines a set of nodes to be processed using the template rules for each source node to be matched You are not restricted to following the nested nodes as shown in this example You might want to select all the paragraphs in the source document for processing in which case you would use lt xsl apply templates select p gt There s more on XPath expressions in Chapter 2 Literal Result Elements You have two options when it comes to generating the element names that will be output Usually the most straightforward is to create what is called a literal result element by typing the element name with start and end tags straight into the stylesheet and then populating the new elements with selected content from the source XML Literal result elements are treated as data to be copied from the result tree directly to the output These elements can have any name and the content may be XSLT instructions nested literal result elements or text If you set attributes on the literal result elements they will also
30. run any transform inside a browser you need to add a processing instruction to the source document which will give the browser the location of the stylesheet you want to use This goes immediately after the XML declaration in xs1_stylesheet xml Save the update while you start work on the stylesheet Williams c01 tex V3 07 31 2009 2 53pm Page 3 Chapter 1 First Steps with XSLT lt xml version 1 0 encoding UTF 8 gt lt xml stylesheet href browser xsl type text xs1 gt lt reference gt lt reference gt You may have used processing instructions in other XML applications what appear to be attributes in the instruction are in fact known as pseudo attributes The href pseudo attribute locates the stylesheet browser xs1 The file extension xs1 is a convention which some applications may rely on for iden tification The type pseudo attribute defines the content type text xsl In this case you use a relative URI for a stylesheet in the same directory as the source document In this book I use the more general term URI which is an identifier that may not imply a specific location whereas the term URL implies a location from which you can obtain a representation of a resource such as an HTML page The W3C recommendation for this processing instruction is separate from the XSLT specifications and is at www w3 org TR xml stylesheet The semantics of pseudo attributes is the same as that of the attr
31. s which are associated with the source file and saved automatically for reuse To avoid typing long paths you can use editor variables recognized by the application such as cfdu for current file directory URL 1 Open xs1_stylesheet xm1 in the XML editor 2 Choose XML gt Configure Stylesheet Transformation 3 Click New in the dialog that opens and name the scenario xml2xhtml in the Edit Scenario dialog that appears as shown in Figure 1 1 By default the variable currentFileURL is used for the XML URL setting On the XSLT tab insert cfdu local xsl in the XSL URL control o gt Choose Saxon6 5 5 in the Transformer drop down Figure 1 1 shows the settings 6 On the Output tab accept the default setting Save as c n htm1 which will save the XHTML file in the same directory as the source with the current filename Check Show in Browser and click OK 8 Inthe main dialog click Transform Now N 11 Williams c01 tex V3 07 31 2009 2 53pm Page 12 a Chapter 1 First Steps with XSLT Edit scenario Scenario Name xml2xhtml XSLT FO Processor Output XMLURL currentFileURL v 3 B XSLURL cFdubistep2 xs HKE egd More about currentFileURL Cluse xml stylesheet declaration Transformer Saxon6 5 5 v Ca Com Figure 1 1 The transformed document should open in your browser Figure 1 2 shows the browser output and L
32. source for the transformation Some content an additional namespace declaration and stylesheet processing instructions have been removed for clarity The listing shows the lt feed gt element and its content to the end of the first lt entry gt element The code download is in the file atom xml Listing 1 5 lt xml version 1 0 encoding UTF 8 gt lt feed xmlns http www w3 org 2005 Atom gt lt title gt O Reilly News XML lt title gt lt link rel alternate type text html href http news oreilly com gt lt id gt tag news oreilly com 2008 08 01 44 lt id gt lt updated gt 2008 12 17T07 32 30Z lt updated gt lt subtitle gt O Reilly News Spreading the knowledge of innovators lt subtitle gt lt generator uri http www sixapart com movabletype gt Movable Type Pro 4 21 en lt generator gt lt link rel self href http feeds oreilly com oreilly xml type application atom xml gt lt entry gt lt title gt Defining markup languages using Unicode properties lt title gt lt link rel alternate type text html1 href http feeds oreilly com r oreilly xml 3 487372046 defining markup languages usin html gt lt id gt tag broadcast oreilly com 2008 53 34679 lt id gt lt published gt 2008 12 17T03 01 232Z lt published gt lt updated gt 2008 12 17T07 32 30Z lt updated gt Williams c01 tex V3 07 31 2009 2 53pm Page 17 Chapter 1 First Steps with
33. t xsl result document gt lt xsl result document format csv gt lt xsl result document gt lt xsl template gt The form of name attributes on XSLT elements is defined as a lexical QName or namespace qualified name It applies for example to templates attribute sets variables parameters and so on Typically this is a simple name but it may also be qualified with a namespace prefix such as lt xs1 function name xm getentry by id gt If two qualified names are compared the namespace URI that is declared with the prefix and the local name is used Main Template The lt xs1 template gt element which is covered in more detail in Chapter 3 is a basic building block of a stylesheet This element is used to declare templates that match elements in the XML source and to gen erate nodes in a result tree Usually a stylesheet has a main template with the match attribute set to lt xsl template match gt ee ee ee select reference body gt Spa l template gt This value is an XPath expression that means match the root of the source tree Note that this is not the same thing as the root element of the source document The root of the source tree is outside of everything including the containing top level element Williams c01 tex V3 07 31 2009 2 53pm Page 7 Chapter 1 First Steps with XSLT This means that processing will begin right at the start of the XML source tree outside the lt
34. t de title gt lt xsl value of select title gt lt de title gt lt dc date gt lt xsl value of select published gt lt dc date gt lt dc creator gt lt xsl value of select author name gt lt dc creator gt 20 Williams c01 tex V3 07 31 2009 2 53pm Page 21 Chapter 1 First Steps with XSLT lt dc description gt lt xsl value of select summary gt lt dc description gt lt dc format gt lt xsl value of select content type gt lt dc format gt lt xsl for each select category gt lt dcsubject gt lt xsl value of select label gt lt dc subject gt lt xs1l for each gt lt item gt lt xsl template gt The full stylesheet is shown in Listing 1 6 Listing 1 6 lt xml version 1 0 encoding UTF 8 gt lt xsl stylesheet version 2 0 xmlns xsl http www w3 org 1999 XSL Transform xmins dc http purl org dc elements 1 1 xmins rdf http www w3 org 1999 02 22 rdf syntax ns xpath default namespace http www w3 org 2005 Atom gt lt xsl output method xml gt lt xsl variable name site gt testurl lt xsl variable gt lt xsl template match gt lt rdf RDF xmlns rdf http www w3 org 1999 02 22 rdf syntax ns xmlns dc http purl org dc elements 1 1 gt lt channel gt lt xsl variable name feedurl select feed link rel self href gt lt xsl attribute name rdf about gt lt xsl value
35. t xsl output method xml encoding UTF 8 gt lt xsl template match gt lt rdf RDF xmlns rdf http www w3 org 1999 02 22 rdf syntax ns xmlns dc http purl org dc elements 1 1 lt channel gt lt channel gt lt rdf RDF gt lt xsl template gt lt xsl styleheet gt Specifying Attributes Both the lt channel gt and lt item gt elements require the rdf about attribute An attribute can be set directly on a literal result element if you know its value ahead of time but in this case you need to use another approach with the lt xsl attribute gt instruction This element should always come first in any set of sequence constructor instructions You can use either the element content or the select attribute but note that these approaches are mutually exclusive The XSLT 2 0 schema definition looks like this lt xs element name attribute substitutionGroup xsl instruction gt lt xs complexType gt lt xs complexContent mixed true gt lt xs extension base xsl sequence constructor gt lt xs attribute name name type xsl avt use required gt lt xs attribute name namespace type xsl avt gt lt xs attribute name select type xsl expression gt lt xs attribute name Separator type xsl avt gt lt xs attribute name type type xsl QName gt lt xs attribute name validation type xsl validation type gt lt xs extension gt lt xs complexContent gt lt xs complexType gt lt xs
36. tThe root element of a stylesheet The xsl stylesheet element is always the root element even if a stylesheet is included in or imported into another It must have a version attribute indicating the version of XSLT that the stylesheet requires For this version of XSLT the value should normally be 2 0 For a stylesheet designed to execute under either XSLT 1 0 or XSLT 2 0 create a core module for each version number then use xsl include or xsl import to incorporate common code which should specify version 2 0 if it uses XSLT 2 0 features or version 1 0 otherwise The xsl transform element is allowed as a synonym The namespace declaration xmlins xsl http www w3 org 1999 XSL Transform by convention uses the prefix xsl An element occurring as a child of the xsl stylesheet element is called a declaration These top level elements are all optional and may occur zero or more times What has happened here Without any further instructions the processor has output all the text nodes from the source document Safari 3 4 on Windows reports an empty stylesheet error and renders nothing which suggests that something may not be quite right with the implementation of built in rules Google Chrome produces the same result presumably because it is based on the same core engine Defining an Output Method You can provide hints to the stylesheet processing by adding some output specifications to your stylesheet Unless otherwise specified as HTML
37. vary according to which processor you use The next example uses the Saxon CLI for the open source version on the local xs1 If you intend to use the CLI frequently you may prefer to run it from an open source tool like jEdit www jedit org rather than from the file system console If you are using the Oxygen IDE and just want to experiment with the CLI you will find the jar file in the 1ib directory If you are not using a bundled version of Saxon you can download the Java ver sion of the Saxon processor from SourceForge http sourceforge net project showfiles php groupid 29872 Unzip the download to a convenient directory Add the saxon9 jar file to the classpath so that the command in the following Try It Out will locate the main program net sf saxon The schema aware version is com saxonica Full documentation is available on the Saxonica site which you should consult for installation and con figuration instructions www saxonica com documentation contents html 13 Williams c01 tex V3 07 31 2009 2 53pm Page 14 Chapter 1 First Steps with XSLT Try It Out The Saxon CLI Enter the following code on the command line and execute it java net sf saxon Transform s xsl_stylesheet xml xsl local xsl o xsl_stylesheet html The options in the example have the following meanings s filename The source XML file xsl filename The XSL stylesheet to use o filename The output filename

Download Pdf Manuals

image

Related Search

Related Contents

Franke Neptune FHNE 604 3G TC XS C    Configuración  Dépliant patrimoine VAH hdef - Ministère de la Culture et de la  PYLE Audio 324MAG User's Manual  Android: Digi iMobile Touch Lite App  Digicom 8D5684QB radio frequency (RF) modem  Valueline VLMB11955W mobile device charger  Honeywell W8635A1006 User's Manual  BEDIENUNGSANLEITUNG USER MANUAL  

Copyright © All rights reserved.
Failed to retrieve file