What’s XPath and What’s it Used for?
XML Path Language (XPath) is used to identify, address and navigate through parts of an XML document. An XPath expression can be used to search an XML document and extract information from anywhere in the document.
Currently, you can use different versions of XPath:
- XPath 1.0 was released in November 1999 and is the most widely implemented and used specification.
- XPath 2.0 was released in January 2007 with a revision in 2010. This specification contains many more expressions than XPath 1.0.
- XPath 3.0 was published in April 2014.
- XPath 3.1 was released in March 2017 supports JSON and XML and added maps and arrays. The latest XPath version 3.1 is specified in the W3C recommendation of March 21, 2017.
How Does XPath Work?
XPath interprets the XML document as a sequence of elements arranged in a tree structure. Each of the elements present in the structure of a file is called nodes. The categorization of the nodes is defined both by the order of appearance in the document and by the relationship of each of the XML elements.
The XPath data identifies seven types of nodes with different functions:
- Element node
- Document node (root node)
- Attribute node
- Text node
- Namespace node
- Processing instruction node
- Comment node
XPath Syntax:
An XPath expression is a text string that represents a path in the document tree. The simplest of the expressions look like file paths as seen in Windows Explorer or the Linux shell.
Evaluating an XPath expression is looking for nodes in the document that conforms to the path defined in the expression. The result of the evaluation is all the nodes that fit the expression. In order to evaluate an XPath expression, the document must be well-formed.
Axis:
The axis allows us to select a subset of document nodes and corresponds to paths in the document tree. Element nodes are indicated by the element name.
- /: if it is at the beginning of the expression, it indicates the root node; if not, it indicates “child.” It must be followed by the name of an element. Example: /xliff/file/body/group/trans-unit/target
- //: indicates “descendant” Example: //target
Predicate:
The predicate is enclosed in brackets, following the axis.
[@attribute]: selects the elements that have the attribute.
[number]: if there are several results, select one of them by order number; last () select the last of them
[condition]: selects the nodes that meet the condition.
The following operators are allowed:
- Logical operators: and, or, not ()
- Arithmetic operators: +, -, *, div, mod
- Comparison operators: =,! =, <,>, <=,> =
Comparisons can be made between node and attribute values or with text or numeric strings.
Where is XPath Used in Localization?
XPath can be used in the localization process for creating custom file types in Trados. With the appearance of new formats, which sometimes aren’t compatible with Trados, the creation of new filters is sometimes indispensable for processing files. With XPath, it’s possible to create new filters for XML files without the need to learn regular expressions. Trados supports the XPath specification 1.0, so you must use expressions from this version of XPath.
You can play around with XPath and its expressions in an online XPath tester. The HTML strip XPath tester has examples and supports XPath 1.0 and 2.0, so it’s a good one to try.
If you prefer to explore something locally, you can use the Notepad++ plugin called “XPatherizer,” which allows you to analyze multiple XPath queries, verify and improve XML documents, and more.
Conclusion
XPath is a useful tool in the localization industry as well as for developers and authors working in XML.