Upgrade your JSPs to XML (but not to .jspx)

20 January 2006

Bart Schuller

by Bart Schuller

Recently we were redesigning the views for a web application. The previous version used JSP files and Struts tiles. In trying to be as standards-compliant as possible, we decided that our pages should generate XHTML and so it made sense to write them as JSP documents, the fancy name for .jspx files.

This then should be a proper JSPX:


<?xml version="1.0" encoding="UTF-8"?>
<jsp:root xmlns:jsp="http://java.sun.com/JSP/Page" version="2.0"
        xmlns="http://www.w3.org/1999/xhtml">
<jsp:directive.page contentType="application/xhtml+xml; charset=UTF-8"/>
<html>
 <head>
  <title>Entities in jspx files</title>
 </head>
 <body>
  <h1>Entities in jspx files: This should be a proper ampersand: &amp;</h1>
 </body>
</html>
</jsp:root>

But when you try it out with Firefox, it rightly complains about a lone ampersand appearing. Viewing the source, you notice that it now says: ampersand: &</h1>.

So it appears that every ampersand used for every entity or character reference needs to be escaped itself. It's hard to describe just how wrong this is, but let me just list a couple of things you can no longer do:

  • Paste pre-written XHTML into your JSPX
  • Use XSLT to transform your own XML into JSPX
  • Change a JSP to a JSPX by only changing the first few lines of the file and adding a <jsp:root>
  • Tell technical writers: this is just an XML format like any other (DocBook for example). Start your XML editor and start typing

Didn't the spec writers notice how wrong this is? They did. From the JSP 2.1 Proposed Final Draft, unchanged from the 2.0 spec:

JSP.6.5.1 Generating XML Content Natively

All JSP 2.0 content is textual, even when using JSP documents to generate XML content. This is quite acceptable, and even ideal, for some applications, but in some other applications XML documents are the main data type being manipulated. For example, the data source may be an XML document repository, perhaps queried using XQuery, some of the manipulation on this data internal to the JSP page will use XML concepts (XPath, XSTL operations), and the generated XML document may be part of some XML pipeline.

In one such application, it is appealing not to transform back and forth between a stream of characters (text) and a parsed representation of the XML document. The JSP expert group has explored different approaches on how such XML-awareness could be added, and a future version of JSP could support this functionality.

The rub is in this sentence: This is quite acceptable, and even ideal, for some applications, but in some other applications XML documents are the main data type being manipulated.

No, it is not acceptable!

The main, 99% use of JSP(X) technology is producing some form of HTML or XML. The rules aren't any different for HTML as opposed to XML so, crazy as it may sound, our 99% falls into the some other applications category.

There are no other words for it, it has to be said:

JSP documents are broken as designed.

So, for us, in this project, it's back to plain JSP.

P.S.

If you wonder how we can get away with serving application/xhtml+xml content without complaints from Internet Explorer users, that's food for a separate article.

P.P.S.

After writing this article I started wondering how Sun's Java Server Faces copes with this problem. It turns out that yes, it is a problem for JSF files as well. See this blog entry for the gory details.