After a bit of reading and probing, I've discovered that there are several distinct differences between HTML, XHTML, and the means by which the documents are served. Instead of immediately diving into the HTML 4.01 Strict DTD (as mentioned in my previous post), I'll begin by looking at the differences between these technologies, and address things like:
- syntactical differences
- architectural differences
- reasons for choosing one over another
There's XHTML 1.0, XHTML 1.1, and XHTML 2.0, which is currently in the works. It turns out, XHTML 1.0 is simply HTML information that adheres to the more strict constraints of a properly formed XML document. This means slightly more work on behalf of the coder, but faster, more automated processing of the document by the user agent (web browser or HTTP client). Some important things to know about well-formed XML documents:
- each open tag must be explicitly closed in proper nesting order
- all attribute values are enclosed in single or double quotes
- all element and attribute names are CASE-SENSITIVE
- every attribute MUST have a value
XHTML 1.1 (Modular)
This actually sounds like a nice idea -- modularize XHTML such that small building blocks can be combined into a customized, extensible DTD. The full working draft of XHTML Modularization is available at the W3C site. If you browse through section 5, you'll find details on the type of modules that may be included to form a more complex document. Inline-frames, Objects, Targets, and more are now modules. I think this will become slightly clearer when I actually start coding with it. In the meantime, have a look at this W3C tutorial on modular XHTML. I'd like to use XHTML 1.1 as the primary technology for my web development activities, but I'm not so sure about the compatibility of it at this point. Still, it's worth learning about because it will likely see an increasing rate of adoption at some point.
The Fork in the Road
So...we now understand that there are several flavors of HTML that we can use to start making something meaningful: HTML 4.01, XHTML 1.0, and XHTML 1.1. If we use HTML 4.01, we can probably get away without using stylesheets, although I see no reason we'd want to do that, because the idea of separating the content from the presentation seems very appealing. I like this idea, and I think I'm drawing a connection between XHTML and Model-View-Controller architecture. I realize MVC can be implemented in traditional HTML applications, but XHTML seems like it lends itself better to this more scalable, flexible architecture.
The leap from HTML with minimal CSS to independent XHTML and CSS compartments seems like the leap from populating a list box with every data item, to creating a view for the list box that only displays relevant data. If you've ever populated a list box with 10,000 items, you probably realize how slow and unmanageable it is (Oh, what, you'd like to add 1 more item? Let me re-organize all 10,000 items). However, if you create, say, an AbstractTableModel (Java) for accessing and updating your data, the user controls become much more responsive, reliable, and efficient.
GMail uses HTML 4.01 Transitional. Wikipedia uses XHTML 1.0 Transitional. What should I use? On one hand, we know that GMail and Wikipedia are very compatible sites, as they are viewed by millions of users from a plethora of user agents (Firefox, Netscape, IE, Opera, etc). I'd rather go with XHTML than HTML because I don't mind making my document a well-formed XML document, but what about XHTML 1.1?
After some Googling around, I find out there is tremendous debate about which technology to use. In Lachy's Log, the author is doubtful that XHTML documents served as text/html will be able to survive a transition to being served as XML documents. There's also some insightful discussion at the end of that post, and there is a list of nuances to consider when coding in XHTML.
I like a challenge, and I'd like to adopt new technology, but I'm not so sure about adoption of 1.1 (see here). And so, I will start coding using the XHTML 1.0 Strict DTD, and I will adhere to the compatibility guidelines provided by W3C.
Lessons Learned:
There are driving forces in the web technology sector that are pushing for new standards (i.e. W3C), and there are forces that are pushing back until existing standards are implemented properly (i.e. Internet Explorer non-conformance issues). The best thing a new web developer can do to stay on the cutting edge while developing applications for today's user agent implementations is to:
- Write documents that have high compatibility from the get-go
- Recognize points of failure when writing documents that implement work-arounds to maintain compatibility
- Learn and practice using new standards with new, compliant user agents
Next Steps:
The next steps to take (not necessarily in this order) will involve:
- Reviewing the rules of properly structured XML
- Reviewing the compatibility guidelines from W3C for serving XHTML as text/html (HTML compatible, see the first paragraph of HTML Media Types)
- Learning XHTML tags and their attributes
- Learning CSS