Friday, February 29, 2008

(X)HTML Part 2: XML/HTML/XHTML/WTFML

Let me preface this post with a disclaimer: I am not a subject matter expert on (X)HTML. I am merely someone who knows HTML basics and is looking to see what the latest standards are. Please feel free to correct me if I make false or ambiguous statements. I'd like to find out what standards I should aim to comply with when I do start coding something real. So, without further a due, here is the path of exploration I've taken today...

After a bit of reading and probing, I've discovered that there are several distinct differences between HTML, XHTML, and the means by which the documents are served. Instead of immediately diving into the HTML 4.01 Strict DTD (as mentioned in my previous post), I'll begin by looking at the differences between these technologies, and address things like:
  • syntactical differences
  • architectural differences
  • reasons for choosing one over another
What is XHTML?
There's XHTML 1.0, XHTML 1.1, and XHTML 2.0, which is currently in the works. It turns out, XHTML 1.0 is simply HTML information that adheres to the more strict constraints of a properly formed XML document. This means slightly more work on behalf of the coder, but faster, more automated processing of the document by the user agent (web browser or HTTP client). Some important things to know about well-formed XML documents:
  • each open tag must be explicitly closed in proper nesting order
  • all attribute values are enclosed in single or double quotes
  • all element and attribute names are CASE-SENSITIVE
  • every attribute MUST have a value
This doesn't seem too hard to accomplish, as it only requires small modifications to HTML code. XHTML DTDs exist for strict, loose (transitional), and frames, just as in HTML 4.01. In order to maintain backwards compatbility with HTML 4.01, XHTML documents should be served as the mime type text/html and adhere to the compatibility guidelines specified by W3C. So far, I see no reason not to use XHTML 1.0.

XHTML 1.1 (Modular)
This actually sounds like a nice idea -- modularize XHTML such that small building blocks can be combined into a customized, extensible DTD. The full working draft of XHTML Modularization is available at the W3C site. If you browse through section 5, you'll find details on the type of modules that may be included to form a more complex document. Inline-frames, Objects, Targets, and more are now modules. I think this will become slightly clearer when I actually start coding with it. In the meantime, have a look at this W3C tutorial on modular XHTML. I'd like to use XHTML 1.1 as the primary technology for my web development activities, but I'm not so sure about the compatibility of it at this point. Still, it's worth learning about because it will likely see an increasing rate of adoption at some point.

The Fork in the Road

So...we now understand that there are several flavors of HTML that we can use to start making something meaningful: HTML 4.01, XHTML 1.0, and XHTML 1.1. If we use HTML 4.01, we can probably get away without using stylesheets, although I see no reason we'd want to do that, because the idea of separating the content from the presentation seems very appealing. I like this idea, and I think I'm drawing a connection between XHTML and Model-View-Controller architecture. I realize MVC can be implemented in traditional HTML applications, but XHTML seems like it lends itself better to this more scalable, flexible architecture.

The leap from HTML with minimal CSS to independent XHTML and CSS compartments seems like the leap from populating a list box with every data item, to creating a view for the list box that only displays relevant data. If you've ever populated a list box with 10,000 items, you probably realize how slow and unmanageable it is (Oh, what, you'd like to add 1 more item? Let me re-organize all 10,000 items). However, if you create, say, an AbstractTableModel (Java) for accessing and updating your data, the user controls become much more responsive, reliable, and efficient.

GMail uses HTML 4.01 Transitional. Wikipedia uses XHTML 1.0 Transitional. What should I use? On one hand, we know that GMail and Wikipedia are very compatible sites, as they are viewed by millions of users from a plethora of user agents (Firefox, Netscape, IE, Opera, etc). I'd rather go with XHTML than HTML because I don't mind making my document a well-formed XML document, but what about XHTML 1.1?

After some Googling around, I find out there is tremendous debate about which technology to use. In Lachy's Log, the author is doubtful that XHTML documents served as text/html will be able to survive a transition to being served as XML documents. There's also some insightful discussion at the end of that post, and there is a list of nuances to consider when coding in XHTML.

I like a challenge, and I'd like to adopt new technology, but I'm not so sure about adoption of 1.1 (see here). And so, I will start coding using the XHTML 1.0 Strict DTD, and I will adhere to the compatibility guidelines provided by W3C.


Lessons Learned:
There are driving forces in the web technology sector that are pushing for new standards (i.e. W3C), and there are forces that are pushing back until existing standards are implemented properly (i.e. Internet Explorer non-conformance issues). The best thing a new web developer can do to stay on the cutting edge while developing applications for today's user agent implementations is to:
  • Write documents that have high compatibility from the get-go
  • Recognize points of failure when writing documents that implement work-arounds to maintain compatibility
  • Learn and practice using new standards with new, compliant user agents
User agents are very different. The Acid2 conformance test is clear evidence of that. It's also clear from the start that if we wish to maintain compatibility across many different browsers, we will need to piggy-back off of the hard-working individuals that were forced to eat hot pockets and drink coffee until 4:30am to find solutions and work-arounds.

Next Steps:
The next steps to take (not necessarily in this order) will involve:
  1. Reviewing the rules of properly structured XML
  2. Reviewing the compatibility guidelines from W3C for serving XHTML as text/html (HTML compatible, see the first paragraph of HTML Media Types)
  3. Learning XHTML tags and their attributes
  4. Learning CSS

Wednesday, February 27, 2008

HTML Part 1: Standards

There are a ton of tutorials. quick-starts, and reference sites out there for HTML. In this post, I attempt to guide myself through the plethora of digital documents in search of a better understanding of what HTML is and how to use it. What are all the tags? What do they do? How can we apply them in a useful manner?

Standards
The standard for HTML documents is maintained by the "World Wide Web Consortium," or "w3c." The current HTML standard is up to version 4.01, and version 5 is in the works. We will focus on version 4.01. Here is a link to the HTML 4.01 Specification (kudos if you read through it all -- it's large). I am taking this approach in learning HTML because examples provided by some sites will display differently in Firefox and IE, etc. Learning HTML from the W3C is like hearing the news from the horse's mouth. We'll learn the nuances of browsers at some other point, but for now we will learn to be standards compliant.

Everything you need to know about syntax...(maybe)
After clicking around a bit on the W3C site, I found the motherload of information on HTML.... the DTD, or "Document-Type Definition." Basically, the DTD describes valid keywords and syntax for HTML. If you can read the DTD, you can refer to it when coding your site. The DTD is a plain-text file, and is an XML Schema. In other words, it is a document that describes the format and structure of a given type of document, namely an HTML document. I won't lie. You'll need to brush up on your XML/SGML to be able to read the document, but I'll give you the basics here as I read. Without further a due, let us parse through this esoteric text...

We see at first glance that comments in this document start with "<!--" and end with "-->". Luckily for us, the DTD is well-documented and contains hyper links to relevant sections of the HTML 4.01 standard. As a note, I had to use the escape sequences for less than and greater than signs while typing this post. Click View->Source to check it out.

The first line that I'm curious about is the one that contains the following ubiquitous text:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

Why do we need this at the beginning of each page? According to this site, we need it so our HTML and CSS will validate properly. In other words, it essentially allows us to tell the web browser that we are standard-compliant with said DTD. It turns out this is extremely important if you'd like your web page to render correctly and predictably in IE, Firefox, etc. According to A List Apart,
Using an incomplete or outdated DOCTYPE—or no DOCTYPE at all—throws these same browsers into “Quirks” mode, where the browser assumes you’ve written old-fashioned, invalid markup and code per the depressing industry norms of the late 1990s.

In this setting, the browser will attempt to parse your page in backward–compatible fashion, rendering your CSS as it might have looked in IE4, and reverting to a proprietary, browser–specific DOM. (IE reverts to the IE DOM; Mozilla and Netscape 6 revert to who knows what.)

Here is a useful link to W3C that contains a table of all of the current DTDs.

There are two main DTDs: "loose" and "strict." The "loose," or transitional DTD provides better backwards compatibility, as it includes some components that were dropped in strict HTML 4.01. Some of these components are popular, familiar tags such as "APPLET," "FONT," and "CENTER." I can code web pages, but this is all new to me. Such is the glory of participating in the tech industry, where innovation drives change, creating a culture of continuous education.

Lesson Learned:
Use the DOCTYPE tag correctly in every HTML document, and use the W3C compliance testing tool (the validator) at http://validator.w3.org/ to make sure your document contains no errors. If you plan on using these deprecated tags (see this list and look for 'L' in the last column), use the loose DTD. If you plan on using frames, use the frameset DTD. Otherwise, use the strict DTD to ensure a more uniform structure for your site. See this article at HTMLHelp for a more thorough explanation of when to use what DTD.

For the Next Post: I'll start diving into the strict DTD to see what we're allowed to put inside our standard-compliant HTML document. I'm also curious about XHTML, the successor of HTML, so we'll touch on that and see what branches of investigation warrant attention.

Helpful Definitions:
  • XML Schema - "a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntax constraints imposed by XML itself. An XML schema provides a view of the document type at a relatively high level of abstraction" (source: Wikipedia)

The Lofty Goals of this Blog

First off, let me express that the intended audience for this Blog is people who are familiar with and interested in programming languages and software development technologies. Now, on to the meat of the post...

College is great. Every day, students have the ability to meet new and intelligent people, learn new and cutting-edge technology, and exchange ideas with well-published professors. All this interaction, studying, and learning stimulates the mind and keeps students full of motivation and creativity.

So what happens to you when you graduate and launch your career into the real world? Will you use the skills and techniques you learned in school? No. It's likely that you will not use a large part, if not a majority, of the vast breadth of your academic knowledge. Instead, you will most likely use your newfound analytical skills to learn several specialized tasks at your workplace. Eventually, you will find your mind becoming bored with the repeated activities you perform on a daily basis.

Thus, the purpose of this blog is:
  • to exercise my own analytical and learning ability
  • to expose myself to technologies I do not regularly use
  • to hone my expertise on nuances and 'gotchas' of said technologies
  • to share my findings, lessons learned, and inspirations with you
The initial technologies I plan on addressing in this blog are as follows:
  • HTML - Yes, I already know HTML. Am I an expert? No. It's time for a review.
  • CSS - Let's learn how to do this the right way.
  • Ajax - What exactly does it mean, and how can it make my blog better?
  • JavaScript - Let's dig into the features, syntax, and capabilities
After I have finished addressing these technologies, I will use the knowledge I acquired to enhance and add features to this blog. Nifty, eh? Next, I will switch to programming languages I currently do not know or do not actively use, namely:
  • Python - I've heard wonderful things about this language. I'd like to explore how powerful it really is, then write a few example applications in it.
  • Ruby - Does it annoy you when you see buzz-terms flying in your face and you have no idea what they are? Yeah, me too.
  • Perl - It's been around for a while, looks ugly as hell, but is extremely powerful, especially at searching text.
  • many more...i.e. awk, bash, MATLAB, LabView, PHP, ASP, Flash, C#, Cold Fusion, Fortran, Adobe AIR, and even MS-DOS Batch files. If there is a particular technology you would like me to address first, please let me know!
In general, when exploring a given technology or programming language, I will try to provide information with regards to:
  • Development environments (Windows and Linux)
  • Basic syntax
  • Gotchas (i.e. garbage collection in Java, pointers and references in C++, function name mangling in DLL's)
  • How to achieve functionality (i.e. network communication, multi-threading with thread-safe shared data, user input, GUI design, file I/O, external function calls)
  • Unique and exciting features of each language (i.e. the 'Reflection' class in C#)
I personally cannot wait to start diving into programming languages, but will restrain myself until I learn how to properly represent content in a web browser without using a template or someone else's HTML/CSS.

In closing, I'd like to welcome you to the start of a new journey that will help you and I 'kick our brains' to help foster creativity, excitement, and inspiration, and avoid the plague of complacency and doldrums.