Wednesday, February 27, 2008

HTML Part 1: Standards

There are a ton of tutorials. quick-starts, and reference sites out there for HTML. In this post, I attempt to guide myself through the plethora of digital documents in search of a better understanding of what HTML is and how to use it. What are all the tags? What do they do? How can we apply them in a useful manner?

Standards
The standard for HTML documents is maintained by the "World Wide Web Consortium," or "w3c." The current HTML standard is up to version 4.01, and version 5 is in the works. We will focus on version 4.01. Here is a link to the HTML 4.01 Specification (kudos if you read through it all -- it's large). I am taking this approach in learning HTML because examples provided by some sites will display differently in Firefox and IE, etc. Learning HTML from the W3C is like hearing the news from the horse's mouth. We'll learn the nuances of browsers at some other point, but for now we will learn to be standards compliant.

Everything you need to know about syntax...(maybe)
After clicking around a bit on the W3C site, I found the motherload of information on HTML.... the DTD, or "Document-Type Definition." Basically, the DTD describes valid keywords and syntax for HTML. If you can read the DTD, you can refer to it when coding your site. The DTD is a plain-text file, and is an XML Schema. In other words, it is a document that describes the format and structure of a given type of document, namely an HTML document. I won't lie. You'll need to brush up on your XML/SGML to be able to read the document, but I'll give you the basics here as I read. Without further a due, let us parse through this esoteric text...

We see at first glance that comments in this document start with "<!--" and end with "-->". Luckily for us, the DTD is well-documented and contains hyper links to relevant sections of the HTML 4.01 standard. As a note, I had to use the escape sequences for less than and greater than signs while typing this post. Click View->Source to check it out.

The first line that I'm curious about is the one that contains the following ubiquitous text:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

Why do we need this at the beginning of each page? According to this site, we need it so our HTML and CSS will validate properly. In other words, it essentially allows us to tell the web browser that we are standard-compliant with said DTD. It turns out this is extremely important if you'd like your web page to render correctly and predictably in IE, Firefox, etc. According to A List Apart,
Using an incomplete or outdated DOCTYPE—or no DOCTYPE at all—throws these same browsers into “Quirks” mode, where the browser assumes you’ve written old-fashioned, invalid markup and code per the depressing industry norms of the late 1990s.

In this setting, the browser will attempt to parse your page in backward–compatible fashion, rendering your CSS as it might have looked in IE4, and reverting to a proprietary, browser–specific DOM. (IE reverts to the IE DOM; Mozilla and Netscape 6 revert to who knows what.)

Here is a useful link to W3C that contains a table of all of the current DTDs.

There are two main DTDs: "loose" and "strict." The "loose," or transitional DTD provides better backwards compatibility, as it includes some components that were dropped in strict HTML 4.01. Some of these components are popular, familiar tags such as "APPLET," "FONT," and "CENTER." I can code web pages, but this is all new to me. Such is the glory of participating in the tech industry, where innovation drives change, creating a culture of continuous education.

Lesson Learned:
Use the DOCTYPE tag correctly in every HTML document, and use the W3C compliance testing tool (the validator) at http://validator.w3.org/ to make sure your document contains no errors. If you plan on using these deprecated tags (see this list and look for 'L' in the last column), use the loose DTD. If you plan on using frames, use the frameset DTD. Otherwise, use the strict DTD to ensure a more uniform structure for your site. See this article at HTMLHelp for a more thorough explanation of when to use what DTD.

For the Next Post: I'll start diving into the strict DTD to see what we're allowed to put inside our standard-compliant HTML document. I'm also curious about XHTML, the successor of HTML, so we'll touch on that and see what branches of investigation warrant attention.

Helpful Definitions:
  • XML Schema - "a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntax constraints imposed by XML itself. An XML schema provides a view of the document type at a relatively high level of abstraction" (source: Wikipedia)

No comments: