HTML Versus XHTML

how are xhtml and html similar and why use xhtml instead of html and html vs xhtml vs html5
GregDeamons Profile Pic
GregDeamons,New Zealand,Professional
Published Date:03-08-2017
Your Website URL(Optional)
Comment
Chapter 16ti CHAPTER 16 In this chapter: • Why XHTML? • Creating XHTML Documents XHTML16 • HTML Versus XHTML • XHTML 1.1 • Should You Use XHTML? Despite its name, you don’t use the Extensible Markup Language (XML) to directly create and mark up web documents. Instead, you use XML to define a new markup language, which you then use to mark up web documents. This should come as no surprise to anyone who has read the preceding chapter in this book. Nor, then, should it surprise you that one of the first languages defined using XML is an XML- ized version of HTML, the most popular markup language ever. HTML is being dis- ciplined and cleaned up by XML, to bring it back into line with the larger family of markup languages. This standard is XHTML 1.0. Because of HTML’s legacy features and oddities, using XML to describe HTML was not an easy job for the World Wide Web Consortium (W3C). In fact, certain HTML rules, as we’ll discuss later, cannot be expressed with XML. Nonetheless, if the W3C has its way, XHTML will ultimately replace the HTML we currently know and love. So much of XHTML is identical to HTML’s current standard, version 4.01, that you can apply almost everything presented elsewhere in this book to both HTML and XHTML. We detail the differences, both good and bad, in this chapter. To become fluent in XHTML, you’ll first need to absorb the rest of this book, and then adjust your thinking to embrace what we present in this chapter. Throughout this chapter, we use “XHTML” to mean the XHTML 1.0 standard. There is a nascent XHTML 1.1 standard that diverges from HTML 4.01 and is more restrictive than XHTML 1.0. We describe the salient features of XHTML 1.1 in section 16.4. 49316.1 Why XHTML? As we described in the preceding chapter, HTML began as a simple markup lan- guage similar in appearance and usage to other Standard Generalized Markup Lan- guage (SGML)-based markup languages. In its early years, little effort was put into making HTML perfectly SGML compliant. As a result, odd features and a lax atti- tude toward enforcing the rules became standard parts of both HTML and the browsers that processed HTML documents. As the Web grew from an experiment into an industry, the desire for a standard ver- sion of HTML led to the creation of several official versions, culminating most recently with version 4.01. As HTML has stabilized into this latest version, browsers have become more alike in their support of various HTML features. In general, the world of HTML has settled into a familiar set of constructs and usage rules. Unfortunately, HTML offers only a limited set of document-creation primitives, is incapable of handling nontraditional content such as chemical formulae, musical notation, or mathematical expressions, and fails to adequately support alternative display media such as handheld computers or intelligent cellular phones. We need new ways to deliver information that can be parsed, processed, displayed, sliced, and diced by the many different communication technologies that have emerged since the Web sparked the digital communication revolution a decade ago. Instead of trying to rein in another herd of maverick, nonstandard markup lan- guages, the W3C introduced XML as a standard way to create new markup lan- guages. XML is the framework upon which organizations can develop their own markup languages to suit the needs of their users. XML is an updated version of SGML, streamlined and enhanced for today’s dynamic systems. And while the W3C originally intended it as a tool to create document markup languages, XML is also becoming quite useful as a standard way to define small languages that different applications use as data-exchange protocols. Of course, we don’t want to abandon the plethora of documents already marked up with HTML, or the infrastructure of knowledge, tools, and technologies that cur- rently support HTML and the Web. Yet, we do not want to miss the opportunities of XML, either. XHTML is the bridge. It uses the features of XML to define a markup language that is nearly identical to standard HTML 4.01 and gets us all started down the XML road. 16.1.1 XHTML Document Type Definitions HTML 4.01 comes in three variants, each defined by a separate SGML Document Type Definition (DTD). XHTML also comes in three variants, with XML DTDs cor- responding to the three SGML DTDs that define HTML 4.01. To create an XHTML 494 Chapter 16: XHTMLdocument, you must choose one of these DTDs and then create a document that uses that DTD’s elements and rules. The first XHTML DTD corresponds to the “strict” HTML DTD. The strict definition excludes all deprecated elements (tags and attributes) in HTML 4.01 and forces authors to use only those features that are fully supported in HTML. Many of the HTML elements and attributes dealing with presentation and appearance, such as the font tag and the align attribute, are missing from the strict XHTML DTD and have been replaced by the equivalent properties in the Cascading Style Sheets (CSS) model. Most HTML authors find the strict XHTML DTD too restrictive because many of the deprecated elements and attributes are still in widespread use throughout the Web. More importantly, lots of content out there on the Web uses the legacy ele- ments and attributes, and the popular browsers still support most of the deprecated elements. The only real advantage of using the strict XHTML DTD is that compliant documents are guaranteed to be fully supported in future versions of XHTML. Most authors will probably choose to use the “transitional” XHTML DTD. It’s clos- est to the current HTML standard and includes all those wonderful, but deprecated, features that make life as an HTML author easier. With the transitional XHTML DTD, you can ease into the XML family while staying current with the browser industry. The third DTD is for frames. It is identical to the transitional DTD in all other respects; the only difference is the replacement of the document body with appropri- ate frame elements. You might think that, for completeness’s sake, there would be strict and transitional frame DTDs, but the W3C decided that if you use frames, you might as well use all the deprecated elements as well. 16.2 Creating XHTML Documents For the most part, creating an XHTML document is no different from creating an HTML document. Using your favorite text editor, simply add the markup elements to your document’s contents in the right order, and display it using your favorite browser. To be strictly correct (“valid,” as they say at the W3C), your XHTML docu- ment needs a boilerplate declaration upfront that specifies the DTD you used to cre- ate the document and defines a namespace for the document. If the W3C has its way, HTML won’t change beyond version 4.01. No more HTML; all new developments will be in XHTML and many other XML-based languages. 16.2 Creating XHTML Documents 49516.2.1 Declaring Document Types For an XHTML browser to correctly parse and display your XHTML document, you should tell it which version of XML is being used to create the document. You must also state which XHTML DTD defines the elements in your document. The XML version declaration uses a special XML processing directive. In general, these XML directives begin with ? and end with ?, but otherwise they look like typ- ical tags in your document. To declare that you are using XML version 1.0, place this directive in the first line in your document: ?xml version="1.0" encoding="UTF-8"? This tells the browser that you are using XML 1.0 along with the 8-bit Unicode char- acter set, the one most commonly used today. The encoding attribute’s value should reflect the character set used in your document. Refer to the appropriate Interna- tional Organization for Standardization (ISO) standards for other encoding names. Once you’ve gotten the important issue of the XML version squared away, you should then declare the markup language’s DTD: DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" With this statement, you declare that your document’s root element is html,as defined in the DTD whose public identifier is defined as "-//W3C//DTD XHTML 1.0 Strict//EN". The browser may know how to find the DTD matching this public identifier. If it does not, it can use the URL following the public identifier as an alter- native location for the DTD. As you may have noticed, the preceding DOCTYPE directive told the browser to use the strict XHTML DTD. Here’s the one you’ll probably use for your transitional XHTML documents: DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" And, as you might expect, the DOCTYPE directive for the frame-based XHTML DTD is: DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd" was already taken. 496 Chapter 16: XHTML16.2.2 Understanding Namespaces As described in the last chapter, an XML DTD defines any number of element and attribute names as part of the markup language. These elements and attributes are stored in a namespace that is unique to the DTD. As you reference elements and attributes in your document, the browser looks them up in the namespace to find out how they should be used. For instance, the a tag’s name (a) and attributes (e.g., href and style) are defined in the XHTML DTD, and their names are placed in the DTD’s namespace. Any pro- cessing agent—usually a browser, but your eyes and brain can serve the same func- tion—can look up the name in the appropriate DTD to figure out what the markup means and what it should do. With XML, your document actually can use more than one DTD and therefore require more than one namespace. For example, you might create a transitional XHTML document but also include special markup for some math expressions according to an XML math language. What happens when both the XHTML DTD and the math DTD use the same name to define different elements, such as a for XHTML hypertext and a for an absolute value in math? How does the browser choose which namespace to use? The answer is the xmlns attribute. Use it to define one or more alternative namespaces within your document. You can place it within the start tag of any ele- † ment within your document, and its URL-like value defines the namespace that the browser should use for all content within that element. With XHTML, according to XML conventions, you should at the very least include within your document’s html tag an xmlns attribute that identifies the primary namespace used throughout the document: html xmlns="http://www.w3.org/TR/xhtml1" If and when you need to include math markup, use the xmlns attribute again to define the math namespace. So, for instance, you could use the xmlns attribute within some math-specific tag of your otherwise common XHTML document (assuming the MATH element exists, of course): div xmlns="http://www.w3.org/1998/Math/MathML"x2/x/div XML namespace—xmlns—get it? This is why XML doesn’t let you begin any element or attribute with the three-letter prefix of “xml”: it’s reserved for special XML attributes and elements. † It looks like a URL, and you might think that it references a document that contains the namespace, but alas, it doesn’t. It is simply a unique name that identifies the namespace. Display agents use that placeholder to refer to their own resources for how to treat the named element or attribute. 16.2 Creating XHTML Documents 497In this case, the XML-compliant browser would use the http://www.w3.org/1998/ Math/MathML namespace to divine that this is the MATH, not the XHTML, version of the div tag, and should therefore be displayed as a division equation. It would quickly become tedious if you had to embed the xmlns attribute into each and every div tag anytime you wanted to show a division equation in your docu- ment. A better way—particularly if you plan to apply it to many different elements in your document—is to identify and label the namespace at the beginning of your doc- ument, and then refer to it by that label as a prefix to the affected element in your document. For example: html xmlns="http://www.w3.org/TR/xhtml1" xmlns:math="http://www.w3.org/1998/Math/MathML" The math namespace can now be abbreviated to “math” later in your document. So the streamlined: /math:divx2/x/div now has the same effect as the lengthy earlier example of the math div tag contain- ing its own xmlns attribute. The vast majority of XHTML authors will never need to define multiple namespaces and so will never have to use fully qualified names containing the namespace prefix. Even so, you should understand that multiple namespaces exist and that you will need to manage them if you choose to embed content based on one DTD within con- tent defined by another DTD. 16.2.3 A Minimal XHTML Document As a courtesy to all fledgling XHTML authors, we now present the minimal and cor- rect XHTML document, including all the appropriate XML, XHTML, and namespace declarations. With this most difficult part out of the way, you need only supply content to create a complete XHTML document: ?xml version="1.0" encoding="UTF-8"? DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" html xmlns="http://www.w3.org/TR/xhtml1" xml:lang="en" lang="en" head titleEvery document must have a title/title /head body ...your content goes here... /body /html Working through the minimal document one element at a time, we begin by declar- ing that we are basing the document on the XML 1.0 standard and using 8-bit 498 Chapter 16: XHTMLUnicode characters to express its contents and markup. We then announce, in the familiar HTML-like DOCTYPE statement, that we are following the markup rules defined in the transitional XHTML 1.0 DTD, which allow us free rein to use nearly any HTML 4.01 element in our document. Our document content actually begins with the html tag, which has its xmlns attribute declare that the XHTML namespace is the default namespace for the entire document. Also note the lang attribute, in both the XML and XHTML namespaces, which declares that the document language is English. Finally, we include the familiar document head and body tags, along with the required title tag. 16.3 HTML Versus XHTML The majority of HTML is completely compatible with XHTML, and this book is devoted to that majority. In this chapter, however, we talk about the minority: where the HTML 4.01 standard and the XHTML DTD differs. If you truly desire to create documents that are both HTML and XHTML compliant, you must heed the various warnings and caveats we outline in the following sections. The biggest difference—that’s Difference with a capital D and that spells difficult—is that writing XHTML documents requires much more discipline and attention to detail than even the most fastidious HTML author ever dreamed necessary. In W3C parlance, that means your documents must be impeccably well formed. Throughout the history of HTML—and in this book—authors have been encouraged to create well-formed documents, but you have to break rank with the HTML standards for your documents to be considered well formed by XML standards. Nonetheless, your efforts to master XHTML will be rewarded with documents that are well formed and a sense of satisfaction from playing by the new rules. You will truly benefit in the future, too: through XML, your documents will be able to appear in places you never dreamed would exist (mostly good places, we hope). 16.3.1 Correctly Nested Elements One requirement of a well-formed XHTML document is that its elements are nested correctly. This isn’t any different from the HTML standards: simply close the markup elements in the order in which you opened them. If one element is within another, the end tag of the inner element must appear before the end tag of the outer element. Hence, in the following well-formed XHTML segment, we end the italics tag before we end the bold one, because we started italicizing after we started bolding the content: bClose the italics tag ifirst/i/b. 16.3 HTML Versus XHTML 499On the other hand, the following: bWell formed, this is inot/b/i is not well formed. XHTML strictly enforces other nesting restrictions that have always been part of HTML but have not always been enforced. These restrictions are not formally part of the XHTML DTD; they are instead defined as part of the XHTML standard that is based on the DTD. Nesting restrictions include the following: • The a tag cannot contain another a tag. • The pre tag cannot containimg, object, big, small, sub, or sup tags. • The button tag cannot contain input, select, textarea, label, button, form, fieldset, iframe, or isindex tags. • The label tag cannot contain other label tags. • The form tag cannot contain other form tags. These restrictions apply to nesting at any level. For example, because an a tag can- not contain any other a tags, any tag contained within that a tag cannot itself contain an a tag, even though it might otherwise. 16.3.2 End Tags As we’ve documented throughout this book, any HTML tag that contains other tags or content has a corresponding end tag. However, one of the hallmarks of HTML (codified in the 4.01 standard) is that you may leave out the end tags if the process- ing agent can infer their presence. This is why most of us HTML authors commonly leave out the /p end tag between adjacent paragraphs. Also, lists and tables can be complicated to wade through, and not having to visually stumble over all the /li, /td, /th, and /tr end tags certainly makes HTML easier to read, albeit a bit more ambiguous. This is not so for XHTML. Every tag that contains other tags or content must have a corresponding end tag present, correctly nested within the XHTML document. A missing end tag is an error and renders the document noncompliant. Although seem- ingly draconian, this and the nesting rules nonetheless remove any and all ambigu- ities as to where one tag starts and another tag ends. This is hair splitting within the XHTML standard. The XML standard has no mechanism to define which tags may not be placed within another tag. SGML, upon which XML is based, does have such a feature, but it was removed from XML to make the language easier to use and implement. As a result, these restrictions are simply listed in an appendix of the XHTML standard instead of being explicitly defined in the XHTML DTD. 500 Chapter 16: XHTML16.3.3 Handling Empty Elements In XML, and thus XHTML, every tag must have a corresponding end tag—even those that aren’t allowed to contain other tags or content. Accordingly, XHTML expects the line break to appear as br/br in your document. Ugh. Fortunately, there is an acceptable alternative: include a slash before the closing bracket of the tag to indicate its ending (e.g., br /). If the tag has attributes, the slash comes after all the attributes so that an image could be defined as: img src="kumquat.gif" / While this notation may seem foreign and annoying to an HTML author, it actually serves a useful purpose. Any XHTML element that has no content can be written this way. Thus, an empty paragraph can be written as p /, and an empty table cell can be written as td /. This is a handy way to mark empty table cells. Clever as it may seem, writing empty tags in this abbreviated way may confuse HTML browsers. To avoid compatibility problems, you can fool the HTML brows- ers by placing a space before the forward slash in an empty element using the XHTML version of its end tag. For example, use br /, with a space between the br and /, instead of the XHTML equivalents br/ and br/br. Table 16-1 contains all of the empty HTML tags, expressed in their acceptable XHTML (transitional DTD) forms. Table 16-1. HTML empty tags in XHTML format area / base / basefont / br / col / frame / hr / img / input / isindex / link / meta / param / 16.3.4 Case Sensitivity If you thought getting all those end tags in the right place and cleaning up the occa- sional nesting error would make writing XHTML documents difficult, hold on to your hat. XHTML is case-sensitive for all tag and attribute names. In an XHTML document, a and A are different tags; src and SRC are different attributes, and so are sRc and SrC How forgiving HTML seems now. The XHTML DTD defines all former HTML tags and attributes using lowercase let- ters. Uppercase tag or attribute names are not valid XHTML tags or attributes. This can be a difficult situation for any author wishing to convert existing HTML documents into XHTML-compliant ones. Lots of web pages use uppercase tag and attribute names, to make them stand out from the surrounding lowercase content. 16.3 HTML Versus XHTML 501To become compliant, all those names must be converted to lowercase—even the ones you used in your CSS stylesheet definitions. Fortunately, it’s easy to accom- plish this kind of change with various editing tools, and XHTML authoring systems should perform the conversion for you. 16.3.5 Quoted Attribute Values As if all those case-sensitive attribute names weren’t aggravating enough, XHTML requires that you enclose every attribute value—even the numeric ones—in double quotes. In HTML, you could quote anything your heart desired, but quote marks are required only if the attribute value included whitespace or other special characters. To be XHTML compliant, every attribute must be enclosed in quotes. For example: table rows=3 is wrong in XHTML. It is correctly written as: table rows="3" 16.3.6 Explicit Attribute Values Within HTML, there are a small number of attributes that have no value. Instead, their mere presence within a tag causes that tag to behave differently. In general, these attributes represent a sort of on/off switch for the tag, like the compact attribute for the various list tags or the ismap attribute for the img tag. In XHTML, every attribute must have a value. Those without values must use their own names. Thus, compact in XHTML is correctly specified as compact="compact", and checked becomes checked="checked". Each must contain the required attribute value enclosed in quotes. Table 16-2 contains a list of attributes with the required XHTML values. Table 16-2. XHTML values for valueless HTML attributes checked="checked" compact="compact" declare="declare" defer="defer" disabled="disabled" ismap="ismap" multiple="multiple" noresize="noresize" noshade="noshade" nowrap="nowrap" readonly="readonly" selected="selected" Be aware that this attribute value requirement may cause some old HTML browsers to ignore the attribute altogether. All the modern browsers don’t have that problem, so the vast majority of users won’t notice any difference. There is no good solution to this problem, other than distributing HTML 4.0-compliant browsers to the needy. 502 Chapter 16: XHTML16.3.7 Handling Special Characters XHTML is more sensitive than HTML to the use of the and & characters in Java- Script and CSS declarations within your documents. In HTML, you can avoid poten- tial conflicts by enclosing your scripts and stylesheets in comments ( and ). XML browsers, however, may simply remove all the contents of comments from your document, thereby deleting your hidden scripts and stylesheets. To properly shield your special characters from XML browsers, enclose your styles or scripts in a CDATA section. This tells the XML browser that any characters con- tained within are plain old characters, without special meanings. For example: script language="JavaScript" CDATA ...JavaScript here... /script This doesn’t solve the problem, though. HTML browsers ignore the contents of the CDATA XML tag but honor the contents of comment-enclosed scripts and stylesheets, whereas XML browsers do just the opposite. We recommend that you put your scripts and styles in external files and reference them in your document with appro- priate external links. Special characters in attribute values are problematic in XHTML, too. In particular, you always should write an ampersand within an attribute value using & and not simply an & character. Similarly, play it safe and encode less-than and greater-than signs using their < and > entities. For example, while: img src=seasonings.gif alt="Salt & pepper" is perfectly valid HTML, you must write it as: img src="seasonings.gif" alt="Salt & pepper" / for it to be compliant XHTML. 16.3.8 The id and name Attributes Early versions of HTML used the name attribute with the a tag to create a fragment identifier in the document. This fragment could then be used in a URL to refer to a particular spot within a document. The name attribute was later added to other tags, such as frame and img, allowing those elements to be referenced by name from other spots in the document. With HTML 4.0, the W3C added the id attribute to almost every tag. Like name, id lets you associate an identifier with nearly any element in a document for later refer- ence and use, perhaps by a hyperlink or a script. XHTML has a strong preference for the id attribute as the anchor of choice within a document. The name attribute is defined but formally deprecated for those elements 16.3 HTML Versus XHTML 503that have historically used it. With widespread support of HTML 4.0 now in place, you should begin to avoid the name attribute where possible and instead use the id attribute to bind names to elements in your documents. If you must use the name attribute on certain tags, include an identical id attribute to ensure that the tag will behave similarly when processed by a strict XHTML browser. 16.4 XHTML 1.1 In May 2001, the W3C released an updated the XHTML standard, XHTML 1.1. While most standards expand upon their previous versions, XHTML 1.1 takes the unusual step of defining a more restrictive version of XHTML. If you think of XHTML 1.0 as unwieldy, picky, and time consuming, you’ll find XHTML 1.1 even more so. In our opinion, XHTML 1.1 is an example of the standards process taken to absurd levels, defining a standard that may be academically pure but is essentially unusable. 16.4.1 Differences in XHTML 1.1 XHTML 1.1 begins with the XHTML 1.0 strict DTD and makes a few modifica- tions. By supporting only the strict version of XHTML 1.0, version 1.1 eliminates all deprecated elements and all browser extensions still in common use on the Web. It also makes the following minor changes: • The lang attribute has been removed from every element. Instead, authors should use the xml:lang attribute. • The name attribute has been removed from the a and map elements. Authors should use the id attribute in its place. Finally, the XHTML 1.1 standard defines a new set of elements that implement a typographic feature known as ruby text. Ruby text is short runs of text placed along- side the base text; it is often used to annotate the text or to indicate pronunciation. Ruby text has its roots in East Asian documents, particularly Chinese schoolbooks and Japanese books and magazines. Ruby text is typically displayed in a smaller font than the base text and follows certain alignment rules to ensure that it appears adja- cent to the appropriate base text element. You define and manage ruby text with a set of elements that provides grouping and layout control. We’ll be blunt: this new feature is so esoteric and of so little impor- tance to the vast majority of HTML authors—even those who would subject them- selves to the needless agony of XHTML 1.1 conformance—that it does not warrant The origin of the name “ruby” lies in the name that printers use for the 5.5-point font used by the British press to set this smaller adjacent text. 504 Chapter 16: XHTMLextensive coverage in this book. Those who are interested can find a complete dis- cussion of ruby text at http://www.w3.org/TR/ruby. For the rest of us, it is sufficient to know that there are a few new elements in XHTML 1.1 that you would be wise not to use in your own DTDs, if only to prevent confusion with the XHTML 1.1 DTD. These new elements are: ruby Defines a segment of ruby text rb Defines the ruby base text rt Defines the ruby text associated with the base text rp Is used as a “ruby parenthesis” to group related ruby elements rbc Serves as a ruby base text container to group several base text elements rtc Serves as a ruby text container to group several ruby elements Should you encounter any of these elements in a document, refer to the aforemen- tioned specification for details on how they are used. In general, you’ll find a single outer ruby element with at least one rb and rt element within it. You can col- lect multiple rb and rt elements within an rp element or group them within the rbc or rtc container element. 16.5 Should You Use XHTML? For a document author used to HTML, XHTML is clearly a more painful and cer- tainly a less forgiving document markup language. Whereas at one time we prided ourselves on being able to crank out HTML with pencil and paper, it’s much more tedious to write XHTML without special document-preparation applications. Why should any author want to take on that extra baggage? 16.5.1 The Dusty Deck Problem Over just a few years, authors have generated billions upon billions of web pages. It is a safe bet that the majority of these pages are not compliant with any defined ver- sion of HTML. It is an even safer bet that the vast majority of these pages are not XHTML compliant. The harsh reality is that these billions of pages will never be converted to XHTML. Who has the time to go back, root out these old pages, and tweak them to make them XHTML compliant—especially when the end result, as perceived by the user, 16.5 Should You Use XHTML? 505will not change? Like the dusty decks of COBOL programs that lay unchanged for decades before Y2K forced programmers to bring them up to snuff, these dusty decks of web pages will also lie untouched until a similarly dramatic event forces us to update them. However, the dusty-deck problem is no excuse for not writing compliant documents going forward. Leave those old documents alone, but don’t create a new conversion problem every time you create a new document. A little effort now will help your documents work across a wider range of browsers in the future. 16.5.2 Automatic Conversion If your sense of responsibility leads you to undertake the conversion of your existing HTML documents into XHTML, you’ll find a utility named Tidy to be exceptionally useful. Written by Dave Raggett, one of the movers and shakers at the W3C, it auto- mates a significant amount of the work required to convert HTML documents into XHTML. While Tidy’s capabilities are too varied and wonderful to be fully listed here, we can at least assure you that Tidy can detect and correct case conversion, quoted attributes, and proper element nesting. For the complete list of features and the lat- est version of Tidy for various computing platforms, visit http://tidy.sourceforge.net. 16.5.3 Lenient Browsers and Lazy Authors There is a good rule of thumb regarding data sharing, especially on the Internet: be lenient in what you accept and strict in what you produce. This is a not a commen- tary on social policy, but rather a pragmatic admonition to tolerate ambiguity and errors in data you receive while making sure that anything you send is scrupulously correct. Web browsers are good examples of lenient acceptors. Most current web pages have some sort of error in them, albeit often just an error of omission. Nonetheless, browsers accept the error and present a reasonable document to the user. This leniency lets authors get away with all sorts of things, often without even knowing they’ve made a mistake. Most authors stop developing a page when it looks good and works the way they want it to. Very few take the time to run their pages through the various HTML- compliance tools to catch potential errors. Many of those who do try to test for com- pliance are so overwhelmed by the number of minor errors they have committed that they simply give up and continue to create bad pages that can be handled by good browsers. Because the number of bad pages continues to grow, browsers cannot afford to start being strict. Any browser that tried to enforce even the most basic rules of the HTML 506 Chapter 16: XHTMLstandard would be abandoned by users who want to see web pages, not error mes- sages. A vicious cycle ensues: bad pages force the use of lenient browsers, which encourage the creation of more bad pages. Break the cycle by vowing to create only XHTML-compliant content whenever you can. 16.5.4 Time, Money, and Standards XHTML was developed as an XML representation of the HTML standard. It is intended, going forward, to become the single standard everyone should use to cre- ate content for the Web. In a perfect world, standards are universally adopted and used. Full compliance is required of any document before it is placed on the Web. Conversion of legacy docu- ments is done immediately. In the real world, a shortage of time and money prevents the universal use of stan- dards. Under pressure to quickly deliver something that works, developers turn out pages that work only well enough. Because browsers allow second-rate content to exist on the Web, the need to comply with a standard becomes a secondary issue— one that is too quickly ignored in the dizzying pace of web development. 16.5.5 Man Versus Machine All is not lost, however. While XHTML is painful and tedious for humans to create, it is quite easy for machines to create. The number of web-authoring tools continues to increase, and the pages created by these machines should be completely XHTML compliant. While it doesn’t make much economic sense for a web author to spend a lot of time getting all those end tags in the right spot, it does make sense for the pro- grammer developing an authoring tool to ensure that the tool generates all those cor- rect end tags. The effort the web author expends is leveraged exactly once for each page; the effort of the tool creator is leveraged over and over, each time the tool pro- duces a new page. It seems that the real future of XHTML lies in the realm of machine-generated content. XHTML is far too picky to be successfully used by the millions of casual web authors who create small sites. However, if those same authors use a tool to create their pages, they could be generating XHTML-compliant pages and never even know it. If you are among that small community of developers who create tools that generate HTML output, you are doing a great disservice to your many potential customers if your tool does not generate excruciatingly correct XHTML-compliant output. There is no technical excuse for any tool not to generate XHTML-compliant output. If there are compatibility issues surrounding how the output might be used (with a nonXHTML browser, perhaps), the tool should provide a switch that lets the author select XHTML-compliant output as an option. 16.5 Should You Use XHTML? 50716.5.6 What to Do? We recommend that all HTML authors take the time to absorb the differences between HTML and XHTML outlined in this chapter. Given the resources and opportunity, you should try to create XHTML-compliant pages wherever possible for the sites you are creating. Certainly you should choose authoring tools that support XHTML and give you the option of generating XHTML-compliant pages. One day, XHTML may replace HTML as the official standard language of the Web. Even so, the number of noncompliant pages on the Web is overwhelming, forcing browsers to honor old HTML constructs and features for at least the next five years. For better or worse, HTML is here to stay as the de facto standard for web authors for years to come. 508 Chapter 16: XHTMLChapter 17ti CHAPTER 17 In this chapter: • Top of the Tips • Cleaning Up After Your HTML Editor Tips, Tricks, and Hacks17 • Tricks with Tables • Tricks with Windows and Frames We’ve sprinkled a number of tips, tricks, and hacks throughout this book, along with style guidelines, examples, and instructions. So why have a special chapter on tips, tricks, and hacks? Because HTML and XHTML are the languages, albeit con- strained, that make the Web the exciting place that it is, and interested readers want to know, “How do I do the cool stuff?” 17.1 Top of the Tips The most important tip for even veteran authors is to surf the Web yourself. We can show and explain a few neat tricks to get you started, but hundreds of thousands of authors out there are combining and recombining HTML and XHTML tags and jug- gling content to create compelling and useful documents. All the popular browsers provide a way to view the source for the web pages that you download. Examine (don’t steal) them for how they create the eye-catching and effective features, and use them to guide your own creations. Get a feel for the more effective web collections. How are their documents organized? How large is each document? We all learn from experience, so go get it 17.1.1 Design for Your Audience We repeatedly argue throughout this book that content matters most, not look. But that doesn’t mean presentation doesn’t matter. 509Effective documents match your target audience’s expectations, giving them a famil- iar environment in which to explore and gather information. Serious academicians, for instance, expect a journal-like appearance for a treatise on the physiology of the kumquat: long on meaningful words, figures, and diagrams and short on frivolous trappings like cute bullets and font abuse. Don’t insult the reader’s eye, except when exercising artistic license to jar or to attack your reader’s sensibilities. By anticipat- ing your audience and designing your documents to appeal to their tastes, you also subtly deflect unwanted surfers from your pages. For instance, use subtle colors and muted text transitions between sections for a clas- sical art museum’s collection, to mimic the hushed environment of a real classical art museum. The typical rock ’n’ roll-crazed web-surfer maniac probably won’t take more than a glance at your site, but the millionaire arts patron might. Also, use effective layout to gently guide your readers’ eyes to areas of interest in your documents. Do that, by adhering to the basic rules of document layout and design, such as placing figures and diagrams near (if not inline with) their content references. Nothing’s worse than having to scroll up and down the browser window in a desperate search for a picture that can explain everything. We won’t lie and suggest that we’re design experts. We aren’t, but they’re not hard to find. So, another tip for the serious web page author is to seek professional help. The best situation is to have design experience yourself. Next best is to have a pro looking over your shoulder, or at least somewhere within earshot. Make a trip to your local library and do some reading on your own, too. Better yet, browse the various online guides. Check out WebDesign in a Nutshell by Jennifer Niederst Robbins (O’Reilly). Your readers will be glad you did. Tools for the Web Designer, 1.6 17.1.2 Consistent Documents The next best tip we can give you is to reuse your documents. Don’t start from scratch each time. Rather, develop a consistent framework, even to the point of a content outline into which you add the detail and character for each page. And endeavor to create CSS2-based stylesheets so that the look and feel of your docu- ments remains consistent across your collection. 17.2 Cleaning Up After Your HTML Editor Although you can create and edit HTML/XHTML documents with a text editor, such as vi or Notepad, most HTML authors use an application that is designed for creating web pages—several are free of charge, many offer a free evaluation period, and most are available for download over the Web. Be forewarned, though; in our 510 Chapter 17: Tips, Tricks, and Hacksexperience, you will rarely (if ever) be able to create a web document from one of these editors without having to inspect, add to, edit, and sometimes even repair the source HTML that the editor generates. The following sections discuss a few things that you should know about and watch out for. 17.2.1 Where Did My Document Go? One of the first things you will notice is that many of the HTML editors automati- cally introduce into your document markup that you did not explicitly select or write. Remember this very simple HTML document that we started with in Chapter 2? html head titleMy first HTML document/title /head body h2My first HTML document/h2 Hello, iWorld Wide Web/i No "Hello, World" for us p Greetings frombr a href="http://www.ora.com"O'Reilly Media/a p Composed with care by: cite(insert your name here)/cite br©2000 and beyond /body /html Here is what the source looks like after you load it into Microsoft Word from Office XP: html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40" head meta http-equiv=Content-Type content="text/html; charset=windows-1252" meta name=ProgId content=Word.Document meta name=Generator content="Microsoft Word 10" meta name=Originator content="Microsoft Word 10" link rel=File-List href="html_files/filelist.xml" title<html>/title if gte mso 9xml w:WordDocument w:Compatibility w:BreakWrappedTables/ w:SnapToGridInCell/ w:WrapTextWithPunct/ w:UseAsianBreakRules/ /w:Compatibility 17.2 Cleaning Up After Your HTML Editor 511 w:BrowserLevelMicrosoftInternetExplorer4/w:BrowserLevel /w:WordDocument /xmlendif style / Style Definitions / p.MsoNormal, li.MsoNormal, div.MsoNormal mso-style-parent:""; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman"; p.MsoPlainText, li.MsoPlainText, div.MsoPlainText margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Courier New"; mso-fareast-font-family:"Times New Roman"; page Section1 size:8.5in 11.0in; margin:1.0in 65.95pt 1.0in 65.95pt; mso-header-margin:.5in; mso-footer-margin:.5in; mso-paper-source:0; div.Section1 page:Section1; /style if gte mso 10 style / Style Definitions / table.MsoNormalTable mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman"; /style endif /head body lang=EN-US style='tab-interval:.5in' 512 Chapter 17: Tips, Tricks, and Hacks