A primer on HTML

By Jonathan Schofield of Watershed Creative

Introduction

HyperText Markup Language (HTML) is the predominant language for structuring (marking up) web pages (and HTML emails).

There are some great HTML editing tools available now that do a good job of protecting content owners from having to understand HTML. One of the most common browser-based tools is TinyMCE, a WYSIWYG editor. Great though tools like this are, they lull you into a false sense of security that everything is simple and taken care of. Because of its carefully nested structure, HTML is easy to break or do badly using these tools, so…

Knowing the basics of HTML is an invaluable skill for you to:

Fix errors when things go awry in TinyMCE (or other editors)
Realise your investment in your stylesheet
Promote yourself on third party sites as effectively as possible

Impatient or just need a reminder? Then skip to the essentials summary.

Want to dive in and learn? Then read on!

Note: I know that the following won’t be read and absorbed in one sitting and that the best way to learn is by doing. Nonetheless, by breaking it down into logical sections, my hope is that it forms a useful point of reference to come back to and one where specific guidance can be found quickly.

Essential terminology

HTML elements are the ‘building blocks’ of the language. Actually, as almost every element can be the ‘parent’ of another element, it’s more helpful to think of elements as parts of a tree structure.
Each element is written (marked up) in plain text using an HTML tag.
Every tag can include markup for one or more HTML attributes and their values, each of which can:
- Modify the behaviour of the element in question
- Provide additional information to the user
- Provide additional structural mechanisms by which stylesheets and JavaScript can function
Stylesheets are collections of one or more user-defined sets of style rules.
Style rules are collections of one or more style selectors and one or more style properties.

Fundamental concepts

HTML is a markup language not a programming language: it uses special sequences of text to add structure and additional semantics to the text content that is presented to the reader; it doesn’t have any facility for defining controls that create outcomes such as if scenario A, do something, if scenario B, do something else.

The ‘special sequences of text’ are nothing more than the names of the defined HTML elements and their attribute values in combination with five specific characters¹: <, =, ", / and >. Put together in the right syntax, these form HTML tags*.

¹ Double and single quotation marks serve the same purpose and can be used interchangeably (in pairs), but it’s best to stick to one or the other consistently.

When it comes to the styling of your content, you should know that almost any² HTML element in the body of a page can be made to display in any way you wish using an appropriate stylesheet. So the choice of element used to add structure to your content is a semantic one³, not a styling one!

² Explorer 6 does not recognise or correctly handle a few HTML elements, most notably those for abbreviations and short quotations. And there a couple of other infrastructural elements that cannot be styled.

³ HTML email is beset by the limitations of many email applications so certain layout effects can only be achieved by using HTML elements in an expedient unsemantic manner.

HTML elements

HTML is currently at version 4 in its evolution, with 85 elements available for use in the particular variant known as XHTML Strict. (There are another 9 deprecated elements in older variants.) Here’s how the 85 break down:

Only 13 elements are needed to markup the vast majority of text and image content
27 elements are for more exacting text markup (of which 12 are for specialist use)
10 elements are for tables (tabulated content in rows and columns like a spreadsheet)
10 elements are for forms (input fields, checkboxes, buttons and associated elements)
2 elements are for embedded media other than images (eg: Flash movies)
18 elements are for surrounding infrastructure (of which 4 are essential)
And 5 elements are deprecated

Note: Mouse-over the numbers cited above for lists of the relevant elements in each case.

The head and the body

HTML markup has two primary and mandatory ‘branches’. To continue with the ‘tree’ metaphor from above:

The head element is the root system of the HTML ‘tree’: vital for browser interaction but tucked away where no one can see it.
The body element is the trunk from which the readable content branches out. All markup you create via your CMS is placed within the body element (with the exception of meta tags).

Block and inline elements

There are two groups of elements that may go in the body:

Block elements force line breaks before and after them. Examples of such elements are headings, paragraphs and list items. Block elements may only be placed inside other block elements.
Note: each block element has its own rules as to which other block element may go inside it.
Inline elements appear in the flow of text as determined by the bounds of their containing ‘parent’ element. Examples of such elements are links, abbreviations and text emphasis. Inline elements may only contain other inline elements.

Note: Mouse-over ‘Block elements’ or ‘Inline elements’ above for lists of the relevant elements in each case.

HTML tags

For every HTML element, there is a corresponding tag…

The anatomy of HTML tags

An example HTML tag

The tag that drives the whole web is the <a> tag which is used to create a link element:

The markup for a simple link element looks like this:
```
<a href="URL">Link text</a>
```
And renders like this: Link text

Tag pairs

Like most (but not all) elements, a link element has a pair of tags: an opening tag, <a>, and a closing tag, </a>, with the text content of the element written between them.

Empty elements

Elements with no text content have ‘self closing’ tags with a penultimate / preceded by a word space. For example, an image — which must also have a src (source) attribute — has the form <img src="http://example.com/image.jpg" />.

The most commonly used HTML elements and their tags

Note: the example markup provided is indented for ease of reading. TinyMCE generates markup devoid of such indenting — which makes it more robust but harder to read!

For heading levels 1 to 4

<h1> for a primary heading
<h2> for a secondary heading
<h3> for a tertiary heading
<h4> for a quaternary heading

<h1>This example markup is for a primary heading</h1>

Note: heading elements may only contain inline elements.

For paragraphs

<p>This example markup is for a one sentence paragraph.</p>

Note: paragraph elements may only contain inline elements.

For lists

<ol> for ordered lists of one or more (typically numbered) items
<ul> for unordered lists of one or more (typically bullet pointed) items
<li> for each list item (within either <ol> or <ul>)

<ol>
  <li>This example markup is for an ordered list</li>
  <li>With two list items
    <ul>
      <li>Which contains a nested unordered list</li>
      <li>Also with two list items</li>
    <ul>
  </li>
</ol>

Note: ordered list and unordered list elements may only contain list item elements directly, but the list item elements themselves may contain any block or inline elements.

For quotations worthy of one or more paragraphs in their own right

<blockquote>
  <p>
    This example markup is for a quotation that deserves to 
    stand out and runs to more than just a few words that would 
    normally be quoted inline within a sentence.
  </p>
</blockquote>

Note: quoted text must be nested within a block element (most commonly a paragraph) that is itself inside the blockquote element.

For links

<a href="URL">Link text</a>

Note: link elements may only contain inline elements.

For text emphasis

<em> for basic emphasis (typically styled as italic)
<strong> for strong emphasis (typically styled as bold)

<p>
  This example markup includes <em>emphasised text</em> 
  and <strong>strongly emphasised text</strong>.
</p>

Note: text emphasis elements may only contain inline elements.

For embedded images

<img src="URL" />

HTML attributes

HTML tags without attributes do nothing more than add structure to your content — useful, but of limited value. HTML attributes and their values are what make tags work harder — they are the key to leveraging the full power of CSS. Some attributes are mandatory for particular tags.

The anatomy of HTML attributes

Attributes are always declared in the opening tag for an element
Attributes comprise a valid name (eg: href), an equals character =, and a user-defined value (eg: URL) enclosed in double or single quotation marks
For example, <a href="URL">

The most commonly used HTML attributes

href: mandatory for links
title: for ‘tool tips’
src: mandatory for images and embedded media
alt: for images
width and height: for images and embedded media
class: empowering CSS and JavaScript
id: empowering in-page links, CSS and JavaScript

href attribute

The prime example of an attribute is the href, which refers to the destination of a link, its URL. It is a mandatory attribute of the <a> tag and can be absolute or relative:

Absolute links have an href accessible from anywhere on the internet and typically begin with http://. Links in HTML email must be of this form.
Relative links are valid only for paths to content that is accessible on the same web server. Our link example from above, <a href="URL">Link text</a>, is an example of a relative link: Link text.

title attribute

Extending our example link tag, we could markup:
<a href="URL" title="More information">Link text</a>.
‘More information’ then appears in a box upon mouse-over of Link text. title attributes can be applied to (almost) any tag. They are particularly useful on links <a> to explain the function or destination of a link before clicking it, and on abbreviations <abbr> where they can provide the full text upon mouse-over of the abbreviated term*.

* But not in IE6 which doesn’t recognise the <abbr> tag!

Note: title attributes tend to have no effect in HTML email applications.

src attribute

src attributes describe the location of images and other embedded media such as Flash movies, essential if you want such elements to appear in your page (only images should be used in HTML email). A src attribute is a mandatory attribute of an <img />, <object> or (deprecated) <embed /> tag. As with href attributes, src attributes can be absolute or relative. In HTML email, src attributes must be absolute. An example minimal markup for an image is: <img src="http://example.com/image.jpg" />.

alt attribute

alt attributes provide a short description of any image that is important to your content and not merely decorative, vital for any user who has images turned off (or in email not yet loaded) or can’t see them. An example markup for an image with alt attribute is: <img src="http://example.com/image.jpg" alt="Short description" />.

width and height attributes

width and height attributes apply to images and other embedded media such as Flash movies. (They can also be applied to other HTML elements but this is deprecated in favour of defining width and height in CSS.) If no width or height attributes are applied to an image, it will display at its full pixel dimensions. In this respect, width and height attributes are not required for images but they do help the browser (or email application) render the page (or email) faster.

class attribute

Along with HTML tags themselves and id attributes, class attributes provide a way to add structure that CSS and JavaScript can exploit. For example, you might want an image to align left and for text to flow around it to the right with a suitable margin to the right of the image. Because there may be more than one instance of a left-aligned image in the page, we would use a class attribute. Assuming there are counterpart styling properties in the stylesheet, we could extend our example image markup to: <img src="http://example.com/image.jpg" alt="Short description" class="img-left" /> The accompanying CSS could be something like:

.img-left
{
  float: left; 
  /* Allows text to ‘float’ (flow) around it */
  margin: 6px 6px 6px 0; 
  /* Margin order: top, right, bottom, left */
}

id attribute

As the name implies, there can only be a single instance of an id attribute in a given page. id attributes serve two important functions:

id attributes provide a means for links (from anywhere) to target the location of a specific element within a page. For example, because the sub-heading for this section on attributes is marked up as <h2 id="attributes">HTML attributes</h2>, we can link directly to it with <a href="#attributes">link</a>.
id attributes provide a powerful structural hook that CSS and JavaScript can exploit. For example, suppose that you had an option to subscribe to a mailing list on your page and you wanted the heading for it to stand out in some way, perhaps with an email graphic of some kind. In basic markup terms it might be just another sub-heading, but by giving it an id, it can interact with whatever counterpart styling is described in the stylesheet. The markup to apply all those styles could simply be: <h2 id="subscribe">Subscribe to our newsletter</h2>.

About semantic markup

Nearly all HTML elements describe the intended semantics — the meaning — of the content they enclose. So, a primary heading should be in an <h1> tag, an abbreviation should be in an <abbr> tag, etc.

Do it right

The key to good markup is to use the HTML tags that your content naturally suggests and then let the CSS take care of appearance.

Form markup

Forms are one of the most complex areas of user interaction, so it is essential to understand and to use form markup best practice. Though it requires diligence and attention to detail, it’s not hard to do, and there are substantial accessibility and usability benefits for all users, whether they have a disability or not.

Form markup is beyond the scope of this article and ought in any case to be the responsibility of your development team. Websemantics.co.uk have an excellent guide to accessible form markup.

The benefits of semantic markup

Compared with outdated and verbose ways of doing markup, semantically marked up content is:

More accessible, usable and meaningful for both human users and machine users such as search robots
Easier to read, maintain, adapt and reuse by both you and your developers
Easier to style
Often quicker to download, reducing demands on hosting and improving performance for the user
More interoperable with other software, extending its reach and value (eg: initiatives such as Microformats)

What not to do

Sometimes it can be easier to understand what to do by learning what not to do…

Don’t use tables unsemantically just to achieve the layout of elements — doing so reduces the accessibility and adaptability of your content.
Don’t skip heading levels because it frustrates users of assistive technologies such as screen readers. For example, an <h1> should not be followed later by an <h3> without the use of an <h2> somewhere between.
Don’t use line breaks <br /> or empty paragraphs to achieve extra line spacing.
Don’t leave the erroneous line breaks or empty elements that are all too easy to create in TinyMCE (don’t be afraid to use the HTML source view to seek them out and remove them).
Don’t mimic lists using special characters such as • for bullet points (use proper list elements instead).
Don’t use inline styles except in HTML email, and know how to avoid those generated by Microsoft Office documents.
More generally, don’t use expedient markup to compensate for a perceived (or actual) lack of stylesheet rules that would achieve the effect you are after — if you need more or amended styles, commission them!

Appearance and CSS

Note: the aim of this section is merely to explain how CSS works sufficient that you understand how to activate it via your HTML markup.

The advantage of CSS

HTML can describe appearance — using deprecated, anachronistic tags and attributes such as <font color="red"> — but it shouldn’t!

In the mid to late 1990s, around four years after the formal birth of HTML, the Cascading StyleSheet (CSS) companion language was invented to do the job of controlling presentational appearance far more elegantly and efficiently.

The objective of CSS is to separate the structural markup of content from its appearance, making each much easier to edit independently.

Default browser styles

It’s worth understanding that all visual web browsers come with built-in default styles that are applied to each and every HTML element. If you remove or disable the stylesheet associated with a web page these default styles are what you see. Because these default styles differ from browser to browser, Watershed utilise a stylesheet ‘reset’ that equalises the styling of all elements across all browsers. Custom styles are then appended to deliver the design you want.

Note: Click here to view this page in a new window or tab with the stylesheet disabled!

Stylesheets, rules, selectors, scope and properties

Stylesheets interact only with HTML with which they are linked or within which they are embedded.
Stylesheets define one or more style rules, each of which is comprised of one or more (comma separated) style selectors and one or more style properties (wrapped in curly braces) {} — see example rule below.
Style selectors affect all HTML elements that match their scope, which can be as broad or as precise as required by the design:
- A scope as broad as to affect all HTML elements of a given type, site-wide (eg: all <h2> elements across a site)
- A scope as precise as to affect only a single HTML element that has a specific HTML attribute type (eg: id) with a specific value (eg: subscribe) (eg: <h2 id="subscribe">)
Style selectors can be defined by custom selector names (eg: subscribe). Such selectors only affect HTML elements that have attributes with a case-sensitive value of the same name (eg: subscribe is different from Subscribe).
Style properties each comprise a valid property (eg: color), a colon :, a valid value (eg: #919fab), and a semi-colon ;.

An example

The style rule for all <h2> elements on this page is:

h2
{
  margin-top: 2em; 
  /* The ‘em’ unit is equivalent to font-size */
  border-top: 5px solid #dee2e6; 
  /* ‘px’ is short for pixels */
  padding: 1em 0 0.5em; 
  /* Padding order: top, right, bottom */
  font-weight: bold;
  font-size: 153.85%;
  line-height: 1;
  color: #919fab; 
  /* Colours are specified in hexadecimal */
}

Note: the text in grey are non-functional CSS comments.

The cascade (how styles are targeted and inherited)

One of HTML’s key strengths is that its elements can be nested inside each other. Cascading StyleSheets have built-in rules that play to that strength, controlling the way that styles are inherited — cascade — or not. The basic rules are:

Selectors targeted at one or more HTML elements of the same type…
Are overridden or appended to by selectors targeted at HTML elements that share a class attribute…
Which are overridden or appended to by selectors targeted at a single HTML element with an id attribute.
Which are overridden or appended to by inline style attributes.

For example:

h2 {some style rules} provides default styling for all <h2> elements.
.cta {some 'call to action' style rules} sets up a custom selector name that targets any HTML element with an attribute of class="cta".
h2.cta {some 'call to action' style rules} is more specific as it applies only to <h2 class="cta">.
#subscribe {some style rules} targets any HTML element with an id attribute of id="subscribe".
#subscribe p {some style rules} is an example of a descendant rule that targets all <p> elements that are nested within any other HTML element with an id attribute of id="subscribe".

There are other powerful and precise selectors but unfortunately the world isn’t ready for their use by the majority until Microsoft Internet Explorer 6 and 7 are used only by a small minority. Internet Explorer 8 will have the capabilities that other browsers like Firefox and Safari have had for years!

The key to exploiting the power of CSS as a content owner

Make sure all the HTML elements you intend to use have been explicitly styled in the stylesheet (or that you are content for them to be styled by the user’s web browser defaults).
Know what class and id selector names have been declared and styled in your stylesheet so that you can apply them as HTML attributes in your markup.

Beware inline styles and pasting content from Microsoft Office

Style properties can be specified ‘inline’ using the style attribute within HTML markup. For example: <h2 style="color: red;">. There are two big problems with using inline styles in web content:

Inline style properties take higher precedence over those written for any other CSS selector and therefore override and undo any carefully crafted stylesheet!
Inline styles are embedded within the HTML and only apply to the single instance of the HTML element for which they are an attribute, thus undoing the whole advantage of using CSS!

Pasting content from Microsoft Office into TinyMCE

Microsoft Office applications, especially Word, make the unreasonable assumption that the styling you have in your Office document is exactly the styling you wish to apply in your web content. As a consequence, when you paste text copied from an Office document directly into TinyMCE, there is a slew of inline styles that comes with it! The solution is to either:

Paste into Notepad (or another plain text editor) first, and then paste into TinyMCE
In TinyMCE, click the Paste from Word button, paste into the pop-up field provided, and then click Insert.

CSS in HTML email

The extent to which the power of CSS can be exploited in HTML email is severely constrained due to the limited and differing support for it from one email application to the next. CSS can be used effectively in HTML email but with great caution and understanding.

With HTML email, inline styles are actually a requirement because they are the only way to get CSS to work in a number of important email applications such as GMail and Hotmail. Even though inline styles are inefficient, they are still easy to create and maintain thanks to the wonderful inline CSS tool provided by Campaign Monitor.

Essential learning and recap

Knowledge is power, but if time and inclination to absorb all the above are limited, here, in order of priority, is the stuff your really should know!…

Learn the terminology and understand the fundamental concepts.
Learn the anatomy of HTML tags and HTML attributes.
Learn the anatomy of HTML tags and HTML attributes.
Know the most commonly used HTML elements and their tags.
Make sure your stylesheet has what you need and that you know its named selectors so that you can apply them in your TinyMCE editor.
Know the most commonly used HTML attributes, especially class and id as they leverage the full power of your stylesheet.
Use HTML elements and their tags semantically.
Know what not to do.