A List Apart has a great article on using special characters in HTML. According to the article, decimal codes are the way to go, and you shouldn’t trust FrontPage and Dreamweaver to insert the correct character codes for viewing across browsers.
The article also goes into an illustration–well worth reading–on the differences between hyphens, em dashes, and en dashes, and how to use each:
Hyphens are Not Dashes
Stop! Go back and re-read the subhead above—at least 2–3 times—then let it sink in before continuing.
The sentence above illustrates the proper use of the hyphen and the two main types of dashes. They are not the same, and must not be confused with each other. In some fancy fonts the difference is more than just the width—hyphens have a distinct serif. If you don’t know the rules already, let’s review them. First, though, a definition:
An “em” is a unit of measurement defined as the point size of the font—12 point type uses a 12 point “em.” An “en” is one-half of an “em.”
Though some of the finer points in the rules are complex, their basic applications are clear-cut and their misuse easily identifiable. First, neither an em dash nor an en dash should be confused with the hyphen (-), which is used to join compound words together.
The correct use of em and en
The em dash (
—) is used to indicate a sudden break in thought (“I was thinking about writing a—what time did you say the movie started?”), a parenthetical statement that deserves more attention than parentheses indicate, or instead of a colon or semicolon to link clauses. It is also used to indicate an open range, such as from a given date with no end yet (as in “Peter Sheerin [1969—] authored this document.”), or vague dates (as a stand-in for the last two digits of a four-digit year).
Two adjacent em dashes (a 2-em dash) are used to indicate missing letters in a word (“I just don’t f——ing care about 3.0 browsers”).
Three adjacent em dashes (a 3-em dash) are used to substitute for the author’s name when a repeated series of works are presented in a bibliography, as well as to indicate an entire missing word in the text.
The en dash (
–) is used to indicate a range of just about anything with numbers, including dates, numbers, game scores, and pages in any sort of document.
It is also used instead of the word “to” or a hyphen to indicate a connection between things, including geographic references (like the Mason–Dixon Line) and routes (such as the New York–Boston commuter train).
It is used to hyphenate compounds of compounds, where at least one pair is already hyphenated (as in “Netscape 6.1 is an Open-Source–based browser.”). The Chicago Manual of style also states that it should be used “Where one of the components of a compound adjective contains more than one word,” instead of a hyphen (as in “Netscape 6.1 is an Open Source–based browser”). Both of these rules are for clarity in indicating exactly what is being modified by the compound.
Other sources also specify the use of an en dash when referring to joint authors, as in the “Bose–Einstein” paper. Some also prefer it to a hyphen when text is set in all capital letters.
Some typographers prefer to use an en dash surrounded by full spaces instead of an em dash. Others prefer to insert hair spaces on either side of the em dash, but this is problematic with some web browsers (see the section on spaces for more detail).
That hyphen you can insert with the key next to the zero on your keyboard is an ambiguous character suffering from an identity crisis. It can’t decide if it’s a hyphen, a minus, or an en dash—in fact, the Unicode specification describes it as “hyphen-minus” and defines very specific replacements for each of its personalities.
Use it if you need to insert a hyphen, but never for a minus (
−) or a dash, since it does not have the correct width for either, or the vertical position for the latter (compare “1+4-2=3” to “1+4−2=3”).
The soft hyphen (
­a.k.a. “discretionary hyphen” and “optional hyphen”) is to be used for one purpose only—to indicate where a word may be broken at the end of a line. Otherwise, it is to remain invisible and not affect the appearance of the word.
Some browsers display it no matter where it falls, but this is not the correct behavior. Others in the past have recommended against its use because its behavior was not well-defined, but the HTML 4.01 spec makes its use and behavior clear and unambiguous.
Three other hyphen characters exist in Unicode, but are unfortunately not defined in the HTML entity set (although they should be):
- The non-breaking hyphen (
‑not in HTML) does just what its name implies.
- The hyphen character (
‐not in HTML) is meant to be used in place of the hyphen-minus when a hyphen is exactly the desired character.
- The hyphenation point (
‧not in HTML) is that bullet-like character you find in some dictionaries to separate syllables. That is its only use, but if you’re creating an online dictionary, using it will make your entries look more professional.
(Alas, I think I am guilty of violating many of those rules in my own web publishing. Out of laziness, I tend to use a double-hyphen rather than an em dash, and ignore the en dash altogether.)