This site does not refer to the standard SGML data types as used in the HTML DTD, but mixes that information with the prose for the definitions listed here. If you need the syntax rules as defined by the DTD, go use the DTD.
In HTML, tags and attributes are case-insensitive. Attribute values, however, can be case-sensitive, case-insensitive, or case-neutral. URIs, for instance, are case-sensitive while list types are not. Number values are case-neutral.
A [character] value is a single character from the document character set. A character may also be referenced by its character entity (escape code).
Basically, [text] stands for text. Can take any and all characters from the document character set and may include character entities (escape codes).
[name] values may include capital letters (A-Z), small letters (a-z), hyphens (-), periods (.), underscores (_), and colons (:). However, they must begin with a letter.
[number] values may include any positive integer and zero (0) unless further restricted.
[URI] values are defined by RFC2396, and include relative and absolute URIs
URIs include URLs, so even if you've never heard of URIs, this data type should be nothing radically new.
URI=Uniform Resource Identifier
URL=Uniform Resource Locator.
A URL is the address to a file, given as the protocol followed by the path. The protocol is the method, or "language", used to fetch the file and is separated from the path by a colon (:).
All protocols except 'mailto' are followed by two slashes after the colon.
The first part of the path for an http or ftp URL is the location of the machine (server) that has the file on its system. A number designates each individual server on the WWW; however, they are rarely used. Instead, domain names associated with the servers are used. Domain names are like nicknames--the same server may be called by different names, and the domain name can also be used to represent a specific directory on the server.
Domain names are case insensitive:
fantasai.tripod.com is the same as Fantasai.Tripod.COM
Following the domain name is the path to the file itself. Paths are given UNIX-style, with forward slashes separating the directories. If a filename is not given, the server automatically looks for an index file (e.g. index.html) to return; if none is found, it returns a list of all files in the directory.
Paths are case sensitive:
https://fantasai.tripod.com/UTF-8/contents.htm is not the same as https://fantasai.tripod.com/utf-8/contents.htm or https://fantasai.tripod.com/UTF-8/contents.HTM
Mailto URLs are simply the protocol followed by the email address:
Relative URLs give the location of a file relative to a base URL. (The base URL is the location of the source anchor's file unless overridden by
<BASE>.) One of the nice things about relative URLS is that you don't have to change them every time you move your files from place to place. This allows you to check your files for broken links on your hard drive before uploading them.
Relative URLs work like a set of point to point directions. You start in the base URL directory. If the destination file is in the same directory, you simply specify the filename. So if I wanted to go from this file (datatype.htm) to the index of this directory, I would simply specify a URL of
If you're going down a directory to get to the destination file, you specify the directory and then the filename. For example, a link from my main page (contents.htm) to this page (datatype.htm) would specify a URL of
To go up a directory level, you specify
"../". Therefore, to link back up to my main page from here, I would specify
You can also combine these for relationships that go up and down several directory levels in the tree. A URL from here to the tag entry for hypertext links would be
"../HTML4/Links/a.html". This goes up one directory level into "UTF-8/", down into "HTML4/", down from there into "Links/", and then picks out "a.html" from that directory.
Fragment identifiers can be used to specify a specific element, or part of the document, as the destination anchor. The element must be named to function as an anchor--either by the
name attribute of the anchor tag (
<A>) or by the
id attribute on any element. To refer to this named element, a URL first designates the file by either an absolute URL or a relative one, followed by a hash (
#), then the identifier.
ex: "https://fantasai.tripod.com/UTF-8/Appendix/datatype.htm#URI" refers to the heading of the entry for URIs (on this page).
From "a.html", I can also use "../../Appendix/datatype.htm#URI".
From within the same file, no file needs to be specified.
"#URI" from here will take you back up to the URI header.
[content-type] values are media types as defined by RFC2045 and RFC2046. Do not confuse them with media types as used in HTML, which are different. (While the RFC uses "media type" in its definition, I will be using "content type" to avoid confusion.)
There are five discreet top-level content types. These are followed by a slash (/) and a subtype. Example: text/html, where text is the top-level media type, and html is the subtype.
A complete list of registered MIME types.
[language code] values are language codes as defined in RFC1766. I had quite a time tracking down a copy--the original server doesn't exist anymore, but the W3C was kind enough to update their resource links for HTML 4.01.
A language code consists of two parts: the primary language tag, optionally followed by a hyphen (-) and a hyphen-separated series of subtags. Example: "en" is the code for English. To be more specific, you can also specify "en-US", indicating the US variant of English.
The primary tag uses a language code from ISO 639. It may also take the values "i" and "x", whose uses are defined in RFC1766.
The subtag can be used to indicate:
[character set] values are character sets from the IANA character set registry.
[datetime] values use the ISO date format (ISO 8601). Since the document is not readily available (ya have to pay -_-;;), the HTML specification covers the format it uses:
YYYY-MM-DDThh:mm:ssTZD or [year]-[month]-[date]T[hour]:[minute]:[second][time zone designator]
|Time Zone Description||Time Zone Designator|
|UTC (Coordinated Universal Time)
a.k.a. GMT (Greenwich Mean Time)
|Time zones ahead of UTC
(to the east)
+[hours]:[minutes] ahead of UTC
|Time zones behind UTC
(to the west)
-[hours]:[minutes] behind UTC
RGB triplets code for a color. They are a six digit number in hexadecimal form.
Each two digits represent a color:
|The first two represent Red.||--> R|
|The next two represent Green.||--> G|
|The last two represent Blue.||--> B|
The higher the number for the color, the more it is shown, and the brighter it is. For example, #FF0000 would be bright red, while #660000 would be a really deep, dark red. What if you wanted to make your color lighter, say, pink? You would add some green and blue to lighten it. Just don't add more green and blue than you have red, or you'll get greenish-blue.
If all the colors are equal, you'll get a shade of gray. So #FFFFFF would be a the brightest gray (white), and #000000 would be the darkest gray (black).
Don't really understand any of this? Check out BigNoseBird's page about COLORS.
The hexadecimal number system has sixteen digits (0 - F), rather than ten digits (0 - 9) like the decimal system we normally use. Therefore a one in the "ten's" place now represents sixteen. 10 in hexadecimal is the same as saying 16 in decimal, and F stands for fifteen. Most of the time, the three colors' numbers are written separately in decimal form, and can take values from 0 to 255. In the hexadecimal system, this would correspond to 0 - FF.
Be sure to convert each color by itself.
To convert manually, take the decimal number and divide by sixteen. If the quotient is a two-digit number, convert it to a letter-numeral. Write down the number in the sixteen's place ("ten's place"). Take the remainder, and convert that if necessary. Write it down next to the quotient, in the one's place.
216 ÷ 16 = 13 R8 216 (dec) = D8 (hex)
Now convert the next color.
Once you have all three numbers converted, write them one right after the other in the order Red, Green, Blue. This is your RGB triplet.
|Decimal-Hexadecimal Digit Conversions|
[16color name] values can take one of sixteen color names corresponding with RGB triplets as defined below:
|Color Name||RGB Triplet|
<SCRIPT>element instead. (For those of you about to defend Microsoft's browser, I have version THREE! So don't explain that it works fine in 5.0 or whatever.)
You can also define your own link types by using a meta data profile. I have no idea what a profile is, so here the specification entry: HTML 4.0 - Meta Data Profiles It's in the section on the META tag, so there will be references to that tag and its attributes.