Last modified: Wednesday, 30-Oct-2019 18:29:56 UTC. Maintained by: Elisa E. Beshero-Bondar (eeb4 at psu.edu). Powered by firebellies.

Introduction to HTML

The headquarters of the W3C at the MIT campus in Cambridge, Massachusetts.

Introduction: HTML in Context

In our course, at the point we expect you’ll be reading this, you have been writing and planning XML documents. By now you’re accustomed to writing well-formed XML code, as well as writing rules to constrain what elements, attributes, and content you want an XML file to contain. So far, your writing of code has followed syntax rules for well-formedness, leaving the naming and content of your elements and attributes up to you, to control with your own schema rules. We now turn to orient you to HTML (or Hyper Text Markup Language), which has its origins in the concept of “hypertexts” as linkable (or “hyperlinked”) documents, and developed into markup controlling the presentation of pages to be networked and shared on the World Wide Web (W3).

In writing HTML, we work within a standardized set of element and attribute names designed to be read by web browsers. XHTML (and the other forms of HTML that we’ll tell you about) all rely on standards formulated by the World Wide Web Consortium (abbreviated W3C), an organization founded in 1994 to develop open-source, platform-independent schema for coding and best practices around the world for sharing and displaying in web browsers. A web browser is software designed to share and display documents and other resources on the World Wide Web accessed through the internet. A web browser (like Chrome, Firefox, Safari) is considered “standards-compliant” when it supports the coding approved by the W3C for the creation of web pages, their styling, their linking to other sites, their representation of metadata, and their dynamic features to be customized by site visitors. Those curious to read about the history of HTML and the origins of the World Wide Web (and the much earlier origins of the Internet) can read more at LivingInternet.com (a wonderfully extensive resource), or this concise and witty walk through: “Internet History: HTML Code Evolution 1.0 to 5.0.”

This guide orients you to XHTML (or eXtensible Hyper Text Markup Language), which at the time of this writing is the most strictly defined content model for hypertexts. XHTML requires XML syntax using the hierarchical, nested elements and the start and end tags that you are familiar with from writing XML. XHTML is the form of HTML used by the W3C on their site pages, and it has served as the long-term recommendation for precise code designed to be interoperable with other XML data formats, such as SVG (Scalable Vector Graphics—which you’ll later be learning to draw and code). Interoperability is term referring to a technology’s capacity to communicate effectively with a different kind of technology. HTML and the World Wide Web were first developed in the early 1990s in an attempt to make various information retrieval structures speak to each other in an interoperable way, and XHTML, due to its strict syntax control, effectively maximizes the interoperability of HTML, which is especially important for those of us developing XML-based projects with a public face on the World Wide Web.

Basic Requirements of XHTML

Valid XHTML syntax requires the following basic structure, beginning with a <!DOCTYPE> declaration:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>A Title Here</title>
</head>
<body>
</body>
</html>

Note: We will make some modifications to this in the next section to adapt it to newtFire’s server environment.

The <!DOCTYPE> and namespace declarations tell the web browser what version of HTML you’re working with. <oXygen/> inserts this declaration for us when we choose it from the “Framework Template” list: To open a new (X)HTML file with the current doctype and namespace declarations in <oXygen/>, open a new file (under File → New ), and enter HTML in the search bar. This will open an HTML document that follows our XHTML syntax rules.

You’ll recognize our root element, which is always <html> and must always indicate an XML name space as an attribute: the xmlns="http://www.w3.org/1999/xhtml" part. Literally, this points from your root <html> element to the published standards set by the W3 consortium for XHTML. Note: <oXygen/> comes with up-to-date W3 schemas to validate that the code you’re writing is good, strict XHTML, but if you weren’t working in <oXygen/>, or if you’re just curious about whether a site you find on the web is valid, you can always use the W3C Markup Validation Service.

The head element is always required and must contain a title element, but note that this does not display on the browser page (though it often appears in the tabs above the browser window). The part of your web page that displays in the browser is coded within the body element.

You really don’t need many elements to build a website, so we are introducing a simple selection that we find ourselves always using in our pages (including the course pages you’re reading). For more, we recommend the w3schools site as a useful ready reference to look up HTML elements, see how they display in browsers, as well as how to style them using Cascading Stylesheets (CSS) code which we’ll be covering, too, in our Introduction to CSS.

To get a view of how most of the elements we’re discussing fit together on a web page, try selecting “View Page Source” (by right-clicking in your browser window, or locating it in your browser menu options. And for a quick visual overview of how websites work and how webpages are structured, check out Basic Web Pages on one of our favorite go-to resources, Intereting is Hard.

How to Save Your Work for Your Server Environment (with newtFire Protocols)

Note that different server environments have different protocols and ways of handling XHTML Doctype declaration and file extensions, so you want to find out the specific protocols for file extensions and posting files from your network administrator. You may save HTML files with either of the following extensions: .xhtml or .html. On the NewtFire server we tend to prefer the more commonly used .html extension, and we also add code to ensure that UTF-8 characters are always served in every browser, and to help make the HTML page fully responsive on a wide range of screens (from mobile devices to wide-screen monitors). If you are developing a newtFire project, then we recommend that you save your work as .html and apply the following setup for your Doctype declaration and <head> element:

 <!DOCTYPE html> 
         <html xmlns="http://www.w3.org/1999/xhtml">
            <head> 
               <title>Your page title here <!--to appear in the browser tab, but not on the page--> </title>
               
               <meta name="viewport" content="width=device-width, initial-scale=1.0" />
               <link rel="stylesheet" type="text/css" href="explain.css"/>  
           </head> 
           <body> <!--Code the viewable part of your site here.-->
           </body>
         </html>

Block-Level Body Elements

Block-level elements are the major structural components of an XHTML page, and usually we do not nest these inside each other: They are discrete “blocks” formatted for distinct display on a page. Each of these elements opens on its own line and closes before the next block-level element opens. Block-level elements are the only permitted children of the HTML root element, and they include headings, paragraphs, lists, and tables.

Headings

Heading elements are for title and section headings throughout your page. HTML defines six levels of headings, with the idea that the first level is usually the largest and strongest, and others get smaller and smaller. The heading tags are: <h1>, <h2>, <h3>, <h4>, <h5>, and <h6>. The idea is to use these in order, so that you typically only use <h1> once to give the title of the whole page, and then use <h2> for major sections and <h3> for subsections (etc.) Have a look at this visual example from w3schools to see how these six elements typically display in a browser.

Paragraphs

Body paragraphs are simply coded with <p> elements. See w3schools visual example of code and browser display.

Lists

Lists are made with two elements, one nested inside the other: A list needs a “wrapper” (or container) element that indicates it’s a list and what kind of list (and that’s the block level part): The wrapper element is either <ol> for an “ordered” (or numbered) list, or <ul> for an “unordered” or bulleted list. (Whether your list is numbered or bulleted depends entirely on the wrapper element.) Inside that single wrapper we have multiple “items” coded with <li> elements (for “list item”). Here’s a sample of coding for an unordered list, followed by an example of its visual display in the browser:

<ul>
<li>apples</li>
<li>oranges</li>
<li>bananas</li>
</ul>

Here’s the browser display of the coding above:

apples
oranges
bananas

If I change my wrapper element from <ul> to <ol>, I generate an ordered (numbered) list:

apples
oranges
bananas

(You’re probably noticing how much extra spacing I have in my unordered list: That’s because I have styled my unordered lists with CSS code to be “padded” with extra spaces on my course pages. Here’s a visual example from w3schools so you can see a default (unstyled) browser view. Try editing the code on the w3schools page to turn the unordered list into an ordered list!

Tables

Tables are a little more complicated: These are made with three nested elements:

<table>: This is the “wrapper” block-level element on the outside that defines the table. We can place an attribute called "border" on table to outline it and each of its internal cells: <table border="1">
<tr>: Table row elements which define each horizontal row of a table.
<td> or <th>: Inside each table row are individual table cells (called “td” for “table data”). You can designate “heading” cells, which is styled with a little more emphasis, using <th> (or “table header” cell).

This may seem odd, but there isn’t an element for wrapping columns in a table. Instead, columns are created by stacking the individual td cells inside their tr rows. If a table row has five cells inside, you have a table with five columns.

Here is a sample of code for an HTML table, outlined with a border, and containing three rows and three columns. In the first row, we’ve designated the table cells to be ( th ) for headings, followed by a couple of rows containing ordinary td cells.

<table border="1">
<tr>
<th>Row 1, left column (heading cell)</th>
<th>Row 1, middle column (heading cell)</th>
<th>Row 1, right column (heading cell)</th>
</tr>
<tr>
<td>Row 2, left column</td>
<td>Row 2, left column</td>
<td>Row 2, left column</td>
</tr>
<tr>
<td>Row 3, left column</td>
<td>Row 3, left column</td>
<td>Row 3, left column</td>
</tr>
</table>

Here’s a visual display of the table coding above:

Row 1, left column (heading cell)	Row 1, middle column (heading cell)	Row 1, right column (heading cell)
Row 2, left column	Row 2, left column	Row 2, left column
Row 3, left column	Row 3, left column	Row 3, left column

For more examples of tables including styling you might want to try applying to their borders, see w3schools’ assortment of tables.

In-line Body Elements

In-line elements are used inside the block-level elements, to set apart certain passages with emphasis, or to link out to other pages, or display an image or render a multimedia file in the browser. We use the following in-line elements most frequently in our work:

Bolding: Two choices: <b> or <strong>. One of these simply indicates bold, while the other is an example of a semantic element, which we like to use when we want to convey emphasis.
Italics: Again, two choices (with similar significance): <i> or <em>. The semantic element here is <em>, again to indicate real emphasis. We use the <i> element for titles of books, movies, etc, where we would typically italicize a title, as in Mary Shelley’s novel, Frankenstein. The output is really the same, but the underlying code can carry some meaning behind the presentational display.
Links: Use the “anchor” element <a> to wrap text to make a clickable link, and define the target of the link with the @href attribute, like this (code view followed by browser display):

<p>Here’s some text inside a paragraph with an <a href="http://greensburg.pitt.edu">absolute link to Pitt-Greensburg’s homepage</a>. And then some more text.</p>

Here’s some text inside a paragraph with a relative link to our course homepage. And then some more text.

An absolute link is a full website URL (or website address starting with the http:// prefix). Use this kind of link when you are pointing to a webpage or web file hosted outside your own site.

A relative link is for pages or files in your own web directory space, and is a simple filepath mapped from the current file to the new file. To link to a file in the same directory, the link just needs to target the filename (with its extension) directly. To link to a file accessed in a directory above the current file, use ../ to climb up.
Images: <img/> with its required @src and @alt attributes. Image elements are distinct from other in-line elements because they are empty (or self-closing) elements: they contain only attribute content, not textual content.
<img src="sandbox.jpg" alt="image of a sandbox"/>

To make an image display in your web page is a little like coding a link: You have to point to a target using the @src attribute and set its value to indicate a separate image file, and you have to set the @alt attribute to some text that serves as a stand-in for the image, a W3C requirement in case the image does not display in the browser, as for example in braille browsers or browsers with text to speech recognition for the visually impaired. The image file might be sitting in the same local directory with your web page, as in the example above. Or we can point to an image at some other file or address on the World Wide Web, like this:

<img src="wolf_and_eagle.jpg" alt="Leonardo Da Vinci’s drawing of the Allegory of the Wolf and the Eagle"/>

Sources like Wikimedia Commons provide public domain images with information on their sources, so we recommend browsing here. (Notice the kinds of information Wikimedia Commons makes available about the Da Vinci image in our example.)

Generic Block and Inline Body Elements: (div and span)

You should know about two more extremely versatile HTML elements. These are used to block off portions of your document to format in a precise way, such as to create boxes sitting side by side on your page (as we did above to set a text next to an image). These are extremely useful for styling with CSS, as you’ll be learning shortly. Here are the generic elements that we frequently use:

<div>: The div element lets you wrap a portion of your page in a block-level division, perhaps to style in a particular way or locate in a particular space on your page. We’ve used div elements all over this page to organize it visually. We typically use an @class attribute to designate particular kinds of divs, to designate divs that hold images, or divs that hold blocks of display code, for example. Example: <div class="inner-box">
<span>: The span element is our in-line generic element for grabbing and highlighting significant spans of text that we want to stand out, perhaps with color coding. We’ve used the <span> element on this page and on our other guide pages to highlight the code snippets we show you in-line, and again we typically use an @class attribute to classify different kinds of spans. Example: <span class="code-snippet">

Editing and Uploading Your XHTML Pages: Working with Server Space

You can edit XHTML pages and save them, together with their associated image and stylesheet (and other) files, in a directory all together to be viewed in a web browser. That directory could be on your own personal computer (and while you are first developing pages you might simply save files on your local computer even disconnected from the internet to view in a web browser while you’re drafting, though of course any content you have linked to on remote locations on the World Wide Web will not load if you are not online). Typically we create web pages to have a public-facing presence on the World Wide Web, or at least to have a community presence on an intranet (shared within a firewall while a site is under development). To make our pages available to others to access requires uploading all the files involved to a web server using File Transfer Protocol (FTP), a standard rule system for exchanging files between computers over a network. Various security measures have been developed to guard web servers from invasive hacking attacks, so that many web servers require Secure File Transfer Protocol (SFTP) and nearly always require registration and authentication with a username and password. SFTP can be accessed from command line, though more frequently people tend to use one of several freely available SFTP software clients with a GUI (Graphical User Interface) that stores your site connection information and, on connection, shows the files in your computer and the files on the remote server, making it easy to upload and download. (We’ve posted information on a few good SFTP clients on our course syllabus.) When HTML pages are uploaded on a web server, they are given a specific URL, or Uniform Resource Locator, otherwise known as a website address, and it usually begins with http:// followed by a a distinct locator for the web browser and your directories and files on it, as in http://newtfire.org/dh/CDASyll.html.

By convention, the first page you place in a particular website folder is designated your index.html page. You don’t have to have an index.html, but if you do, the main page of your site can be abbreviated to the name of the site directory holding the web files, like this one for my personal Pitt homepage: http://www.pitt.edu/~ebb8/, or these from our newtFire server for Jon Horanic, Stacey Triplette, Brooke Stewart, and Becca Parker. By default, when given that address, the web browser retrieves the index.html file I have placed in that space, and if it doesn’t find one it generates an error. (The site address leads to exactly the same place as http://www.pitt.edu/~ebb8/index.html .)

If you are a student enrolled in our course at Pitt-Greensburg, we have shown you (or are about to show you) how to access our class’s newtfire web server together with SFTP (instructions posted on Courseweb Announcements) so that you can access your personal folder to upload files through an SFTP client and then view those files in a web browser. As with most colleges and universities, students, faculty, and staff across the Pitt system have access to public-facing personal web space, which we encourage you to learn about and set up on your space following Pitt’s posted instructions. Enrolled students in our course will post files to our newtFire server for HTML related homework exercises and for course project website development, and we provide information to you privately on how to access your assigned web space.

You want to be mindful of your file management in your server space, and we recommend taking care to choose simple and easily understandable file names. Keep the names short and simple. Do not use your last names in your filenames for anything you post on a web server (this is not like posting to Courseweb and our ordinary homework submission rules do not apply here). Consider setting up special directories for different kinds of files as you build your websites in order to make everything you need easy to locate later. Mirror your file directories on your associated GitHub repositories, so that anything you post to the web server is also saved in the same file directory structure (and backed up) on your GitHub repository should our server go down or should you choose to transfer your files to a new server someday. We recommend following the guidelines on Obdurodon’s Project directory and file structure tutorial for sustainably organizing and managing your web space.

Setting up XSLT to Make HTML

When you learn to write XSLT to transform XML into HTML, you will need to configure some settings and create a template rule that matches on the document node. The stylesheet should start out something like this:

         <?xml version="1.0" encoding="UTF-8"?>
         <xsl:stylesheet xmlns="http://www.w3.org/1999/xhtml"
         xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
         <xsl:output method="xhtml" encoding="utf-8" doctype-system="about:legacy-compat" omit-xml-declaration="yes" />
         <xsl:template match="/">
         <html>
         <head>
         <title><!-- title will go here --></title>
         <meta name="viewport" content="width=device-width, initial-scale=1.0" />
         <link rel="stylesheet" type="text/css" href="whatever.css"/>

         </head>
         <body>
         <!-- you'll normally use one or more <xsl:apply-templates> rules here -->
         </body>
         </html>
         </xsl:template>
         </xsl:stylesheet>

Basically, we map out the skeleton of the HTML file, all its basic structural components within a first template rule matching on the document node. One oddity is the @doctype-system attribute: We have to set this to "about:legacy-compat" because XSLT otherwise will not generate the HTML DOCTYPE line that it needs to be valid. The output HTML document will have a longer doctype line (<!DOCTYPE html SYSTEM "about:legacy-compat"> instead of just <!DOCTYPE html>), but it basically indicates the same thing to a web browser. XSLT can only output the longer form of the doctype line, based on the way it parses code. The indication of legacy compatibility basically indicates that the XSLT technology producing the HTML is a legacy system, maintaining compatibility with current HTML standards.