Incorporate
  1. SOAP; use SOAP Primer from W3C
  2.  XML Web services from Micosoft's .Net initiative.
 Gray text is from W3C.

alert_red.gif Drafted: 5/14/00; 5/8/03alert_red.gif
Currently being created!

COSC 330

LEARNING MODULE X
EXTENSIBLE MARKUP LANGUAGE

       Extensible markup language (XML) is a new open standard, proposed by the W3C, for a customizable markup language for  defining data formats for the Web.  It is designed specifically to provide an extensible tag-based language for specifyin new data formats so that any kind of data can be transmitted, seamlessly, over TCP/IP networks.  It has been said that "XML is to data what HTML is to text", but since text is a specific form of data, XML is a more general markup language than HTML.  Like HTML, XML is derived from the metalanguage SGML.  XML is a specialized version of SGML that has the ability to describe any type of structured data, but whose efficiency minimizes the bandwidth required for TCP/IP transmissions.   Three excellent references, from which this LM was developed and with which it is integrated are the XML home page of the W3C, the TechEncyclopedia definition of XML (and associated entries), and Netscape's XML Developer Central.
See the Study Guide for this learning module.

The Objectives of this learning module are to:

  1. specify the basic concepts of XML
  2. to relate XML to associated concepts DTD, XSL, and XHTML
  3. to compare XML, XHTML, and HTML
    1. show how XML is used to create XHTML
    2. show how HTML can be translated into XHTML
  4. to present a perspective on the future of XML on the future of Web development
TPQ 1: Rewrite the preceding objectives in terms of personal accomplishments to be attained after finishing the study of this learning module.

This LM covers the same content as Chapter 26 in Niederst, Web Design in a Nutshell, but modifies this in order to better associate related techniques. The sequence of presentations in this learning module is as follows.   You can click on any link to jump directly to that section.

  1. BASIC CONCEPTS IN XML
  2. DOCUMENT TYPE DEFINITION (DTD)
  3. XSL, THE STYLESHEET LANGUAGE FOR XML
  4. XHTML IS HTML RECAST WITH XML
  5. COMPARISON OF XML AND HTML
  6. APPLICATIONS OF XML
  7. REFERENCES ON XML, XHTML, AND XSL


1.  BASIC CONCEPTS IN XML:

  1.  XML is customizable markup language specifically designed for representing "structured data" (e.g. spreadsheets, databases, address books, financial tables, technical drawings, etc.)  in a text file.
    1. Formatting all data as simple text has several advantages:
      1. Data formatted as text is independent of the application which created it. In general, the format of data saved on secodary storage is characteristic of the application that generated it.  Typically, such data has either a binary format or text format.However, unlike the binary form, data formatted as text is application independent, i.e. it can be displayed and modified without the application that produced it.
      2. Data formatted as text allows developers to modify it with simple text editors instead of using the application associated with the data.
    2. The disadvantages of text markup languages can be conpensated.  Normally, data files in a text format are larger than comparable binary files.However, using a text markup technology was a conscious decision by the XML developers based on the advantages mentioned in the previous section. "The disadvantages can usually be compensated at a different level.  Disk space isn't as expensive anymore as it used to be, and programs like zip and gzip can compress files very well and very fast. Those programs are available for nearly all platforms (and are usually free). In addition, communication protocols such as modem protocols and HTTP/1.1 (the core protocol of the Web) can compress data on the fly, thus saving bandwith as effectively as a binary format."
    3. According to the W3C, XML is a set of recommended conventions for creating text formats for structured data files that are :
      1. easy to generate and read (by a computer),
      2. unambiguous,
      3. extensible,
      4. platform independent, and
      5. a universal standard.
    4. Unlike HTML tags, XML tags  are used only to delimit data elements; the interpretation of the data is done entirely by the application that reads it.  The customizable tags enable developers to define unique types of data that can be transmitted over a TCP/IP network, validated and interpreted by the receiver.
    5. XML can be used by any individual or group of individuals or companies that wants to share information in a consistent way.
    6. According to the W3C's activity page, XML will
      1. Enable internationalized media-independent electronic publishing
      2. Allow industries to define platform-independent protocols for the exchange of data, especially the data of electronic commerce
      3. Deliver information to user agents in a form that allows automatic processing after receipt
      4. Make it easier to develop software to handle specialized information distributed over the Web
      5. Make it easy for people to process data using inexpensive software
      6. Allow people to display information the way they want it, under stylesheet control
      7. Make it easier to provide metadata (data aboutdata) that will help people find information and help information producers and consumers find each other
      Outside of W3C, many groups are already defining new formats for information interchange. The number of XML applications is growing rapidly, and the growth appears likely to continue. There are many areas, for example, the health-care industry, the Inland Revenue, government and finance, where XML applications may soon be used to store and process data. XML as a simple method for data representation and organization will mean that problems of data incompatibility and tedious manual re-keying will become more manageable.
  2. XML is "extensible" because, unlike HTML, the markup symbols are unlimited and self-defining.
  3. XML is actually part of a family of related W3C specifications.  The primary specification is XML 1.0  (Feb '98), the document that defines XML tags, attributes, etc  However, in addition, there is an evolving collection optional modules that provide sets of tags & attributes, or guidelines for specific tasks.  These include (from W3C's XML in 10 Points):
    1. Xlink (still in development as of November 1999) which describes a standard way to add hyperlinks to an XML file.
    2. XPointer & XFragments (also still being developed) are syntaxes for pointing to parts of an XML document. (An Xpointer is a bit like a URL, but instead of pointing to documents on the Web, it points to pieces of data inside an XML file.)
    3. CSS, the style sheet language, is applicable to XML as it is to HTML.
    4. XSL (autumn 1999) is the advanced language for expressing style sheets. It is based on XSLT, a transformation language that is often useful outside XSL as well, for rearranging, adding or deleting tags & attributes.
    5. The DOM is a standard set of function calls for manipulating XML (and HTML) files from a programming language.
    6. XML Namespaces is a specification that describes how you can associate a URL with every single tag and attribute in an XML document. What that URL is used for is up to the application that reads the URL, though. (RDF, W3C's standard for metadata, uses it to link every piece of metadata to a file defining the type of that data.)
    7. XML Schemas 1 and 2 help developers to precisely define their own XML-based formats.
    There are several more modules and tools available or under development. Keep an eye on W3C's technical reports pageThe XML Activity Statement explains the W3C's ongoing work in more detail.
  4. XML is based on SGML, structured markup language.  According to the W3C, "The designers of XML took the best parts of SGML, guided by the experience with HTML, and produced something that is no less powerful than SGML, but vastly more regular and simpler to use. Some evolutions, however, are hard to distinguish from revolutions... And it must be said that while SGML is mostly used for technical documentation and much less for other kinds of data, with XML it is exactly the opposite."
  5. XML syntax is virtually identical to that of HTML (and all markup languages derived from SGML).   It has the dual advantage of being easily processed by computers while remaining undertandable to humans.
    1. Data elements are delimited by matching start and end tags, e.g. <name> and </name>.
    2. Elements may contain attributes, name-value pairs e.g., country="US".
    From the user's point of view, XML documents look like HTML documents. Users can display and print XML documents as if they were HTML documents.  Access these examples of XML to see its general syntax.
  6. XML is planned as the foundation on which many of the W3C markup languages are (and will be) based.  This is illustrated in the following diagram.
RELATIONS BETWEEN MARKUP LANGUAGES
from the W3C XML Activity Page.
  1. The last of the W3C's XML in 10 Points says, "XML is license-free, platform-independent and well-supported.  By choosing XML as the basis for some project, you buy into a large and growing community of tools (one of which may already do what you need!) and engineers experienced in the technology. Opting for XML is a bit like choosing SQL for databases: you still have to build your own database and your own programs/procedures that manipulate it, but there are many tools available and many people that can help you. And since XML, as a W3C technology, is license-free, you can build your own software around it without paying anybody anything. The large and growing support means that you are also not tied to a single vendor. XML isn't always the best solution, but it is always worth considering."
2.  DOCUMENT TYPE DEFINITION (DTD) :
  1. A DTD is a file (or collection of files), written in XML, which contains a formal definition of a particular document type. It defines each tag and specifies how they should be interpreted by the application processing the document.  In particular the DTD specifies
    1. the tag names that can be used as data element types,
    2. where names may be used, and
    3. the syntax of type definitions.
    For example, to define a document type that can describe lists of text items, the DTD would contain (among other declarations)
     
      <!ELEMENT List (Item)+>
      <!ELEMENT Item (#PCDATA)>


    These statements declare a list (type List) of items (type Item); the "+" indicates that the list can have one or more items.  The #PCDATA specifes that the items contain only text. These declarations would allow a list of three text elements to be written as
     

      <List><Item>FIBs</Item><Item>SAQs</Item><Item>TPQs</Item></List>


    The actual appearance of the displayed list depends on the particular CSS.

  2. A DTD provides applications with advance specification of how data is to be defined.
  3. Using a DTD guarantees that all data files which are declared to have the same data type will be consistently formatted so that they can be used by browsers, search engines, database management systems, and other applications can be used.
  4. XML is the formal specification language that allows the DTD to be parsed into declarations that can then be used to identify every element in a data file and their interrelationships.
  5. XML does not have to be associated with a DTD.  In practice the design of a DTD can be difficult, so XML was designed so it can be used either with or without a DTD.
    1. "DTDless operation" means markup tags can be used without being defined formally; however, this sacrifices the great advantage of DTDs, i.e. the ability to easily create additional documents of the same type.
    2. Because XML files do not require a DTD, DTD-lesses HTML files can be converted into XML by making a few simple changes.  (See  Neiderst, p. 447.)
  6. There are thousands of SGML DTDs already in existence in all kinds of areas (see the SGML Web pages for examples). Many of them can be downloaded and used freely.  Existing SGML DTDs do need to be converted to XML for use with XML systems.
3. XSL (EXTENSIBLE STYLESHEET LANGUAGE), THE STYLESHEET LANGUAGE FOR XML:
  1. XSL, developped by the W3C XSL Working Group, is a language for expressing stylesheets for XML files. It consists of two parts:
    1. a language for transforming XML documents, and
    2. an XML vocabulary for specifying formatting semantics.
  2. An XSL stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses the formatting vocabulary.
  3. For background information on style sheets, see the Web style sheets resource page. Discussions about XSL are carried out on the XSL-List at mulberrytech.com and on the newsgroupcomp.infosystems.www.authoring.stylesheets.
  4. The latest versions of the XSL specification are:
    1. Extensible Stylesheet Language (XSL) - W3C Working Draft
    2. XSL Transformations (XSLT) Specification - W3C Recommendation
    3. XML Path Language (XPath) - W3C Recommendation
4. XHTML IS HTML RECAST WITH XML:
  1. During 1999, HTML 4.0 was rewritten in XML and named XHTML 1.0.   It is intended to be associated other XML applications, so that XHTML tags can be combined with tags from any other XML application, e.g. SMIL, MathML, SVG tags as illustrated in the following figure.
from the W3C HTML Activity Statement
  1. Since it is written in XML, XHTML itself is actually an XML application.
5. COMPARISON OF XML AND HTML:
  1.  Like HTML, XML makes use of tags and attributes, but while HTML specifies what each tag and attribute means (and often how the text between them will look in a browser), XML uses the tags only to delimit pieces of data, and leaves the interpretation of the data completely to the application that reads it. In other words, if you see "<p>" in an HTML file, you know it is a paragraph, but, in an XML file, depending on the context, it may be a price, a parameter, a person, a p... (b.t.w., who says it has to be a word with a "p"?) .... For example, a <PHONENUM> could indicate that the data that followed it was a phone number. This means that an XML file can be processed purely as data by a program or it can be stored with similar data on another computer or, like an HTML file, that it can be displayed. For example, depending on how the application in the receiving computer wanted to handle the phone number, it could be stored, displayed, or dialed.
  2. But the rules for XML files are much stricter than for HTML. A forgotten tag, or a an attribute without quotes makes the file unusable, while in HTML such practice is often explicitly allowed, or at least tolerated. It is written in the official XML specification: applications are not allowed to try to second-guess the creator of a broken XML file; if the file is broken, an application has to stop right there and issue an error.
  3. One of the shortcommings of HTML is that, although it is great for transferring documents from a server to an HTTP client, it is useless for sending them to other mediums likd a printer or a ticker.  XML is specifically designed to facilitate this.
6.  APPLICATIONS OF XML:
  1. It is expected that HTML and XML will be used together in many Web applications.
  2. Early applications of XML include Microsoft's Channel Definition Format (CDF), which describes a channel, a portion of a Web site that has been downloaded to your hard disk and is then is updated periodically as information changes. A specific CDF file contains data that specifies an initial Web page and how frequently it is updated.
  3. Current support in Web browsers is given at: http://www.xml.com/pub/2000/05/03/browserchart/index.html
7. REFERENCES ON XML, XHTML, AND XSL:
  1. References from WhatIs:
    1. XML has been developed by a working group under the auspices of the World Wide Web Consortium, which offers you the most knowledgeable place to learn more. Their introduction is called Extensible Markup Language (XML).
    2. Another good source is Robin Cover's The SGML/XML Web Page.
    3. Another early application is ChartWare, which uses XML as a way to describe medical charts so that they can be shared by doctors. ChartWare, maker of a medical charting application, was an early user of XML.
    4. IBM provides an XML Developer Web site that includes free online XML courses, articles, and frequently-asked questions.
    5. Here is IBM's Introduction to XML, including an online tutorial.
    6. Microsoft provides an example of Building an Interactive Frequent-Flyer Web Site Using XML.
  2. W3C references of topics related to XML:
    1. XML Schema·
    2. DOM 
    3. CSS· 
    4. XSL · 
    5. XHTML 
  3. A series of articles/tutorials on the integration of XML tools in Web authoring tools (beginning with Netscape 6's Mozilla) is given at xml.com.