Incorporate
- SOAP; use SOAP
Primer from W3C
- XML
Web services from Micosoft's .Net initiative.
Gray text is from W3C.
Drafted: 5/14/00;
5/8/03
Currently being created!
COSC 330
LEARNING MODULE X
EXTENSIBLE MARKUP LANGUAGE
Extensible markup language (XML) is a new open standard, proposed by
the W3C, for a customizable markup language for defining
data formats for the Web. It is designed specifically to provide
an extensible tag-based language for specifyin new data formats so that any
kind of data can be transmitted, seamlessly, over TCP/IP networks. It
has been said that "XML is to data what HTML is to text", but since
text is a specific form of data, XML is a more general markup language than
HTML. Like HTML, XML is derived from the metalanguage SGML. XML
is a specialized version of SGML that has the ability to describe any
type of structured data, but whose efficiency minimizes the bandwidth required
for TCP/IP transmissions. Three excellent references, from which
this LM was developed and with which it is integrated are the XML home page of the W3C, the TechEncyclopedia
definition of XML (and associated entries), and Netscape's XML Developer
Central.
See
the Study Guide for this learning module.
The Objectives of this learning module are to:
- specify the basic concepts
of XML
- to relate XML to associated
concepts DTD, XSL, and XHTML
- to compare XML, XHTML,
and HTML
- show how XML is used to
create XHTML
- show how HTML can be translated
into XHTML
- to present a perspective
on the future of XML on the future of Web development
TPQ 1: Rewrite the preceding objectives in terms of personal accomplishments
to be attained after finishing the study of this learning module.
This LM covers the same content
as Chapter 26 in Niederst, Web Design in
a Nutshell, but modifies this in order to
better associate related techniques. The sequence of presentations in this
learning module is as follows. You can click on any link to jump
directly to that section.
- BASIC
CONCEPTS IN XML
- DOCUMENT TYPE
DEFINITION (DTD)
- XSL, THE STYLESHEET
LANGUAGE FOR XML
- XHTML IS HTML
RECAST WITH XML
- COMPARISON OF XML AND
HTML
- APPLICATIONS OF XML
- REFERENCES ON XML, XHTML, AND XSL
1. BASIC
CONCEPTS IN XML:
- XML is customizable
markup language specifically designed for representing "structured data" (e.g. spreadsheets, databases, address books, financial
tables, technical drawings, etc.) in a text file.
- Formatting all data as simple
text has several advantages:
- Data formatted as text is
independent of the application which created it. In general, the format
of data saved on secodary storage is characteristic of the application that
generated it. Typically, such data has either a binary format or text
format.However, unlike the binary form, data formatted as text is application
independent, i.e. it can be displayed and modified without the application
that produced it.
- Data formatted as text allows
developers to modify it with simple text editors instead of using the
application associated with the data.
- The disadvantages of text
markup languages can be conpensated. Normally, data files in
a text format are larger than comparable binary files.However,
using a text markup technology was a conscious decision by the XML developers
based on the advantages mentioned in the previous section. "The disadvantages can usually be compensated at a different
level. Disk space isn't as expensive anymore as it used to be, and
programs like zip and gzip can compress files very well
and very fast. Those programs are available for nearly all platforms (and
are usually free). In addition, communication protocols such as modem protocols
and HTTP/1.1 (the core
protocol of the Web) can compress data on the fly, thus saving bandwith as
effectively as a binary format."
- According to the W3C, XML
is a set of recommended conventions for creating text formats for structured
data files that are :
- easy to generate and read
(by a computer),
- unambiguous,
- extensible,
- platform independent,
and
- a universal standard.
- Unlike HTML tags, XML tags are used only to delimit data elements; the interpretation of
the data is done entirely by the application that reads
it. The customizable tags enable developers to define unique types of data that can be
transmitted over a TCP/IP network, validated and interpreted by the receiver.
- XML can be used by any individual or group of
individuals or companies that wants to share information in a consistent
way.
- According to the W3C's activity page, XML will
- Enable internationalized media-independent
electronic publishing
- Allow industries to define platform-independent
protocols for the exchange of data, especially the data of electronic
commerce
- Deliver information to user agents in a form
that allows automatic processing after receipt
- Make it easier to develop software to handle
specialized information distributed over the Web
- Make it easy for people to process data using
inexpensive software
- Allow people to display information the way
they want it, under stylesheet control
- Make it easier to provide metadata (data
aboutdata) that will help people find information and help information
producers and consumers find each other
Outside of W3C, many groups are already defining new
formats for information interchange. The number of XML applications is growing
rapidly, and the growth appears likely to continue. There are many areas,
for example, the health-care industry, the Inland Revenue, government and
finance, where XML applications may soon be used to store and process data.
XML as a simple method for data representation and organization will mean
that problems of data incompatibility and tedious manual re-keying will become
more manageable.
- XML is "extensible" because, unlike HTML, the
markup symbols are unlimited and self-defining.
- XML is actually part of a family of related W3C specifications.
The primary specification is XML 1.0 (Feb '98), the document that defines XML tags, attributes,
etc. However, in addition, there is an evolving collection
optional modules that provide sets of tags
& attributes, or guidelines for specific tasks. These include
(from W3C's XML in
10 Points):
- Xlink
(still in development as of November 1999) which describes a standard way
to add hyperlinks to an XML file.
- XPointer & XFragments (also still being
developed) are syntaxes for pointing to parts of an XML document. (An Xpointer
is a bit like a URL, but instead of pointing to documents on the Web, it
points to pieces of data inside an XML file.)
- CSS,
the style sheet language, is applicable to XML as it is to HTML.
- XSL
(autumn 1999) is the advanced
language for expressing style sheets. It is based on XSLT, a transformation language
that is often useful outside XSL as well, for rearranging, adding or deleting
tags & attributes.
- The DOM is a standard set of
function calls for manipulating XML (and HTML) files from a
programming language.
- XML Namespaces is a specification
that describes how you can associate a URL with every single tag and attribute
in an XML document. What that URL is used for is up to the application that
reads the URL, though. (RDF,
W3C's standard for metadata, uses it to link every piece of metadata to a
file defining the type of that data.)
- XML
Schemas 1 and 2 help
developers to precisely define their own XML-based formats.
There are several more modules and tools available
or under development. Keep an eye on W3C's
technical reports page. The XML Activity Statement explains the W3C's ongoing work in more detail.
- XML is based on SGML, structured markup language. According
to the W3C, "The designers of XML took the best parts
of SGML, guided by the experience with HTML, and produced something that
is no less powerful than SGML, but vastly more regular and simpler to use.
Some evolutions, however, are hard to distinguish from revolutions... And
it must be said that while SGML is mostly used for technical documentation
and much less for other kinds of data, with XML it is exactly the opposite."
- XML syntax is virtually identical to that of
HTML (and all markup languages derived from SGML). It has the
dual advantage of being easily processed by computers while remaining undertandable to humans.
- Data elements are delimited by matching start
and end tags, e.g. <name> and </name>.
- Elements may contain attributes, name-value
pairs e.g., country="US".
From the user's point of view, XML
documents look like HTML documents. Users can display and print XML documents
as if they were HTML documents. Access these examples of XML
to see its general syntax. - XML is planned as the foundation on which many of the W3C markup
languages are (and will be) based. This is illustrated in the following
diagram.
- The last of the W3C's XML in 10 Points says, "XML is license-free,
platform-independent and well-supported. By choosing XML as the basis
for some project, you buy into a large and growing community of tools (one
of which may already do what you need!) and engineers experienced in the
technology. Opting for XML is a bit like choosing SQL for databases: you
still have to build your own database and your own programs/procedures that
manipulate it, but there are many tools available and many people that can
help you. And since XML, as a W3C technology, is license-free, you can build
your own software around it without paying anybody anything. The large and
growing support means that you are also not tied to a single vendor. XML
isn't always the best solution, but it is always worth considering."
2. DOCUMENT
TYPE DEFINITION (DTD) :
- A DTD is a file (or collection of files),
written in XML, which contains a formal definition of a particular
document type. It defines each tag and specifies how they should be interpreted
by the application processing the document. In particular the DTD specifies
- the tag names that can be used as
data element types,
- where names may be used, and
- the syntax of type definitions.
For example, to define a document type that can describe lists of
text items, the DTD would contain (among other declarations)
<!ELEMENT List (Item)+>
<!ELEMENT Item (#PCDATA)>
These statements declare a
list (type List) of items (type Item); the "+" indicates that the
list can have one or more items. The #PCDATA specifes that the items
contain only text. These declarations would allow a list of three text elements
to be written as
<List><Item>FIBs</Item><Item>SAQs</Item><Item>TPQs</Item></List>
The actual appearance of the displayed list
depends on the particular CSS.
- A DTD provides applications with advance
specification of how data is to be defined.
- Using a DTD guarantees that all data
files which are declared to have the same data type will be consistently
formatted so that they can be used by browsers, search engines, database
management systems, and other applications can be used.
- XML is the formal specification language
that allows the DTD to be parsed into declarations that can then be used
to identify every element in a data file and their interrelationships.
- XML does not have to be associated
with a DTD. In practice the design of a DTD can be difficult, so XML
was designed so it can be used either with or without a DTD.
- "DTDless operation" means markup
tags can be used without being defined formally; however, this sacrifices
the great advantage of DTDs, i.e. the ability to easily create additional
documents of the same type.
- Because XML files do not require
a DTD, DTD-lesses HTML files can be converted into XML by making a few
simple changes. (See
Neiderst, p. 447.)
- There are thousands of SGML DTDs already
in existence in all kinds of areas (see the SGML Web pages
for examples). Many of them can be downloaded and used freely. Existing
SGML DTDs do need to be converted to XML for use with XML systems.
3. XSL (EXTENSIBLE STYLESHEET
LANGUAGE), THE STYLESHEET LANGUAGE FOR XML:
- XSL,
developped by the W3C XSL Working Group, is a language for expressing stylesheets for XML files. It consists
of two parts:
- a language
for transforming XML documents, and
- an XML vocabulary for specifying formatting
semantics.
- An XSL stylesheet specifies the presentation
of a class of XML documents by describing how an instance of the class is
transformed into an XML document that uses the formatting vocabulary.
- For background information on style sheets,
see the Web style sheets resource page.
Discussions about XSL are carried out on the XSL-List at mulberrytech.com
and on the newsgroupcomp.infosystems.www.authoring.stylesheets.
- The latest versions of the XSL specification
are:
- Extensible Stylesheet Language
(XSL) - W3C Working Draft
- XSL Transformations (XSLT)
Specification - W3C Recommendation
- XML Path
Language (XPath) - W3C Recommendation
4. XHTML IS HTML
RECAST WITH XML:
- During 1999, HTML 4.0 was rewritten in XML and named
XHTML 1.0. It is intended to be associated other XML applications,
so that XHTML tags can be combined with tags from any other XML application,
e.g. SMIL, MathML, SVG tags as illustrated in the following figure.
- Since it is written in XML, XHTML itself is actually
an XML application.
5. COMPARISON OF XML
AND HTML:
- Like HTML, XML makes use of tags and attributes,
but while HTML specifies what each tag and attribute means (and often how
the text between them will look in a browser), XML uses the tags only to
delimit pieces of data, and leaves the interpretation of the data completely
to the application that reads it. In other words, if you see "<p>"
in an HTML file, you know it is a paragraph, but, in an XML file, depending
on the context, it may be a price, a parameter, a person, a p... (b.t.w.,
who says it has to be a word with a "p"?) .... For example, a <PHONENUM> could indicate that the
data that followed it was a phone number. This means that an XML file can
be processed purely as data by a program or it can be stored with similar
data on another computer or, like an HTML file, that it can be displayed.
For example, depending on how the application in the receiving computer wanted
to handle the phone number, it could be stored, displayed, or dialed.
- But the rules for XML files are much stricter
than for HTML. A forgotten tag, or a an attribute without quotes makes the
file unusable, while in HTML such practice is often explicitly allowed, or
at least tolerated. It is written in the official XML specification: applications
are not allowed to try to second-guess the creator of a broken XML
file; if the file is broken, an application has to stop right there and issue
an error.
- One of the shortcommings of
HTML is that, although it is great for transferring documents from a server
to an HTTP client, it is useless for sending them to other mediums likd a
printer or a ticker. XML is specifically designed to facilitate this.
6.
APPLICATIONS OF XML:
- It is expected that HTML and XML will be used
together in many Web applications.
- Early applications of XML include Microsoft's
Channel Definition Format (CDF), which describes a
channel, a portion of a Web site that has been
downloaded to your hard disk and is then is updated periodically as information
changes. A specific CDF file contains data that specifies an initial Web
page and how frequently it is updated.
-
Current support in Web browsers
is given at: http://www.xml.com/pub/2000/05/03/browserchart/index.html
7. REFERENCES ON XML, XHTML, AND XSL:
- References from WhatIs:
- XML has been developed by a working group under
the auspices of the World Wide Web Consortium, which offers you the most
knowledgeable place to learn more. Their introduction is called Extensible Markup Language (XML).
- Another good source is Robin Cover's The SGML/XML
Web Page.
- Another early application is ChartWare, which
uses XML as a way to describe medical charts so that they can be shared by
doctors. ChartWare, maker of a medical charting application, was
an early user of XML.
- IBM provides an XML Developer Web
site that includes free online XML courses, articles, and frequently-asked
questions.
- Here is IBM's Introduction to XML, including an online tutorial.
- Microsoft provides an example of Building
an Interactive Frequent-Flyer Web Site Using XML.
- W3C references of topics related to XML:
- XML Schema·
- DOM
- CSS·
- XSL ·
- XHTML
- A series
of articles/tutorials on the integration of XML tools in Web authoring
tools (beginning with Netscape 6's Mozilla) is given at xml.com.